org.apache.commons.math3.ml.clustering

Class FuzzyKMeansClusterer<T extends Clusterable>

• Type Parameters:
T - type of the points to cluster

public class FuzzyKMeansClusterer<T extends Clusterable>
extends Clusterer<T>
Fuzzy K-Means clustering algorithm.

The Fuzzy K-Means algorithm is a variation of the classical K-Means algorithm, with the major difference that a single data point is not uniquely assigned to a single cluster. Instead, each point i has a set of weights uij which indicate the degree of membership to the cluster j.

The algorithm then tries to minimize the objective function:

 J = ∑i=1..C∑k=1..N uikmdik2

with dik being the distance between data point i and the cluster center k.

The algorithm requires two parameters:

• k: the number of clusters
• fuzziness: determines the level of cluster fuzziness, larger values lead to fuzzier clusters
• maxIterations: the maximum number of iterations
• epsilon: the convergence criteria, default is 1e-3

The fuzzy variant of the K-Means algorithm is more robust with regard to the selection of the initial cluster centers.

Since:
3.3
Version:
$Id: FuzzyKMeansClusterer.html 908881 2014-05-15 07:10:28Z luc$
• Constructor Summary

Constructors
Constructor and Description
FuzzyKMeansClusterer(int k, double fuzziness)
Creates a new instance of a FuzzyKMeansClusterer.
FuzzyKMeansClusterer(int k, double fuzziness, int maxIterations, DistanceMeasure measure)
Creates a new instance of a FuzzyKMeansClusterer.
FuzzyKMeansClusterer(int k, double fuzziness, int maxIterations, DistanceMeasure measure, double epsilon, RandomGenerator random)
Creates a new instance of a FuzzyKMeansClusterer.
• Method Summary

Methods
Modifier and Type Method and Description
List<CentroidCluster<T>> cluster(Collection<T> dataPoints)
Performs Fuzzy K-Means cluster analysis.
List<CentroidCluster<T>> getClusters()
Returns the list of clusters resulting from the last call to cluster(Collection).
List<T> getDataPoints()
Returns an unmodifiable list of the data points used in the last call to cluster(Collection).
double getEpsilon()
Returns the convergence criteria used by this instance.
double getFuzziness()
Returns the fuzziness factor used by this instance.
int getK()
Return the number of clusters this instance will use.
int getMaxIterations()
Returns the maximum number of iterations this instance will use.
RealMatrix getMembershipMatrix()
Returns the nxk membership matrix, where n is the number of data points and k the number of clusters.
double getObjectiveFunctionValue()
Get the value of the objective function.
RandomGenerator getRandomGenerator()
Returns the random generator this instance will use.
• Methods inherited from class org.apache.commons.math3.ml.clustering.Clusterer

distance, getDistanceMeasure
• Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
• Constructor Detail

• FuzzyKMeansClusterer

public FuzzyKMeansClusterer(int k,
double fuzziness)
throws NumberIsTooSmallException
Creates a new instance of a FuzzyKMeansClusterer.

The euclidean distance will be used as default distance measure.

Parameters:
k - the number of clusters to split the data into
fuzziness - the fuzziness factor, must be > 1.0
Throws:
NumberIsTooSmallException - if fuzziness <= 1.0
• FuzzyKMeansClusterer

public FuzzyKMeansClusterer(int k,
double fuzziness,
int maxIterations,
DistanceMeasure measure)
throws NumberIsTooSmallException
Creates a new instance of a FuzzyKMeansClusterer.
Parameters:
k - the number of clusters to split the data into
fuzziness - the fuzziness factor, must be > 1.0
maxIterations - the maximum number of iterations to run the algorithm for. If negative, no maximum will be used.
measure - the distance measure to use
Throws:
NumberIsTooSmallException - if fuzziness <= 1.0
• FuzzyKMeansClusterer

public FuzzyKMeansClusterer(int k,
double fuzziness,
int maxIterations,
DistanceMeasure measure,
double epsilon,
RandomGenerator random)
throws NumberIsTooSmallException
Creates a new instance of a FuzzyKMeansClusterer.
Parameters:
k - the number of clusters to split the data into
fuzziness - the fuzziness factor, must be > 1.0
maxIterations - the maximum number of iterations to run the algorithm for. If negative, no maximum will be used.
measure - the distance measure to use
epsilon - the convergence criteria (default is 1e-3)
random - random generator to use for choosing initial centers
Throws:
NumberIsTooSmallException - if fuzziness <= 1.0
• Method Detail

• getK

public int getK()
Return the number of clusters this instance will use.
Returns:
the number of clusters
• getFuzziness

public double getFuzziness()
Returns the fuzziness factor used by this instance.
Returns:
the fuzziness factor
• getMaxIterations

public int getMaxIterations()
Returns the maximum number of iterations this instance will use.
Returns:
the maximum number of iterations, or -1 if no maximum is set
• getEpsilon

public double getEpsilon()
Returns the convergence criteria used by this instance.
Returns:
the convergence criteria
• getRandomGenerator

public RandomGenerator getRandomGenerator()
Returns the random generator this instance will use.
Returns:
the random generator
• getMembershipMatrix

public RealMatrix getMembershipMatrix()
Returns the nxk membership matrix, where n is the number of data points and k the number of clusters.

The element Ui,j represents the membership value for data point i to cluster j.

Returns:
the membership matrix
Throws:
MathIllegalStateException - if cluster(Collection) has not been called before
• getObjectiveFunctionValue

public double getObjectiveFunctionValue()
Get the value of the objective function.
Returns:
the objective function evaluation as double value
Throws:
MathIllegalStateException - if cluster(Collection) has not been called before
• cluster

public List<CentroidCluster<T>> cluster(Collection<T> dataPoints)
throws MathIllegalArgumentException
Performs Fuzzy K-Means cluster analysis.
Specified by:
cluster in class Clusterer<T extends Clusterable>
Parameters:
dataPoints - the points to cluster
Returns:
the list of clusters
Throws:
MathIllegalArgumentException - if the data points are null or the number of clusters is larger than the number of data points