Class FuzzyKMeansClusterer<T extends Clusterable>
- java.lang.Object
-
- org.apache.commons.math4.legacy.ml.clustering.Clusterer<T>
-
- org.apache.commons.math4.legacy.ml.clustering.FuzzyKMeansClusterer<T>
-
- Type Parameters:
T
- type of the points to cluster
public class FuzzyKMeansClusterer<T extends Clusterable> extends Clusterer<T>
Fuzzy K-Means clustering algorithm.The Fuzzy K-Means algorithm is a variation of the classical K-Means algorithm, with the major difference that a single data point is not uniquely assigned to a single cluster. Instead, each point i has a set of weights uij which indicate the degree of membership to the cluster j.
The algorithm then tries to minimize the objective function:
with dik being the distance between data point i and the cluster center k.J = ∑i=1..C∑k=1..N uikmdik2
The algorithm requires two parameters:
- k: the number of clusters
- fuzziness: determines the level of cluster fuzziness, larger values lead to fuzzier clusters
- maxIterations: the maximum number of iterations
- epsilon: the convergence criteria, default is 1e-3
The fuzzy variant of the K-Means algorithm is more robust with regard to the selection of the initial cluster centers.
- Since:
- 3.3
-
-
Constructor Summary
Constructors Constructor Description FuzzyKMeansClusterer(int k, double fuzziness)
Creates a new instance of a FuzzyKMeansClusterer.FuzzyKMeansClusterer(int k, double fuzziness, int maxIterations, DistanceMeasure measure)
Creates a new instance of a FuzzyKMeansClusterer.FuzzyKMeansClusterer(int k, double fuzziness, int maxIterations, DistanceMeasure measure, double epsilon, org.apache.commons.rng.UniformRandomProvider random)
Creates a new instance of a FuzzyKMeansClusterer.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description List<CentroidCluster<T>>
cluster(Collection<T> dataPoints)
Performs Fuzzy K-Means cluster analysis.List<CentroidCluster<T>>
getClusters()
Returns the list of clusters resulting from the last call tocluster(Collection)
.List<T>
getDataPoints()
Returns an unmodifiable list of the data points used in the last call tocluster(Collection)
.double
getEpsilon()
Returns the convergence criteria used by this instance.double
getFuzziness()
Returns the fuzziness factor used by this instance.int
getK()
Return the number of clusters this instance will use.int
getMaxIterations()
Returns the maximum number of iterations this instance will use.RealMatrix
getMembershipMatrix()
Returns thenxk
membership matrix, wheren
is the number of data points andk
the number of clusters.double
getObjectiveFunctionValue()
Get the value of the objective function.org.apache.commons.rng.UniformRandomProvider
getRandomGenerator()
Returns the random generator this instance will use.-
Methods inherited from class org.apache.commons.math4.legacy.ml.clustering.Clusterer
distance, getDistanceMeasure
-
-
-
-
Constructor Detail
-
FuzzyKMeansClusterer
public FuzzyKMeansClusterer(int k, double fuzziness)
Creates a new instance of a FuzzyKMeansClusterer.The euclidean distance will be used as default distance measure.
- Parameters:
k
- the number of clusters to split the data intofuzziness
- the fuzziness factor, must be > 1.0- Throws:
NumberIsTooSmallException
- iffuzziness <= 1.0
-
FuzzyKMeansClusterer
public FuzzyKMeansClusterer(int k, double fuzziness, int maxIterations, DistanceMeasure measure)
Creates a new instance of a FuzzyKMeansClusterer.- Parameters:
k
- the number of clusters to split the data intofuzziness
- the fuzziness factor, must be > 1.0maxIterations
- the maximum number of iterations to run the algorithm for. If negative, no maximum will be used.measure
- the distance measure to use- Throws:
NumberIsTooSmallException
- iffuzziness <= 1.0
-
FuzzyKMeansClusterer
public FuzzyKMeansClusterer(int k, double fuzziness, int maxIterations, DistanceMeasure measure, double epsilon, org.apache.commons.rng.UniformRandomProvider random)
Creates a new instance of a FuzzyKMeansClusterer.- Parameters:
k
- the number of clusters to split the data intofuzziness
- the fuzziness factor, must be > 1.0maxIterations
- the maximum number of iterations to run the algorithm for. If negative, no maximum will be used.measure
- the distance measure to useepsilon
- the convergence criteria (default is 1e-3)random
- random generator to use for choosing initial centers- Throws:
NumberIsTooSmallException
- iffuzziness <= 1.0
-
-
Method Detail
-
getK
public int getK()
Return the number of clusters this instance will use.- Returns:
- the number of clusters
-
getFuzziness
public double getFuzziness()
Returns the fuzziness factor used by this instance.- Returns:
- the fuzziness factor
-
getMaxIterations
public int getMaxIterations()
Returns the maximum number of iterations this instance will use.- Returns:
- the maximum number of iterations, or -1 if no maximum is set
-
getEpsilon
public double getEpsilon()
Returns the convergence criteria used by this instance.- Returns:
- the convergence criteria
-
getRandomGenerator
public org.apache.commons.rng.UniformRandomProvider getRandomGenerator()
Returns the random generator this instance will use.- Returns:
- the random generator
-
getMembershipMatrix
public RealMatrix getMembershipMatrix()
Returns thenxk
membership matrix, wheren
is the number of data points andk
the number of clusters.The element Ui,j represents the membership value for data point
i
to clusterj
.- Returns:
- the membership matrix
- Throws:
MathIllegalStateException
- ifcluster(Collection)
has not been called before
-
getDataPoints
public List<T> getDataPoints()
Returns an unmodifiable list of the data points used in the last call tocluster(Collection)
.- Returns:
- the list of data points, or
null
ifcluster(Collection)
has not been called before.
-
getClusters
public List<CentroidCluster<T>> getClusters()
Returns the list of clusters resulting from the last call tocluster(Collection)
.- Returns:
- the list of clusters, or
null
ifcluster(Collection)
has not been called before.
-
getObjectiveFunctionValue
public double getObjectiveFunctionValue()
Get the value of the objective function.- Returns:
- the objective function evaluation as double value
- Throws:
MathIllegalStateException
- ifcluster(Collection)
has not been called before
-
cluster
public List<CentroidCluster<T>> cluster(Collection<T> dataPoints)
Performs Fuzzy K-Means cluster analysis.- Specified by:
cluster
in classClusterer<T extends Clusterable>
- Parameters:
dataPoints
- the points to cluster- Returns:
- the list of clusters
- Throws:
MathIllegalArgumentException
- if the data points are null or the number of clusters is larger than the number of data points
-
-