Class KMeansPlusPlusClusterer<T extends Clusterable>
- java.lang.Object
-
- org.apache.commons.math4.legacy.ml.clustering.Clusterer<T>
-
- org.apache.commons.math4.legacy.ml.clustering.KMeansPlusPlusClusterer<T>
-
- Type Parameters:
T
- type of the points to cluster
- Direct Known Subclasses:
ElkanKMeansPlusPlusClusterer
,MiniBatchKMeansClusterer
public class KMeansPlusPlusClusterer<T extends Clusterable> extends Clusterer<T>
Clustering algorithm based on David Arthur and Sergei Vassilvitski k-means++ algorithm.- Since:
- 3.2
- See Also:
- K-means++ (wikipedia)
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
KMeansPlusPlusClusterer.EmptyClusterStrategy
Strategies to use for replacing an empty cluster.
-
Constructor Summary
Constructors Constructor Description KMeansPlusPlusClusterer(int k)
Build a clusterer.KMeansPlusPlusClusterer(int k, int maxIterations)
Build a clusterer.KMeansPlusPlusClusterer(int k, int maxIterations, DistanceMeasure measure)
Build a clusterer.KMeansPlusPlusClusterer(int k, int maxIterations, DistanceMeasure measure, org.apache.commons.rng.UniformRandomProvider random)
Build a clusterer.KMeansPlusPlusClusterer(int k, int maxIterations, DistanceMeasure measure, org.apache.commons.rng.UniformRandomProvider random, KMeansPlusPlusClusterer.EmptyClusterStrategy emptyStrategy)
Build a clusterer.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description List<CentroidCluster<T>>
cluster(Collection<T> points)
Runs the K-means++ clustering algorithm.int
getMaxIterations()
Returns the maximum number of iterations this instance will use.int
getNumberOfClusters()
Return the number of clusters this instance will use.-
Methods inherited from class org.apache.commons.math4.legacy.ml.clustering.Clusterer
distance, getDistanceMeasure
-
-
-
-
Constructor Detail
-
KMeansPlusPlusClusterer
public KMeansPlusPlusClusterer(int k)
Build a clusterer.The default strategy for handling empty clusters that may appear during algorithm iterations is to split the cluster with largest distance variance.
The euclidean distance will be used as default distance measure.
- Parameters:
k
- the number of clusters to split the data into
-
KMeansPlusPlusClusterer
public KMeansPlusPlusClusterer(int k, int maxIterations)
Build a clusterer.The default strategy for handling empty clusters that may appear during algorithm iterations is to split the cluster with largest distance variance.
The euclidean distance will be used as default distance measure.
- Parameters:
k
- the number of clusters to split the data intomaxIterations
- the maximum number of iterations to run the algorithm for. If negative, no maximum will be used.
-
KMeansPlusPlusClusterer
public KMeansPlusPlusClusterer(int k, int maxIterations, DistanceMeasure measure)
Build a clusterer.The default strategy for handling empty clusters that may appear during algorithm iterations is to split the cluster with largest distance variance.
- Parameters:
k
- the number of clusters to split the data intomaxIterations
- the maximum number of iterations to run the algorithm for.measure
- the distance measure to use- Throws:
NotStrictlyPositiveException
- ifk <= 0
.
-
KMeansPlusPlusClusterer
public KMeansPlusPlusClusterer(int k, int maxIterations, DistanceMeasure measure, org.apache.commons.rng.UniformRandomProvider random)
Build a clusterer.The default strategy for handling empty clusters that may appear during algorithm iterations is to split the cluster with largest distance variance.
- Parameters:
k
- the number of clusters to split the data intomaxIterations
- the maximum number of iterations to run the algorithm for. If negative, no maximum will be used.measure
- the distance measure to userandom
- random generator to use for choosing initial centers
-
KMeansPlusPlusClusterer
public KMeansPlusPlusClusterer(int k, int maxIterations, DistanceMeasure measure, org.apache.commons.rng.UniformRandomProvider random, KMeansPlusPlusClusterer.EmptyClusterStrategy emptyStrategy)
Build a clusterer.- Parameters:
k
- the number of clusters to split the data intomaxIterations
- the maximum number of iterations to run the algorithm for.measure
- the distance measure to userandom
- random generator to use for choosing initial centersemptyStrategy
- strategy to use for handling empty clusters that may appear during algorithm iterations- Throws:
NotStrictlyPositiveException
- ifk <= 0
ormaxIterations <= 0
.
-
-
Method Detail
-
getNumberOfClusters
public int getNumberOfClusters()
Return the number of clusters this instance will use.- Returns:
- the number of clusters
-
getMaxIterations
public int getMaxIterations()
Returns the maximum number of iterations this instance will use.- Returns:
- the maximum number of iterations, or -1 if no maximum is set
-
cluster
public List<CentroidCluster<T>> cluster(Collection<T> points)
Runs the K-means++ clustering algorithm.- Specified by:
cluster
in classClusterer<T extends Clusterable>
- Parameters:
points
- the points to cluster- Returns:
- a list of clusters containing the points
- Throws:
MathIllegalArgumentException
- if the data points are null or the number of clusters is larger than the number of data pointsConvergenceException
- if an empty cluster is encountered and the empty cluster strategy is set toKMeansPlusPlusClusterer.EmptyClusterStrategy.ERROR
-
-