Class ElkanKMeansPlusPlusClusterer<T extends Clusterable>
- java.lang.Object
-
- org.apache.commons.math4.legacy.ml.clustering.Clusterer<T>
-
- org.apache.commons.math4.legacy.ml.clustering.KMeansPlusPlusClusterer<T>
-
- org.apache.commons.math4.legacy.ml.clustering.ElkanKMeansPlusPlusClusterer<T>
-
- Type Parameters:
T
- Type of the points to cluster.
public class ElkanKMeansPlusPlusClusterer<T extends Clusterable> extends KMeansPlusPlusClusterer<T>
Implementation of k-means++ algorithm. It is based onElkan, Charles. "Using the triangle inequality to accelerate k-means." ICML. Vol. 3. 2003.
Algorithm uses triangle inequality to speed up computation, by reducing the amount of distances calculations. Towards the last iterations of the algorithm, points which already assigned to some cluster are unlikely to move to a new cluster; updates of cluster centers are also usually relatively small. Triangle inequality is thus used to determine the cases where distance computation could be skipped since center move only a little, without affecting points partitioning.
For initial centers seeding, we apply the algorithm described in
Arthur, David, and Sergei Vassilvitskii. "k-means++: The advantages of careful seeding." Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, 2007.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.commons.math4.legacy.ml.clustering.KMeansPlusPlusClusterer
KMeansPlusPlusClusterer.EmptyClusterStrategy
-
-
Constructor Summary
Constructors Constructor Description ElkanKMeansPlusPlusClusterer(int k)
ElkanKMeansPlusPlusClusterer(int k, int maxIterations, DistanceMeasure measure, org.apache.commons.rng.UniformRandomProvider random)
ElkanKMeansPlusPlusClusterer(int k, int maxIterations, DistanceMeasure measure, org.apache.commons.rng.UniformRandomProvider random, KMeansPlusPlusClusterer.EmptyClusterStrategy emptyStrategy)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description List<CentroidCluster<T>>
cluster(Collection<T> points)
Runs the K-means++ clustering algorithm.-
Methods inherited from class org.apache.commons.math4.legacy.ml.clustering.KMeansPlusPlusClusterer
getMaxIterations, getNumberOfClusters
-
Methods inherited from class org.apache.commons.math4.legacy.ml.clustering.Clusterer
distance, getDistanceMeasure
-
-
-
-
Constructor Detail
-
ElkanKMeansPlusPlusClusterer
public ElkanKMeansPlusPlusClusterer(int k)
- Parameters:
k
- Clustering parameter.
-
ElkanKMeansPlusPlusClusterer
public ElkanKMeansPlusPlusClusterer(int k, int maxIterations, DistanceMeasure measure, org.apache.commons.rng.UniformRandomProvider random)
- Parameters:
k
- Clustering parameter.maxIterations
- Allowed number of iterations.measure
- Distance measure.random
- Random generator.
-
ElkanKMeansPlusPlusClusterer
public ElkanKMeansPlusPlusClusterer(int k, int maxIterations, DistanceMeasure measure, org.apache.commons.rng.UniformRandomProvider random, KMeansPlusPlusClusterer.EmptyClusterStrategy emptyStrategy)
- Parameters:
k
- Clustering parameter.maxIterations
- Allowed number of iterations.measure
- Distance measure.random
- Random generator.emptyStrategy
- Strategy for handling empty clusters that may appear during algorithm progress.
-
-
Method Detail
-
cluster
public List<CentroidCluster<T>> cluster(Collection<T> points)
Runs the K-means++ clustering algorithm.- Overrides:
cluster
in classKMeansPlusPlusClusterer<T extends Clusterable>
- Parameters:
points
- the points to cluster- Returns:
- a list of clusters containing the points
-
-