Class ElkanKMeansPlusPlusClusterer<T extends Clusterable>

  • Type Parameters:
    T - Type of the points to cluster.

    public class ElkanKMeansPlusPlusClusterer<T extends Clusterable>
    extends KMeansPlusPlusClusterer<T>
    Implementation of k-means++ algorithm. It is based on
    Elkan, Charles. "Using the triangle inequality to accelerate k-means." ICML. Vol. 3. 2003.

    Algorithm uses triangle inequality to speed up computation, by reducing the amount of distances calculations. Towards the last iterations of the algorithm, points which already assigned to some cluster are unlikely to move to a new cluster; updates of cluster centers are also usually relatively small. Triangle inequality is thus used to determine the cases where distance computation could be skipped since center move only a little, without affecting points partitioning.

    For initial centers seeding, we apply the algorithm described in

    Arthur, David, and Sergei Vassilvitskii. "k-means++: The advantages of careful seeding." Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, 2007.
    • Constructor Detail

      • ElkanKMeansPlusPlusClusterer

        public ElkanKMeansPlusPlusClusterer​(int k,
                                            int maxIterations,
                                            DistanceMeasure measure,
                                            org.apache.commons.rng.UniformRandomProvider random)
        Parameters:
        k - Clustering parameter.
        maxIterations - Allowed number of iterations.
        measure - Distance measure.
        random - Random generator.
      • ElkanKMeansPlusPlusClusterer

        public ElkanKMeansPlusPlusClusterer​(int k,
                                            int maxIterations,
                                            DistanceMeasure measure,
                                            org.apache.commons.rng.UniformRandomProvider random,
                                            KMeansPlusPlusClusterer.EmptyClusterStrategy emptyStrategy)
        Parameters:
        k - Clustering parameter.
        maxIterations - Allowed number of iterations.
        measure - Distance measure.
        random - Random generator.
        emptyStrategy - Strategy for handling empty clusters that may appear during algorithm progress.