org.apache.commons.math3.ml.clustering

Class KMeansPlusPlusClusterer<T extends Clusterable>

• Type Parameters:
T - type of the points to cluster

public class KMeansPlusPlusClusterer<T extends Clusterable>
extends Clusterer<T>
Clustering algorithm based on David Arthur and Sergei Vassilvitski k-means++ algorithm.
Since:
3.2
Version:
$Id: KMeansPlusPlusClusterer.html 908881 2014-05-15 07:10:28Z luc$
K-means++ (wikipedia)
• Constructor Detail

• KMeansPlusPlusClusterer

public KMeansPlusPlusClusterer(int k)
Build a clusterer.

The default strategy for handling empty clusters that may appear during algorithm iterations is to split the cluster with largest distance variance.

The euclidean distance will be used as default distance measure.

Parameters:
k - the number of clusters to split the data into
• KMeansPlusPlusClusterer

public KMeansPlusPlusClusterer(int k,
int maxIterations)
Build a clusterer.

The default strategy for handling empty clusters that may appear during algorithm iterations is to split the cluster with largest distance variance.

The euclidean distance will be used as default distance measure.

Parameters:
k - the number of clusters to split the data into
maxIterations - the maximum number of iterations to run the algorithm for. If negative, no maximum will be used.
• KMeansPlusPlusClusterer

public KMeansPlusPlusClusterer(int k,
int maxIterations,
DistanceMeasure measure)
Build a clusterer.

The default strategy for handling empty clusters that may appear during algorithm iterations is to split the cluster with largest distance variance.

Parameters:
k - the number of clusters to split the data into
maxIterations - the maximum number of iterations to run the algorithm for. If negative, no maximum will be used.
measure - the distance measure to use
• KMeansPlusPlusClusterer

public KMeansPlusPlusClusterer(int k,
int maxIterations,
DistanceMeasure measure,
RandomGenerator random)
Build a clusterer.

The default strategy for handling empty clusters that may appear during algorithm iterations is to split the cluster with largest distance variance.

Parameters:
k - the number of clusters to split the data into
maxIterations - the maximum number of iterations to run the algorithm for. If negative, no maximum will be used.
measure - the distance measure to use
random - random generator to use for choosing initial centers
• KMeansPlusPlusClusterer

public KMeansPlusPlusClusterer(int k,
int maxIterations,
DistanceMeasure measure,
RandomGenerator random,
KMeansPlusPlusClusterer.EmptyClusterStrategy emptyStrategy)
Build a clusterer.
Parameters:
k - the number of clusters to split the data into
maxIterations - the maximum number of iterations to run the algorithm for. If negative, no maximum will be used.
measure - the distance measure to use
random - random generator to use for choosing initial centers
emptyStrategy - strategy to use for handling empty clusters that may appear during algorithm iterations