org.apache.commons.math3.stat.clustering
Class KMeansPlusPlusClusterer<T extends Clusterable<T>>

java.lang.Object
  extended by org.apache.commons.math3.stat.clustering.KMeansPlusPlusClusterer<T>
Type Parameters:
T - type of the points to cluster

public class KMeansPlusPlusClusterer<T extends Clusterable<T>>
extends Object

Clustering algorithm based on David Arthur and Sergei Vassilvitski k-means++ algorithm.

Since:
2.0
Version:
$Id: KMeansPlusPlusClusterer.java 1416643 2012-12-03 19:37:14Z tn $
See Also:
K-means++ (wikipedia)

Nested Class Summary
static class KMeansPlusPlusClusterer.EmptyClusterStrategy
          Strategies to use for replacing an empty cluster.
 
Constructor Summary
KMeansPlusPlusClusterer(Random random)
          Build a clusterer.
KMeansPlusPlusClusterer(Random random, KMeansPlusPlusClusterer.EmptyClusterStrategy emptyStrategy)
          Build a clusterer.
 
Method Summary
 List<Cluster<T>> cluster(Collection<T> points, int k, int maxIterations)
          Runs the K-means++ clustering algorithm.
 List<Cluster<T>> cluster(Collection<T> points, int k, int numTrials, int maxIterationsPerTrial)
          Runs the K-means++ clustering algorithm.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

KMeansPlusPlusClusterer

public KMeansPlusPlusClusterer(Random random)
Build a clusterer.

The default strategy for handling empty clusters that may appear during algorithm iterations is to split the cluster with largest distance variance.

Parameters:
random - random generator to use for choosing initial centers

KMeansPlusPlusClusterer

public KMeansPlusPlusClusterer(Random random,
                               KMeansPlusPlusClusterer.EmptyClusterStrategy emptyStrategy)
Build a clusterer.

Parameters:
random - random generator to use for choosing initial centers
emptyStrategy - strategy to use for handling empty clusters that may appear during algorithm iterations
Since:
2.2
Method Detail

cluster

public List<Cluster<T>> cluster(Collection<T> points,
                                int k,
                                int numTrials,
                                int maxIterationsPerTrial)
                                                throws MathIllegalArgumentException,
                                                       ConvergenceException
Runs the K-means++ clustering algorithm.

Parameters:
points - the points to cluster
k - the number of clusters to split the data into
numTrials - number of trial runs
maxIterationsPerTrial - the maximum number of iterations to run the algorithm for at each trial run. If negative, no maximum will be used
Returns:
a list of clusters containing the points
Throws:
MathIllegalArgumentException - if the data points are null or the number of clusters is larger than the number of data points
ConvergenceException - if an empty cluster is encountered and the emptyStrategy is set to ERROR

cluster

public List<Cluster<T>> cluster(Collection<T> points,
                                int k,
                                int maxIterations)
                                                throws MathIllegalArgumentException,
                                                       ConvergenceException
Runs the K-means++ clustering algorithm.

Parameters:
points - the points to cluster
k - the number of clusters to split the data into
maxIterations - the maximum number of iterations to run the algorithm for. If negative, no maximum will be used
Returns:
a list of clusters containing the points
Throws:
MathIllegalArgumentException - if the data points are null or the number of clusters is larger than the number of data points
ConvergenceException - if an empty cluster is encountered and the emptyStrategy is set to ERROR


Copyright © 2003-2012 The Apache Software Foundation. All Rights Reserved.