16 Machine Learning16.1 OverviewMachine learning support in commons-math currently provides operations to cluster data sets based on a distance measure. 16.2 Clustering algorithms and distance measuresThe Clusterer class represents a clustering algorithm. The following algorithms are available:
An comparison of the available clustering algorithms: 16.3 Distance measuresEach clustering algorithm requires a distance measure to determine the distance between two points (either data points or cluster centers). The following distance measures are available: 16.3 Example
Here is an example of a clustering execution. Let us assume we have a set of locations from our domain model,
where each location has a method The cluster algorithms expect a list of Clusterable as input. Typically, we don't want to pollute our domain objects with interfaces from helper APIs. Hence, we first create a wrapper object:
// wrapper class
public static class LocationWrapper implements Clusterable {
private double[] points;
private Location location;
public LocationWrapper(Location location) {
this.location = location;
this.points = new double[] { location.getX(), location.getY() }
}
public Location getLocation() {
return location;
}
public double[] getPoint() {
return points;
}
}
// we have a list of our locations we want to cluster. create a
List<Location> locations = ...;
List<LocationWrapper> clusterInput = new ArrayList<LocationWrapper>(locations.size());
for (Location location : locations)
clusterInput.add(new LocationWrapper(location));
// initialize a new clustering algorithm.
// we use KMeans++ with 10 clusters and 10000 iterations maximum.
// we did not specify a distance measure; the default (euclidean distance) is used.
KMeansPlusPlusClusterer<LocationWrapper> clusterer = new KMeansPlusPlusClusterer<LocationWrapper>(10, 10000);
List<CentroidCluster<LocationWrapper>> clusterResults = clusterer.cluster(clusterInput);
// output the clusters
for (int i=0; i<clusterResults.size(); i++) {
System.out.println("Cluster " + i);
for (LocationWrapper locationWrapper : clusterResults.get(i).getPoints())
System.out.println(locationWrapper.getLocation());
System.out.println();
}
|