K-Means clustering is a simple and commonly used non-parametric clustering technique. Given a set of data, it requires a number of clusters (k) as input, and clusters the data into the specified number of clusters. K-Means clustering is also notorious for getting stuck in local maxima (incorrect clustering of the data). To get around this, it is often re-run several times and the result with lowest error is chosen.
My implementation provides an interface, KMeansClusterable, that objects must implement to be clustered. The interface defines a method, getFeatureSpaceRepresentation() that returns an n-dimensional feature space representation of the object (vector) in the form of a double array. Feel free to download and try my implementation here - any feedback is gladly welcomed.
Some results from my implementation are shown below. Given a set of data that consists of six clusters, each drawn from a square uniform distribution in 2-dimensional space, I ran my implementation using the correct number of means, too few means, and too many means. It is also worth noting that I ran K-Means multiple times and chose the result with the lowest error.
