Class JaroWinklerSimilarity

java.lang.Object
org.apache.commons.text.similarity.JaroWinklerSimilarity
All Implemented Interfaces:
SimilarityScore<Double>

public class JaroWinklerSimilarity extends Object implements SimilarityScore<Double>
A similarity algorithm indicating the percentage of matched characters between two character sequences.

The Jaro measure is the weighted sum of percentage of matched characters from each file and transposed characters. Winkler increased this measure for matching initial characters.

This implementation is based on the Jaro Winkler similarity algorithm from https://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance.

This code has been adapted from Apache Commons Lang 3.3.

Since:
1.7
  • Constructor Details

  • Method Details

    • matches

      protected static int[] matches(CharSequence first, CharSequence second)
      This method returns the Jaro-Winkler string matches, half transpositions, prefix array.
      Parameters:
      first - the first string to be matched
      second - the second string to be matched
      Returns:
      mtp array containing: matches, half transpositions, and prefix
    • apply

      public Double apply(CharSequence left, CharSequence right)
      Computes the Jaro Winkler Similarity between two character sequences.
       sim.apply(null, null)          = IllegalArgumentException
       sim.apply("foo", null)         = IllegalArgumentException
       sim.apply(null, "foo")         = IllegalArgumentException
       sim.apply("", "")              = 1.0
       sim.apply("foo", "foo")        = 1.0
       sim.apply("foo", "foo ")       = 0.94
       sim.apply("foo", "foo  ")      = 0.91
       sim.apply("foo", " foo ")      = 0.87
       sim.apply("foo", "  foo")      = 0.51
       sim.apply("", "a")             = 0.0
       sim.apply("aaapppp", "")       = 0.0
       sim.apply("frog", "fog")       = 0.93
       sim.apply("fly", "ant")        = 0.0
       sim.apply("elephant", "hippo") = 0.44
       sim.apply("hippo", "elephant") = 0.44
       sim.apply("hippo", "zzzzzzzz") = 0.0
       sim.apply("hello", "hallo")    = 0.88
       sim.apply("ABC Corporation", "ABC Corp") = 0.91
       sim.apply("D N H Enterprises Inc", "D & H Enterprises, Inc.") = 0.95
       sim.apply("My Gym Children's Fitness Center", "My Gym. Childrens Fitness") = 0.92
       sim.apply("PENNSYLVANIA", "PENNCISYLVNIA") = 0.88
       
      Specified by:
      apply in interface SimilarityScore<Double>
      Parameters:
      left - the first CharSequence, must not be null
      right - the second CharSequence, must not be null
      Returns:
      result similarity
      Throws:
      IllegalArgumentException - if either CharSequence input is null