public class JaroWinklerDistance extends Object implements SimilarityScore<Double>
The Jaro measure is the weighted sum of percentage of matched characters from each file and transposed characters. Winkler increased this measure for matching initial characters.
This implementation is based on the Jaro Winkler similarity algorithm from http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance.
This code has been adapted from Apache Commons Lang 3.3.
Modifier and Type | Field and Description |
---|---|
static int |
INDEX_NOT_FOUND
Represents a failed index search.
|
Constructor and Description |
---|
JaroWinklerDistance() |
Modifier and Type | Method and Description |
---|---|
Double |
apply(CharSequence left,
CharSequence right)
Find the Jaro Winkler Distance which indicates the similarity score
between two CharSequences.
|
protected static int[] |
matches(CharSequence first,
CharSequence second)
This method returns the Jaro-Winkler string matches, transpositions, prefix, max array.
|
public static final int INDEX_NOT_FOUND
public JaroWinklerDistance()
public Double apply(CharSequence left, CharSequence right)
distance.apply(null, null) = IllegalArgumentException distance.apply("","") = 0.0 distance.apply("","a") = 0.0 distance.apply("aaapppp", "") = 0.0 distance.apply("frog", "fog") = 0.93 distance.apply("fly", "ant") = 0.0 distance.apply("elephant", "hippo") = 0.44 distance.apply("hippo", "elephant") = 0.44 distance.apply("hippo", "zzzzzzzz") = 0.0 distance.apply("hello", "hallo") = 0.88 distance.apply("ABC Corporation", "ABC Corp") = 0.93 distance.apply("D N H Enterprises Inc", "D & H Enterprises, Inc.") = 0.95 distance.apply("My Gym Children's Fitness Center", "My Gym. Childrens Fitness") = 0.92 distance.apply("PENNSYLVANIA", "PENNCISYLVNIA") = 0.88
apply
in interface SimilarityScore<Double>
left
- the first String, must not be nullright
- the second String, must not be nullIllegalArgumentException
- if either String input null
protected static int[] matches(CharSequence first, CharSequence second)
first
- the first string to be matchedsecond
- the second string to be machtedCopyright © 2014–2017 The Apache Software Foundation. All rights reserved.