Class LevenshteinDistance

java.lang.Object
org.apache.commons.text.similarity.LevenshteinDistance
All Implemented Interfaces:
BiFunction<CharSequence,CharSequence,Integer>, EditDistance<Integer>, ObjectSimilarityScore<CharSequence,Integer>, SimilarityScore<Integer>

public class LevenshteinDistance extends Object implements EditDistance<Integer>
An algorithm for measuring the difference between two character sequences using the Levenshtein Distance.

This is the number of changes needed to change one sequence into another, where each change is a single character modification (deletion, insertion or substitution).

This code has been adapted from Apache Commons Lang 3.3.

Since:
1.0
See Also:
  • Constructor Details

    • LevenshteinDistance

      Deprecated.
      Constructs a default instance that uses a version of the algorithm that does not use a threshold parameter.
      See Also:
    • LevenshteinDistance

      public LevenshteinDistance(Integer threshold)
      Constructs a new instance. If the threshold is not null, distance calculations will be limited to a maximum length. If the threshold is null, the unlimited version of the algorithm will be used.
      Parameters:
      threshold - If this is null then distances calculations will not be limited. This may not be negative.
  • Method Details

    • getDefaultInstance

      Gets the default instance.
      Returns:
      The default instance.
    • apply

      public Integer apply(CharSequence left, CharSequence right)
      Computes the Levenshtein distance between two Strings.

      A higher score indicates a greater distance.

      Chas Emerick has written an implementation in Java, which avoids an OutOfMemoryError which can occur when my Java implementation is used with very large strings.

       distance.apply(null, *)             = Throws IllegalArgumentException
       distance.apply(*, null)             = Throws IllegalArgumentException
       distance.apply("","")               = 0
       distance.apply("","a")              = 1
       distance.apply("aaapppp", "")       = 7
       distance.apply("frog", "fog")       = 1
       distance.apply("fly", "ant")        = 3
       distance.apply("elephant", "hippo") = 7
       distance.apply("hippo", "elephant") = 7
       distance.apply("hippo", "zzzzzzzz") = 8
       distance.apply("hello", "hallo")    = 1
       
      Specified by:
      apply in interface BiFunction<CharSequence,CharSequence,Integer>
      Specified by:
      apply in interface ObjectSimilarityScore<CharSequence,Integer>
      Specified by:
      apply in interface SimilarityScore<Integer>
      Parameters:
      left - the first input, must not be null.
      right - the second input, must not be null.
      Returns:
      result distance, or -1.
      Throws:
      IllegalArgumentException - if either String input null.
    • apply

      public <E> Integer apply(SimilarityInput<E> left, SimilarityInput<E> right)
      Computes the Levenshtein distance between two inputs.

      A higher score indicates a greater distance.

       distance.apply(null, *)             = Throws IllegalArgumentException
       distance.apply(*, null)             = Throws IllegalArgumentException
       distance.apply("","")               = 0
       distance.apply("","a")              = 1
       distance.apply("aaapppp", "")       = 7
       distance.apply("frog", "fog")       = 1
       distance.apply("fly", "ant")        = 3
       distance.apply("elephant", "hippo") = 7
       distance.apply("hippo", "elephant") = 7
       distance.apply("hippo", "zzzzzzzz") = 8
       distance.apply("hello", "hallo")    = 1
       
      Type Parameters:
      E - The type of similarity score unit.
      Parameters:
      left - the first input, must not be null.
      right - the second input, must not be null.
      Returns:
      result distance, or -1.
      Throws:
      IllegalArgumentException - if either String input null.
      Since:
      1.13.0
    • getThreshold

      Gets the distance threshold.
      Returns:
      The distance threshold.