Class LevenshteinDetailedDistance

java.lang.Object
org.apache.commons.text.similarity.LevenshteinDetailedDistance
All Implemented Interfaces:
BiFunction<CharSequence,CharSequence,LevenshteinResults>, EditDistance<LevenshteinResults>, ObjectSimilarityScore<CharSequence,LevenshteinResults>, SimilarityScore<LevenshteinResults>

An algorithm for measuring the difference between two character sequences.

This is the number of changes needed to change one sequence into another, where each change is a single character modification (deletion, insertion or substitution).

Since:
1.0
  • Constructor Details

    • LevenshteinDetailedDistance

      Deprecated.

      This returns the default instance that uses a version of the algorithm that does not use a threshold parameter.

      See Also:
    • LevenshteinDetailedDistance

      If the threshold is not null, distance calculations will be limited to a maximum length.

      If the threshold is null, the unlimited version of the algorithm will be used.

      Parameters:
      threshold - If this is null then distances calculations will not be limited. This may not be negative.
  • Method Details

    • getDefaultInstance

      Gets the default instance.
      Returns:
      The default instace
    • apply

      Computes the Levenshtein distance between two Strings.

      A higher score indicates a greater distance.

      The previous implementation of the Levenshtein distance algorithm was from http://www.merriampark.com/ld.htm

      Chas Emerick has written an implementation in Java, which avoids an OutOfMemoryError which can occur when my Java implementation is used with very large strings.
      This implementation of the Levenshtein distance algorithm is from http://www.merriampark.com/ldjava.htm

       distance.apply(null, *)             = IllegalArgumentException
       distance.apply(*, null)             = IllegalArgumentException
       distance.apply("","")               = 0
       distance.apply("","a")              = 1
       distance.apply("aaapppp", "")       = 7
       distance.apply("frog", "fog")       = 1
       distance.apply("fly", "ant")        = 3
       distance.apply("elephant", "hippo") = 7
       distance.apply("hippo", "elephant") = 7
       distance.apply("hippo", "zzzzzzzz") = 8
       distance.apply("hello", "hallo")    = 1
       
      Specified by:
      apply in interface BiFunction<CharSequence,CharSequence,LevenshteinResults>
      Specified by:
      apply in interface ObjectSimilarityScore<CharSequence,LevenshteinResults>
      Specified by:
      apply in interface SimilarityScore<LevenshteinResults>
      Parameters:
      left - the first input, must not be null
      right - the second input, must not be null
      Returns:
      result distance, or -1
      Throws:
      IllegalArgumentException - if either String input null
    • apply

      public <E> LevenshteinResults apply(SimilarityInput<E> left, SimilarityInput<E> right)
      Computes the Levenshtein distance between two Strings.

      A higher score indicates a greater distance.

      The previous implementation of the Levenshtein distance algorithm was from http://www.merriampark.com/ld.htm

      Chas Emerick has written an implementation in Java, which avoids an OutOfMemoryError which can occur when my Java implementation is used with very large strings.
      This implementation of the Levenshtein distance algorithm is from http://www.merriampark.com/ldjava.htm

       distance.apply(null, *)             = IllegalArgumentException
       distance.apply(*, null)             = IllegalArgumentException
       distance.apply("","")               = 0
       distance.apply("","a")              = 1
       distance.apply("aaapppp", "")       = 7
       distance.apply("frog", "fog")       = 1
       distance.apply("fly", "ant")        = 3
       distance.apply("elephant", "hippo") = 7
       distance.apply("hippo", "elephant") = 7
       distance.apply("hippo", "zzzzzzzz") = 8
       distance.apply("hello", "hallo")    = 1
       
      Type Parameters:
      E - The type of similarity score unit.
      Parameters:
      left - the first input, must not be null
      right - the second input, must not be null
      Returns:
      result distance, or -1
      Throws:
      IllegalArgumentException - if either String input null
      Since:
      1.13.0
    • getThreshold

      Gets the distance threshold.
      Returns:
      The distance threshold