Class LevenshteinDetailedDistance

java.lang.Object
org.apache.commons.text.similarity.LevenshteinDetailedDistance
All Implemented Interfaces:
BiFunction<CharSequence,CharSequence,LevenshteinResults>, EditDistance<LevenshteinResults>, ObjectSimilarityScore<CharSequence,LevenshteinResults>, SimilarityScore<LevenshteinResults>

An algorithm for measuring the difference between two character sequences.

This is the number of changes needed to change one sequence into another, where each change is a single character modification (deletion, insertion or substitution).

Since:
1.0
  • Constructor Details

    • LevenshteinDetailedDistance

      Deprecated.
      Constructs a new instance that uses a version of the algorithm that does not use a threshold parameter.
      See Also:
    • LevenshteinDetailedDistance

      Constructs a new instance for a threshold.

      If the threshold is not null, distance calculations will be limited to a maximum length.

      If the threshold is null, the unlimited version of the algorithm will be used.

      Parameters:
      threshold - If this is null then distances calculations will not be limited. This may not be negative.
  • Method Details

    • getDefaultInstance

      Gets the default instance.
      Returns:
      The default instace
    • apply

      Computes the Levenshtein distance between two Strings.

      A higher score indicates a greater distance.

      Chas Emerick has written an implementation in Java, which avoids an OutOfMemoryError which can occur when my Java implementation is used with very large strings.

       distance.apply(null, *)             = Throws IllegalArgumentException
       distance.apply(*, null)             = Throws IllegalArgumentException
       distance.apply("","")               = 0
       distance.apply("","a")              = 1
       distance.apply("aaapppp", "")       = 7
       distance.apply("frog", "fog")       = 1
       distance.apply("fly", "ant")        = 3
       distance.apply("elephant", "hippo") = 7
       distance.apply("hippo", "elephant") = 7
       distance.apply("hippo", "zzzzzzzz") = 8
       distance.apply("hello", "hallo")    = 1
       
      Specified by:
      apply in interface BiFunction<CharSequence,CharSequence,LevenshteinResults>
      Specified by:
      apply in interface ObjectSimilarityScore<CharSequence,LevenshteinResults>
      Specified by:
      apply in interface SimilarityScore<LevenshteinResults>
      Parameters:
      left - the first input, must not be null.
      right - the second input, must not be null.
      Returns:
      result distance, or -1.
      Throws:
      IllegalArgumentException - if either String input null.
    • apply

      public <E> LevenshteinResults apply(SimilarityInput<E> left, SimilarityInput<E> right)
      Computes the Levenshtein distance between two Strings.

      A higher score indicates a greater distance.

      Chas Emerick has written an implementation in Java, which avoids an OutOfMemoryError which can occur when my Java implementation is used with very large strings.

       distance.apply(null, *)             = Throws IllegalArgumentException
       distance.apply(*, null)             = Throws IllegalArgumentException
       distance.apply("","")               = 0
       distance.apply("","a")              = 1
       distance.apply("aaapppp", "")       = 7
       distance.apply("frog", "fog")       = 1
       distance.apply("fly", "ant")        = 3
       distance.apply("elephant", "hippo") = 7
       distance.apply("hippo", "elephant") = 7
       distance.apply("hippo", "zzzzzzzz") = 8
       distance.apply("hello", "hallo")    = 1
       
      Type Parameters:
      E - The type of similarity score unit.
      Parameters:
      left - the first input, must not be null.
      right - the second input, must not be null.
      Returns:
      result distance, or -1.
      Throws:
      IllegalArgumentException - if either String input null.
      Since:
      1.13.0
    • getThreshold

      Gets the distance threshold.
      Returns:
      The distance threshold.