See: Description
Interface  Description 

EditDistance<R> 
Interface for Edit Distances.

SimilarityScore<R> 
Interface for the concept of a string similarity score.

Class  Description 

CosineDistance 
Measures the cosine distance between two character sequences.

CosineSimilarity 
Measures the Cosine similarity of two vectors of an inner product space and
compares the angle between them.

EditDistanceFrom<R> 
This stores a
EditDistance implementation and a CharSequence "left" string. 
FuzzyScore 
A matching algorithm that is similar to the searching algorithms implemented in editors such
as Sublime Text, TextMate, Atom and others.

HammingDistance 
The hamming distance between two strings of equal length is the number of
positions at which the corresponding symbols are different.

IntersectionResult 
Represents the intersection result between two sets.

IntersectionSimilarity<T> 
Measures the intersection of two sets created from a pair of character sequences.

JaccardDistance 
Measures the Jaccard distance of two sets of character sequence.

JaccardSimilarity 
Measures the Jaccard similarity (aka Jaccard index) of two sets of character
sequence.

JaroWinklerDistance 
Measures the JaroWinkler distance of two character sequences.

JaroWinklerSimilarity 
A similarity algorithm indicating the percentage of matched characters between two character sequences.

LevenshteinDetailedDistance 
An algorithm for measuring the difference between two character sequences.

LevenshteinDistance 
An algorithm for measuring the difference between two character sequences.

LevenshteinResults 
Container class to store Levenshtein distance between two character sequences.

LongestCommonSubsequence 
A similarity algorithm indicating the length of the longest common subsequence between two strings.

LongestCommonSubsequenceDistance 
An edit distance algorithm based on the length of the longest common subsequence between two strings.

SimilarityScoreFrom<R> 
This stores a
SimilarityScore implementation and a CharSequence "left" string. 
Provides algorithms for string similarity.
The algorithms that implement the EditDistance interface follow the same simple principle: the more similar (closer) strings are, lower is the distance. For example, the words house and hose are closer than house and trousers.
The following algorithms are available at the moment:
Cosine Distance
Cosine Similarity
Fuzzy Score
Hamming Distance
JaroWinkler Distance
JaroWinkler Similarity
Levenshtein Distance
Longest Common Subsequence Distance
The Cosine Distance
utilises a regular expression tokenizer (\w+)
.
And the Levenshtein Distance
's
behavior can be changed to take into consideration a maximum throughput.
Copyright © 2014–2022 The Apache Software Foundation. All rights reserved.