public class LongestCommonSubsequence extends Object implements SimilarityScore<Integer>
The Longest common subsequence algorithm returns the length of the longest subsequence that two strings have in
common. Two strings that are entirely different, return a value of 0, and two strings that return a value
of the commonly shared length implies that the strings are completely the same in value and position.
Note. Generally this algorithm is fairly inefficient, as for length m, n of the input
CharSequence's left and right respectively, the runtime of the
algorithm is O(m*n).
This implementation is based on the Longest Commons Substring algorithm from https://en.wikipedia.org/wiki/Longest_common_subsequence_problem.
For further reading see:
Lothaire, M. Applied combinatorics on words. New York: Cambridge U Press, 2005. 12-13
| Constructor and Description |
|---|
LongestCommonSubsequence() |
| Modifier and Type | Method and Description |
|---|---|
Integer |
apply(CharSequence left,
CharSequence right)
Calculates longestCommonSubsequence similarity score of two
CharSequence's passed as
input. |
CharSequence |
logestCommonSubsequence(CharSequence left,
CharSequence right)
Computes the longestCommonSubsequence between the two
CharSequence's passed as
input. |
int[][] |
longestCommonSubstringLengthArray(CharSequence left,
CharSequence right)
Computes the lcsLengthArray for the sake of doing the actual lcs calculation.
|
public LongestCommonSubsequence()
public Integer apply(CharSequence left, CharSequence right)
CharSequence's passed as
input.apply in interface SimilarityScore<Integer>left - first character sequenceright - second character sequenceIllegalArgumentException - if either String input nullpublic CharSequence logestCommonSubsequence(CharSequence left, CharSequence right)
CharSequence's passed as
input.
Note, a substring and
subsequence are not necessarily the same thing. Indeed, abcxyzqrs and
xyzghfm have both the same common substring and subsequence, namely xyz. However,
axbyczqrs and abcxyzqtv have the longest common subsequence xyzq because a
subsequence need not have adjacent characters.
For reference, we give the definition of a subsequence for the reader: a subsequence is a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements.
left - first character sequenceright - second character sequenceIllegalArgumentException - if either String input nullpublic int[][] longestCommonSubstringLengthArray(CharSequence left, CharSequence right)
left - first character sequenceright - second character sequenceCopyright © 2014–2017 The Apache Software Foundation. All rights reserved.