public class LongestCommonSubsequence extends Object implements SimilarityScore<Integer>
The Longest common subsequence algorithm returns the length of the longest subsequence that two strings have in
common. Two strings that are entirely different, return a value of 0, and two strings that return a value
of the commonly shared length implies that the strings are completely the same in value and position.
Note. Generally this algorithm is fairly inefficient, as for length m, n of the input
CharSequence
's left
and right
respectively, the runtime of the
algorithm is O(m*n).
This implementation is based on the Longest Commons Substring algorithm from https://en.wikipedia.org/wiki/Longest_common_subsequence_problem.
For further reading see:
Lothaire, M. Applied combinatorics on words. New York: Cambridge U Press, 2005. 12-13
Constructor and Description |
---|
LongestCommonSubsequence() |
Modifier and Type | Method and Description |
---|---|
Integer |
apply(CharSequence left,
CharSequence right)
Calculates longestCommonSubsequence similarity score of two
CharSequence 's passed as
input. |
CharSequence |
logestCommonSubsequence(CharSequence left,
CharSequence right)
Computes the longestCommonSubsequence between the two
CharSequence 's passed as
input. |
int[][] |
longestCommonSubstringLengthArray(CharSequence left,
CharSequence right)
Computes the lcsLengthArray for the sake of doing the actual lcs calculation.
|
public LongestCommonSubsequence()
public Integer apply(CharSequence left, CharSequence right)
CharSequence
's passed as
input.apply
in interface SimilarityScore<Integer>
left
- first character sequenceright
- second character sequenceIllegalArgumentException
- if either String input null
public CharSequence logestCommonSubsequence(CharSequence left, CharSequence right)
CharSequence
's passed as
input.
Note, a substring and
subsequence are not necessarily the same thing. Indeed, abcxyzqrs
and
xyzghfm
have both the same common substring and subsequence, namely xyz
. However,
axbyczqrs
and abcxyzqtv
have the longest common subsequence xyzq
because a
subsequence need not have adjacent characters.
For reference, we give the definition of a subsequence for the reader: a subsequence is a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements.
left
- first character sequenceright
- second character sequenceIllegalArgumentException
- if either String input null
public int[][] longestCommonSubstringLengthArray(CharSequence left, CharSequence right)
left
- first character sequenceright
- second character sequenceCopyright © 2014–2017 The Apache Software Foundation. All rights reserved.