Class Soundex
- All Implemented Interfaces:
Encoder
,StringEncoder
This class is thread-safe. Although not strictly immutable, the mutable fields are not actually used.
-
Field Summary
Modifier and TypeFieldDescriptionstatic final char
The marker character used to indicate a silent (ignored) character.static final Soundex
An instance of Soundex using the US_ENGLISH_MAPPING mapping.static final Soundex
An instance of Soundex using the mapping as per the Genealogy site: http://www.genealogy.com/articles/research/00000060.htmlstatic final String
This is a default mapping of the 26 letters used in US English.static final Soundex
An instance of Soundex using the Simplified Soundex mapping, as described here: http://west-penwith.org.uk/misc/soundex.htm -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionint
difference
(String s1, String s2) Encodes the Strings and returns the number of characters in the two encoded Strings that are the same.Encodes an Object using the soundex algorithm.Encodes a String using the soundex algorithm.int
Deprecated.This feature is not needed since the encoding size must be constant.void
setMaxLength
(int maxLength) Deprecated.This feature is not needed since the encoding size must be constant.Retrieves the Soundex code for a given String object.
-
Field Details
-
SILENT_MARKER
The marker character used to indicate a silent (ignored) character. These are ignored except when they appear as the first character.Note: the
US_ENGLISH_MAPPING_STRING
does not use this mechanism because changing it might break existing code. Mappings that don't contain a silent marker code are treated as though H and W are silent.To override this, use the
Soundex(String, boolean)
constructor.- Since:
- 1.11
- See Also:
-
US_ENGLISH_MAPPING_STRING
This is a default mapping of the 26 letters used in US English. A value of0
for a letter position means do not encode, but treat as a separator when it occurs between consonants with the same code.(This constant is provided as both an implementation convenience and to allow Javadoc to pick up the value for the constant values page.)
Note that letters H and W are treated specially. They are ignored (after the first letter) and don't act as separators between consonants with the same code.
- See Also:
-
US_ENGLISH
An instance of Soundex using the US_ENGLISH_MAPPING mapping. This treats H and W as silent letters. Apart from when they appear as the first letter, they are ignored. They don't act as separators between duplicate codes.- See Also:
-
US_ENGLISH_SIMPLIFIED
An instance of Soundex using the Simplified Soundex mapping, as described here: http://west-penwith.org.uk/misc/soundex.htmThis treats H and W the same as vowels (AEIOUY). Such letters aren't encoded (after the first), but they do act as separators when dropping duplicate codes. The mapping is otherwise the same as for
US_ENGLISH
- Since:
- 1.11
-
US_ENGLISH_GENEALOGY
An instance of Soundex using the mapping as per the Genealogy site: http://www.genealogy.com/articles/research/00000060.htmlThis treats vowels (AEIOUY), H and W as silent letters. Such letters are ignored (after the first) and do not act as separators when dropping duplicate codes.
The codes for consonants are otherwise the same as for
US_ENGLISH_MAPPING_STRING
andUS_ENGLISH_SIMPLIFIED
- Since:
- 1.11
-
-
Constructor Details
-
Soundex
public Soundex()Creates an instance using US_ENGLISH_MAPPING- See Also:
-
Soundex
Creates a soundex instance using the given mapping. This constructor can be used to provide an internationalized mapping for a non-Western character set.Every letter of the alphabet is "mapped" to a numerical value. This char array holds the values to which each letter is mapped. This implementation contains a default map for US_ENGLISH
If the mapping contains an instance of
SILENT_MARKER
then H and W are not given special treatment- Parameters:
mapping
- Mapping array to use when finding the corresponding code for a given character
-
Soundex
Creates a refined soundex instance using a custom mapping. This constructor can be used to customize the mapping, and/or possibly provide an internationalized mapping for a non-Western character set.If the mapping contains an instance of
SILENT_MARKER
then H and W are not given special treatment- Parameters:
mapping
- Mapping string to use when finding the corresponding code for a given character- Since:
- 1.4
-
Soundex
Creates a refined soundex instance using a custom mapping. This constructor can be used to customize the mapping, and/or possibly provide an internationalized mapping for a non-Western character set.- Parameters:
mapping
- Mapping string to use when finding the corresponding code for a given characterspecialCaseHW
- if true, then- Since:
- 1.11
-
-
Method Details
-
difference
Encodes the Strings and returns the number of characters in the two encoded Strings that are the same. This return value ranges from 0 through 4: 0 indicates little or no similarity, and 4 indicates strong similarity or identical values.- Parameters:
s1
- A String that will be encoded and compared.s2
- A String that will be encoded and compared.- Returns:
- The number of characters in the two encoded Strings that are the same from 0 to 4.
- Throws:
EncoderException
- if an error occurs encoding one of the strings- Since:
- 1.3
- See Also:
-
SoundexUtils.difference(StringEncoder,String,String)
- MS T-SQL DIFFERENCE
-
encode
Encodes an Object using the soundex algorithm. This method is provided in order to satisfy the requirements of the Encoder interface, and will throw an EncoderException if the supplied object is not of typeString
.- Specified by:
encode
in interfaceEncoder
- Parameters:
obj
- Object to encode- Returns:
- An object (or type
String
) containing the soundex code which corresponds to the String supplied. - Throws:
EncoderException
- if the parameter supplied is not of typeString
IllegalArgumentException
- if a character is not mapped
-
encode
Encodes a String using the soundex algorithm.- Specified by:
encode
in interfaceStringEncoder
- Parameters:
str
- A String object to encode- Returns:
- A Soundex code corresponding to the String supplied
- Throws:
IllegalArgumentException
- if a character is not mapped
-
getMaxLength
Deprecated.This feature is not needed since the encoding size must be constant. Will be removed in 2.0.Returns the maxLength. Standard Soundex- Returns:
- int
-
setMaxLength
Deprecated.This feature is not needed since the encoding size must be constant. Will be removed in 2.0.Sets the maxLength.- Parameters:
maxLength
- The maxLength to set
-
soundex
Retrieves the Soundex code for a given String object.- Parameters:
str
- String to encode using the Soundex algorithm- Returns:
- A soundex code for the String supplied
- Throws:
IllegalArgumentException
- if a character is not mapped
-