Oracle® Database PL/SQL Packages and Types Reference 11g Release 2 (11.2) Part Number E25788-04 |
|
|
PDF · Mobi · ePub |
The UTL_MATCH
package facilitates matching two records. This is typically used to match names, such as two First Names or two Last Names.
This chapter contains the following topics:
Overview
Security Model
"Edit Distance" also known as "Levenshtein Distance "(named after the Russian scientist Vladimir Levenshtein, who devised the algorithm in 1965), is a measure of Similarity between two strings, s1 and s2. The distance is the number of insertions, deletions or substitutions required to transform s1 to s2.�
The Edit Distance between strings "shackleford" and "shackelford" = 2
The "Jaro-Winkler algorithm" is another way of calculating Edit distance between two strings. This method, developed at the U.S. Census, is a String Comparator measure that gives values of partial agreement between two strings. The string comparator accounts for length of strings and partially accounts for typical human errors made in alphanumeric strings.
Table 230-1 shows similarity values returned by Jaro-Winkler and Edit Distance
Table 230-1 Comparison between normalized values returned by Jaro-Winkler and Edit Distance algorithms
String 1 | String 2 | Jaro Winkler | Edit Distance |
---|---|---|---|
Dunningham |
Cunnigham |
89 |
80 |
Abroms |
Abrams |
92 |
83 |
Lampley |
Campley |
90 |
86 |
Marhta |
Martha |
96 |
67 |
Jonathon |
Jonathan |
95 |
88 |
Jeraldine |
Geraldine |
92 |
89 |
The UTL_MATCH
package runs with definer's rights. UTL_MATCH
must be created under SYS
. Operations provided by this package are performed with SYS
privileges.
Table 230-2 DBMS_ALERT Package Subprograms
Subprogram | Description |
---|---|
Calculates the number of changes required to transform string-1 into string-2 |
|
Calculates the number of changes required to transform string-1 into string-2, returning a value between 0 (no match) and 100 (perfect match) |
|
Calculates the measure of agreement between string-1 and string-2 |
|
Calculates the measure of agreement between string-1 and string-2, returning a value between 0 (no match) and 100 (perfect match) |
This function calculates the number of insertions, deletions or substitutions required to transform string-1 into string-2.
UTL_MATCH.EDIT_DISTANCE ( s1 IN VARCHAR2, s2 IN VARCHAR2) RETURN PLS_INTEGER;�
Table 230-3 EDIT_DISTANCE Function Parameters
Parameter | Description |
---|---|
|
The string to be transformed |
|
The string into which |
SELECT UTL_MATCH.EDIT_DISTANCE('shackleford', 'shackelford') FROM DUAL; ------------- returns 2
This function calculates the number of insertions, deletions or substations required to transform string-1 into string-2, and returns the Normalized value of the Edit Distance between two Strings. The value is typically between 0 (no match) and 100 (perfect match).�
UTL_MATCH.EDIT_DISTANCE_SIMILARITY ( s1 IN VARCHAR2, s2 IN VARCHAR2) RETURN PLS_INTEGER;�
Table 230-4 EDIT_DISTANCE_SIMILARITY Function Parameters
Parameter | Description |
---|---|
|
The string to be transformed |
|
The string into which |
SELECT UTL_MATCH.EDIT_DISTANCE_SIMILARITY('shackleford', 'shackelford') FROM DUAL; -------------- returns 82
This function calculates the measure of agreement between two strings.�
UTL_MATCH.JARO_WINKLER ( s1 IN VARCHAR2, s2 IN VARCHAR2) RETURN BINARY_DOUBLE;�
SELECT UTL_MATCH.JARO_WINKLER('shackleford', 'shackelford') FROM DUAL; -------------- returns 9.818E-001
This function calculates the measure of agreement between two strings, and returns a score between 0 (no match) and 100 (perfect match).�
UTL_MATCH.JARO_WINKLER ( s1 IN VARCHAR2, s2 IN VARCHAR2) RETURN PLS_INTEGER;�
SELECT UTL_MATCH.JARO_WINKLER_SIMILARITY('shackleford', 'shackelford') FROM DUAL; -------------- returns 98