Showing 4 of total 4 results (show query)
k3jph
phonics:Phonetic Spelling Algorithms
Provides a collection of phonetic algorithms including Soundex, Metaphone, NYSIIS, Caverphone, and others. The package is documented in <doi:10.18637/jss.v095.i08>.
Maintained by James Howard. Last updated 4 years ago.
bsd-2-licenselinguisticsmetaphonenysiisphonetic-spelling-algorithmsphonicsrecord-linkagesoundextext-processingcpp
14.8 match 30 stars 7.18 score 56 scripts 3 dependentssym33
RecordLinkage:Record Linkage Functions for Linking and Deduplicating Data Sets
Provides functions for linking and deduplicating data sets. Methods based on a stochastic approach are implemented as well as classification algorithms from the machine learning domain. For details, see our paper "The RecordLinkage Package: Detecting Errors in Data" Sariyar M / Borg A (2010) <doi:10.32614/RJ-2010-017>.
Maintained by Murat Sariyar. Last updated 2 years ago.
3.3 match 6 stars 9.00 score 454 scripts 8 dependentsmarkvanderloo
stringdist:Approximate String Matching, Fuzzy Text Search, and String Distance Functions
Implements an approximate string matching version of R's native 'match' function. Also offers fuzzy text search based on various string distance measures. Can calculate various string distances based on edits (Damerau-Levenshtein, Hamming, Levenshtein, optimal sting alignment), qgrams (q- gram, cosine, jaccard distance) or heuristic metrics (Jaro, Jaro-Winkler). An implementation of soundex is provided as well. Distances can be computed between character vectors while taking proper care of encoding or between integer vectors representing generic sequences. This package is built for speed and runs in parallel by using 'openMP'. An API for C or C++ is exposed as well. Reference: MPJ van der Loo (2014) <doi:10.32614/RJ-2014-011>.
Maintained by Mark van der Loo. Last updated 3 months ago.
0.5 match 327 stars 15.54 score 2.0k scripts 179 dependentsdmarcelinobr
SoundexBR:Phonetic-Coding for Portuguese
The SoundexBR package provides an algorithm for decoding names into phonetic codes, as pronounced in Portuguese. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling. The algorithm mainly encodes consonants; a vowel will not be encoded unless it is the first letter. The soundex code resultant consists of a four digits long string composed by one letter followed by three numerical digits: the letter is the first letter of the name, and the digits encode the remaining consonants.
Maintained by Daniel Marcelino. Last updated 6 years ago.
0.5 match 13 stars 3.99 score 15 scripts