R-universe search: exports:wordcount

Showing 3 of total 3 results (show query)

wrathematics

ngram:Fast n-Gram 'Tokenization'

An n-gram is a sequence of n "words" taken, in order, from a body of text. This is a collection of utilities for creating, displaying, summarizing, and "babbling" n-grams. The 'tokenization' and "babbling" are handled by very efficient C code, which can even be built as its own standalone library. The babbler is a simple Markov chain. The package also offers a vignette with complete example 'workflows' and information about the utilities offered in the package.

Maintained by Drew Schmidt. Last updated 1 years ago.

ngram text text-mining

71 stars 10.45 score 844 scripts 7 dependents

ropensci

textreuse:Detect Text Reuse and Document Similarity

Tools for measuring similarity among documents and detecting passages which have been reused. Implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity functions; pairwise comparisons; minhash and locality sensitive hashing algorithms; and a version of the Smith-Waterman local alignment algorithm suitable for natural language.

Maintained by Yaoxiang Li. Last updated 1 months ago.

peer-reviewed cpp

200 stars 9.28 score 226 scripts

schwartstack

handyplots:Handy Plots

Several handy plots for quickly looking at the relationship between two numeric vectors of equal length. Quickly visualize scatter plots, residual plots, qq-plots, box plots, confidence intervals, and prediction intervals.

Maintained by Jonathan Schwartz. Last updated 6 years ago.

1.00 score 6 scripts