Showing 4 of total 4 results (show query)
doccstat
abseil:'C++' Header Files from 'Abseil'
Wraps the 'Abseil' 'C++' library for use by R packages. Original files are from <https://github.com/abseil/abseil-cpp>. Patches are located at <https://github.com/doccstat/abseil-r/tree/main/local/patches>.
Maintained by Xingchi Li. Last updated 12 months ago.
64.0 match 2 stars 3.00 scoremeztez
bigrquerystorage:An Interface to Google's 'BigQuery Storage' API
Easily talk to Google's 'BigQuery Storage' API from R (<https://cloud.google.com/bigquery/docs/reference/storage/rpc>).
Maintained by Bruno Tremblay. Last updated 2 months ago.
33.0 match 20 stars 5.52 score 8 scriptsbnosac
tokenizers.bpe:Byte Pair Encoding Text Tokenization
Unsupervised text tokenizer focused on computational efficiency. Wraps the 'YouTokenToMe' library <https://github.com/VKCOM/YouTokenToMe> which is an implementation of fast Byte Pair Encoding (BPE) <https://aclanthology.org/P16-1162/>.
Maintained by Jan Wijffels. Last updated 2 years ago.
bpebyte-pair-encodingtext-miningtokenizationcpp
1.5 match 15 stars 4.56 score 48 scriptsbnosac
sentencepiece:Text Tokenization using Byte Pair Encoding and Unigram Modelling
Unsupervised text tokenizer allowing to perform byte pair encoding and unigram modelling. Wraps the 'sentencepiece' library <https://github.com/google/sentencepiece> which provides a language independent tokenizer to split text in words and smaller subword units. The techniques are explained in the paper "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing" by Taku Kudo and John Richardson (2018) <doi:10.18653/v1/D18-2012>. Provides as well straightforward access to pretrained byte pair encoding models and subword embeddings trained on Wikipedia using 'word2vec', as described in "BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages" by Benjamin Heinzerling and Michael Strube (2018) <http://www.lrec-conf.org/proceedings/lrec2018/pdf/1049.pdf>.
Maintained by Jan Wijffels. Last updated 2 years ago.
bytenatural-language-processingsentencepieceword-segmentationcpp
1.5 match 25 stars 4.10 score 8 scripts