R-universe search: bpe

package

owner

contributor

author

maintainer

topic

needs

exports

data

Currently serving26314packages,22481articles, and64223datasets by1263organizations,13659 maintainers and22066 contributors.

Not sure what to search for? Why not try:maps, bayesian, ecology, climate, genome, gam, spatial, database, pdf, shiny, rstudio, machine learning, prediction, birds, fish, sports, ... (more popular topics)

Organizations

vimc

lcbc-uio

stan-dev

pharmaverse

r-spatial

tidyverse

ropengov

rstudio

r-lib

ropensci

bioc

r-forge

kwb-r

pik-piam

hypertidy

poissonconsulting

mrc-ide

pecanproject

tidymodels

insightsengineering

thinkr-open

inbo

mlr-org

ggseg

ohdsi

modeloriented

predictiveecology

paws-r

flr

ropenspain

bnosac

sciviews

rmi-pacta

repboxr

openvolley

mrcieu

nlmixr2

epiverse-trace

yulab-smu

frbcesab

ices-tools-prod

azure

appsilon

statnet

bips-hb

mlverse

riatelab

epiforecasts

rjdverse

cloudyr

tmsalab

usaid-oha-si

bupaverse

dreamrs

openpharma

hubverse-org

usepa

easystats

business-science

ambiorix-web

coatless-rpkg

merck

darwin-eu

certe-medical-epidemiology

hugheylab

rikenbit

uscbiostats

nutriverse

r-dbi

rsquaredacademy

bluegreen-labs

spatstat

terminological

data-cleaning

rspatial

biometris

gesistsa

nflverse

cogdisreslab

humaniverse

ocbe-uio

epicentre-msf

apache

ifpri

reconhub

ipeagit

ctu-bern

traitecoevo

dynverse

piecepackr

idslme

atsa-es

ecohealthalliance

lbbe-software

oxfordihtm

framverse

Want to learn more about r-universe? Have a look atropensci.org/r-universeor updates from the rOpenSci blog:

Better documentation for R-universe!February 28, 2025
R-Universe Named an R Consortium Top-Level ProjectDecember 3, 2024
Capturing Screenshots Programmatically With RSeptember 10, 2024
Navigating the R ecosystem using R-universeSeptember 24, 2024
A fresh new look for R-universe!June 12, 2024
R-Universe Documentation Gets a Boost from Google Season of DocsApril 12, 2024
R-universe now builds MacOS ARM64 binaries for use on Apple Silicon (aka M1/M2/M3) systemsJanuary 14, 2024
R-universe now builds WASM binaries for all R packagesNovember 17, 2023
The rOpenSci MultiverseNovember 6, 2023
CRAN-ial Expansion: Taking Your R Package Development to New Frontiers with R-UniverseSeptember 19, 2023
Meeting the Stars of the R-Universe: The R-Universe Against Diseases.September 15, 2023
My Life with the R-universeAugust 1, 2023
New cran.dev shortlinks to package information and documentationJuly 26, 2023
Meeting the Stars of the R-Universe: PEcAn, an Open Source Project to Take Care of the PlanetJune 6, 2023
Downloading snapshots and creating stable R packages repositories using r-universeMay 31, 2023
How r-universe searches for packages on CRAN / BioconductorApril 3, 2023
Meeting the Stars of the R-Universe: Researching Our Brain with the Magic of the R-UniverseMarch 30, 2023
Meeting the Stars of the R-universe: ThinkR's Approach to Contributing to a Growing and Friendly R CommunityFebruary 28, 2023
Discovering and learning everything there is to know about R packages using r-universeFebruary 27, 2023
New preferred repo name for r-universe registriesFebruary 7, 2023
Improved permanent URL schema for r-universe.devJanuary 30, 2023
postdoc 1.0: minimal and uncluttered HTML package manualsNovember 29, 2022
Meeting the stars of the R-universe: R Community, Exchange and LearnNovember 23, 2022
Searching and browsing the R universeMarch 23, 2022
A Blend of Package Build FailuresJanuary 31, 2022
How renv restores packages from r-universe for reproducibility or productionJanuary 6, 2022
RSS feeds of package updates in r-universeNovember 24, 2021
How I Test cffr on (about) 2,000 Packages using GitHub Actions and R-universeNovember 23, 2021
Generating and customizing badges in r-universeOctober 14, 2021
rOpenSci docs are now built on r-universeSeptember 3, 2021
How to create your personal CRAN-like repository on R-universeJune 22, 2021
Publishing and browsing articles on R-universeApril 9, 2021
rOpenSci's R-universe ProjectMay 25, 2021
A first look at the R-universe build infrastructureMarch 4, 2021
Moving away from Travis CINovember 19, 2020
How to precompute package vignettes or pkgdown articlesDecember 8, 2019

Showing 4 of total 4 results (show query)

bnosac

tokenizers.bpe:Byte Pair Encoding Text Tokenization

Unsupervised text tokenizer focused on computational efficiency. Wraps the 'YouTokenToMe' library <https://github.com/VKCOM/YouTokenToMe> which is an implementation of fast Byte Pair Encoding (BPE) <https://aclanthology.org/P16-1162/>.

Maintained by Jan Wijffels. Last updated 2 years ago.

bpe byte-pair-encoding text-mining tokenization cpp

52.3 match 15 stars 4.56 score 48 scripts

tidymodels

textrecipes:Extra 'Recipes' for Text Processing

Converting text to numerical features requires specifically created procedures, which are implemented as steps according to the 'recipes' package. These steps allows for tokenization, filtering, counting (tf and tfidf) and feature hashing.

Maintained by Emil Hvitfeldt. Last updated 9 days ago.

1.9 match 160 stars 10.87 score 964 scripts 1 dependents

davzim

rtiktoken:A Byte-Pair-Encoding (BPE) Tokenizer for OpenAI's Large Language Models

A thin wrapper around the tiktoken-rs crate, allowing to encode text into Byte-Pair-Encoding (BPE) tokens and decode tokens back to text. This is useful to understand how Large Language Models (LLMs) perceive text.

Maintained by David Zimmermann-Kollenda. Last updated 4 months ago.

rust cargo

3.3 match 11 stars 4.22 score 3 scripts

bnosac

sentencepiece:Text Tokenization using Byte Pair Encoding and Unigram Modelling

Unsupervised text tokenizer allowing to perform byte pair encoding and unigram modelling. Wraps the 'sentencepiece' library <https://github.com/google/sentencepiece> which provides a language independent tokenizer to split text in words and smaller subword units. The techniques are explained in the paper "SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing" by Taku Kudo and John Richardson (2018) <doi:10.18653/v1/D18-2012>. Provides as well straightforward access to pretrained byte pair encoding models and subword embeddings trained on Wikipedia using 'word2vec', as described in "BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages" by Benjamin Heinzerling and Michael Strube (2018) <http://www.lrec-conf.org/proceedings/lrec2018/pdf/1049.pdf>.

Maintained by Jan Wijffels. Last updated 2 years ago.

byte natural-language-processing sentencepiece word-segmentation cpp

2.6 match 25 stars 4.10 score 8 scripts