R-universe search: unicode

gagolews

stringi:Fast and Portable Character String Processing Facilities

A collection of character string/text/natural language processing tools for pattern searching (e.g., with 'Java'-like regular expressions or the 'Unicode' collation algorithm), random string generation, case mapping, string transliteration, concatenation, sorting, padding, wrapping, Unicode normalisation, date-time formatting and parsing, and many more. They are fast, consistent, convenient, and - thanks to 'ICU' (International Components for Unicode) - portable across all locales and platforms. Documentation about 'stringi' is provided via its website at <https://stringi.gagolewski.com/> and the paper by Gagolewski (2022, <doi:10.18637/jss.v103.i02>).

Maintained by Marek Gagolewski. Last updated 1 months ago.

icu icu4c natural-language-processing nlp regex regexp string-manipulation stringi stringr text text-processing tidy-data unicode cpp

17.1 match 309 stars 18.31 score 10k scripts 8.6k dependents

kurthornik

Unicode:Unicode Data and Utilities

Data from Unicode 15.1.0 and related utilities.

Maintained by Kurt Hornik. Last updated 1 years ago.

76.4 match 3.89 score 107 scripts 4 dependents

richierocks

rebus.unicode:Unicode Extensions for the 'rebus' Package

Build regular expressions piece by piece using human readable code. This package contains Unicode functionality, and is primarily intended to be used by package developers.

Maintained by Richard Cotton. Last updated 8 years ago.

41.2 match 3.78 score 4 scripts 4 dependents

patperry

utf8:Unicode Text Processing

Process and print 'UTF-8' encoded international text (Unicode). Input, validate, normalize, encode, format, and display.

Maintained by Kirill Müller. Last updated 3 months ago.

9.4 match 113 stars 16.48 score 295 scripts 11k dependents

rolkra

utf8ify:Format Text Using Unicode

Format text (bold, italic, ...) and numbers using UTF-8. Offers functions to search for emojis and include them in your text.

Maintained by Roland Krasser. Last updated 2 months ago.

21.2 match 2 stars 4.30 score

gagolews

stringx:Replacements for Base String Functions Powered by 'stringi'

English is the native language for only 5% of the World population. Also, only 17% of us can understand this text. Moreover, the Latin alphabet is the main one for merely 36% of the total. The early computer era, now a very long time ago, was dominated by the US. Due to the proliferation of the internet, smartphones, social media, and other technologies and communication platforms, this is no longer the case. This package replaces base R string functions (such as grep(), tolower(), sprintf(), and strptime()) with ones that fully support the Unicode standards related to natural language and date-time processing. It also fixes some long-standing inconsistencies, and introduces some new, useful features. Thanks to 'ICU' (International Components for Unicode) and 'stringi', they are fast, reliable, and portable across different platforms.

Maintained by Marek Gagolewski. Last updated 2 months ago.

icu icu4c natural-language-processing nlp regex regexp string-manipulation stringi text text-processing unicode

10.8 match 28 stars 4.75 score 1 scripts

r-lib

cli:Helpers for Developing Command Line Interfaces

A suite of tools to build attractive command line interfaces ('CLIs'), from semantic elements: headings, lists, alerts, paragraphs, etc. Supports custom themes via a 'CSS'-like language. It also contains a number of lower level 'CLI' elements: rules, boxes, trees, and 'Unicode' symbols with 'ASCII' alternatives. It support ANSI colors and text styles as well.

Maintained by Gábor Csárdi. Last updated 17 hours ago.

cli

2.2 match 664 stars 19.33 score 1.4k scripts 14k dependents

rstudio

reticulate:Interface to 'Python'

Interface to 'Python' modules, classes, and functions. When calling into 'Python', R data types are automatically converted to their equivalent 'Python' types. When values are returned from 'Python' to R they are converted back to R types. Compatible with all versions of 'Python' >= 2.7.

Maintained by Tomasz Kalinowski. Last updated 1 days ago.

cpp

1.9 match 1.7k stars 21.07 score 18k scripts 427 dependents

tidyverse

tidyverse:Easily Install and Load the 'Tidyverse'

The 'tidyverse' is a set of packages that work in harmony because they share common data representations and 'API' design. This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step. Learn more about the 'tidyverse' at <https://www.tidyverse.org>.

Maintained by Hadley Wickham. Last updated 5 months ago.

data-science tidyverse

1.8 match 1.7k stars 20.26 score 664k scripts 125 dependents

pursuitofdatascience

tidyEmoji:Discovers Emoji from Text

Unicodes are not friendly to work with, and not all Unicodes are Emoji per se, making obtaining Emoji statistics a difficult task. This tool can help your experience of working with Emoji as smooth as possible, as it has the 'tidyverse' style.

Maintained by Youzhi Yu. Last updated 2 years ago.

8.1 match 2 stars 4.00 score 7 scripts

ropensci

skimr:Compact and Flexible Summaries of Data

A simple to use summary function that can be used with pipes and displays nicely in the console. The default summary statistics may be modified by the user as can the default formatting. Support for data frames and vectors is included, and users can implement their own skim methods for specific object types as described in a vignette. Default summaries include support for inline spark graphs. Instructions for managing these on specific operating systems are given in the "Using skimr" vignette and the README.

Maintained by Elin Waring. Last updated 2 months ago.

peer-reviewed ropensci summary-statistics unconf unconf17

1.9 match 1.1k stars 16.80 score 18k scripts 14 dependents

coolbutuseless

lofifonts:Text Rendering with Bitmap and Vector Fonts

Alternate font rendering is useful when rendering text to novel graphics outputs where modern font rendering is not available or where bespoke text positioning is required. Bitmap and vector fonts allow for custom layout and rendering using pixel coordinates and line drawing. Formatted text is created as a data.frame of pixel coordinates (for bitmap fonts) or stroke coordinates (for vector fonts). All text can be easily previewed as a matrix or raster image. A selection of fonts is included with this package.

Maintained by Mike Cheng. Last updated 23 days ago.

5.3 match 7 stars 5.94 score 10 scripts

melff

RKernel:Yet another R kernel for Jupyter

Provides a kernel for Jupyter.

Maintained by Martin Elff. Last updated 14 days ago.

jupyter jupyter-kernel jupyter-kernels jupyter-notebook

6.8 match 38 stars 4.60 score

r-lib

clisymbols:Unicode Symbols at the R Prompt

A small subset of Unicode symbols, that are useful when building command line applications. They fall back to alternatives on terminals that do not support Unicode. Many symbols were taken from the 'figures' 'npm' package (see <https://github.com/sindresorhus/figures>).

Maintained by Gábor Csárdi. Last updated 4 months ago.

3.9 match 82 stars 7.61 score 32 scripts 19 dependents

trevorld

bittermelon:Bitmap Tools

Provides functions for creating, modifying, and displaying bitmaps including printing them in the terminal. There is a special emphasis on monochrome bitmap fonts and their glyphs as well as colored pixel art/sprites. Provides native read/write support for the 'hex' and 'yaff' bitmap font formats and if 'monobit' <https://github.com/robhagemans/monobit> is installed can also read/write several additional bitmap font formats.

Maintained by Trevor L. Davis. Last updated 2 months ago.

4.1 match 6 stars 6.26 score 2 dependents

gadenbuie

ermoji:RStudio Addin to Search and Copy Emoji

RStudio addin to search through emoji and copy the emoji name, unicode string or glyph to the clipboard.

Maintained by Garrick Aden-Buie. Last updated 4 years ago.

emoji emoji-picker emoji-unicode rstudio rstudio-addin

8.0 match 26 stars 3.11 score 1 scripts

csids

csdata:Structural Data for Norway

Datasets relating to population in municipalities, municipality/county matching, and how different municipalities have merged/redistricted over time from 2006 to 2024.

Maintained by Richard Aubrey White. Last updated 6 months ago.

csverse

4.0 match 5.78 score 6 scripts 2 dependents

dernarr

ndl:Naive Discriminative Learning

Naive discriminative learning implements learning and classification models based on the Rescorla-Wagner equations and their equilibrium equations.

Maintained by Tino Sering. Last updated 7 years ago.

cpp

6.9 match 1 stars 3.00 score 66 scripts

easystats

insight:Easy Access to Model Information for Various Model Objects

A tool to provide an easy, intuitive and consistent access to information contained in various R models, like model formulas, model terms, information about random effects, data that was used to fit the model or data from response variables. 'insight' mainly revolves around two types of functions: Functions that find (the names of) information, starting with 'find_', and functions that get the underlying data, starting with 'get_'. The package has a consistent syntax and works with many different model objects, where otherwise functions to access these information are missing.

Maintained by Daniel Lüdecke. Last updated 4 days ago.

easystats hacktoberfest insight models names predictors random

1.2 match 412 stars 17.24 score 568 scripts 210 dependents

bioc

CellBench:Construct Benchmarks for Single Cell Analysis Methods

This package contains infrastructure for benchmarking analysis methods and access to single cell mixture benchmarking data. It provides a framework for organising analysis methods and testing combinations of methods in a pipeline without explicitly laying out each combination. It also provides utilities for sampling and filtering SingleCellExperiment objects, constructing lists of functions with varying parameters, and multithreaded evaluation of analysis methods.

Maintained by Shian Su. Last updated 5 months ago.

software infrastructure singlecell benchmark bioinformatics

2.0 match 30 stars 8.71 score 98 scripts

trinker

lexicon:Lexicons for Text Analysis

A collection of lexical hash tables, dictionaries, and word lists.

Maintained by Tyler Rinker. Last updated 3 years ago.

hash lexicon lookup names-frequent stopwords text-dictionaries text-mining

1.5 match 111 stars 8.80 score 224 scripts 25 dependents

kwb-r

kwb.utils:General Utility Functions Developed at KWB

This package contains some small helper functions that aim at improving the quality of code developed at Kompetenzzentrum Wasser gGmbH (KWB).

Maintained by Hauke Sonnenberg. Last updated 12 months ago.

1.8 match 8 stars 7.33 score 12 scripts 78 dependents

rich-iannone

i18n:Internationalization Data from the 'Unicode CLDR' in Tabular Form

Up-to-date data from the 'Unicode CLDR Project' (where 'CLDR' stands for 'Common Locale Data Repository') are available here as a series of easy-to-parse datasets. Several functions are provided for extracting key elements from the tabular datasets.

Maintained by Richard Iannone. Last updated 9 months ago.

3.4 match 10 stars 3.70 score 9 scripts

dbosak01

reporter:Creates Statistical Reports

Contains functions to create regulatory-style statistical reports. Originally designed to create tables, listings, and figures for the pharmaceutical, biotechnology, and medical device industries, these reports are generalized enough that they could be used in any industry. Generates text, rich-text, PDF, HTML, and Microsoft Word file formats. The package specializes in printing wide and long tables with automatic page wrapping and splitting. Reports can be produced with a minimum of function calls, and without relying on other table packages. The package supports titles, footnotes, page header, page footers, spanning headers, page by variables, and automatic page numbering.

Maintained by David Bosak. Last updated 12 months ago.

report reporting reports rptr

1.3 match 16 stars 9.35 score 173 scripts 4 dependents

hneth

ds4psy:Data Science for Psychologists

All datasets and functions required for the examples and exercises of the book "Data Science for Psychologists" (by Hansjoerg Neth, Konstanz University, 2023), freely available at <https://bookdown.org/hneth/ds4psy/>. The book and course introduce principles and methods of data science to students of psychology and other biological or social sciences. The 'ds4psy' package primarily provides datasets, but also functions for data generation and manipulation (e.g., of text and time data) and graphics that are used in the book and its exercises. All functions included in 'ds4psy' are designed to be explicit and instructive, rather than efficient or elegant.

Maintained by Hansjoerg Neth. Last updated 1 months ago.

data-literacy data-science education exploratory-data-analysis psychology social-sciences visualisation

1.7 match 22 stars 6.79 score 70 scripts

cmann3

cursr:Cursor and Terminal Manipulation

A toolbox for developing applications, games, simulations, or agent-based models in the R terminal. Included functions allow users to move the cursor around the terminal screen, change text colors and attributes, clear the screen, hide and show the cursor, map key presses to functions, draw shapes and curves, among others. Most functionalities require users to be in a terminal (not the R GUI).

Maintained by Chris Mann. Last updated 4 years ago.

4.5 match 2.30 score 2 scripts

rpuggaardrode

praatpicture:'Praat Picture' Style Plots of Acoustic Data

Quickly and easily generate plots of acoustic data aligned with transcriptions similar to those made in 'Praat' using either derived signals generated directly in R with 'wrassp' or imported derived signals from 'Praat'. Provides easy and fast out-of-the-box solutions but also a high extent of flexibility. Also provides options for embedding audio in figures and animating figures.

Maintained by Rasmus Puggaard-Rode. Last updated 20 days ago.

1.8 match 29 stars 5.28 score 3 scripts

dschuhmacher

kanjistat:A Statistical Framework for the Analysis of Japanese Kanji Characters

Various tools and data sets that support the study of kanji, including their morphology, decomposition and concepts of distance and similarity between them.

Maintained by Dominic Schuhmacher. Last updated 9 months ago.

cpp

1.9 match 4 stars 4.90 score 6 scripts

robsteranium

csvwr:Read and Write CSV on the Web (CSVW) Tables and Metadata

Provide functions for reading and writing CSVW - i.e. CSV tables and JSON metadata. The metadata helps interpret CSV by setting the types and variable names.

Maintained by Robin Gower. Last updated 1 years ago.

csvw

1.7 match 15 stars 4.88 score 10 scripts

rossellhayes

and:Construct Natural-Language Lists with Internationalization

Construct language-aware lists. Make "and"-separated and "or"-separated lists that automatically conform to the user's language settings.

Maintained by Alexander Rossell Hayes. Last updated 18 days ago.

i18n internationalization translation

1.6 match 20 stars 5.01 score 6 scripts 2 dependents

rmi-pacta

pacta.loanbook:Easily Install and Load PACTA for Banks Packages

PACTA (Paris Agreement Capital Transition Assessment) for Banks is a tool that allows banks to calculate the climate alignment of their corporate lending portfolios. This package is designed to make it easy to install and load multiple PACTA for Banks packages in a single step. It also provides thorough documentation - the PACTA for Banks cookbook at <https://rmi-pacta.github.io/pacta.loanbook/articles/cookbook_overview.html> - on how to run a PACTA for Banks analysis. This covers prerequisites for the analysis, the separate steps of running the analysis, the interpretation of PACTA for Banks results, and advanced use cases.

Maintained by Jacob Kastl. Last updated 2 days ago.

1.7 match 1 stars 4.68 score 12 scripts

matanhakim

rtlr:Print Right-to-Left Languages Correctly

Convenience functions to make some common tasks with right-to-left string printing easier, more convenient and with no need to remember long Unicode characters. Specifically helpful for right-to-left languages such as Arabic, Persian and Hebrew.

Maintained by Matan Hakim. Last updated 2 years ago.

2.3 match 5 stars 3.40 score 5 scripts

kurthornik

tau:Text Analysis Utilities

Utilities for text analysis.

Maintained by Kurt Hornik. Last updated 5 months ago.

1.9 match 4.02 score 115 scripts 6 dependents

matutosi

moranajp:Morphological Analysis for Japanese

Supports morphological analysis for Japanese by using 'MeCab' <https://taku910.github.io/mecab/>, 'Sudachi' <https://github.com/WorksApplications/Sudachi>, 'Chamame' <https://chamame.ninjal.ac.jp/>, or 'Ginza' <https://github.com/megagonlabs/ginza>. Can input a data.frame and obtain all results of 'MeCab' and the row number of the original data.frame as a text id.

Maintained by Toshikazu Matsumura. Last updated 8 months ago.

1.8 match 4.13 score 17 scripts

rmi-pacta

pactaverse:Easily Install and Load the 'PACTA-verse'

The 'pactaverse' is a set of packages that work to help R users implement various functionality related to the PACTA open source project.

Maintained by Jackson Hoffart. Last updated 3 months ago.

1.8 match 6 stars 3.73 score 3 scripts

macmillancontentscience

piecemaker:Tools for Preparing Text for Tokenizers

Tokenizers break text into pieces that are more usable by machine learning models. Many tokenizers share some preparation steps. This package provides those shared steps, along with a simple tokenizer.

Maintained by Jon Harmon. Last updated 2 years ago.

1.9 match 3.48 score 6 scripts 2 dependents

nielsenrich

arabicStemR:Arabic Stemmer for Text Analysis

Allows users to stem Arabic texts for text analysis.

Maintained by Rich Nielsen. Last updated 3 years ago.

3.5 match 2 stars 1.82 score 33 scripts

r-lib

brio:Basic R Input Output

Functions to handle basic input output, these functions always read and write UTF-8 (8-bit Unicode Transformation Format) files and provide more explicit control over line endings.

Maintained by Gábor Csárdi. Last updated 7 months ago.

0.5 match 56 stars 12.00 score 39 scripts 526 dependents

cran

PersianStemmer:Persian Stemmer for Text Analysis

Allows users to stem Persian texts for text analysis.

Maintained by Roozbeh Safshekan. Last updated 6 years ago.

3.5 match 1.78 score 2 dependents

cran

NISTunits:Fundamental Physical Constants and Unit Conversions from NIST

Fundamental physical constants (Quantity, Value, Uncertainty, Unit) for SI (International System of Units) and non-SI units, plus unit conversions Based on the data from NIST (National Institute of Standards and Technology, USA)

Maintained by Jose Gama. Last updated 9 years ago.

1.8 match 2.85 score 10 dependents

alistaire47

passport:Travel Smoothly Between Country Name and Code Formats

Smooths the process of working with country names and codes via powerful parsing, standardization, and conversion utilities arranged in a simple, consistent API. Country name formats include multiple sources including the Unicode Common Locale Data Repository (CLDR, <http://cldr.unicode.org/>) common-sense standardized names in hundreds of languages.

Maintained by Edward Visel. Last updated 4 years ago.

country-codes country-data country-names

0.8 match 35 stars 6.17 score 28 scripts 1 dependents

trevorld

hexfont:'GNU Unifont' Hex Fonts

Contains most of the hex font files from the 'GNU Unifont Project' <https://unifoundry.com/unifont/> compressed by 'xz'. 'GNU Unifont' is a duospaced bitmap font that attempts to cover all the official Unicode glyphs plus several of the artificial scripts in the '(Under-)ConScript Unicode Registry' <https://www.kreativekorp.com/ucsur/>. Provides a convenience function for loading in several of them at the same time as a 'bittermelon' bitmap font object for easy rendering of the glyphs in an 'R' terminal or graphics device.

Maintained by Trevor L. Davis. Last updated 4 days ago.

0.8 match 12 stars 5.08 score

benjaminwolfe

signs:Insert Proper Minus Signs

Provides convenience functions to replace hyphen-minuses (ASCII 45) with proper minus signs (Unicode character 2212). The true minus matches the plus symbol in width, line thickness, and height above the baseline. It was designed for mathematics, looks better in presentation, and is understood properly by screen readers.

Maintained by Benjamin E. Wolfe. Last updated 5 years ago.

0.5 match 18 stars 6.36 score 28 scripts 3 dependents

ecodynizw

vietnameseConverter:Convert Vietnamese Encodings

Conversion of characters from unsupported Vietnamese character encodings to Unicode characters. These Vietnamese encodings (TCVN3, VISCII, VPS) are not natively supported in R and lead to printing of wrong characters and garbled text (mojibake). This package fixes that problem and provides readable output with the correct Unicode characters (with or without diacritics).

Maintained by Juergen Niedballa. Last updated 3 years ago.

0.8 match 2 stars 4.00 score 4 scripts

cran

WhatsR:Parsing, Anonymizing and Visualizing Exported 'WhatsApp' Chat Logs

Imports 'WhatsApp' chat logs and parses them into a usable dataframe object. The parser works on chats exported from Android or iOS phones and on Linux, macOS and Windows. The parser has multiple options for extracting smileys and emojis from the messages, extracting URLs and domains from the messages, extracting names and types of sent media files from the messages, extracting timestamps from messages, extracting and anonymizing author names from messages. Can be used to create anonymized versions of data.

Maintained by Julian Kohne. Last updated 1 years ago.

openjdk

1.7 match 1.70 score 3 scripts

paithiov909

audubon:Japanese Text Processing Tools

A collection of Japanese text processing tools for filling Japanese iteration marks, Japanese character type conversions, segmentation by phrase, and text normalization which is based on rules for the 'Sudachi' morphological analyzer and the 'NEologd' (Neologism dictionary for 'MeCab'). These features are specific to Japanese and are not implemented in 'ICU' (International Components for Unicode).

Maintained by Akiru Kato. Last updated 21 days ago.

japanese javascript

0.5 match 10 stars 5.61 score 3 scripts 1 dependents

mdlincoln

salty:Turn Clean Data into Messy Data

Take real or simulated data and salt it with errors commonly found in the wild, such as pseudo-OCR errors, Unicode problems, numeric fields with nonsensical punctuation, bad dates, etc.

Maintained by Matthew Lincoln. Last updated 7 months ago.

0.5 match 64 stars 4.81 score 20 scripts

chambm

AhoCorasickTrie:Fast Searching for Multiple Keywords in Multiple Texts

Aho-Corasick is an optimal algorithm for finding many keywords in a text. It can locate all matches in a text in O(N+M) time; i.e., the time needed scales linearly with the number of keywords (N) and the size of the text (M). Compare this to the naive approach which takes O(N*M) time to loop through each pattern and scan for it in the text. This implementation builds the trie (the generic name of the data structure) and runs the search in a single function call. If you want to search multiple texts with the same trie, the function will take a list or vector of texts and return a list of matches to each text. By default, all 128 ASCII characters are allowed in both the keywords and the text. A more efficient trie is possible if the alphabet size can be reduced. For example, DNA sequences use at most 19 distinct characters and usually only 4; protein sequences use at most 26 distinct characters and usually only 20. UTF-8 (Unicode) matching is not currently supported.

Maintained by Matt Chambers. Last updated 1 months ago.

cpp

0.5 match 10 stars 4.65 score 15 scripts 2 dependents

kurthornik

RKEAjars:R/KEA Interface Jars

External jars required for package RKEA.

Maintained by Kurt Hornik. Last updated 5 years ago.

openjdk

1.5 match 1.48 score 1 scripts 1 dependents

chris31415926535

tardis:Text Analysis with Rules and Dictionaries for Inferring Sentiment

Measure text's sentiment with dictionaries and simple rules covering negations and modifiers. User-supplied dictionaries are supported, including Unicode emojis and multi-word tokens, so this package can also be used to study constructs beyond sentiment.

Maintained by Christopher Belanger. Last updated 2 years ago.

nlp sentiment-analysis cpp

0.5 match 2 stars 4.00 score 10 scripts

kevin444

greekLetters:Routines for Writing Greek Letters and Mathematical Symbols on the 'RStudio' and 'RGui'

An implementation of functions to display Greek letters on the 'RStudio' (include subscript and superscript indexes) and 'RGui' (without subscripts and only with superscript 1, 2 or 3; because 'RGui' doesn't support printing the corresponding Unicode characters as a string: all subscripts ranging from 0 to 9 and superscripts equal to 0, 4, 5, 6, 7, 8 or 9). The functions in this package do not work properly on the R console. Characters are used via Unicode and encoded as UTF-8 to ensure that they can be viewed on all operating systems. Other characters related to mathematics are included, such as the infinity symbol. All this accessible from very simple commands. This is a package that can be used for teaching purposes, the statistical notation for hypothesis testing can be written from this package and so it is possible to build a course from the 'swirlify' package. Another utility of this package is to create new summary functions that contain the functional form of the model adjusted with the Greek letters, thus making the transition from statistical theory to practice easier. In addition, it is a natural extension of the 'clisymbols' package.

Maintained by Kevin Allan Sales Rodrigues. Last updated 5 months ago.

0.8 match 2.18 score 51 scripts 1 dependents