Showing 51 of total 51 results (show query)
gagolews
stringi:Fast and Portable Character String Processing Facilities
A collection of character string/text/natural language processing tools for pattern searching (e.g., with 'Java'-like regular expressions or the 'Unicode' collation algorithm), random string generation, case mapping, string transliteration, concatenation, sorting, padding, wrapping, Unicode normalisation, date-time formatting and parsing, and many more. They are fast, consistent, convenient, and - thanks to 'ICU' (International Components for Unicode) - portable across all locales and platforms. Documentation about 'stringi' is provided via its website at <https://stringi.gagolewski.com/> and the paper by Gagolewski (2022, <doi:10.18637/jss.v103.i02>).
Maintained by Marek Gagolewski. Last updated 1 months ago.
icuicu4cnatural-language-processingnlpregexregexpstring-manipulationstringistringrtexttext-processingtidy-dataunicodecpp
17.1 match 309 stars 18.31 score 10k scripts 8.6k dependentskurthornik
Unicode:Unicode Data and Utilities
Data from Unicode 15.1.0 and related utilities.
Maintained by Kurt Hornik. Last updated 1 years ago.
76.4 match 3.89 score 107 scripts 4 dependentsrichierocks
rebus.unicode:Unicode Extensions for the 'rebus' Package
Build regular expressions piece by piece using human readable code. This package contains Unicode functionality, and is primarily intended to be used by package developers.
Maintained by Richard Cotton. Last updated 8 years ago.
41.2 match 3.78 score 4 scripts 4 dependentspatperry
utf8:Unicode Text Processing
Process and print 'UTF-8' encoded international text (Unicode). Input, validate, normalize, encode, format, and display.
Maintained by Kirill Müller. Last updated 3 months ago.
9.4 match 113 stars 16.48 score 295 scripts 11k dependentsrolkra
utf8ify:Format Text Using Unicode
Format text (bold, italic, ...) and numbers using UTF-8. Offers functions to search for emojis and include them in your text.
Maintained by Roland Krasser. Last updated 2 months ago.
21.2 match 2 stars 4.30 scoregagolews
stringx:Replacements for Base String Functions Powered by 'stringi'
English is the native language for only 5% of the World population. Also, only 17% of us can understand this text. Moreover, the Latin alphabet is the main one for merely 36% of the total. The early computer era, now a very long time ago, was dominated by the US. Due to the proliferation of the internet, smartphones, social media, and other technologies and communication platforms, this is no longer the case. This package replaces base R string functions (such as grep(), tolower(), sprintf(), and strptime()) with ones that fully support the Unicode standards related to natural language and date-time processing. It also fixes some long-standing inconsistencies, and introduces some new, useful features. Thanks to 'ICU' (International Components for Unicode) and 'stringi', they are fast, reliable, and portable across different platforms.
Maintained by Marek Gagolewski. Last updated 2 months ago.
icuicu4cnatural-language-processingnlpregexregexpstring-manipulationstringitexttext-processingunicode
10.8 match 28 stars 4.75 score 1 scriptsr-lib
cli:Helpers for Developing Command Line Interfaces
A suite of tools to build attractive command line interfaces ('CLIs'), from semantic elements: headings, lists, alerts, paragraphs, etc. Supports custom themes via a 'CSS'-like language. It also contains a number of lower level 'CLI' elements: rules, boxes, trees, and 'Unicode' symbols with 'ASCII' alternatives. It support ANSI colors and text styles as well.
Maintained by Gábor Csárdi. Last updated 17 hours ago.
2.2 match 664 stars 19.33 score 1.4k scripts 14k dependentsrstudio
reticulate:Interface to 'Python'
Interface to 'Python' modules, classes, and functions. When calling into 'Python', R data types are automatically converted to their equivalent 'Python' types. When values are returned from 'Python' to R they are converted back to R types. Compatible with all versions of 'Python' >= 2.7.
Maintained by Tomasz Kalinowski. Last updated 1 days ago.
1.9 match 1.7k stars 21.07 score 18k scripts 427 dependentstidyverse
tidyverse:Easily Install and Load the 'Tidyverse'
The 'tidyverse' is a set of packages that work in harmony because they share common data representations and 'API' design. This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step. Learn more about the 'tidyverse' at <https://www.tidyverse.org>.
Maintained by Hadley Wickham. Last updated 5 months ago.
1.8 match 1.7k stars 20.26 score 664k scripts 125 dependentspursuitofdatascience
tidyEmoji:Discovers Emoji from Text
Unicodes are not friendly to work with, and not all Unicodes are Emoji per se, making obtaining Emoji statistics a difficult task. This tool can help your experience of working with Emoji as smooth as possible, as it has the 'tidyverse' style.
Maintained by Youzhi Yu. Last updated 2 years ago.
8.1 match 2 stars 4.00 score 7 scriptsropensci
skimr:Compact and Flexible Summaries of Data
A simple to use summary function that can be used with pipes and displays nicely in the console. The default summary statistics may be modified by the user as can the default formatting. Support for data frames and vectors is included, and users can implement their own skim methods for specific object types as described in a vignette. Default summaries include support for inline spark graphs. Instructions for managing these on specific operating systems are given in the "Using skimr" vignette and the README.
Maintained by Elin Waring. Last updated 2 months ago.
peer-reviewedropenscisummary-statisticsunconfunconf17
1.9 match 1.1k stars 16.80 score 18k scripts 14 dependentscoolbutuseless
lofifonts:Text Rendering with Bitmap and Vector Fonts
Alternate font rendering is useful when rendering text to novel graphics outputs where modern font rendering is not available or where bespoke text positioning is required. Bitmap and vector fonts allow for custom layout and rendering using pixel coordinates and line drawing. Formatted text is created as a data.frame of pixel coordinates (for bitmap fonts) or stroke coordinates (for vector fonts). All text can be easily previewed as a matrix or raster image. A selection of fonts is included with this package.
Maintained by Mike Cheng. Last updated 23 days ago.
5.3 match 7 stars 5.94 score 10 scriptsmelff
RKernel:Yet another R kernel for Jupyter
Provides a kernel for Jupyter.
Maintained by Martin Elff. Last updated 14 days ago.
jupyterjupyter-kerneljupyter-kernelsjupyter-notebook
6.8 match 38 stars 4.60 scorer-lib
clisymbols:Unicode Symbols at the R Prompt
A small subset of Unicode symbols, that are useful when building command line applications. They fall back to alternatives on terminals that do not support Unicode. Many symbols were taken from the 'figures' 'npm' package (see <https://github.com/sindresorhus/figures>).
Maintained by Gábor Csárdi. Last updated 4 months ago.
3.9 match 82 stars 7.61 score 32 scripts 19 dependentstrevorld
bittermelon:Bitmap Tools
Provides functions for creating, modifying, and displaying bitmaps including printing them in the terminal. There is a special emphasis on monochrome bitmap fonts and their glyphs as well as colored pixel art/sprites. Provides native read/write support for the 'hex' and 'yaff' bitmap font formats and if 'monobit' <https://github.com/robhagemans/monobit> is installed can also read/write several additional bitmap font formats.
Maintained by Trevor L. Davis. Last updated 2 months ago.
4.1 match 6 stars 6.26 score 2 dependentsgadenbuie
ermoji:RStudio Addin to Search and Copy Emoji
RStudio addin to search through emoji and copy the emoji name, unicode string or glyph to the clipboard.
Maintained by Garrick Aden-Buie. Last updated 4 years ago.
emojiemoji-pickeremoji-unicoderstudiorstudio-addin
8.0 match 26 stars 3.11 score 1 scriptscsids
csdata:Structural Data for Norway
Datasets relating to population in municipalities, municipality/county matching, and how different municipalities have merged/redistricted over time from 2006 to 2024.
Maintained by Richard Aubrey White. Last updated 6 months ago.
4.0 match 5.78 score 6 scripts 2 dependentsdernarr
ndl:Naive Discriminative Learning
Naive discriminative learning implements learning and classification models based on the Rescorla-Wagner equations and their equilibrium equations.
Maintained by Tino Sering. Last updated 7 years ago.
6.9 match 1 stars 3.00 score 66 scriptseasystats
insight:Easy Access to Model Information for Various Model Objects
A tool to provide an easy, intuitive and consistent access to information contained in various R models, like model formulas, model terms, information about random effects, data that was used to fit the model or data from response variables. 'insight' mainly revolves around two types of functions: Functions that find (the names of) information, starting with 'find_', and functions that get the underlying data, starting with 'get_'. The package has a consistent syntax and works with many different model objects, where otherwise functions to access these information are missing.
Maintained by Daniel Lüdecke. Last updated 4 days ago.
easystatshacktoberfestinsightmodelsnamespredictorsrandom
1.2 match 412 stars 17.24 score 568 scripts 210 dependentsbioc
CellBench:Construct Benchmarks for Single Cell Analysis Methods
This package contains infrastructure for benchmarking analysis methods and access to single cell mixture benchmarking data. It provides a framework for organising analysis methods and testing combinations of methods in a pipeline without explicitly laying out each combination. It also provides utilities for sampling and filtering SingleCellExperiment objects, constructing lists of functions with varying parameters, and multithreaded evaluation of analysis methods.
Maintained by Shian Su. Last updated 5 months ago.
softwareinfrastructuresinglecellbenchmarkbioinformatics
2.0 match 30 stars 8.71 score 98 scriptstrinker
lexicon:Lexicons for Text Analysis
A collection of lexical hash tables, dictionaries, and word lists.
Maintained by Tyler Rinker. Last updated 3 years ago.
hashlexiconlookupnames-frequentstopwordstext-dictionariestext-mining
1.5 match 111 stars 8.80 score 224 scripts 25 dependentskwb-r
kwb.utils:General Utility Functions Developed at KWB
This package contains some small helper functions that aim at improving the quality of code developed at Kompetenzzentrum Wasser gGmbH (KWB).
Maintained by Hauke Sonnenberg. Last updated 12 months ago.
1.8 match 8 stars 7.33 score 12 scripts 78 dependentsrich-iannone
i18n:Internationalization Data from the 'Unicode CLDR' in Tabular Form
Up-to-date data from the 'Unicode CLDR Project' (where 'CLDR' stands for 'Common Locale Data Repository') are available here as a series of easy-to-parse datasets. Several functions are provided for extracting key elements from the tabular datasets.
Maintained by Richard Iannone. Last updated 9 months ago.
3.4 match 10 stars 3.70 score 9 scriptsdbosak01
reporter:Creates Statistical Reports
Contains functions to create regulatory-style statistical reports. Originally designed to create tables, listings, and figures for the pharmaceutical, biotechnology, and medical device industries, these reports are generalized enough that they could be used in any industry. Generates text, rich-text, PDF, HTML, and Microsoft Word file formats. The package specializes in printing wide and long tables with automatic page wrapping and splitting. Reports can be produced with a minimum of function calls, and without relying on other table packages. The package supports titles, footnotes, page header, page footers, spanning headers, page by variables, and automatic page numbering.
Maintained by David Bosak. Last updated 12 months ago.
1.3 match 16 stars 9.35 score 173 scripts 4 dependentshneth
ds4psy:Data Science for Psychologists
All datasets and functions required for the examples and exercises of the book "Data Science for Psychologists" (by Hansjoerg Neth, Konstanz University, 2023), freely available at <https://bookdown.org/hneth/ds4psy/>. The book and course introduce principles and methods of data science to students of psychology and other biological or social sciences. The 'ds4psy' package primarily provides datasets, but also functions for data generation and manipulation (e.g., of text and time data) and graphics that are used in the book and its exercises. All functions included in 'ds4psy' are designed to be explicit and instructive, rather than efficient or elegant.
Maintained by Hansjoerg Neth. Last updated 1 months ago.
data-literacydata-scienceeducationexploratory-data-analysispsychologysocial-sciencesvisualisation
1.7 match 22 stars 6.79 score 70 scriptscmann3
cursr:Cursor and Terminal Manipulation
A toolbox for developing applications, games, simulations, or agent-based models in the R terminal. Included functions allow users to move the cursor around the terminal screen, change text colors and attributes, clear the screen, hide and show the cursor, map key presses to functions, draw shapes and curves, among others. Most functionalities require users to be in a terminal (not the R GUI).
Maintained by Chris Mann. Last updated 4 years ago.
4.5 match 2.30 score 2 scriptsrpuggaardrode
praatpicture:'Praat Picture' Style Plots of Acoustic Data
Quickly and easily generate plots of acoustic data aligned with transcriptions similar to those made in 'Praat' using either derived signals generated directly in R with 'wrassp' or imported derived signals from 'Praat'. Provides easy and fast out-of-the-box solutions but also a high extent of flexibility. Also provides options for embedding audio in figures and animating figures.
Maintained by Rasmus Puggaard-Rode. Last updated 20 days ago.
1.8 match 29 stars 5.28 score 3 scriptsdschuhmacher
kanjistat:A Statistical Framework for the Analysis of Japanese Kanji Characters
Various tools and data sets that support the study of kanji, including their morphology, decomposition and concepts of distance and similarity between them.
Maintained by Dominic Schuhmacher. Last updated 9 months ago.
1.9 match 4 stars 4.90 score 6 scriptsrobsteranium
csvwr:Read and Write CSV on the Web (CSVW) Tables and Metadata
Provide functions for reading and writing CSVW - i.e. CSV tables and JSON metadata. The metadata helps interpret CSV by setting the types and variable names.
Maintained by Robin Gower. Last updated 1 years ago.
1.7 match 15 stars 4.88 score 10 scriptsrossellhayes
and:Construct Natural-Language Lists with Internationalization
Construct language-aware lists. Make "and"-separated and "or"-separated lists that automatically conform to the user's language settings.
Maintained by Alexander Rossell Hayes. Last updated 18 days ago.
i18ninternationalizationtranslation
1.6 match 20 stars 5.01 score 6 scripts 2 dependentsrmi-pacta
pacta.loanbook:Easily Install and Load PACTA for Banks Packages
PACTA (Paris Agreement Capital Transition Assessment) for Banks is a tool that allows banks to calculate the climate alignment of their corporate lending portfolios. This package is designed to make it easy to install and load multiple PACTA for Banks packages in a single step. It also provides thorough documentation - the PACTA for Banks cookbook at <https://rmi-pacta.github.io/pacta.loanbook/articles/cookbook_overview.html> - on how to run a PACTA for Banks analysis. This covers prerequisites for the analysis, the separate steps of running the analysis, the interpretation of PACTA for Banks results, and advanced use cases.
Maintained by Jacob Kastl. Last updated 2 days ago.
1.7 match 1 stars 4.68 score 12 scriptsmatanhakim
rtlr:Print Right-to-Left Languages Correctly
Convenience functions to make some common tasks with right-to-left string printing easier, more convenient and with no need to remember long Unicode characters. Specifically helpful for right-to-left languages such as Arabic, Persian and Hebrew.
Maintained by Matan Hakim. Last updated 2 years ago.
2.3 match 5 stars 3.40 score 5 scriptskurthornik
tau:Text Analysis Utilities
Utilities for text analysis.
Maintained by Kurt Hornik. Last updated 5 months ago.
1.9 match 4.02 score 115 scripts 6 dependentsmatutosi
moranajp:Morphological Analysis for Japanese
Supports morphological analysis for Japanese by using 'MeCab' <https://taku910.github.io/mecab/>, 'Sudachi' <https://github.com/WorksApplications/Sudachi>, 'Chamame' <https://chamame.ninjal.ac.jp/>, or 'Ginza' <https://github.com/megagonlabs/ginza>. Can input a data.frame and obtain all results of 'MeCab' and the row number of the original data.frame as a text id.
Maintained by Toshikazu Matsumura. Last updated 8 months ago.
1.8 match 4.13 score 17 scriptsrmi-pacta
pactaverse:Easily Install and Load the 'PACTA-verse'
The 'pactaverse' is a set of packages that work to help R users implement various functionality related to the PACTA open source project.
Maintained by Jackson Hoffart. Last updated 3 months ago.
1.8 match 6 stars 3.73 score 3 scriptsmacmillancontentscience
piecemaker:Tools for Preparing Text for Tokenizers
Tokenizers break text into pieces that are more usable by machine learning models. Many tokenizers share some preparation steps. This package provides those shared steps, along with a simple tokenizer.
Maintained by Jon Harmon. Last updated 2 years ago.
1.9 match 3.48 score 6 scripts 2 dependentsnielsenrich
arabicStemR:Arabic Stemmer for Text Analysis
Allows users to stem Arabic texts for text analysis.
Maintained by Rich Nielsen. Last updated 3 years ago.
3.5 match 2 stars 1.82 score 33 scriptsr-lib
brio:Basic R Input Output
Functions to handle basic input output, these functions always read and write UTF-8 (8-bit Unicode Transformation Format) files and provide more explicit control over line endings.
Maintained by Gábor Csárdi. Last updated 7 months ago.
0.5 match 56 stars 12.00 score 39 scripts 526 dependentscran
PersianStemmer:Persian Stemmer for Text Analysis
Allows users to stem Persian texts for text analysis.
Maintained by Roozbeh Safshekan. Last updated 6 years ago.
3.5 match 1.78 score 2 dependentscran
NISTunits:Fundamental Physical Constants and Unit Conversions from NIST
Fundamental physical constants (Quantity, Value, Uncertainty, Unit) for SI (International System of Units) and non-SI units, plus unit conversions Based on the data from NIST (National Institute of Standards and Technology, USA)
Maintained by Jose Gama. Last updated 9 years ago.
1.8 match 2.85 score 10 dependentsalistaire47
passport:Travel Smoothly Between Country Name and Code Formats
Smooths the process of working with country names and codes via powerful parsing, standardization, and conversion utilities arranged in a simple, consistent API. Country name formats include multiple sources including the Unicode Common Locale Data Repository (CLDR, <http://cldr.unicode.org/>) common-sense standardized names in hundreds of languages.
Maintained by Edward Visel. Last updated 4 years ago.
country-codescountry-datacountry-names
0.8 match 35 stars 6.17 score 28 scripts 1 dependentstrevorld
hexfont:'GNU Unifont' Hex Fonts
Contains most of the hex font files from the 'GNU Unifont Project' <https://unifoundry.com/unifont/> compressed by 'xz'. 'GNU Unifont' is a duospaced bitmap font that attempts to cover all the official Unicode glyphs plus several of the artificial scripts in the '(Under-)ConScript Unicode Registry' <https://www.kreativekorp.com/ucsur/>. Provides a convenience function for loading in several of them at the same time as a 'bittermelon' bitmap font object for easy rendering of the glyphs in an 'R' terminal or graphics device.
Maintained by Trevor L. Davis. Last updated 4 days ago.
0.8 match 12 stars 5.08 scorebenjaminwolfe
signs:Insert Proper Minus Signs
Provides convenience functions to replace hyphen-minuses (ASCII 45) with proper minus signs (Unicode character 2212). The true minus matches the plus symbol in width, line thickness, and height above the baseline. It was designed for mathematics, looks better in presentation, and is understood properly by screen readers.
Maintained by Benjamin E. Wolfe. Last updated 5 years ago.
0.5 match 18 stars 6.36 score 28 scripts 3 dependentsecodynizw
vietnameseConverter:Convert Vietnamese Encodings
Conversion of characters from unsupported Vietnamese character encodings to Unicode characters. These Vietnamese encodings (TCVN3, VISCII, VPS) are not natively supported in R and lead to printing of wrong characters and garbled text (mojibake). This package fixes that problem and provides readable output with the correct Unicode characters (with or without diacritics).
Maintained by Juergen Niedballa. Last updated 3 years ago.
0.8 match 2 stars 4.00 score 4 scriptscran
WhatsR:Parsing, Anonymizing and Visualizing Exported 'WhatsApp' Chat Logs
Imports 'WhatsApp' chat logs and parses them into a usable dataframe object. The parser works on chats exported from Android or iOS phones and on Linux, macOS and Windows. The parser has multiple options for extracting smileys and emojis from the messages, extracting URLs and domains from the messages, extracting names and types of sent media files from the messages, extracting timestamps from messages, extracting and anonymizing author names from messages. Can be used to create anonymized versions of data.
Maintained by Julian Kohne. Last updated 1 years ago.
1.7 match 1.70 score 3 scriptspaithiov909
audubon:Japanese Text Processing Tools
A collection of Japanese text processing tools for filling Japanese iteration marks, Japanese character type conversions, segmentation by phrase, and text normalization which is based on rules for the 'Sudachi' morphological analyzer and the 'NEologd' (Neologism dictionary for 'MeCab'). These features are specific to Japanese and are not implemented in 'ICU' (International Components for Unicode).
Maintained by Akiru Kato. Last updated 21 days ago.
0.5 match 10 stars 5.61 score 3 scripts 1 dependentsmdlincoln
salty:Turn Clean Data into Messy Data
Take real or simulated data and salt it with errors commonly found in the wild, such as pseudo-OCR errors, Unicode problems, numeric fields with nonsensical punctuation, bad dates, etc.
Maintained by Matthew Lincoln. Last updated 7 months ago.
0.5 match 64 stars 4.81 score 20 scriptskurthornik
RKEAjars:R/KEA Interface Jars
External jars required for package RKEA.
Maintained by Kurt Hornik. Last updated 5 years ago.
1.5 match 1.48 score 1 scripts 1 dependentschris31415926535
tardis:Text Analysis with Rules and Dictionaries for Inferring Sentiment
Measure text's sentiment with dictionaries and simple rules covering negations and modifiers. User-supplied dictionaries are supported, including Unicode emojis and multi-word tokens, so this package can also be used to study constructs beyond sentiment.
Maintained by Christopher Belanger. Last updated 2 years ago.
0.5 match 2 stars 4.00 score 10 scripts