Showing 15 of total 15 results (show query)
apache
arrow:Integration to 'Apache' 'Arrow'
'Apache' 'Arrow' <https://arrow.apache.org/> is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. This package provides an interface to the 'Arrow C++' library.
Maintained by Jonathan Keane. Last updated 2 months ago.
15k stars 19.25 score 10k scripts 82 dependentsquanteda
quanteda:Quantitative Analysis of Textual Data
A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.
Maintained by Kenneth Benoit. Last updated 3 months ago.
corpusnatural-language-processingquantedatext-analyticsonetbbcpp
851 stars 16.65 score 5.4k scripts 52 dependentsropensci
hunspell:High-Performance Stemmer, Tokenizer, and Spell Checker
Low level spell checker and morphological analyzer based on the famous 'hunspell' library <https://hunspell.github.io>. The package can analyze or check individual words as well as parse text, latex, html or xml documents. For a more user-friendly interface use the 'spelling' package which builds on this package to automate checking of files, documentation and vignettes in all common formats.
Maintained by Jeroen Ooms. Last updated 7 days ago.
hunspellspell-checkspellcheckerstemmertokenizercpp
112 stars 13.23 score 422 scripts 30 dependentscjvanlissa
tidySEM:Tidy Structural Equation Modeling
A tidy workflow for generating, estimating, reporting, and plotting structural equation models using 'lavaan', 'OpenMx', or 'Mplus'. Throughout this workflow, elements of syntax, results, and graphs are represented as 'tidy' data, making them easy to customize. Includes functionality to estimate latent class analyses, and to plot 'dagitty' and 'igraph' objects.
Maintained by Caspar J. van Lissa. Last updated 21 days ago.
58 stars 10.69 score 330 scripts 1 dependentstidymodels
embed:Extra Recipes for Encoding Predictors
Predictors can be converted to one or more numeric representations using a variety of methods. Effect encodings using simple generalized linear models <doi:10.48550/arXiv.1611.09477> or nonlinear models <doi:10.48550/arXiv.1604.06737> can be used. There are also functions for dimension reduction and other approaches.
Maintained by Emil Hvitfeldt. Last updated 2 months ago.
142 stars 9.35 score 1.1k scriptsdbosak01
libr:Libraries, Data Dictionaries, and a Data Step for R
Contains a set of functions to create data libraries, generate data dictionaries, and simulate a data step. The libname() function will load a directory of data into a library in one line of code. The dictionary() function will generate data dictionaries for individual data frames or an entire library. And the datestep() function will perform row-by-row data processing.
Maintained by David Bosak. Last updated 3 months ago.
27 stars 8.27 score 48 scripts 2 dependentsrjdverse
rjd3toolkit:Utility Functions around 'JDemetra+ 3.0'
R Interface to 'JDemetra+ 3.x' (<https://github.com/jdemetra>) time series analysis software. It provides functions allowing to model time series (create outlier regressors, user-defined calendar regressors, UCARIMA models...), to test the presence of trading days or seasonal effects and also to set specifications in pre-adjustment and benchmarking when using rjd3x13 or rjd3tramoseats.
Maintained by Tanguy Barthelemy. Last updated 5 months ago.
javajdemetraseasonal-adjustmenttime-seriestimeseriesopenjdk
6 stars 5.74 score 48 scripts 16 dependentsvgherard
kgrams:Classical k-gram Language Models
Training and evaluating k-gram language models in R, supporting several probability smoothing techniques, perplexity computations, random text generation and more.
Maintained by Valerio Gherardi. Last updated 5 months ago.
language-modelsn-gramsnatural-language-processingcpp
7 stars 5.17 score 14 scripts 1 dependentsnalimilan
R.temis:Integrated Text Mining Solution
An integrated solution to perform a series of text mining tasks such as importing and cleaning a corpus, and analyses like terms and documents counts, lexical summary, terms co-occurrences and documents similarity measures, graphs of terms, correspondence analysis and hierarchical clustering. Corpora can be imported from spreadsheet-like files, directories of raw text files, as well as from 'Dow Jones Factiva', 'LexisNexis', 'Europresse' and 'Alceste' files.
Maintained by Milan Bouchet-Valat. Last updated 1 days ago.
28 stars 5.00 score 24 scriptsvgherard
sbo:Text Prediction via Stupid Back-Off N-Gram Models
Utilities for training and evaluating text predictors based on Stupid Back-Off N-gram models (Brants et al., 2007, <https://www.aclweb.org/anthology/D07-1090/>).
Maintained by Valerio Gherardi. Last updated 4 years ago.
natural-language-processingngram-modelspredictive-textsbocpp
10 stars 4.78 score 12 scriptsmelff
RKernel:Yet another R kernel for Jupyter
Provides a kernel for Jupyter.
Maintained by Martin Elff. Last updated 10 days ago.
jupyterjupyter-kerneljupyter-kernelsjupyter-notebook
39 stars 4.63 scorebioc
XNAString:Efficient Manipulation of Modified Oligonucleotide Sequences
The XNAString package allows for description of base sequences and associated chemical modifications in a single object. XNAString is able to capture single stranded, as well as double stranded molecules. Chemical modifications are represented as independent strings associated with different features of the molecules (base sequence, sugar sequence, backbone sequence, modifications) and can be read or written to a HELM notation. It also enables secondary structure prediction using RNAfold from ViennaRNA. XNAString is designed to be efficient representation of nucleic-acid based therapeutics, therefore it stores information about target sequences and provides interface for matching and alignment functions from Biostrings and pwalign packages.
Maintained by Marianna Plucinska. Last updated 5 months ago.
sequencematchingalignmentsequencinggeneticscpp
4.18 score 4 scriptshaghish
HMDA:Holistic Multimodel Domain Analysis for Exploratory Machine Learning
Holistic Multimodel Domain Analysis (HMDA) is a robust and transparent framework designed for exploratory machine learning research, aiming to enhance the process of feature assessment and selection. HMDA addresses key limitations of traditional machine learning methods by evaluating the consistency across multiple high-performing models within a fine-tuned modeling grid, thereby improving the interpretability and reliability of feature importance assessments. Specifically, it computes Weighted Mean SHapley Additive exPlanations (WMSHAP), which aggregate feature contributions from multiple models based on weighted performance metrics. HMDA also provides confidence intervals to demonstrate the stability of these feature importance estimates. This framework is particularly beneficial for analyzing complex, multidimensional datasets common in health research, supporting reliable exploration of mental health outcomes such as suicidal ideation, suicide attempts, and other psychological conditions. Additionally, HMDA includes automated procedures for feature selection based on WMSHAP ratios and performs dimension reduction analyses to identify underlying structures among features. For more details see Haghish (2025) <doi:10.13140/RG.2.2.32473.63846>.
Maintained by E. F. Haghish. Last updated 16 hours ago.
ensemble-feature-importanceexplainable-aiexplainable-artificial-intelligenceexplainable-machine-learningexplainable-mlexploratory-machine-learningexploratory-modellingfeature-importancefeature-selection-methodsholistic-modelingholistic-multimodel-domain-analysismultimodel-ensemblereproducible-aireproducible-researchrobust-feature-selectionshapley-additive-explanationsshapley-valuestransparent-aiweighted-mean-shapwmshap
1 stars 3.48 scorebrendensm
misuvi:Access the Michigan Substance Use Vulnerability Index (MI-SUVI)
Easily import the MI-SUVI data sets. The user can import data sets with full metrics, percentiles, Z-scores, or rankings. Data is available at both the County and Zip Code Tabulation Area (ZCTA) levels. This package also includes a function to import shape files for easy mapping and a function to access the full technical documentation. All data is sourced from the Michigan Department of Health and Human Services.
Maintained by Brenden Smith. Last updated 2 months ago.
3.40 scorechristopherkenny
acronames:Create Acronyms for Naming Things
Simple tool for developing names based on first letters of keywords.
Maintained by Christopher T. Kenny. Last updated 3 years ago.
1 stars 1.70 score 1 scripts