R-universe search: needs:MatrixExtra

package

owner

contributor

author

maintainer

topic

needs

exports

data

Currently serving26341packages,22657articles, and64224datasets by1265organizations,13662 maintainers and22192 contributors.

Not sure what to search for? Why not try:maps, bayesian, ecology, climate, genome, gam, spatial, database, pdf, shiny, rstudio, machine learning, prediction, birds, fish, sports, ... (more popular topics)

Organizations

vimc

lcbc-uio

stan-dev

pharmaverse

r-spatial

tidyverse

ropengov

rstudio

r-lib

ropensci

bioc

r-forge

kwb-r

pik-piam

hypertidy

poissonconsulting

mrc-ide

tidymodels

pecanproject

insightsengineering

thinkr-open

mlr-org

inbo

ohdsi

ggseg

modeloriented

predictiveecology

paws-r

flr

ropenspain

sciviews

bnosac

mrcieu

openvolley

rmi-pacta

repboxr

epiverse-trace

nlmixr2

ices-tools-prod

yulab-smu

frbcesab

riatelab

statnet

azure

mlverse

bips-hb

appsilon

cloudyr

epiforecasts

rjdverse

tmsalab

hubverse-org

dreamrs

usepa

bupaverse

openpharma

usaid-oha-si

coatless-rpkg

business-science

easystats

certe-medical-epidemiology

ambiorix-web

darwin-eu

merck

rikenbit

nutriverse

traitecoevo

hugheylab

spatstat

r-dbi

uscbiostats

bluegreen-labs

rsquaredacademy

biometris

aus-doh-safety-and-quality

gesistsa

rspatial

terminological

data-cleaning

ipeagit

ocbe-uio

epicentre-msf

humaniverse

reconhub

ifpri

nflverse

ctu-bern

apache

cogdisreslab

dynverse

rformassspectrometry

csids

atsa-es

quanteda

rinterface

lbbe-software

Want to learn more about r-universe? Have a look atropensci.org/r-universeor updates from the rOpenSci blog:

Better documentation for R-universe!February 28, 2025
R-Universe Named an R Consortium Top-Level ProjectDecember 3, 2024
Capturing Screenshots Programmatically With RSeptember 10, 2024
Navigating the R ecosystem using R-universeSeptember 24, 2024
A fresh new look for R-universe!June 12, 2024
R-Universe Documentation Gets a Boost from Google Season of DocsApril 12, 2024
R-universe now builds MacOS ARM64 binaries for use on Apple Silicon (aka M1/M2/M3) systemsJanuary 14, 2024
R-universe now builds WASM binaries for all R packagesNovember 17, 2023
The rOpenSci MultiverseNovember 6, 2023
CRAN-ial Expansion: Taking Your R Package Development to New Frontiers with R-UniverseSeptember 19, 2023
Meeting the Stars of the R-Universe: The R-Universe Against Diseases.September 15, 2023
My Life with the R-universeAugust 1, 2023
New cran.dev shortlinks to package information and documentationJuly 26, 2023
Meeting the Stars of the R-Universe: PEcAn, an Open Source Project to Take Care of the PlanetJune 6, 2023
Downloading snapshots and creating stable R packages repositories using r-universeMay 31, 2023
How r-universe searches for packages on CRAN / BioconductorApril 3, 2023
Meeting the Stars of the R-Universe: Researching Our Brain with the Magic of the R-UniverseMarch 30, 2023
Meeting the Stars of the R-universe: ThinkR's Approach to Contributing to a Growing and Friendly R CommunityFebruary 28, 2023
Discovering and learning everything there is to know about R packages using r-universeFebruary 27, 2023
New preferred repo name for r-universe registriesFebruary 7, 2023
Improved permanent URL schema for r-universe.devJanuary 30, 2023
postdoc 1.0: minimal and uncluttered HTML package manualsNovember 29, 2022
Meeting the stars of the R-universe: R Community, Exchange and LearnNovember 23, 2022
Searching and browsing the R universeMarch 23, 2022
A Blend of Package Build FailuresJanuary 31, 2022
How renv restores packages from r-universe for reproducibility or productionJanuary 6, 2022
RSS feeds of package updates in r-universeNovember 24, 2021
How I Test cffr on (about) 2,000 Packages using GitHub Actions and R-universeNovember 23, 2021
Generating and customizing badges in r-universeOctober 14, 2021
rOpenSci docs are now built on r-universeSeptember 3, 2021
How to create your personal CRAN-like repository on R-universeJune 22, 2021
Publishing and browsing articles on R-universeApril 9, 2021
rOpenSci's R-universe ProjectMay 25, 2021
A first look at the R-universe build infrastructureMarch 4, 2021
Moving away from Travis CINovember 19, 2020
How to precompute package vignettes or pkgdown articlesDecember 8, 2019

Showing 28 of total 28 results (show query)

dselivanov

text2vec:Modern Text Mining Framework for R

Fast and memory-friendly tools for text vectorization, topic modeling (LDA, LSA), word embeddings (GloVe), similarities. This package provides a source-agnostic streaming API, which allows researchers to perform analysis of collections of documents which are larger than available RAM. All core functions are parallelized to benefit from multicore machines.

Maintained by Dmitriy Selivanov. Last updated 8 months ago.

glove latent-dirichlet-allocation natural-language-processing text-mining topic-modeling vectorization word-embeddings word2vec cpp

860 stars 13.48 score 1.3k scripts 23 dependents

oscarkjell

text:Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning

Link R with Transformers from Hugging Face to transform text variables to word embeddings; where the word embeddings are used to statistically test the mean difference between set of texts, compute semantic similarity scores between texts, predict numerical variables, and visual statistically significant words according to various dimensions etc. For more information see <https://www.r-text.org>.

Maintained by Oscar Kjell. Last updated 7 days ago.

deep-learning machine-learning nlp transformers openjdk

145 stars 13.21 score 436 scripts 1 dependents

tommyjones

textmineR:Functions for Text Mining and Topic Modeling

An aid for text mining in R, with a syntax that should be familiar to experienced R users. Provides a wrapper for several topic models that take similarly-formatted input and give similarly-formatted output. Has additional functionality for analyzing and diagnostics for topic models.

Maintained by Tommy Jones. Last updated 2 years ago.

cpp

106 stars 10.83 score 310 scripts 7 dependents

matloff

regtools:Regression and Classification Tools

Tools for linear, nonlinear and nonparametric regression and classification. Novel graphical methods for assessment of parametric models using nonparametric methods. One vs. All and All vs. All multiclass classification, optional class probabilities adjustment. Nonparametric regression (k-NN) for general dimension, local-linear option. Nonlinear regression with Eickert-White method for dealing with heteroscedasticity. Utilities for converting time series to rectangular form. Utilities for conversion between factors and indicator variables. Some code related to "Statistical Regression and Classification: from Linear Models to Machine Learning", N. Matloff, 2017, CRC, ISBN 9781498710916.

Maintained by Norm Matloff. Last updated 2 months ago.

127 stars 9.39 score 48 scripts 3 dependents

prodriguezsosa

conText:'a la Carte' on Text (ConText) Embedding Regression

A fast, flexible and transparent framework to estimate context-specific word and short document embeddings using the 'a la carte' embeddings approach developed by Khodak et al. (2018) <arXiv:1805.05388> and evaluate hypotheses about covariate effects on embeddings using the regression framework developed by Rodriguez et al. (2021)<https://github.com/prodriguezsosa/EmbeddingRegression>.

Maintained by Pedro L. Rodriguez. Last updated 11 months ago.

104 stars 9.10 score 1.7k scripts

theharmonylab

topics:Creating and Significance Testing Language Features for Visualisation

Implements differential language analysis with statistical tests and offers various language visualization techniques for n-grams and topics. It also supports the 'text' package. For more information, visit <https://r-topics.org/> and <https://www.r-text.org/>.

Maintained by Oscar Kjell. Last updated 3 days ago.

openjdk

5 stars 8.38 score 22 scripts 2 dependents

matloff

qeML:Quick and Easy Machine Learning Tools

The letters 'qe' in the package title stand for "quick and easy," alluding to the convenience goal of the package. We bring together a variety of machine learning (ML) tools from standard R packages, providing wrappers with a simple, convenient, and uniform interface.

Maintained by Norm Matloff. Last updated 9 days ago.

41 stars 8.37 score 48 scripts 1 dependents

matloff

dsld:Data Science Looks at Discrimination

Statistical and graphical tools for detecting and measuring discrimination and bias, be it racial, gender, age or other. Detection and remediation of bias in machine learning algorithms. 'Python' interfaces available.

Maintained by Norm Matloff. Last updated 2 months ago.

12 stars 7.81 score 35 scripts

mhahsler

markovDP:Infrastructure for Discrete-Time Markov Decision Processes (MDP)

Provides the infrastructure to work with Markov Decision Processes (MDPs) in R. The focus is on convenience in formulating MDPs, the support of sparse representations (using sparse matrices, lists and data.frames) and visualization of results. Some key components are implemented in C++ to speed up computation. Several popular solvers are implemented.

Maintained by Michael Hahsler. Last updated 16 days ago.

control-theory markov-decision-process optimization cpp

7 stars 5.51 score 4 scripts

david-cortes

recometrics:Evaluation Metrics for Implicit-Feedback Recommender Systems

Calculates evaluation metrics for implicit-feedback recommender systems that are based on low-rank matrix factorization models, given the fitted model matrices and data, thus allowing to compare models from a variety of libraries. Metrics include P@K (precision-at-k, for top-K recommendations), R@K (recall at k), AP@K (average precision at k), NDCG@K (normalized discounted cumulative gain at k), Hit@K (from which the 'Hit Rate' is calculated), RR@K (reciprocal rank at k, from which the 'MRR' or 'mean reciprocal rank' is calculated), ROC-AUC (area under the receiver-operating characteristic curve), and PR-AUC (area under the precision-recall curve). These are calculated on a per-user basis according to the ranking of items induced by the model, using efficient multi-threaded routines. Also provides functions for creating train-test splits for model fitting and evaluation.

Maintained by David Cortes. Last updated 3 months ago.

implicit-feedback matrix-factorization recommender-systems openblas cpp openmp

28 stars 5.45 score

occupationmeasurement

occupationMeasurement:Interactively Measure Occupations in Interviews and Beyond

Perform interactive occupation coding during interviews as described in Peycheva, D., Sakshaug, J., Calderwood, L. (2021) <doi:10.2478/jos-2021-0042> and Schierholz, M., Gensicke, M., Tschersich, N., Kreuter, F. (2018) <doi:10.1111/rssa.12297>. Generate suggestions for occupational categories based on free text input, with pre-trained machine learning models in German and a ready-to-use shiny application provided for quick and easy data collection.

Maintained by Jan Simson. Last updated 8 months ago.

3 stars 5.18 score 17 scripts

bioc

ttgsea:Tokenizing Text of Gene Set Enrichment Analysis

Functional enrichment analysis methods such as gene set enrichment analysis (GSEA) have been widely used for analyzing gene expression data. GSEA is a powerful method to infer results of gene expression data at a level of gene sets by calculating enrichment scores for predefined sets of genes. GSEA depends on the availability and accuracy of gene sets. There are overlaps between terms of gene sets or categories because multiple terms may exist for a single biological process, and it can thus lead to redundancy within enriched terms. In other words, the sets of related terms are overlapping. Using deep learning, this pakage is aimed to predict enrichment scores for unique tokens or words from text in names of gene sets to resolve this overlapping set issue. Furthermore, we can coin a new term by combining tokens and find its enrichment score by predicting such a combined tokens.

Maintained by Dongmin Jung. Last updated 5 months ago.

software geneexpression genesetenrichment

4.95 score 3 scripts 3 dependents

bioc

DeepPINCS:Protein Interactions and Networks with Compounds based on Sequences using Deep Learning

The identification of novel compound-protein interaction (CPI) is important in drug discovery. Revealing unknown compound-protein interactions is useful to design a new drug for a target protein by screening candidate compounds. The accurate CPI prediction assists in effective drug discovery process. To identify potential CPI effectively, prediction methods based on machine learning and deep learning have been developed. Data for sequences are provided as discrete symbolic data. In the data, compounds are represented as SMILES (simplified molecular-input line-entry system) strings and proteins are sequences in which the characters are amino acids. The outcome is defined as a variable that indicates how strong two molecules interact with each other or whether there is an interaction between them. In this package, a deep-learning based model that takes only sequence information of both compounds and proteins as input and the outcome as output is used to predict CPI. The model is implemented by using compound and protein encoders with useful features. The CPI model also supports other modeling tasks, including protein-protein interaction (PPI), chemical-chemical interaction (CCI), or single compounds and proteins. Although the model is designed for proteins, DNA and RNA can be used if they are represented as sequences.

Maintained by Dongmin Jung. Last updated 5 months ago.

software network graphandnetwork neuralnetwork openjdk

4.78 score 4 scripts 2 dependents

mkearney

wactor:Word Factor Vectors

A user-friendly factor-like interface for converting strings of text into numeric vectors and rectangular data structures.

Maintained by Michael W. Kearney. Last updated 5 years ago.

text text-classification text-processing text-vectorization word-embeddings word-vectors word2vec

33 stars 4.52 score 3 scripts

javierdelahoz

LDAShiny:User-Friendly Interface for Review of Scientific Literature

Contains the development of a tool that provides a web-based graphical user interface (GUI) to perform a review of the scientific literature under the Bayesian approach of Latent Dirichlet Allocation (LDA)and machine learning algorithms. The application methodology is framed by the well known procedures in topic modelling on how to clean and process data. Contains methods described by Blei, David M., Andrew Y. Ng, and Michael I. Jordan (2003) <https://jmlr.org/papers/volume3/blei03a/blei03a.pdf> Allocation"; Thomas L. Griffiths and Mark Steyvers (2004) <doi:10.1073/pnas.0307752101> ; Xiong Hui, et al (2019) <doi:10.1016/j.cie.2019.06.010>.

Maintained by Javier De La Hoz Maestre. Last updated 4 years ago.

3 stars 4.48 score 3 scripts

huongtran53

PlotNormTest:Graphical Univariate/Multivariate Assessments for Normality Assumption

Graphical methods testing multivariate normality assumption. Methods including assessing score function, and cumulant generating functions, independent transformations and linear transformations.

Maintained by Huong Tran. Last updated 5 months ago.

4.30 score

robindenz1

CareDensity:Calculate the Care Density or Fragmented Care Density Given a Patient-Sharing Network

Given a patient-sharing network, calculate either the classic care density as proposed by Pollack et al. (2013) <doi:10.1007/s11606-012-2104-7> or the fragmented care density as proposed by Engels et al. (2024) <doi:10.1186/s12874-023-02106-0>. By utilizing the 'igraph' and 'data.table' packages, the provided functions scale well for very large graphs.

Maintained by Robin Denz. Last updated 5 months ago.

care-coordination network-analysis patient-care

1 stars 4.18 score 6 scripts

bioc

IFAA:Robust Inference for Absolute Abundance in Microbiome Analysis

This package offers a robust approach to make inference on the association of covariates with the absolute abundance (AA) of microbiome in an ecosystem. It can be also directly applied to relative abundance (RA) data to make inference on AA because the ratio of two RA is equal to the ratio of their AA. This algorithm can estimate and test the associations of interest while adjusting for potential confounders. The estimates of this method have easy interpretation like a typical regression analysis. High-dimensional covariates are handled with regularization and it is implemented by parallel computing. False discovery rate is automatically controlled by this approach. Zeros do not need to be imputed by a positive value for the analysis. The IFAA package also offers the 'MZILN' function for estimating and testing associations of abundance ratios with covariates.

Maintained by Zhigang Li. Last updated 5 months ago.

software technology sequencing microbiome regression

4.15 score 14 scripts

psychbruce

PsychWordVec:Word Embedding Research Framework for Psychological Science

An integrative toolbox of word embedding research that provides: (1) a collection of 'pre-trained' static word vectors in the '.RData' compressed format <https://psychbruce.github.io/WordVector_RData.pdf>; (2) a series of functions to process, analyze, and visualize word vectors; (3) a range of tests to examine conceptual associations, including the Word Embedding Association Test <doi:10.1126/science.aal4230> and the Relative Norm Distance <doi:10.1073/pnas.1720347115>, with permutation test of significance; (4) a set of training methods to locally train (static) word vectors from text corpora, including 'Word2Vec' <arXiv:1301.3781>, 'GloVe' <doi:10.3115/v1/D14-1162>, and 'FastText' <arXiv:1607.04606>; (5) a group of functions to download 'pre-trained' language models (e.g., 'GPT', 'BERT') and extract contextualized (dynamic) word vectors (based on the R package 'text').

Maintained by Han-Wu-Shuang Bao. Last updated 1 years ago.

22 stars 4.04 score 10 scripts

bioc

VAExprs:Generating Samples of Gene Expression Data with Variational Autoencoders

A fundamental problem in biomedical research is the low number of observations, mostly due to a lack of available biosamples, prohibitive costs, or ethical reasons. By augmenting a few real observations with artificially generated samples, their analysis could lead to more robust and higher reproducible. One possible solution to the problem is the use of generative models, which are statistical models of data that attempt to capture the entire probability distribution from the observations. Using the variational autoencoder (VAE), a well-known deep generative model, this package is aimed to generate samples with gene expression data, especially for single-cell RNA-seq data. Furthermore, the VAE can use conditioning to produce specific cell types or subpopulations. The conditional VAE (CVAE) allows us to create targeted samples rather than completely random ones.

Maintained by Dongmin Jung. Last updated 5 months ago.

software geneexpression singlecell openjdk

4.00 score 4 scripts

bioc

GenProSeq:Generating Protein Sequences with Deep Generative Models

Generative modeling for protein engineering is key to solving fundamental problems in synthetic biology, medicine, and material science. Machine learning has enabled us to generate useful protein sequences on a variety of scales. Generative models are machine learning methods which seek to model the distribution underlying the data, allowing for the generation of novel samples with similar properties to those on which the model was trained. Generative models of proteins can learn biologically meaningful representations helpful for a variety of downstream tasks. Furthermore, they can learn to generate protein sequences that have not been observed before and to assign higher probability to protein sequences that satisfy desired criteria. In this package, common deep generative models for protein sequences, such as variational autoencoder (VAE), generative adversarial networks (GAN), and autoregressive models are available. In the VAE and GAN, the Word2vec is used for embedding. The transformer encoder is applied to protein sequences for the autoregressive model.

Maintained by Dongmin Jung. Last updated 5 months ago.

software proteomics openjdk

4.00 score 3 scripts

dustinstoltz

text2map:R Tools for Text Matrices, Embeddings, and Networks

This is a collection of functions optimized for working with with various kinds of text matrices. Focusing on the text matrix as the primary object - represented either as a base R dense matrix or a 'Matrix' package sparse matrix - allows for a consistent and intuitive interface that stays close to the underlying mathematical foundation of computational text analysis. In particular, the package includes functions for working with word embeddings, text networks, and document-term matrices. Methods developed in Stoltz and Taylor (2019) <doi:10.1007/s42001-019-00048-6>, Taylor and Stoltz (2020) <doi:10.1007/s42001-020-00075-8>, Taylor and Stoltz (2020) <doi:10.15195/v7.a23>, and Stoltz and Taylor (2021) <doi:10.1016/j.poetic.2021.101567>.

Maintained by Dustin Stoltz. Last updated 4 months ago.

3.82 score 22 scripts

emilhvitfeldt

wordsalad:Provide Tools to Extract and Analyze Word Vectors

Provides access to various word embedding methods (GloVe, fasttext and word2vec) to extract word vectors using a unified framework to increase reproducibility and correctness.

Maintained by Emil Hvitfeldt. Last updated 5 years ago.

8 stars 3.60 score 9 scripts

pilacuan-bonete-luis

LDABiplots:Biplot Graphical Interface for LDA Models

Contains the development of a tool that provides a web-based graphical user interface (GUI) to perform Biplots representations from a scraping of news from digital newspapers under the Bayesian approach of Latent Dirichlet Assignment (LDA) and machine learning algorithms. Contains LDA methods described by Blei , David M., Andrew Y. Ng and Michael I. Jordan (2003) <https://jmlr.org/papers/volume3/blei03a/blei03a.pdf>, and Biplot methods described by Gabriel K.R(1971) <doi:10.1093/biomet/58.3.453> and Galindo-Villardon P(1986) <https://diarium.usal.es/pgalindo/files/2012/07/Questiio.pdf>.

Maintained by Luis Pilacuan-Bonete. Last updated 3 years ago.

3.00 score 4 scripts

theogrost

NUSS:Mixed N-Grams and Unigram Sequence Segmentation

Segmentation of short text sequences - like hashtags - into the separated words sequence, done with the use of dictionary, which may be built on custom corpus of texts. Unigram dictionary is used to find most probable sequence, and n-grams approach is used to determine possible segmentation given the text corpus.

Maintained by Oskar Kosch. Last updated 8 months ago.

cpp

3.00 score 8 scripts

cran

cdparcoord:Top Frequency-Based Parallel Coordinates

Parallel coordinate plotting with resolutions for large data sets and missing values.

Maintained by Norm Matloff. Last updated 6 years ago.

2.70 score

kidoishi

MadanText:Persian Textmining Tool for Frequency Analysis, Statistical Analysis, and Word Clouds

MadanText is an open-source software designed specifically for text mining in the Persian language. It allows users to examine word frequencies, download data for analysis, and generate word clouds. This tool is particularly useful for researchers and analysts working with Persian language data.

Maintained by Kido Ishikawa. Last updated 1 years ago.

openjdk

2.70 score

kidoishi

MadanTextNetwork:Persian Textmining Tool for Co-Occurrence_Network

MadanText_co-occurrence_network is an open-source software designed specifically for text mining in the Persian language. It adds co-occurrence network functionality to MadanText. The input file replaces the text format with an Excel format.

Maintained by Kido Ishikawa. Last updated 1 years ago.

openjdk

2.70 score