R-universe search: topic:cpp

tidyverse

dplyr:A Grammar of Data Manipulation

A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

Maintained by Hadley Wickham. Last updated 26 days ago.

data-manipulation grammar cpp

4.8k stars 24.68 score 659k scripts 7.8k dependents

tidyverse

tidyr:Tidy Messy Data

Tools to help to create tidy data, where each column is a variable, each row is an observation, and each cell contains a single value. 'tidyr' contains tools for changing the shape (pivoting) and hierarchy (nesting and 'unnesting') of a dataset, turning deeply nested lists into rectangular data frames ('rectangling'), and extracting values out of string columns. It also includes tools for working with missing values (both implicit and explicit).

Maintained by Hadley Wickham. Last updated 26 days ago.

tidy-data cpp

1.4k stars 22.88 score 168k scripts 5.5k dependents

rcppcore

Rcpp:Seamless R and C++ Integration

The 'Rcpp' package provides R functions as well as C++ classes which offer a seamless integration of R and C++. Many R data types and objects can be mapped back and forth to C++ equivalents which facilitates both writing of new code as well as easier integration of third-party libraries. Documentation about 'Rcpp' is provided by several vignettes included in this package, via the 'Rcpp Gallery' site at <https://gallery.rcpp.org>, the paper by Eddelbuettel and Francois (2011, <doi:10.18637/jss.v040.i08>), the book by Eddelbuettel (2013, <doi:10.1007/978-1-4614-6868-4>) and the paper by Eddelbuettel and Balamuta (2018, <doi:10.1080/00031305.2017.1375990>); see 'citation("Rcpp")' for details.

Maintained by Dirk Eddelbuettel. Last updated 11 hours ago.

c-plus-plus c-plus-plus-11 c-plus-plus-14 c-plus-plus-17 c-plus-plus-20 rcpp cpp

755 stars 22.63 score 11k scripts 13k dependents

r-spatial

sf:Simple Features for R

Support for simple feature access, a standardized way to encode and analyze spatial vector data. Binds to 'GDAL' <doi:10.5281/zenodo.5884351> for reading and writing data, to 'GEOS' <doi:10.5281/zenodo.11396894> for geometrical operations, and to 'PROJ' <doi:10.5281/zenodo.5884394> for projection conversions and datum transformations. Uses by default the 's2' package for geometry operations on geodetic (long/lat degree) coordinates.

Maintained by Edzer Pebesma. Last updated 3 days ago.

gdal geos proj spatial cpp

1.4k stars 22.44 score 117k scripts 1.2k dependents

igraph

igraph:Network Analysis and Visualization

Routines for simple graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more.

Maintained by Kirill Müller. Last updated 5 days ago.

complex-networks graph-algorithms graph-theory mathematics network-analysis network-graph fortran libxml2 glpk openblas cpp

584 stars 21.13 score 31k scripts 1.9k dependents

rstudio

reticulate:Interface to 'Python'

Interface to 'Python' modules, classes, and functions. When calling into 'Python', R data types are automatically converted to their equivalent 'Python' types. When values are returned from 'Python' to R they are converted back to R types. Compatible with all versions of 'Python' >= 2.7.

Maintained by Tomasz Kalinowski. Last updated 4 days ago.

cpp

1.7k stars 21.02 score 18k scripts 434 dependents

r-lib

testthat:Unit Testing for R

Software testing is important, but, in part because it is frustrating and boring, many of us avoid it. 'testthat' is a testing framework for R that is easy to learn and use, and integrates with your existing 'workflow'.

Maintained by Hadley Wickham. Last updated 29 days ago.

unit-testing cpp

900 stars 20.97 score 74k scripts 465 dependents

tidyverse

readxl:Read Excel Files

Import excel files into R. Supports '.xls' via the embedded 'libxls' C library <https://github.com/libxls/libxls> and '.xlsx' via the embedded 'RapidXML' C++ library <https://rapidxml.sourceforge.net/>. Works on Windows, Mac and Linux without external dependencies.

Maintained by Jennifer Bryan. Last updated 22 days ago.

excel spreadsheet xls xlsx cpp

734 stars 20.85 score 160k scripts 815 dependents

lme4

lme4:Linear Mixed-Effects Models using 'Eigen' and S4

Fit linear and generalized linear mixed-effects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the 'Eigen' C++ library for numerical linear algebra and 'RcppEigen' "glue".

Maintained by Ben Bolker. Last updated 4 days ago.

cpp

647 stars 20.68 score 35k scripts 1.5k dependents

r-lib

fs:Cross-Platform File System Operations Based on 'libuv'

A cross-platform interface to file system operations, built on top of the 'libuv' C library.

Maintained by Gábor Csárdi. Last updated 5 months ago.

filesystem libuv cpp

370 stars 20.26 score 8.1k scripts 5.2k dependents

tidyverse

readr:Read Rectangular Text Data

The goal of 'readr' is to provide a fast and friendly way to read rectangular data (like 'csv', 'tsv', and 'fwf'). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changes.

Maintained by Jennifer Bryan. Last updated 8 months ago.

csv fwf parsing cpp

1.0k stars 20.06 score 132k scripts 2.1k dependents

apache

arrow:Integration to 'Apache' 'Arrow'

'Apache' 'Arrow' <https://arrow.apache.org/> is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. This package provides an interface to the 'Arrow C++' library.

Maintained by Jonathan Keane. Last updated 2 months ago.

arrow curl openssl cpp

15k stars 19.25 score 10k scripts 82 dependents

slowkow

ggrepel:Automatically Position Non-Overlapping Text Labels with 'ggplot2'

Provides text and label geoms for 'ggplot2' that help to avoid overlapping text labels. Labels repel away from each other and away from the data points.

Maintained by Kamil Slowikowski. Last updated 5 months ago.

ggplot2 text visualization cpp

1.2k stars 19.20 score 37k scripts 1.2k dependents

ycphs

openxlsx:Read, Write and Edit xlsx Files

Simplifies the creation of Excel .xlsx files by providing a high level interface to writing, styling and editing worksheets. Through the use of 'Rcpp', read/write times are comparable to the 'xlsx' and 'XLConnect' packages with the added benefit of removing the dependency on Java.

Maintained by Jan Marvin Garbuszus. Last updated 2 months ago.

xlsx cpp

232 stars 19.09 score 20k scripts 277 dependents

stan-dev

rstan:R Interface to Stan

User-facing R functions are provided to parse, compile, test, estimate, and analyze Stan models by accessing the header-only Stan library provided by the 'StanHeaders' package. The Stan project develops a probabilistic programming language that implements full Bayesian statistical inference via Markov Chain Monte Carlo, rough Bayesian inference via 'variational' approximation, and (optionally penalized) maximum likelihood estimation via optimization. In all three cases, automatic differentiation is used to quickly and accurately evaluate gradients without burdening the user with the need to derive the partial derivatives.

Maintained by Ben Goodrich. Last updated 12 hours ago.

bayesian-data-analysis bayesian-inference bayesian-statistics mcmc stan cpp

1.1k stars 18.86 score 14k scripts 281 dependents

rcppcore

RcppArmadillo:'Rcpp' Integration for the 'Armadillo' Templated Linear Algebra Library

'Armadillo' is a templated C++ linear algebra library (by Conrad Sanderson) that aims towards a good balance between speed and ease of use. Integer, floating point and complex numbers are supported, as well as a subset of trigonometric and statistics functions. Various matrix decompositions are provided through optional integration with LAPACK and ATLAS libraries. The 'RcppArmadillo' package includes the header files from the templated 'Armadillo' library. Thus users do not need to install 'Armadillo' itself in order to use 'RcppArmadillo'. From release 7.800.0 on, 'Armadillo' is licensed under Apache License 2; previous releases were under licensed as MPL 2.0 from version 3.800.0 onwards and LGPL-3 prior to that; 'RcppArmadillo' (the 'Rcpp' bindings/bridge to Armadillo) is licensed under the GNU GPL version 2 or later, as is the rest of 'Rcpp'.

Maintained by Dirk Eddelbuettel. Last updated 23 hours ago.

armadillo c-plus-plus rcpp rcpparmadillo openblas cpp openmp

200 stars 18.85 score 1.9k scripts 3.4k dependents

r-dbi

RSQLite:SQLite Interface for R

Embeds the SQLite database engine in R and provides an interface compliant with the DBI package. The source for the SQLite engine and for various extensions in a recent version is included. System libraries will never be consulted because this package relies on static linking for the plugins it includes; this also ensures a consistent experience across all installations.

Maintained by Kirill Müller. Last updated 1 hours ago.

database sqlite3 cpp

331 stars 18.78 score 8.1k scripts 1.1k dependents

tidyverse

haven:Import and Export 'SPSS', 'Stata' and 'SAS' Files

Import foreign statistical formats into R via the embedded 'ReadStat' C library, <https://github.com/WizardMac/ReadStat>.

Maintained by Hadley Wickham. Last updated 6 months ago.

sas spss stata zlib cpp

427 stars 18.63 score 18k scripts 682 dependents

r-lib

xml2:Parse XML

Bindings to 'libxml2' for working with XML data using a simple, consistent interface based on 'XPath' expressions. Also supports XML schema validation; for 'XSLT' transformations see the 'xslt' package.

Maintained by Jeroen Ooms. Last updated 16 days ago.

libxml2 xml cpp

220 stars 18.52 score 6.3k scripts 2.3k dependents

r-lib

roxygen2:In-Line Documentation for R

Generate your Rd documentation, 'NAMESPACE' file, and collation field using specially formatted comments. Writing documentation in-line with code makes it easier to keep your documentation up-to-date as your requirements change. 'roxygen2' is inspired by the 'Doxygen' system for C++.

Maintained by Hadley Wickham. Last updated 8 months ago.

devtools documentation cpp

606 stars 18.51 score 2.3k scripts 219 dependents

gagolews

stringi:Fast and Portable Character String Processing Facilities

A collection of character string/text/natural language processing tools for pattern searching (e.g., with 'Java'-like regular expressions or the 'Unicode' collation algorithm), random string generation, case mapping, string transliteration, concatenation, sorting, padding, wrapping, Unicode normalisation, date-time formatting and parsing, and many more. They are fast, consistent, convenient, and - thanks to 'ICU' (International Components for Unicode) - portable across all locales and platforms. Documentation about 'stringi' is provided via its website at <https://stringi.gagolewski.com/> and the paper by Gagolewski (2022, <doi:10.18637/jss.v103.i02>).

Maintained by Marek Gagolewski. Last updated 2 days ago.

icu icu4c natural-language-processing nlp regex regexp string-manipulation stringi stringr text text-processing tidy-data unicode cpp

307 stars 18.42 score 10k scripts 8.7k dependents

hadley

plyr:Tools for Splitting, Applying and Combining Data

A set of tools that solves a common set of problems: you need to break a big problem down into manageable pieces, operate on each piece and then put all the pieces back together. For example, you might want to fit a model to each spatial location or time point in your study, summarise data by panels or collapse high-dimensional arrays to simpler summary statistics. The development of 'plyr' has been generously supported by 'Becton Dickinson'.

Maintained by Hadley Wickham. Last updated 5 months ago.

cpp

500 stars 18.16 score 83k scripts 3.3k dependents

tidyverse

vroom:Read and Write Rectangular Text Data Quickly

The goal of 'vroom' is to read and write data (like 'csv', 'tsv' and 'fwf') quickly. When reading it uses a quick initial indexing step, then reads the values lazily , so only the data you actually use needs to be read. The writer formats the data in parallel and writes to disk asynchronously from formatting.

Maintained by Jennifer Bryan. Last updated 7 months ago.

csv csv-parser fixed-width-text tsv tsv-parser cpp

625 stars 17.82 score 4.5k scripts 2.1k dependents

r-lib

cpp11:A C++11 Interface for R's C Interface

Provides a header only, C++11 interface to R's C interface. Compared to other approaches 'cpp11' strives to be safe against long jumps from the C API as well as C++ exceptions, conform to normal R function semantics and supports interaction with 'ALTREP' vectors.

Maintained by Davis Vaughan. Last updated 25 days ago.

cpp cpp11

212 stars 17.69 score 104 scripts 8.6k dependents

rspatial

terra:Spatial Data Analysis

Methods for spatial data analysis with vector (points, lines, polygons) and raster (grid) data. Methods for vector data include geometric operations such as intersect and buffer. Raster methods include local, focal, global, zonal and geometric operations. The predict and interpolate methods facilitate the use of regression type (interpolation, machine learning) models for spatial prediction, including with satellite remote sensing data. Processing of very large files is supported. See the manual and tutorials on <https://rspatial.org/> to get started. 'terra' replaces the 'raster' package ('terra' can do more, and it is faster and easier to use).

Maintained by Robert J. Hijmans. Last updated 7 hours ago.

geospatial raster spatial vector onetbb proj gdal geos cpp

560 stars 17.65 score 17k scripts 856 dependents

robjhyndman

forecast:Forecasting Functions for Time Series and Linear Models

Methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling.

Maintained by Rob Hyndman. Last updated 7 months ago.

forecast forecasting openblas cpp

1.1k stars 17.46 score 16k scripts 240 dependents

dmlc

xgboost:Extreme Gradient Boosting

Extreme Gradient Boosting, which is an efficient implementation of the gradient boosting framework from Chen & Guestrin (2016) <doi:10.1145/2939672.2939785>. This package is its R interface. The package includes efficient linear model solver and tree learning algorithms. The package can automatically do parallel computation on a single machine which could be more than 10 times faster than existing gradient boosting packages. It supports various objective functions, including regression, classification and ranking. The package is made to be extensible, so that users are also allowed to define their own objectives easily.

Maintained by Jiaming Yuan. Last updated 2 days ago.

distributed-systems gbdt gbm gbrt machine-learning xgboost cpp openmp

27k stars 17.45 score 115 dependents

dmurdoch

rgl:3D Visualization Using OpenGL

Provides medium to high level functions for 3D interactive graphics, including functions modelled on base graphics (plot3d(), etc.) as well as functions for constructing representations of geometric objects (cube3d(), etc.). Output may be on screen using OpenGL, or to various standard 3D file formats including WebGL, PLY, OBJ, STL as well as 2D image formats, including PNG, Postscript, SVG, PGF.

Maintained by Duncan Murdoch. Last updated 17 hours ago.

graphics opengl rgl webgl libglu libglvnd libpng libx11 freetype cpp

91 stars 17.40 score 7.3k scripts 303 dependents

ropensci

magick:Advanced Graphics and Image-Processing in R

Bindings to 'ImageMagick': the most comprehensive open-source image processing library available. Supports many common formats (png, jpeg, tiff, pdf, etc) and manipulations (rotate, scale, crop, trim, flip, blur, etc). All operations are vectorized via the Magick++ STL meaning they operate either on a single frame or a series of frames for working with layers, collages, or animation. In RStudio images are automatically previewed when printed to the console, resulting in an interactive editing environment. The latest version of the package includes a native graphics device for creating in-memory graphics or drawing onto images using pixel coordinates.

Maintained by Jeroen Ooms. Last updated 6 days ago.

image-manipulation image-processing imagemagick cpp

467 stars 17.38 score 9.0k scripts 258 dependents

bioc

BiocParallel:Bioconductor facilities for parallel evaluation

This package provides modified versions and novel implementation of functions for parallel evaluation, tailored to use with Bioconductor objects.

Maintained by Martin Morgan. Last updated 1 months ago.

infrastructure bioconductor-package core-package u24ca289073 cpp

67 stars 17.31 score 7.3k scripts 1.1k dependents

r-quantities

units:Measurement Units for R Vectors

Support for measurement units in R vectors, matrices and arrays: automatic propagation, conversion, derivation and simplification of units; raising errors in case of unit incompatibility. Compatible with the POSIXct, Date and difftime classes. Uses the UNIDATA udunits library and unit database for unit compatibility checking and conversion. Documentation about 'units' is provided in the paper by Pebesma, Mailund & Hiebert (2016, <doi:10.32614/RJ-2016-061>), included in this package as a vignette; see 'citation("units")' for details.

Maintained by Edzer Pebesma. Last updated 16 days ago.

udunits cpp

181 stars 17.28 score 3.3k scripts 1.2k dependents

emmanuelparadis

ape:Analyses of Phylogenetics and Evolution

Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel's test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ*, BIONJ*, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.

Maintained by Emmanuel Paradis. Last updated 5 days ago.

openblas cpp

64 stars 17.27 score 13k scripts 599 dependents

rspatial

raster:Geographic Data Analysis and Modeling

Reading, writing, manipulating, analyzing and modeling of spatial data. This package has been superseded by the "terra" package <https://CRAN.R-project.org/package=terra>.

Maintained by Robert J. Hijmans. Last updated 1 days ago.

cpp

163 stars 17.23 score 58k scripts 562 dependents

hadley

reshape2:Flexibly Reshape Data: A Reboot of the Reshape Package

Flexibly restructure and aggregate data using just two functions: melt and 'dcast' (or 'acast').

Maintained by Hadley Wickham. Last updated 4 years ago.

cpp

210 stars 17.19 score 94k scripts 2.0k dependents

astamm

nloptr:R Interface to NLopt

Solve optimization problems using an R interface to NLopt. NLopt is a free/open-source library for nonlinear optimization, providing a common interface for a number of different free optimization routines available online as well as original implementations of various other algorithms. See <https://nlopt.readthedocs.io/en/latest/NLopt_Algorithms/> for more information on the available algorithms. Building from included sources requires 'CMake'. On Linux and 'macOS', if a suitable system build of NLopt (2.7.0 or later) is found, it is used; otherwise, it is built from included sources via 'CMake'. On Windows, NLopt is obtained through 'rwinlib' for 'R <= 4.1.x' or grabbed from the appropriate toolchain for 'R >= 4.2.0'.

Maintained by Aymeric Stamm. Last updated 12 days ago.

cpp

107 stars 17.17 score 1.1k scripts 1.8k dependents

rstudio

promises:Abstractions for Promise-Based Asynchronous Programming

Provides fundamental abstractions for doing asynchronous programming in R using promises. Asynchronous programming is useful for allowing a single R process to orchestrate multiple tasks in the background while also attending to something else. Semantics are similar to 'JavaScript' promises, but with a syntax that is idiomatic R.

Maintained by Joe Cheng. Last updated 2 months ago.

cpp

204 stars 17.10 score 688 scripts 2.6k dependents

thomasp85

ggraph:An Implementation of Grammar of Graphics for Graphs and Networks

The grammar of graphics as implemented in ggplot2 is a poor fit for graph and network visualizations due to its reliance on tabular data input. ggraph is an extension of the ggplot2 API tailored to graph visualizations and provides the same flexible approach to building up plots layer by layer.

Maintained by Thomas Lin Pedersen. Last updated 1 years ago.

ggplot-extension ggplot2 graph-visualization network-visualization visualization cpp

1.1k stars 16.96 score 9.2k scripts 111 dependents

satijalab

Seurat:Tools for Single Cell Genomics

A toolkit for quality control, analysis, and exploration of single cell RNA sequencing data. 'Seurat' aims to enable users to identify and interpret sources of heterogeneity from single cell transcriptomic measurements, and to integrate diverse types of single cell data. See Satija R, Farrell J, Gennert D, et al (2015) <doi:10.1038/nbt.3192>, Macosko E, Basu A, Satija R, et al (2015) <doi:10.1016/j.cell.2015.05.002>, Stuart T, Butler A, et al (2019) <doi:10.1016/j.cell.2019.05.031>, and Hao, Hao, et al (2020) <doi:10.1101/2020.10.12.335331> for more details.

Maintained by Paul Hoffman. Last updated 1 years ago.

human-cell-atlas single-cell-genomics single-cell-rna-seq cpp

2.4k stars 16.86 score 50k scripts 73 dependents

glmmtmb

glmmTMB:Generalized Linear Mixed Models using Template Model Builder

Fit linear and generalized linear mixed models with various extensions, including zero-inflation. The models are fitted using maximum likelihood estimation via 'TMB' (Template Model Builder). Random effects are assumed to be Gaussian on the scale of the linear predictor and are integrated out using the Laplace approximation. Gradients are calculated using automatic differentiation.

Maintained by Mollie Brooks. Last updated 9 hours ago.

cpp openmp

314 stars 16.85 score 3.7k scripts 25 dependents

andrisignorell

DescTools:Tools for Descriptive Statistics

A collection of miscellaneous basic statistic functions and convenience wrappers for efficiently describing data. The author's intention was to create a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results. The package contains furthermore functions to produce documents using MS Word (or PowerPoint) and functions to import data from Excel. Many of the included functions can be found scattered in other packages and other sources written partly by Titans of R. The reason for collecting them here, was primarily to have them consolidated in ONE instead of dozens of packages (which themselves might depend on other packages which are not needed at all), and to provide a common and consistent interface as far as function and arguments naming, NA handling, recycling rules etc. are concerned. Google style guides were used as naming rules (in absence of convincing alternatives). The 'BigCamelCase' style was consequently applied to functions borrowed from contributed R packages as well.

Maintained by Andri Signorell. Last updated 19 hours ago.

fortran cpp

86 stars 16.73 score 7.7k scripts 101 dependents

klausvigo

phangorn:Phylogenetic Reconstruction and Analysis

Allows for estimation of phylogenetic trees and networks using Maximum Likelihood, Maximum Parsimony, distance methods and Hadamard conjugation (Schliep 2011). Offers methods for tree comparison, model selection and visualization of phylogenetic networks as described in Schliep et al. (2017).

Maintained by Klaus Schliep. Last updated 20 hours ago.

software technology qualitycontrol phylogenetic-analysis phylogenetics openblas cpp

206 stars 16.70 score 2.5k scripts 135 dependents

sebkrantz

collapse:Advanced and Fast Data Transformation

A C/C++ based package for advanced data transformation and statistical computing in R that is extremely fast, class-agnostic, robust and programmer friendly. Core functionality includes a rich set of S3 generic grouped and weighted statistical functions for vectors, matrices and data frames, which provide efficient low-level vectorizations, OpenMP multithreading, and skip missing values by default. These are integrated with fast grouping and ordering algorithms (also callable from C), and efficient data manipulation functions. The package also provides a flexible and rigorous approach to time series and panel data in R. It further includes fast functions for common statistical procedures, detailed (grouped, weighted) summary statistics, powerful tools to work with nested data, fast data object conversions, functions for memory efficient R programming, and helpers to effectively deal with variable labels, attributes, and missing data. It is well integrated with base R classes, 'dplyr'/'tibble', 'data.table', 'sf', 'units', 'plm' (panel-series and data frames), and 'xts'/'zoo'.

Maintained by Sebastian Krantz. Last updated 7 days ago.

data-aggregation data-analysis data-manipulation data-processing data-science data-transformation econometrics high-performance panel-data scientific-computing statistics time-series weighted weights cpp openmp

672 stars 16.68 score 708 scripts 99 dependents

quanteda

quanteda:Quantitative Analysis of Textual Data

A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.

Maintained by Kenneth Benoit. Last updated 3 months ago.

corpus natural-language-processing quanteda text-analytics onetbb cpp

851 stars 16.65 score 5.4k scripts 52 dependents

amices

mice:Multivariate Imputation by Chained Equations

Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.

Maintained by Stef van Buuren. Last updated 19 hours ago.

chained-equations fcs imputation mice missing-data missing-values multiple-imputation multivariate-data cpp

462 stars 16.64 score 10k scripts 154 dependents

rapporter

pander:An R 'Pandoc' Writer

Contains some functions catching all messages, 'stdout' and other useful information while evaluating R code and other helpers to return user specified text elements (like: header, paragraph, table, image, lists etc.) in 'pandoc' markdown or several type of R objects similarly automatically transformed to markdown format. Also capable of exporting/converting (the resulting) complex 'pandoc' documents to e.g. HTML, 'PDF', 'docx' or 'odt'. This latter reporting feature is supported in brew syntax or with a custom reference class with a smarty caching 'backend'.

Maintained by Gergely Daróczi. Last updated 28 days ago.

literate-programming markdown pandoc pandoc-markdown reproducible-research rmarkdown cpp

297 stars 16.60 score 7.6k scripts 108 dependents

mlverse

torch:Tensors and Neural Networks with 'GPU' Acceleration

Provides functionality to define and train neural networks similar to 'PyTorch' by Paszke et al (2019) <doi:10.48550/arXiv.1912.01703> but written entirely in R using the 'libtorch' library. Also supports low-level tensor operations and 'GPU' acceleration.

Maintained by Daniel Falbel. Last updated 3 days ago.

autograd deep-learning torch cpp

521 stars 16.50 score 1.4k scripts 39 dependents

bioc

fgsea:Fast Gene Set Enrichment Analysis

The package implements an algorithm for fast gene set enrichment analysis. Using the fast algorithm allows to make more permutations and get more fine grained p-values, which allows to use accurate stantard approaches to multiple hypothesis correction.

Maintained by Alexey Sergushichev. Last updated 9 days ago.

geneexpression differentialexpression genesetenrichment pathways cpp

392 stars 16.31 score 3.9k scripts 101 dependents

r-dbi

odbc:Connect to ODBC Compatible Databases (using the DBI Interface)

A DBI-compatible interface to ODBC databases.

Maintained by Hadley Wickham. Last updated 2 days ago.

database odbc unixodbc cpp

396 stars 16.31 score 2.9k scripts 23 dependents

imbs-hl

ranger:A Fast Implementation of Random Forests

A fast implementation of Random Forests, particularly suited for high dimensional data. Ensembles of classification, regression, survival and probability prediction trees are supported. Data from genome-wide association studies can be analyzed efficiently. In addition to data frames, datasets of class 'gwaa.data' (R package 'GenABEL') and 'dgCMatrix' (R package 'Matrix') can be directly analyzed.

Maintained by Marvin N. Wright. Last updated 5 months ago.

cpp

783 stars 16.22 score 9.2k scripts 189 dependents

bioc

DESeq2:Differential gene expression analysis based on the negative binomial distribution

Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution.

Maintained by Michael Love. Last updated 24 days ago.

sequencing rnaseq chipseq geneexpression transcription normalization differentialexpression bayesian regression principalcomponent clustering immunooncology openblas cpp

375 stars 16.11 score 17k scripts 115 dependents

jlmelville

uwot:The Uniform Manifold Approximation and Projection (UMAP) Method for Dimensionality Reduction

An implementation of the Uniform Manifold Approximation and Projection dimensionality reduction by McInnes et al. (2018) <doi:10.48550/arXiv.1802.03426>. It also provides means to transform new data and to carry out supervised dimensionality reduction. An implementation of the related LargeVis method of Tang et al. (2016) <doi:10.48550/arXiv.1602.00370> is also provided. This is a complete re-implementation in R (and C++, via the 'Rcpp' package): no Python installation is required. See the uwot website (<https://github.com/jlmelville/uwot>) for more documentation and examples.

Maintained by James Melville. Last updated 3 days ago.

dimensionality-reduction umap cpp

329 stars 16.08 score 2.0k scripts 145 dependents

thomasp85

ggforce:Accelerating 'ggplot2'

The aim of 'ggplot2' is to aid in visual data investigations. This focus has led to a lack of facilities for composing specialised plots. 'ggforce' aims to be a collection of mainly new stats and geoms that fills this gap. All additional functionality is aimed to come through the official extension system so using 'ggforce' should be a stable experience.

Maintained by Thomas Lin Pedersen. Last updated 4 days ago.

ggplot-extension ggplot2 visualization cpp

929 stars 15.98 score 9.3k scripts 298 dependents

bioc

rhdf5:R Interface to HDF5

This package provides an interface between HDF5 and R. HDF5's main features are the ability to store and access very large and/or complex datasets and a wide variety of metadata on mass storage (disk) through a completely portable file format. The rhdf5 package is thus suited for the exchange of large and/or complex datasets between R and other software package, and for letting R applications work on datasets that are larger than the available RAM.

Maintained by Mike Smith. Last updated 5 days ago.

infrastructure dataimport hdf5 rhdf5 openssl curl zlib cpp

62 stars 15.87 score 4.2k scripts 232 dependents

r-lib

later:Utilities for Scheduling Functions to Execute Later with Event Loops

Executes arbitrary R or C functions some time after the current time, after the R execution stack has emptied. The functions are scheduled in an event loop.

Maintained by Winston Chang. Last updated 2 months ago.

cpp

143 stars 15.86 score 234 scripts 2.6k dependents

jeroen

V8:Embedded JavaScript and WebAssembly Engine for R

An R interface to V8 <https://v8.dev>: Google's open source JavaScript and WebAssembly engine. This package can be compiled either with V8 version 6 and up or NodeJS when built as a shared library.

Maintained by Jeroen Ooms. Last updated 6 days ago.

javascript libv8 wasm nodejs cpp

201 stars 15.81 score 508 scripts 337 dependents

stan-dev

rstanarm:Bayesian Applied Regression Modeling via Stan

Estimates previously compiled regression models using the 'rstan' package, which provides the R interface to the Stan C++ library for Bayesian estimation. Users specify models via the customary R syntax with a formula and data.frame plus some additional arguments for priors.

Maintained by Ben Goodrich. Last updated 10 days ago.

bayesian bayesian-data-analysis bayesian-inference bayesian-methods bayesian-statistics multilevel-models rstan rstanarm stan statistical-modeling cpp

393 stars 15.70 score 5.0k scripts 13 dependents

rcppcore

RcppEigen:'Rcpp' Integration for the 'Eigen' Templated Linear Algebra Library

R and 'Eigen' integration using 'Rcpp'. 'Eigen' is a C++ template library for linear algebra: matrices, vectors, numerical solvers and related algorithms. It supports dense and sparse matrices on integer, floating point and complex numbers, decompositions of such matrices, and solutions of linear systems. Its performance on many algorithms is comparable with some of the best implementations based on 'Lapack' and level-3 'BLAS'. The 'RcppEigen' package includes the header files from the 'Eigen' C++ template library. Thus users do not need to install 'Eigen' itself in order to use 'RcppEigen'. Since version 3.1.1, 'Eigen' is licensed under the Mozilla Public License (version 2); earlier version were licensed under the GNU LGPL version 3 or later. 'RcppEigen' (the 'Rcpp' bindings/bridge to 'Eigen') is licensed under the GNU GPL version 2 or later, as is the rest of 'Rcpp'.

Maintained by Dirk Eddelbuettel. Last updated 7 months ago.

algorithm c-plus-plus eigen eigen-library openblas cpp

114 stars 15.66 score 356 scripts 3.8k dependents

r-lib

systemfonts:System Native Font Finding

Provides system native access to the font catalogue. As font handling varies between systems it is difficult to correctly locate installed fonts across different operating systems. The 'systemfonts' package provides bindings to the native libraries on Windows, macOS and Linux for finding font files that can then be used further by e.g. graphic devices. The main use is intended to be from compiled code but 'systemfonts' also provides access from R.

Maintained by Thomas Lin Pedersen. Last updated 2 months ago.

fonts fontconfig freetype cpp

95 stars 15.62 score 384 scripts 990 dependents

mhahsler

dbscan:Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Related Algorithms

A fast reimplementation of several density-based algorithms of the DBSCAN family. Includes the clustering algorithms DBSCAN (density-based spatial clustering of applications with noise) and HDBSCAN (hierarchical DBSCAN), the ordering algorithm OPTICS (ordering points to identify the clustering structure), shared nearest neighbor clustering, and the outlier detection algorithms LOF (local outlier factor) and GLOSH (global-local outlier score from hierarchies). The implementations use the kd-tree data structure (from library ANN) for faster k-nearest neighbor search. An R interface to fast kNN and fixed-radius NN search is also provided. Hahsler, Piekenbrock and Doran (2019) <doi:10.18637/jss.v091.i01>.

Maintained by Michael Hahsler. Last updated 2 months ago.

clustering dbscan density-based-clustering hdbscan lof optics cpp

324 stars 15.60 score 1.6k scripts 85 dependents

facebook

prophet:Automatic Forecasting Procedure

Implements a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.

Maintained by Sean Taylor. Last updated 5 months ago.

forecasting python cpp

19k stars 15.59 score 976 scripts 13 dependents

r-lib

svglite:An 'SVG' Graphics Device

A graphics device for R that produces 'Scalable Vector Graphics'. 'svglite' is a fork of the older 'RSvgDevice' package.

Maintained by Thomas Lin Pedersen. Last updated 5 months ago.

svg libpng cpp

181 stars 15.57 score 4.7k scripts 228 dependents

rstudio

httpuv:HTTP and WebSocket Server Library

Provides low-level socket and protocol support for handling HTTP and WebSocket requests directly from within R. It is primarily intended as a building block for other packages, rather than making it particularly easy to create complete web applications using httpuv alone. httpuv is built on top of the libuv and http-parser C libraries, both of which were developed by Joyent, Inc. (See LICENSE file for libuv and http-parser license information.)

Maintained by Winston Chang. Last updated 10 days ago.

libuv1 cpp

236 stars 15.42 score 708 scripts 2.1k dependents

bioc

Rsamtools:Binary alignment (BAM), FASTA, variant call (BCF), and tabix file import

This package provides an interface to the 'samtools', 'bcftools', and 'tabix' utilities for manipulating SAM (Sequence Alignment / Map), FASTA, binary variant call (BCF) and compressed indexed tab-delimited (tabix) files.

Maintained by Bioconductor Package Maintainer. Last updated 4 months ago.

dataimport sequencing coverage alignment qualitycontrol bioconductor-package core-package curl bzip2 xz-utils zlib cpp

28 stars 15.34 score 3.2k scripts 569 dependents

dankelley

oce:Analysis of Oceanographic Data

Supports the analysis of Oceanographic data, including 'ADCP' measurements, measurements made with 'argo' floats, 'CTD' measurements, sectional data, sea-level time series, coastline and topographic data, etc. Provides specialized functions for calculating seawater properties such as potential temperature in either the 'UNESCO' or 'TEOS-10' equation of state. Produces graphical displays that conform to the conventions of the Oceanographic literature. This package is discussed extensively by Kelley (2018) "Oceanographic Analysis with R" <doi:10.1007/978-1-4939-8844-0>.

Maintained by Dan Kelley. Last updated 2 days ago.

oceanography fortran cpp

146 stars 15.34 score 4.2k scripts 18 dependents

rstudio

sass:Syntactically Awesome Style Sheets ('Sass')

An 'SCSS' compiler, powered by the 'LibSass' library. With this, R developers can use variables, inheritance, and functions to generate dynamic style sheets. The package uses the 'Sass CSS' extension language, which is stable, powerful, and CSS compatible.

Maintained by Carson Sievert. Last updated 11 months ago.

cpp

100 stars 15.30 score 252 scripts 4.3k dependents

xrobin

pROC:Display and Analyze ROC Curves

Tools for visualizing, smoothing and comparing receiver operating characteristic (ROC curves). (Partial) area under the curve (AUC) can be compared with statistical tests based on U-statistics or bootstrap. Confidence intervals can be computed for (p)AUC or ROC curves.

Maintained by Xavier Robin. Last updated 5 months ago.

bootstrapping covariance hypothesis-testing machine-learning plot plotting roc roc-curve variance cpp

125 stars 15.18 score 16k scripts 445 dependents

r-lib

ragg:Graphic Devices Based on AGG

Anti-Grain Geometry (AGG) is a high-quality and high-performance 2D drawing library. The 'ragg' package provides a set of graphic devices based on AGG to use as alternative to the raster devices provided through the 'grDevices' package.

Maintained by Thomas Lin Pedersen. Last updated 11 days ago.

drawing graphics vector-graphics freetype libpng tiff libjpeg-turbo cpp

175 stars 15.18 score 1.8k scripts 485 dependents

trevorhastie

glmnet:Lasso and Elastic-Net Regularized Generalized Linear Models

Extremely efficient procedures for fitting the entire lasso or elastic-net regularization path for linear regression, logistic and multinomial regression models, Poisson regression, Cox model, multiple-response Gaussian, and the grouped multinomial regression; see <doi:10.18637/jss.v033.i01> and <doi:10.18637/jss.v039.i05>. There are two new and important additions. The family argument can be a GLM family object, which opens the door to any programmed family (<doi:10.18637/jss.v106.i01>). This comes with a modest computational cost, so when the built-in families suffice, they should be used instead. The other novelty is the relax option, which refits each of the active sets in the path unpenalized. The algorithm uses cyclical coordinate descent in a path-wise fashion, as described in the papers cited.

Maintained by Trevor Hastie. Last updated 2 years ago.

fortran cpp

82 stars 15.15 score 22k scripts 736 dependents

adeverse

ade4:Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences

Tools for multivariate data analysis. Several methods are provided for the analysis (i.e., ordination) of one-table (e.g., principal component analysis, correspondence analysis), two-table (e.g., coinertia analysis, redundancy analysis), three-table (e.g., RLQ analysis) and K-table (e.g., STATIS, multiple coinertia analysis). The philosophy of the package is described in Dray and Dufour (2007) <doi:10.18637/jss.v022.i04>.

Maintained by Aurélie Siberchicot. Last updated 9 days ago.

openblas cpp

40 stars 15.10 score 2.2k scripts 257 dependents

kosukeimai

MatchIt:Nonparametric Preprocessing for Parametric Causal Inference

Selects matched samples of the original treated and control groups with similar covariate distributions -- can be used to match exactly on covariates, to match on propensity scores, or perform a variety of other matching procedures. The package also implements a series of recommendations offered in Ho, Imai, King, and Stuart (2007) <DOI:10.1093/pan/mpl013>. (The 'gurobi' package, which is not on CRAN, is optional and comes with an installation of the Gurobi Optimizer, available at <https://www.gurobi.com>.)

Maintained by Noah Greifer. Last updated 14 days ago.

cpp openmp

220 stars 15.03 score 2.4k scripts 21 dependents

r-lib

isoband:Generate Isolines and Isobands from Regularly Spaced Elevation Grids

A fast C++ implementation to generate contour lines (isolines) and contour polygons (isobands) from regularly spaced grids containing elevation data.

Maintained by Hadley Wickham. Last updated 2 years ago.

cpp

132 stars 15.01 score 75 scripts 7.6k dependents

mjskay

ggdist:Visualizations of Distributions and Uncertainty

Provides primitives for visualizing distributions using 'ggplot2' that are particularly tuned for visualizing uncertainty in either a frequentist or Bayesian mode. Both analytical distributions (such as frequentist confidence distributions or Bayesian priors) and distributions represented as samples (such as bootstrap distributions or Bayesian posterior samples) are easily visualized. Visualization primitives include but are not limited to: points with multiple uncertainty intervals, eye plots (Spiegelhalter D., 1999) <https://ideas.repec.org/a/bla/jorssa/v162y1999i1p45-58.html>, density plots, gradient plots, dot plots (Wilkinson L., 1999) <doi:10.1080/00031305.1999.10474474>, quantile dot plots (Kay M., Kola T., Hullman J., Munson S., 2016) <doi:10.1145/2858036.2858558>, complementary cumulative distribution function barplots (Fernandes M., Walls L., Munson S., Hullman J., Kay M., 2018) <doi:10.1145/3173574.3173718>, and fit curves with multiple uncertainty ribbons.

Maintained by Matthew Kay. Last updated 4 months ago.

ggplot2 uncertainty uncertainty-visualization visualization cpp

859 stars 14.95 score 3.1k scripts 62 dependents

philchalmers

mirt:Multidimensional Item Response Theory

Analysis of discrete response data using unidimensional and multidimensional item analysis models under the Item Response Theory paradigm (Chalmers (2012) <doi:10.18637/jss.v048.i06>). Exploratory and confirmatory item factor analysis models are estimated with quadrature (EM) or stochastic (MHRM) methods. Confirmatory bi-factor and two-tier models are available for modeling item testlets using dimension reduction EM algorithms, while multiple group analyses and mixed effects designs are included for detecting differential item, bundle, and test functioning, and for modeling item and person covariates. Finally, latent class models such as the DINA, DINO, multidimensional latent class, mixture IRT models, and zero-inflated response models are supported, as well as a wide family of probabilistic unfolding models.

Maintained by Phil Chalmers. Last updated 2 days ago.

irt mirt openblas cpp openmp

212 stars 14.93 score 2.5k scripts 40 dependents

rcppcore

RcppParallel:Parallel Programming Tools for 'Rcpp'

High level functions for parallel programming with 'Rcpp'. For example, the 'parallelFor()' function can be used to convert the work of a standard serial "for" loop into a parallel one and the 'parallelReduce()' function can be used for accumulating aggregate or other values.

Maintained by Kevin Ushey. Last updated 10 days ago.

onetbb cpp

174 stars 14.89 score 215 scripts 800 dependents

r-dbi

RPostgres:C++ Interface to PostgreSQL

Fully DBI-compliant C++-backed interface to PostgreSQL <https://www.postgresql.org/>, an open-source relational database.

Maintained by Kirill Müller. Last updated 1 months ago.

database postgres postgresql cpp

338 stars 14.78 score 1.6k scripts 31 dependents

thomasp85

tidygraph:A Tidy API for Graph Manipulation

A graph, while not "tidy" in itself, can be thought of as two tidy data frames describing node and edge data respectively. 'tidygraph' provides an approach to manipulate these two virtual data frames using the API defined in the 'dplyr' package, as well as provides tidy interfaces to a lot of common graph algorithms.

Maintained by Thomas Lin Pedersen. Last updated 2 months ago.

graph-algorithms graph-manipulation igraph network-analysis tidyverse cpp

553 stars 14.74 score 4.6k scripts 136 dependents

lrberge

fixest:Fast Fixed-Effects Estimations

Fast and user-friendly estimation of econometric models with multiple fixed-effects. Includes ordinary least squares (OLS), generalized linear models (GLM) and the negative binomial. The core of the package is based on optimized parallel C++ code, scaling especially well for large data sets. The method to obtain the fixed-effects coefficients is based on Berge (2018) <https://github.com/lrberge/fixest/blob/master/_DOCS/FENmlm_paper.pdf>. Further provides tools to export and view the results of several estimations with intuitive design to cluster the standard-errors.

Maintained by Laurent Berge. Last updated 7 months ago.

cpp openmp

394 stars 14.69 score 3.8k scripts 26 dependents

vincentarelbundock

marginaleffects:Predictions, Comparisons, Slopes, Marginal Means, and Hypothesis Tests

Compute and plot predictions, slopes, marginal means, and comparisons (contrasts, risk ratios, odds, etc.) for over 100 classes of statistical and machine learning models in R. Conduct linear and non-linear hypothesis tests, or equivalence tests. Calculate uncertainty estimates using the delta method, bootstrapping, or simulation-based inference. Details can be found in Arel-Bundock, Greifer, and Heiss (2024) <doi:10.18637/jss.v111.i09>.

Maintained by Vincent Arel-Bundock. Last updated 3 days ago.

cpp

509 stars 14.56 score 1.8k scripts 10 dependents

alexcb

rjson:JSON for R

Converts R object into JSON objects and vice-versa.

Maintained by Alex Couture-Beil. Last updated 6 months ago.

cpp

21 stars 14.54 score 7.4k scripts 852 dependents

r-lib

clock:Date-Time Types and Tools

Provides a comprehensive library for date-time manipulations using a new family of orthogonal date-time classes (durations, time points, zoned-times, and calendars) that partition responsibilities so that the complexities of time zones are only considered when they are really needed. Capabilities include: date-time parsing, formatting, arithmetic, extraction and updating of components, and rounding.

Maintained by Davis Vaughan. Last updated 12 days ago.

cpp

106 stars 14.53 score 296 scripts 407 dependents

ropensci

osmdata:Import 'OpenStreetMap' Data as Simple Features or Spatial Objects

Download and import of 'OpenStreetMap' ('OSM') data as 'sf' or 'sp' objects. 'OSM' data are extracted from the 'Overpass' web server (<https://overpass-api.de/>) and processed with very fast 'C++' routines for return to 'R'.

Maintained by Mark Padgham. Last updated 1 months ago.

open0street0map openstreetmap overpass0api osm cpp osm-data overpass-api peer-reviewed cpp

322 stars 14.53 score 2.8k scripts 14 dependents

r-lidar

lidR:Airborne LiDAR Data Manipulation and Visualization for Forestry Applications

Airborne LiDAR (Light Detection and Ranging) interface for data manipulation and visualization. Read/write 'las' and 'laz' files, computation of metrics in area based approach, point filtering, artificial point reduction, classification from geographic data, normalization, individual tree segmentation and other manipulations.

Maintained by Jean-Romain Roussel. Last updated 2 months ago.

als forestry las laz lidar point-cloud remote-sensing openblas cpp openmp

623 stars 14.47 score 844 scripts 8 dependents

statistikat

VIM:Visualization and Imputation of Missing Values

New tools for the visualization of missing and/or imputed values are introduced, which can be used for exploring the data and the structure of the missing and/or imputed values. Depending on this structure of the missing values, the corresponding methods may help to identify the mechanism generating the missing values and allows to explore the data including missing values. In addition, the quality of imputation can be visually explored using various univariate, bivariate, multiple and multivariate plot methods. A graphical user interface available in the separate package VIMGUI allows an easy handling of the implemented plot methods.

Maintained by Matthias Templ. Last updated 8 months ago.

hotdeck imputation-methods model-predictions visualization cpp

85 stars 14.44 score 2.6k scripts 19 dependents

davidgohel

ggiraph:Make 'ggplot2' Graphics Interactive

Create interactive 'ggplot2' graphics using 'htmlwidgets'.

Maintained by David Gohel. Last updated 1 days ago.

libpng cpp

822 stars 14.37 score 4.1k scripts 35 dependents

bioc

xcms:LC-MS and GC-MS Data Analysis

Framework for processing and visualization of chromatographically separated and single-spectra mass spectral data. Imports from AIA/ANDI NetCDF, mzXML, mzData and mzML files. Preprocesses data for high-throughput, untargeted analyte profiling.

Maintained by Steffen Neumann. Last updated 15 days ago.

immunooncology massspectrometry metabolomics bioconductor feature-detection mass-spectrometry peak-detection cpp

196 stars 14.31 score 984 scripts 11 dependents

thomasp85

farver:High Performance Colour Space Manipulation

The encoding of colour can be handled in many different ways, using different colour spaces. As different colour spaces have different uses, efficient conversion between these representations are important. The 'farver' package provides a set of functions that gives access to very fast colour space conversion and comparisons implemented in C++, and offers speed improvements over the 'convertColor' function in the 'grDevices' package.

Maintained by Thomas Lin Pedersen. Last updated 11 months ago.

color-conversion cpp

136 stars 14.22 score 164 scripts 8.0k dependents

bioc

GOSemSim:GO-terms Semantic Similarity Measures

The semantic comparisons of Gene Ontology (GO) annotations provide quantitative ways to compute similarities between genes and gene groups, and have became important basis for many bioinformatics analysis approaches. GOSemSim is an R package for semantic similarity computation among GO terms, sets of GO terms, gene products and gene clusters. GOSemSim implemented five methods proposed by Resnik, Schlicker, Jiang, Lin and Wang respectively.

Maintained by Guangchuang Yu. Last updated 5 months ago.

annotation go clustering pathways network software bioinformatics gene-ontology semantic-similarity cpp

63 stars 14.12 score 708 scripts 68 dependents

qsbase

qs:Quick Serialization of R Objects

Provides functions for quickly writing and reading any R object to and from disk.

Maintained by Travers Ching. Last updated 8 days ago.

compression data-storage encoding serialization libzstd lz4 cpp

417 stars 14.05 score 2.5k scripts 51 dependents

r-lib

fastmap:Fast Data Structures

Fast implementation of data structures, including a key-value store, stack, and queue. Environments are commonly used as key-value stores in R, but every time a new key is used, it is added to R's global symbol table, causing a small amount of memory leakage. This can be problematic in cases where many different keys are used. Fastmap avoids this memory leak issue by implementing the map using data structures in C++.

Maintained by Winston Chang. Last updated 11 months ago.

cpp

134 stars 14.04 score 102 scripts 5.5k dependents

bertcarnell

lhs:Latin Hypercube Samples

Provides a number of methods for creating and augmenting Latin Hypercube Samples and Orthogonal Array Latin Hypercube Samples.

Maintained by Rob Carnell. Last updated 9 months ago.

latin-hypercube latin-hypercube-sample latin-hypercube-sampling lhs orthogonal-arrays cpp

45 stars 14.04 score 1.5k scripts 110 dependents

jkrijthe

Rtsne:T-Distributed Stochastic Neighbor Embedding using a Barnes-Hut Implementation

An R wrapper around the fast T-distributed Stochastic Neighbor Embedding implementation by Van der Maaten (see <https://github.com/lvdmaaten/bhtsne/> for more information on the original implementation).

Maintained by Jesse Krijthe. Last updated 10 months ago.

openblas cpp openmp

256 stars 14.01 score 4.4k scripts 233 dependents

r-forge

survey:Analysis of Complex Survey Samples

Summary statistics, two-sample tests, rank tests, generalised linear models, cumulative link models, Cox models, loglinear models, and general maximum pseudolikelihood estimation for multistage stratified, cluster-sampled, unequally weighted survey samples. Variances by Taylor series linearisation or replicate weights. Post-stratification, calibration, and raking. Two-phase and multiphase subsampling designs. Graphics. PPS sampling without replacement. Small-area estimation. Dual-frame designs.

Maintained by "Thomas Lumley". Last updated 1 days ago.

cpp

1 stars 13.94 score 13k scripts 234 dependents

eddelbuettel

anytime:Anything to 'POSIXct' or 'Date' Converter

Convert input in any one of character, integer, numeric, factor, or ordered type into 'POSIXct' (or 'Date') objects, using one of a number of predefined formats, and relying on Boost facilities for date and time parsing.

Maintained by Dirk Eddelbuettel. Last updated 17 days ago.

boost c-plus-plus-11 conversions cpp11 date datetime posixct rcpp cpp

165 stars 13.91 score 1.4k scripts 99 dependents

gbm-developers

gbm:Generalized Boosted Regression Models

An implementation of extensions to Freund and Schapire's AdaBoost algorithm and Friedman's gradient boosting machine. Includes regression methods for least squares, absolute loss, t-distribution loss, quantile regression, logistic, multinomial logistic, Poisson, Cox proportional hazards partial likelihood, AdaBoost exponential loss, Huberized hinge loss, and Learning to Rank measures (LambdaMart). Originally developed by Greg Ridgeway. Newer version available at github.com/gbm-developers/gbm3.

Maintained by Greg Ridgeway. Last updated 9 months ago.

cpp

52 stars 13.85 score 6.8k scripts 91 dependents

config-i1

smooth:Forecasting Using State Space Models

Functions implementing Single Source of Error state space models for purposes of time series analysis and forecasting. The package includes ADAM (Svetunkov, 2023, <https://openforecast.org/adam/>), Exponential Smoothing (Hyndman et al., 2008, <doi: 10.1007/978-3-540-71918-2>), SARIMA (Svetunkov & Boylan, 2019 <doi: 10.1080/00207543.2019.1600764>), Complex Exponential Smoothing (Svetunkov & Kourentzes, 2018, <doi: 10.13140/RG.2.2.24986.29123>), Simple Moving Average (Svetunkov & Petropoulos, 2018 <doi: 10.1080/00207543.2017.1380326>) and several simulation functions. It also allows dealing with intermittent demand based on the iETS framework (Svetunkov & Boylan, 2019, <doi: 10.13140/RG.2.2.35897.06242>).

Maintained by Ivan Svetunkov. Last updated 12 days ago.

arima arima-forecasting ces ets exponential-smoothing forecast state-space time-series openblas cpp

90 stars 13.83 score 412 scripts 25 dependents

duckdb

duckdb:DBI Package for the DuckDB Database Management System

The DuckDB project is an embedded analytical data management system with support for the Structured Query Language (SQL). This package includes all of DuckDB and an R Database Interface (DBI) connector.

Maintained by Kirill Müller. Last updated 10 days ago.

database duckdb olap cpp

159 stars 13.80 score 1.7k scripts 46 dependents

rspatial

geosphere:Spherical Trigonometry

Spherical trigonometry for geographic applications. That is, compute distances and related measures for angular (longitude/latitude) locations.

Maintained by Robert J. Hijmans. Last updated 6 months ago.

cpp

36 stars 13.79 score 5.7k scripts 116 dependents

r-spatial

s2:Spherical Geometry Operators Using the S2 Geometry Library

Provides R bindings for Google's s2 library for geometric calculations on the sphere. High-performance constructors and exporters provide high compatibility with existing spatial packages, transformers construct new geometries from existing geometries, predicates provide a means to select geometries based on spatial relationships, and accessors extract information about geometries.

Maintained by Edzer Pebesma. Last updated 12 days ago.

openssl cpp

74 stars 13.76 score 207 scripts 1.2k dependents

immunogenomics

harmony:Fast, Sensitive, and Accurate Integration of Single Cell Data

Implementation of the Harmony algorithm for single cell integration, described in Korsunsky et al <doi:10.1038/s41592-019-0619-0>. Package includes a standalone Harmony function and interfaces to external frameworks.

Maintained by Ilya Korsunsky. Last updated 5 months ago.

algorithm data-integration scrna-seq openblas cpp

554 stars 13.74 score 5.5k scripts 8 dependents

richarddmorey

BayesFactor:Computation of Bayes Factors for Common Designs

A suite of functions for computing various Bayes factors for simple designs, including contingency tables, one- and two-sample designs, one-way designs, general ANOVA designs, and linear regression.

Maintained by Richard D. Morey. Last updated 1 years ago.

cpp

132 stars 13.71 score 1.7k scripts 21 dependents

knausb

vcfR:Manipulate and Visualize VCF Data

Facilitates easy manipulation of variant call format (VCF) data. Functions are provided to rapidly read from and write to VCF files. Once VCF data is read into R a parser function extracts matrices of data. This information can then be used for quality control or other purposes. Additional functions provide visualization of genomic data. Once processing is complete data may be written to a VCF file (*.vcf.gz). It also may be converted into other popular R objects (e.g., genlight, DNAbin). VcfR provides a link between VCF data and familiar R software.

Maintained by Brian J. Knaus. Last updated 1 months ago.

genomics population-genetics population-genomics rcpp vcf-data visualization zlib cpp

256 stars 13.66 score 3.1k scripts 19 dependents

janmarvin

openxlsx2:Read, Write and Edit 'xlsx' Files

Simplifies the creation of 'xlsx' files by providing a high level interface to writing, styling and editing worksheets.

Maintained by Jan Marvin Garbuszus. Last updated 11 hours ago.

xlsx cpp

139 stars 13.64 score 194 scripts 11 dependents

r-lib

textshaping:Bindings to the 'HarfBuzz' and 'Fribidi' Libraries for Text Shaping

Provides access to the text shaping functionality in the 'HarfBuzz' library and the bidirectional algorithm in the 'Fribidi' library. 'textshaping' is a low-level utility package mainly for graphic devices that expands upon the font tool-set provided by the 'systemfonts' package.

Maintained by Thomas Lin Pedersen. Last updated 2 months ago.

harfbuzz freetype fribidi cpp

19 stars 13.58 score 66 scripts 484 dependents

tidyverts

fable:Forecasting Models for Tidy Time Series

Provides a collection of commonly used univariate and multivariate time series forecasting models including automatically selected exponential smoothing (ETS) and autoregressive integrated moving average (ARIMA) models. These models work within the 'fable' framework provided by the 'fabletools' package, which provides the tools to evaluate, visualise, and combine models in a workflow consistent with the tidyverse.

Maintained by Mitchell OHara-Wild. Last updated 4 months ago.

forecasting cpp

569 stars 13.54 score 2.1k scripts 6 dependents

asgr

imager:Image Processing Library Based on 'CImg'

Fast image processing for images in up to 4 dimensions (two spatial dimensions, one time/depth dimension, one colour dimension). Provides most traditional image processing tools (filtering, morphology, transformations, etc.) as well as various functions for easily analysing image data using R. The package wraps 'CImg', <http://cimg.eu>, a simple, modern C++ library for image processing.

Maintained by Aaron Robotham. Last updated 5 days ago.

libx11 fftw3 tiff cpp openmp

17 stars 13.53 score 2.4k scripts 44 dependents

dselivanov

text2vec:Modern Text Mining Framework for R

Fast and memory-friendly tools for text vectorization, topic modeling (LDA, LSA), word embeddings (GloVe), similarities. This package provides a source-agnostic streaming API, which allows researchers to perform analysis of collections of documents which are larger than available RAM. All core functions are parallelized to benefit from multicore machines.

Maintained by Dmitriy Selivanov. Last updated 8 months ago.

glove latent-dirichlet-allocation natural-language-processing text-mining topic-modeling vectorization word-embeddings word2vec cpp

860 stars 13.48 score 1.3k scripts 23 dependents

kkholst

mets:Analysis of Multivariate Event Times

Implementation of various statistical models for multivariate event history data <doi:10.1007/s10985-013-9244-x>. Including multivariate cumulative incidence models <doi:10.1002/sim.6016>, and bivariate random effects probit models (Liability models) <doi:10.1016/j.csda.2015.01.014>. Modern methods for survival analysis, including regression modelling (Cox, Fine-Gray, Ghosh-Lin, Binomial regression) with fast computation of influence functions.

Maintained by Klaus K. Holst. Last updated 19 hours ago.

multivariate-time-to-event survival-analysis time-to-event fortran openblas cpp

14 stars 13.46 score 236 scripts 42 dependents

ironholds

urltools:Vectorised Tools for URL Handling and Parsing

A toolkit for all URL-handling needs, including encoding and decoding, parsing, parameter extraction and modification. All functions are designed to be both fast and entirely vectorised. It is intended to be useful for people dealing with web-related datasets, such as server-side logs, although may be useful for other situations involving large sets of URLs.

Maintained by Os Keyes. Last updated 4 years ago.

access-logs data-import url cpp

131 stars 13.43 score 968 scripts 264 dependents

ropensci

tokenizers:Fast, Consistent Tokenization of Natural Language Text

Convert natural language text into tokens. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, shingled characters, lines, Penn Treebank, regular expressions, as well as functions for counting characters, words, and sentences, and a function for splitting longer texts into separate documents, each with the same number of words. The tokenizers have a consistent interface, and the package is built on the 'stringi' and 'Rcpp' packages for fast yet correct tokenization in 'UTF-8'.

Maintained by Thomas Charlon. Last updated 1 years ago.

nlp peer-reviewed text-mining tokenizer cpp

186 stars 13.33 score 1.1k scripts 81 dependents

chjackson

flexsurv:Flexible Parametric Survival and Multi-State Models

Flexible parametric models for time-to-event data, including the Royston-Parmar spline model, generalized gamma and generalized F distributions. Any user-defined parametric distribution can be fitted, given at least an R function defining the probability density or hazard. There are also tools for fitting and predicting from fully parametric multi-state models, based on either cause-specific hazards or mixture models.

Maintained by Christopher Jackson. Last updated 2 months ago.

cpp

57 stars 13.31 score 632 scripts 43 dependents

ropensci

hunspell:High-Performance Stemmer, Tokenizer, and Spell Checker

Low level spell checker and morphological analyzer based on the famous 'hunspell' library <https://hunspell.github.io>. The package can analyze or check individual words as well as parse text, latex, html or xml documents. For a more user-friendly interface use the 'spelling' package which builds on this package to automate checking of files, documentation and vignettes in all common formats.

Maintained by Jeroen Ooms. Last updated 7 days ago.

hunspell spell-check spellchecker stemmer tokenizer cpp

112 stars 13.23 score 422 scripts 30 dependents

bioc

dada2:Accurate, high-resolution sample inference from amplicon sequencing data

The dada2 package infers exact amplicon sequence variants (ASVs) from high-throughput amplicon sequencing data, replacing the coarser and less accurate OTU clustering approach. The dada2 pipeline takes as input demultiplexed fastq files, and outputs the sequence variants and their sample-wise abundances after removing substitution and chimera errors. Taxonomic classification is available via a native implementation of the RDP naive Bayesian classifier, and species-level assignment to 16S rRNA gene fragments by exact matching.

Maintained by Benjamin Callahan. Last updated 5 months ago.

immunooncology microbiome sequencing classification metagenomics amplicon bioconductor bioinformatics metabarcoding taxonomy cpp

487 stars 13.17 score 3.0k scripts 4 dependents

fstpackage

fst:Lightning Fast Serialization of Data Frames

Multithreaded serialization of compressed data frames using the 'fst' format. The 'fst' format allows for full random access of stored data and a wide range of compression settings using the LZ4 and ZSTD compressors.

Maintained by Mark Klik. Last updated 6 months ago.

compression data-frame data-storage cpp

624 stars 13.16 score 1.9k scripts 56 dependents

daqana

dqrng:Fast Pseudo Random Number Generators

Several fast random number generators are provided as C++ header only libraries: The PCG family by O'Neill (2014 <https://www.cs.hmc.edu/tr/hmc-cs-2014-0905.pdf>) as well as the Xoroshiro / Xoshiro family by Blackman and Vigna (2021 <doi:10.1145/3460772>). In addition fast functions for generating random numbers according to a uniform, normal and exponential distribution are included. The latter two use the Ziggurat algorithm originally proposed by Marsaglia and Tsang (2000, <doi:10.18637/jss.v005.i08>). The fast sampling methods support unweighted sampling both with and without replacement. These functions are exported to R and as a C++ interface and are enabled for use with the default 64 bit generator from the PCG family, Xoroshiro128+/++/** and Xoshiro256+/++/** as well as the 64 bit version of the 20 rounds Threefry engine (Salmon et al., 2011, <doi:10.1145/2063384.2063405>) as provided by the package 'sitmo'.

Maintained by Ralf Stubner. Last updated 7 months ago.

random random-distributions random-generation random-sampling rng cpp

42 stars 13.12 score 188 scripts 183 dependents

bioc

pcaMethods:A collection of PCA methods

Provides Bayesian PCA, Probabilistic PCA, Nipals PCA, Inverse Non-Linear PCA and the conventional SVD PCA. A cluster based method for missing value estimation is included for comparison. BPCA, PPCA and NipalsPCA may be used to perform PCA on incomplete data as well as for accurate missing value estimation. A set of methods for printing and plotting the results is also provided. All PCA methods make use of the same data structure (pcaRes) to provide a common interface to the PCA results. Initiated at the Max-Planck Institute for Molecular Plant Physiology, Golm, Germany.

Maintained by Henning Redestig. Last updated 5 months ago.

bayesian cpp

49 stars 13.10 score 538 scripts 73 dependents

ropensci

pdftools:Text Extraction, Rendering and Converting of PDF Documents

Utilities based on 'libpoppler' <https://poppler.freedesktop.org> for extracting text, fonts, attachments and metadata from a PDF file. Also supports high quality rendering of PDF documents into PNG, JPEG, TIFF format, or into raw bitmap vectors for further processing in R.

Maintained by Jeroen Ooms. Last updated 26 days ago.

pdf-files pdf-format pdftools poppler poppler-library text-extraction cpp

529 stars 13.10 score 3.3k scripts 47 dependents

tagteam

riskRegression:Risk Regression Models and Prediction Scores for Survival Analysis with Competing Risks

Implementation of the following methods for event history analysis. Risk regression models for survival endpoints also in the presence of competing risks are fitted using binomial regression based on a time sequence of binary event status variables. A formula interface for the Fine-Gray regression model and an interface for the combination of cause-specific Cox regression models. A toolbox for assessing and comparing performance of risk predictions (risk markers and risk prediction models). Prediction performance is measured by the Brier score and the area under the ROC curve for binary possibly time-dependent outcome. Inverse probability of censoring weighting and pseudo values are used to deal with right censored data. Lists of risk markers and lists of risk models are assessed simultaneously. Cross-validation repeatedly splits the data, trains the risk prediction models on one part of each split and then summarizes and compares the performance across splits.

Maintained by Thomas Alexander Gerds. Last updated 30 days ago.

openblas cpp

47 stars 13.07 score 736 scripts 37 dependents

bioc

scran:Methods for Single-Cell RNA-Seq Data Analysis

Implements miscellaneous functions for interpretation of single-cell RNA-seq data. Methods are provided for assignment of cell cycle phase, detection of highly variable and significantly correlated genes, identification of marker genes, and other common tasks in routine single-cell analysis workflows.

Maintained by Aaron Lun. Last updated 5 months ago.

immunooncology normalization sequencing rnaseq software geneexpression transcriptomics singlecell clustering bioconductor-package human-cell-atlas single-cell-rna-seq openblas cpp

41 stars 13.05 score 7.6k scripts 37 dependents

biodiverse

unmarked:Models for Data from Unmarked Animals

Fits hierarchical models of animal abundance and occurrence to data collected using survey methods such as point counts, site occupancy sampling, distance sampling, removal sampling, and double observer sampling. Parameters governing the state and observation processes can be modeled as functions of covariates. References: Kellner et al. (2023) <doi:10.1111/2041-210X.14123>, Fiske and Chandler (2011) <doi:10.18637/jss.v043.i10>.

Maintained by Ken Kellner. Last updated 10 days ago.

openblas cpp openmp

4 stars 13.02 score 652 scripts 12 dependents

r-forge

tm:Text Mining Package

A framework for text mining applications within R.

Maintained by Kurt Hornik. Last updated 1 months ago.

cpp

13.00 score 14k scripts 100 dependents

davidcsterratt

geometry:Mesh Generation and Surface Tessellation

Makes the 'Qhull' library <http://www.qhull.org> available in R, in a similar manner as in Octave and MATLAB. Qhull computes convex hulls, Delaunay triangulations, halfspace intersections about a point, Voronoi diagrams, furthest-site Delaunay triangulations, and furthest-site Voronoi diagrams. It runs in 2D, 3D, 4D, and higher dimensions. It implements the Quickhull algorithm for computing the convex hull. Qhull does not support constrained Delaunay triangulations, or mesh generation of non-convex objects, but the package does include some R functions that allow for this.

Maintained by David C. Sterratt. Last updated 2 months ago.

cpp

16 stars 12.98 score 776 scripts 139 dependents

nimble-dev

nimble:MCMC, Particle Filtering, and Programmable Hierarchical Modeling

A system for writing hierarchical statistical models largely compatible with 'BUGS' and 'JAGS', writing nimbleFunctions to operate models and do basic R-style math, and compiling both models and nimbleFunctions via custom-generated C++. 'NIMBLE' includes default methods for MCMC, Laplace Approximation, Monte Carlo Expectation Maximization, and some other tools. The nimbleFunction system makes it easy to do things like implement new MCMC samplers from R, customize the assignment of samplers to different parts of a model from R, and compile the new samplers automatically via C++ alongside the samplers 'NIMBLE' provides. 'NIMBLE' extends the 'BUGS'/'JAGS' language by making it extensible: New distributions and functions can be added, including as calls to external compiled code. Although most people think of MCMC as the main goal of the 'BUGS'/'JAGS' language for writing models, one can use 'NIMBLE' for writing arbitrary other kinds of model-generic algorithms as well. A full User Manual is available at <https://r-nimble.org>.

Maintained by Christopher Paciorek. Last updated 17 days ago.

bayesian-inference bayesian-methods hierarchical-models mcmc probabilistic-programming openblas cpp

169 stars 12.97 score 2.6k scripts 19 dependents

r-spatial

lwgeom:Bindings to Selected 'liblwgeom' Functions for Simple Features

Access to selected functions found in 'liblwgeom' <https://github.com/postgis/postgis/tree/master/liblwgeom>, the light-weight geometry library used by 'PostGIS' <http://postgis.net/>.

Maintained by Edzer Pebesma. Last updated 2 months ago.

proj geos cpp

61 stars 12.95 score 1.7k scripts 66 dependents

openair-project

openair:Tools for the Analysis of Air Pollution Data

Tools to analyse, interpret and understand air pollution data. Data are typically regular time series and air quality measurement, meteorological data and dispersion model output can be analysed. The package is described in Carslaw and Ropkins (2012, <doi:10.1016/j.envsoft.2011.09.008>) and subsequent papers.

Maintained by David Carslaw. Last updated 1 days ago.

air-quality air-quality-data meteorology openair cpp

316 stars 12.94 score 1.2k scripts 12 dependents

thomasp85

tweenr:Interpolate Data for Smooth Animations

In order to create smooth animation between states of data, tweening is necessary. This package provides a range of functions for creating tweened data that can be used as basis for animation. Furthermore it adds a number of vectorized interpolaters for common R data types such as numeric, date and colour.

Maintained by Thomas Lin Pedersen. Last updated 1 years ago.

animation plotting transition tweening cpp

399 stars 12.93 score 440 scripts 324 dependents

cvxgrp

CVXR:Disciplined Convex Optimization

An object-oriented modeling language for disciplined convex programming (DCP) as described in Fu, Narasimhan, and Boyd (2020, <doi:10.18637/jss.v094.i14>). It allows the user to formulate convex optimization problems in a natural way following mathematical convention and DCP rules. The system analyzes the problem, verifies its convexity, converts it into a canonical form, and hands it off to an appropriate solver to obtain the solution. Interfaces to solvers on CRAN and elsewhere are provided, both commercial and open source.

Maintained by Anqi Fu. Last updated 5 months ago.

cpp

207 stars 12.89 score 768 scripts 51 dependents

paleolimbot

wk:Lightweight Well-Known Geometry Parsing

Provides a minimal R and C++ API for parsing well-known binary and well-known text representation of geometries to and from R-native formats. Well-known binary is compact and fast to parse; well-known text is human-readable and is useful for writing tests. These formats are useful in R only if the information they contain can be accessed in R, for which high-performance functions are provided here.

Maintained by Dewey Dunnington. Last updated 6 months ago.

cpp

47 stars 12.85 score 89 scripts 1.2k dependents

bioc

SingleR:Reference-Based Single-Cell RNA-Seq Annotation

Performs unbiased cell type recognition from single-cell RNA sequencing data, by leveraging reference transcriptomic datasets of pure cell types to infer the cell of origin of each single cell independently.

Maintained by Aaron Lun. Last updated 1 months ago.

software singlecell geneexpression transcriptomics classification clustering annotation bioconductor singler cpp

184 stars 12.83 score 2.1k scripts 2 dependents

tkonopka

umap:Uniform Manifold Approximation and Projection

Uniform manifold approximation and projection is a technique for dimension reduction. The algorithm was described by McInnes and Healy (2018) in <arXiv:1802.03426>. This package provides an interface for two implementations. One is written from scratch, including components for nearest-neighbor search and for embedding. The second implementation is a wrapper for 'python' package 'umap-learn' (requires separate installation, see vignette for more details).

Maintained by Tomasz Konopka. Last updated 11 months ago.

dimensionality-reduction umap cpp

132 stars 12.82 score 3.6k scripts 45 dependents

davidgohel

gdtools:Utilities for Graphical Rendering and Fonts Management

Tools are provided to compute metrics of formatted strings and to check the availability of a font. Another set of functions is provided to support the collection of fonts from 'Google Fonts' in a cache. Their use is simple within 'R Markdown' documents and 'shiny' applications but also with graphic productions generated with the 'ggiraph', 'ragg' and 'svglite' packages or with tabular productions from the 'flextable' package.

Maintained by David Gohel. Last updated 4 days ago.

cairo freetype cpp

26 stars 12.80 score 234 scripts 152 dependents

spedygiorgio

markovchain:Easy Handling Discrete Time Markov Chains

Functions and S4 methods to create and manage discrete time Markov chains more easily. In addition functions to perform statistical (fitting and drawing random variates) and probabilistic (analysis of their structural proprieties) analysis are provided. See Spedicato (2017) <doi:10.32614/RJ-2017-036>. Some functions for continuous times Markov chains depend on the suggested ctmcd package.

Maintained by Giorgio Alfredo Spedicato. Last updated 5 months ago.

ctmc dtmc markov-chain markov-model r-programming rcpp openblas cpp

104 stars 12.78 score 712 scripts 4 dependents

bioc

mzR:parser for netCDF, mzXML and mzML and mzIdentML files (mass spectrometry data)

mzR provides a unified API to the common file formats and parsers available for mass spectrometry data. It comes with a subset of the proteowizard library for mzXML, mzML and mzIdentML. The netCDF reading code has previously been used in XCMS.

Maintained by Steffen Neumann. Last updated 2 months ago.

immunooncology infrastructure dataimport proteomics metabolomics massspectrometry zlib cpp

45 stars 12.77 score 204 scripts 44 dependents

bioc

EBImage:Image processing and analysis toolbox for R

EBImage provides general purpose functionality for image processing and analysis. In the context of (high-throughput) microscopy-based cellular assays, EBImage offers tools to segment cells and extract quantitative cellular descriptors. This allows the automation of such tasks using the R programming language and facilitates the use of other tools in the R environment for signal processing, statistical modeling, machine learning and visualization with image data.

Maintained by Andrzej Oleś. Last updated 5 months ago.

visualization bioinformatics image-analysis image-processing cpp

71 stars 12.77 score 1.5k scripts 33 dependents

bioc

MSnbase:Base Functions and Classes for Mass Spectrometry and Proteomics

MSnbase provides infrastructure for manipulation, processing and visualisation of mass spectrometry and proteomics data, ranging from raw to quantitative and annotated data.

Maintained by Laurent Gatto. Last updated 15 days ago.

immunooncology infrastructure proteomics massspectrometry qualitycontrol dataimport bioconductor bioinformatics mass-spectrometry proteomics-data visualisation cpp

131 stars 12.76 score 772 scripts 36 dependents

ropensci

readODS:Read and Write ODS Files

Read ODS (OpenDocument Spreadsheet) into R as data frame. Also support writing data frame into ODS file.

Maintained by Chung-hong Chan. Last updated 3 months ago.

cpp

55 stars 12.74 score 808 scripts 26 dependents

rfastofficial

Rfast:A Collection of Efficient and Extremely Fast R Functions

A collection of fast (utility) functions for data analysis. Column and row wise means, medians, variances, minimums, maximums, many t, F and G-square tests, many regressions (normal, logistic, Poisson), are some of the many fast functions. References: a) Tsagris M., Papadakis M. (2018). Taking R to its limits: 70+ tips. PeerJ Preprints 6:e26605v1 <doi:10.7287/peerj.preprints.26605v1>. b) Tsagris M. and Papadakis M. (2018). Forward regression in R: from the extreme slow to the extreme fast. Journal of Data Science, 16(4): 771--780. <doi:10.6339/JDS.201810_16(4).00006>. c) Chatzipantsiou C., Dimitriadis M., Papadakis M. and Tsagris M. (2020). Extremely Efficient Permutation and Bootstrap Hypothesis Tests Using Hypothesis Tests Using R. Journal of Modern Applied Statistical Methods, 18(2), eP2898. <doi:10.48550/arXiv.1806.10947>. d) Tsagris M., Papadakis M., Alenazi A. and Alzeley O. (2024). Computationally Efficient Outlier Detection for High-Dimensional Data Using the MDP Algorithm. Computation, 12(9): 185. <doi:10.3390/computation12090185>. e) Tsagris M. and Papadakis M. (2025). Fast and light-weight energy statistics using the R package Rfast. <doi:10.48550/arXiv.2501.02849>.

Maintained by Manos Papadakis. Last updated 30 days ago.

openblas cpp openmp

147 stars 12.67 score 1.2k scripts 174 dependents

covaruber

sommer:Solving Mixed Model Equations in R

Structural multivariate-univariate linear mixed model solver for estimation of multiple random effects with unknown variance-covariance structures (e.g., heterogeneous and unstructured) and known covariance among levels of random effects (e.g., pedigree and genomic relationship matrices) (Covarrubias-Pazaran, 2016 <doi:10.1371/journal.pone.0156744>; Maier et al., 2015 <doi:10.1016/j.ajhg.2014.12.006>; Jensen et al., 1997). REML estimates can be obtained using the Direct-Inversion Newton-Raphson and Direct-Inversion Average Information algorithms for the problems r x r (r being the number of records) or using the Henderson-based average information algorithm for the problem c x c (c being the number of coefficients to estimate). Spatial models can also be fitted using the two-dimensional spline functionality available.

Maintained by Giovanny Covarrubias-Pazaran. Last updated 2 days ago.

average-information mixed-models rcpparmadillo openblas cpp openmp

44 stars 12.63 score 300 scripts 10 dependents

bstewart

stm:Estimation of the Structural Topic Model

The Structural Topic Model (STM) allows researchers to estimate topic models with document-level covariates. The package also includes tools for model selection, visualization, and estimation of topic-covariate regressions. Methods developed in Roberts et. al. (2014) <doi:10.1111/ajps.12103> and Roberts et. al. (2016) <doi:10.1080/01621459.2016.1141684>. Vignette is Roberts et. al. (2019) <doi:10.18637/jss.v091.i02>.

Maintained by Brandon Stewart. Last updated 1 years ago.

openblas cpp

404 stars 12.63 score 1.6k scripts 6 dependents

bioc

SNPRelate:Parallel Computing Toolset for Relatedness and Principal Component Analysis of SNP Data

Genome-wide association studies (GWAS) are widely used to investigate the genetic basis of diseases and traits, but they pose many computational challenges. We developed an R package SNPRelate to provide a binary format for single-nucleotide polymorphism (SNP) data in GWAS utilizing CoreArray Genomic Data Structure (GDS) data files. The GDS format offers the efficient operations specifically designed for integers with two bits, since a SNP could occupy only two bits. SNPRelate is also designed to accelerate two key computations on SNP data using parallel computing for multi-core symmetric multiprocessing computer architectures: Principal Component Analysis (PCA) and relatedness analysis using Identity-By-Descent measures. The SNP GDS format is also used by the GWASTools package with the support of S4 classes and generic functions. The extended GDS format is implemented in the SeqArray package to support the storage of single nucleotide variations (SNVs), insertion/deletion polymorphism (indel) and structural variation calls in whole-genome and whole-exome variant data.

Maintained by Xiuwen Zheng. Last updated 5 months ago.

infrastructure genetics statisticalmethod principalcomponent bioinformatics gds-format pca simd snp openblas cpp

105 stars 12.57 score 1.6k scripts 19 dependents

r-dbi

bigrquery:An Interface to Google's 'BigQuery' 'API'

Easily talk to Google's 'BigQuery' database from R.

Maintained by Hadley Wickham. Last updated 1 months ago.

bigquery database cpp

520 stars 12.47 score 1.8k scripts 4 dependents

r-spatialecology

landscapemetrics:Landscape Metrics for Categorical Map Patterns

Calculates landscape metrics for categorical landscape patterns in a tidy workflow. 'landscapemetrics' reimplements the most common metrics from 'FRAGSTATS' (<https://www.fragstats.org/>) and new ones from the current literature on landscape metrics. This package supports 'terra' SpatRaster objects as input arguments. It further provides utility functions to visualize patches, select metrics and building blocks to develop new metrics.

Maintained by Maximilian H.K. Hesselbarth. Last updated 2 months ago.

landscape-ecology landscape-metrics raster spatial cpp

240 stars 12.47 score 584 scripts 4 dependents

drostlab

philentropy:Similarity and Distance Quantification Between Probability Functions

Computes 46 optimized distance and similarity measures for comparing probability functions (Drost (2018) <doi:10.21105/joss.00765>). These comparisons between probability functions have their foundations in a broad range of scientific disciplines from mathematics to ecology. The aim of this package is to provide a core framework for clustering, classification, statistical inference, goodness-of-fit, non-parametric statistics, information theory, and machine learning tasks that are based on comparing univariate or multivariate probability functions.

Maintained by Hajk-Georg Drost. Last updated 4 months ago.

distance-measures distance-quantification information-theory jensen-shannon-divergence parametric-distributions similarity-measures statistics cpp

137 stars 12.44 score 484 scripts 24 dependents

yixuan

RSpectra:Solvers for Large-Scale Eigenvalue and SVD Problems

R interface to the 'Spectra' library <https://spectralib.org/> for large-scale eigenvalue and SVD problems. It is typically used to compute a few eigenvalues/vectors of an n by n matrix, e.g., the k largest eigenvalues, which is usually more efficient than eigen() if k << n. This package provides the 'eigs()' function that does the similar job as in 'Matlab', 'Octave', 'Python SciPy' and 'Julia'. It also provides the 'svds()' function to calculate the largest k singular values and corresponding singular vectors of a real matrix. The matrix to be computed on can be dense, sparse, or in the form of an operator defined by the user.

Maintained by Yixuan Qiu. Last updated 8 months ago.

eigenvalues spectra svd openblas cpp

81 stars 12.40 score 394 scripts 433 dependents

schochastics

graphlayouts:Additional Layout Algorithms for Network Visualizations

Several new layout algorithms to visualize networks are provided which are not part of 'igraph'. Most are based on the concept of stress majorization by Gansner et al. (2004) <doi:10.1007/978-3-540-31843-9_25>. Some more specific algorithms allow the user to emphasize hidden group structures in networks or focus on specific nodes.

Maintained by David Schoch. Last updated 2 months ago.

ggraph graph-algorithms network-analysis network-visualization cpp

277 stars 12.38 score 322 scripts 115 dependents

asardaes

dtwclust:Time Series Clustering Along with Optimizations for the Dynamic Time Warping Distance

Time series clustering along with optimized techniques related to the Dynamic Time Warping distance and its corresponding lower bounds. Implementations of partitional, hierarchical, fuzzy, k-Shape and TADPole clustering are available. Functionality can be easily extended with custom distance measures and centroid definitions. Implementations of DTW barycenter averaging, a distance based on global alignment kernels, and the soft-DTW distance and centroid routines are also provided. All included distance functions have custom loops optimized for the calculation of cross-distance matrices, including parallelization support. Several cluster validity indices are included.

Maintained by Alexis Sarda. Last updated 8 months ago.

clustering dtw time-series openblas cpp

262 stars 12.35 score 406 scripts 14 dependents

juliainterop

JuliaCall:Seamless Integration Between R and 'Julia'

Provides an R interface to 'Julia', which is a high-level, high-performance dynamic programming language for numerical computing, see <https://julialang.org/> for more information. It provides a high-level interface as well as a low-level interface. Using the high level interface, you could call any 'Julia' function just like any R function with automatic type conversion. Using the low level interface, you could deal with C-level SEXP directly while enjoying the convenience of using a high-level programming language like 'Julia'.

Maintained by Changcheng Li. Last updated 4 months ago.

julia cpp

270 stars 12.33 score 380 scripts 8 dependents

eddelbuettel

RcppTOML:'Rcpp' Bindings to Parser for "Tom's Obvious Markup Language"

The configuration format defined by 'TOML' (which expands to "Tom's Obvious Markup Language") specifies an excellent format (described at <https://toml.io/en/>) suitable for both human editing as well as the common uses of a machine-readable format. This package uses 'Rcpp' to connect to the 'toml++' parser written by Mark Gillard to R.

Maintained by Dirk Eddelbuettel. Last updated 21 days ago.

c-plus-plus-11 toml toml-parser toml-parsing cpp

36 stars 12.32 score 124 scripts 433 dependents

jefferislab

RANN:Fast Nearest Neighbour Search (Wraps ANN Library) Using L2 Metric

Finds the k nearest neighbours for every point in a given dataset in O(N log N) time using Arya and Mount's ANN library (v1.1.3). There is support for approximate as well as exact searches, fixed radius searches and 'bd' as well as 'kd' trees. The distance is computed using the L2 (Euclidean) metric. Please see package 'RANN.L1' for the same functionality using the L1 (Manhattan, taxicab) metric.

Maintained by Gregory Jefferis. Last updated 7 months ago.

ann-library nearest-neighbors nearest-neighbours cpp

58 stars 12.31 score 1.3k scripts 193 dependents

alexkz

kernlab:Kernel-Based Machine Learning Lab

Kernel-based machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction. Among other methods 'kernlab' includes Support Vector Machines, Spectral Clustering, Kernel PCA, Gaussian Processes and a QP solver.

Maintained by Alexandros Karatzoglou. Last updated 8 months ago.

openblas cpp

21 stars 12.26 score 7.8k scripts 487 dependents

bioc

bsseq:Analyze, manage and store whole-genome methylation data

A collection of tools for analyzing and visualizing whole-genome methylation data from sequencing. This includes whole-genome bisulfite sequencing and Oxford nanopore data.

Maintained by Kasper Daniel Hansen. Last updated 3 months ago.

dnamethylation cpp

37 stars 12.26 score 676 scripts 15 dependents

alexiosg

rugarch:Univariate GARCH Models

ARFIMA, in-mean, external regressors and various GARCH flavors, with methods for fit, forecast, simulation, inference and plotting.

Maintained by Alexios Galanos. Last updated 3 months ago.

cpp

26 stars 12.25 score 1.3k scripts 16 dependents

markmfredrickson

optmatch:Functions for Optimal Matching

Distance based bipartite matching using minimum cost flow, oriented to matching of treatment and control groups in observational studies ('Hansen' and 'Klopfer' 2006 <doi:10.1198/106186006X137047>). Routines are provided to generate distances from generalised linear models (propensity score matching), formulas giving variables on which to limit matched distances, stratified or exact matching directives, or calipers, alone or in combination.

Maintained by Josh Errickson. Last updated 4 months ago.

matching openblas cpp

47 stars 12.22 score 588 scripts 5 dependents

r-dbi

RMariaDB:Database Interface and MariaDB Driver

Implements a DBI-compliant interface to MariaDB (<https://mariadb.org/>) and MySQL (<https://www.mysql.com/>) databases.

Maintained by Kirill Müller. Last updated 1 months ago.

database mariadb mysql cpp

133 stars 12.20 score 792 scripts 10 dependents

steffenmoritz

imputeTS:Time Series Missing Value Imputation

Imputation (replacement) of missing values in univariate time series. Offers several imputation functions and missing data plots. Available imputation algorithms include: 'Mean', 'LOCF', 'Interpolation', 'Moving Average', 'Seasonal Decomposition', 'Kalman Smoothing on Structural Time Series models', 'Kalman Smoothing on ARIMA models'. Published in Moritz and Bartz-Beielstein (2017) <doi:10.32614/RJ-2017-009>.

Maintained by Steffen Moritz. Last updated 3 years ago.

data-visualization imputation imputation-algorithm imputets missing-data time-series cpp

162 stars 12.18 score 1.9k scripts 27 dependents

stuart-lab

Signac:Analysis of Single-Cell Chromatin Data

A framework for the analysis and exploration of single-cell chromatin data. The 'Signac' package contains functions for quantifying single-cell chromatin data, computing per-cell quality control metrics, dimension reduction and normalization, visualization, and DNA sequence motif analysis. Reference: Stuart et al. (2021) <doi:10.1038/s41592-021-01282-5>.

Maintained by Tim Stuart. Last updated 7 months ago.

atac bioinformatics single-cell zlib cpp

355 stars 12.18 score 3.7k scripts 1 dependents

bioc

glmGamPoi:Fit a Gamma-Poisson Generalized Linear Model

Fit linear models to overdispersed count data. The package can estimate the overdispersion and fit repeated models for matrix input. It is designed to handle large input datasets as they typically occur in single cell RNA-seq experiments.

Maintained by Constantin Ahlmann-Eltze. Last updated 12 days ago.

regression rnaseq software singlecell gamma-poisson glm negative-binomial-regression on-disk openblas cpp

111 stars 12.16 score 1.0k scripts 4 dependents

r-lib

lobstr:Visualize R Data Structures with Trees

A set of tools for inspecting and understanding R data structures inspired by str(). Includes ast() for visualizing abstract syntax trees, ref() for showing shared references, cst() for showing call stack trees, and obj_size() for computing object sizes.

Maintained by Hadley Wickham. Last updated 1 years ago.

cpp

305 stars 12.15 score 732 scripts 95 dependents

openpharma

mmrm:Mixed Models for Repeated Measures

Mixed models for repeated measures (MMRM) are a popular choice for analyzing longitudinal continuous outcomes in randomized clinical trials and beyond; see Cnaan, Laird and Slasor (1997) <doi:10.1002/(SICI)1097-0258(19971030)16:20%3C2349::AID-SIM667%3E3.0.CO;2-E> for a tutorial and Mallinckrodt, Lane, Schnell, Peng and Mancuso (2008) <doi:10.1177/009286150804200402> for a review. This package implements MMRM based on the marginal linear model without random effects using Template Model Builder ('TMB') which enables fast and robust model fitting. Users can specify a variety of covariance matrices, weight observations, fit models with restricted or standard maximum likelihood inference, perform hypothesis testing with Satterthwaite or Kenward-Roger adjustment, and extract least square means estimates by using 'emmeans'.

Maintained by Daniel Sabanes Bove. Last updated 22 days ago.

cpp

138 stars 12.15 score 113 scripts 4 dependents

isciences

exactextractr:Fast Extraction from Raster Datasets using Polygons

Quickly and accurately summarizes raster values over polygonal areas ("zonal statistics").

Maintained by Daniel Baston. Last updated 8 months ago.

gis raster rcpp geos cpp

286 stars 12.13 score 1.4k scripts 14 dependents

rstudio

shinytest2:Testing for Shiny Applications

Automated unit testing of Shiny applications through a headless 'Chromium' browser.

Maintained by Barret Schloerke. Last updated 3 days ago.

cpp

108 stars 12.13 score 704 scripts 1 dependents

vspinu

timechange:Efficient Manipulation of Date-Times

Efficient routines for manipulation of date-time objects while accounting for time-zones and daylight saving times. The package includes utilities for updating of date-time components (year, month, day etc.), modification of time-zones, rounding of date-times, period addition and subtraction etc. Parts of the 'CCTZ' source code, released under the Apache 2.0 License, are included in this package. See <https://github.com/google/cctz> for more details.

Maintained by Vitalie Spinu. Last updated 1 years ago.

ceiling date-time period rounding time time-zones update cpp

30 stars 12.12 score 68 scripts 1.9k dependents

bioc

SeqArray:Data management of large-scale whole-genome sequence variant calls using GDS files

Data management of large-scale whole-genome sequencing variant calls with thousands of individuals: genotypic data (e.g., SNVs, indels and structural variation calls) and annotations in SeqArray GDS files are stored in an array-oriented and compressed manner, with efficient data access using the R programming language.

Maintained by Xiuwen Zheng. Last updated 6 days ago.

infrastructure datarepresentation sequencing genetics bioinformatics gds-format snp snv wes wgs cpp

45 stars 12.11 score 1.1k scripts 9 dependents

bioc

BiocSingular:Singular Value Decomposition for Bioconductor Packages

Implements exact and approximate methods for singular value decomposition and principal components analysis, in a framework that allows them to be easily switched within Bioconductor packages or workflows. Where possible, parallelization is achieved using the BiocParallel framework.

Maintained by Aaron Lun. Last updated 5 months ago.

software dimensionreduction principalcomponent bioconductor-package human-cell-atlas singular-value-decomposition cpp

7 stars 12.10 score 1.2k scripts 103 dependents

stephens999

ashr:Methods for Adaptive Shrinkage, using Empirical Bayes

The R package 'ashr' implements an Empirical Bayes approach for large-scale hypothesis testing and false discovery rate (FDR) estimation based on the methods proposed in M. Stephens, 2016, "False discovery rates: a new deal", <DOI:10.1093/biostatistics/kxw041>. These methods can be applied whenever two sets of summary statistics---estimated effects and standard errors---are available, just as 'qvalue' can be applied to previously computed p-values. Two main interfaces are provided: ash(), which is more user-friendly; and ash.workhorse(), which has more options and is geared toward advanced users. The ash() and ash.workhorse() also provides a flexible modeling interface that can accommodate a variety of likelihoods (e.g., normal, Poisson) and mixture priors (e.g., uniform, normal).

Maintained by Peter Carbonetto. Last updated 11 months ago.

cpp

82 stars 12.10 score 780 scripts 15 dependents

bioc

ShortRead:FASTQ input and manipulation

This package implements sampling, iteration, and input of FASTQ files. The package includes functions for filtering and trimming reads, and for generating a quality assessment report. Data are represented as DNAStringSet-derived objects, and easily manipulated for a diversity of purposes. The package also contains legacy support for early single-end, ungapped alignment formats.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

dataimport sequencing qualitycontrol bioconductor-package core-package zlib cpp

8 stars 12.08 score 1.8k scripts 49 dependents

jolars

eulerr:Area-Proportional Euler and Venn Diagrams with Ellipses

Generate area-proportional Euler diagrams using numerical optimization. An Euler diagram is a generalization of a Venn diagram, relaxing the criterion that all interactions need to be represented. Diagrams may be fit with ellipses and circles via a wide range of inputs and can be visualized in numerous ways.

Maintained by Johan Larsson. Last updated 1 years ago.

euler-diagram venn-diagram openblas cpp

131 stars 12.08 score 1.2k scripts 5 dependents

truecluster

ff:Memory-Efficient Storage of Large Data on Disk and Fast Access Functions

The ff package provides data structures that are stored on disk but behave (almost) as if they were in RAM by transparently mapping only a section (pagesize) in main memory - the effective virtual memory consumption per ff object. ff supports R's standard atomic data types 'double', 'logical', 'raw' and 'integer' and non-standard atomic types boolean (1 bit), quad (2 bit unsigned), nibble (4 bit unsigned), byte (1 byte signed with NAs), ubyte (1 byte unsigned), short (2 byte signed with NAs), ushort (2 byte unsigned), single (4 byte float with NAs). For example 'quad' allows efficient storage of genomic data as an 'A','T','G','C' factor. The unsigned types support 'circular' arithmetic. There is also support for close-to-atomic types 'factor', 'ordered', 'POSIXct', 'Date' and custom close-to-atomic types. ff not only has native C-support for vectors, matrices and arrays with flexible dimorder (major column-order, major row-order and generalizations for arrays). There is also a ffdf class not unlike data.frames and import/export filters for csv files. ff objects store raw data in binary flat files in native encoding, and complement this with metadata stored in R as physical and virtual attributes. ff objects have well-defined hybrid copying semantics, which gives rise to certain performance improvements through virtualization. ff objects can be stored and reopened across R sessions. ff files can be shared by multiple ff R objects (using different data en/de-coding schemes) in the same process or from multiple R processes to exploit parallelism. A wide choice of finalizer options allows to work with 'permanent' files as well as creating/removing 'temporary' ff files completely transparent to the user. On certain OS/Filesystem combinations, creating the ff files works without notable delay thanks to using sparse file allocation. Several access optimization techniques such as Hybrid Index Preprocessing and Virtualization are implemented to achieve good performance even with large datasets, for example virtual matrix transpose without touching a single byte on disk. Further, to reduce disk I/O, 'logicals' and non-standard data types get stored native and compact on binary flat files i.e. logicals take up exactly 2 bits to represent TRUE, FALSE and NA. Beyond basic access functions, the ff package also provides compatibility functions that facilitate writing code for ff and ram objects and support for batch processing on ff objects (e.g. as.ram, as.ff, ffapply). ff interfaces closely with functionality from package 'bit': chunked looping, fast bit operations and coercions between different objects that can store subscript information ('bit', 'bitwhich', ff 'boolean', ri range index, hi hybrid index). This allows to work interactively with selections of large datasets and quickly modify selection criteria. Further high-performance enhancements can be made available upon request.

Maintained by Jens Oehlschlägel. Last updated 3 months ago.

cpp

27 stars 12.01 score 764 scripts 71 dependents

bioc

sparseMatrixStats:Summary Statistics for Rows and Columns of Sparse Matrices

High performance functions for row and column operations on sparse matrices. For example: col / rowMeans2, col / rowMedians, col / rowVars etc. Currently, the optimizations are limited to data in the column sparse format. This package is inspired by the matrixStats package by Henrik Bengtsson.

Maintained by Constantin Ahlmann-Eltze. Last updated 5 months ago.

infrastructure software datarepresentation cpp

54 stars 11.98 score 174 scripts 130 dependents

eddelbuettel

RcppAnnoy:'Rcpp' Bindings for 'Annoy', a Library for Approximate Nearest Neighbors

'Annoy' is a small C++ library for Approximate Nearest Neighbors written for efficient memory usage as well an ability to load from / save to disk. This package provides an R interface by relying on the 'Rcpp' package, exposing the same interface as the original Python wrapper to 'Annoy'. See <https://github.com/spotify/annoy> for more on 'Annoy'. 'Annoy' is released under Version 2.0 of the Apache License. Also included is a small Windows port of 'mmap' which is released under the MIT license.

Maintained by Dirk Eddelbuettel. Last updated 21 days ago.

annoy nearest nearest-neighbors cpp

72 stars 11.97 score 57 scripts 147 dependents

exaexa

scattermore:Scatterplots with More Points

C-based conversion of large scatterplot data to rasters plus other operations such as data blurring or data alpha blending. Speeds up plotting of data with millions of points.

Maintained by Mirek Kratochvil. Last updated 1 years ago.

performance plot scatterplot visualization cpp

244 stars 11.95 score 596 scripts 85 dependents

hadley

pryr:Tools for Computing on the Language

Useful tools to pry back the covers of R and understand the language at a deeper level.

Maintained by Hadley Wickham. Last updated 1 years ago.

cpp

204 stars 11.93 score 1.9k scripts 57 dependents

rspatial

dismo:Species Distribution Modeling

Methods for species distribution modeling, that is, predicting the environmental similarity of any site to that of the locations of known occurrences of a species.

Maintained by Robert J. Hijmans. Last updated 4 months ago.

cpp

25 stars 11.88 score 2.8k scripts 21 dependents

kaneplusplus

bigmemory:Manage Massive Matrices with Shared Memory and Memory-Mapped Files

Create, store, access, and manipulate massive matrices. Matrices are allocated to shared memory and may use memory-mapped files. Packages 'biganalytics', 'bigtabulate', 'synchronicity', and 'bigalgebra' provide advanced functionality.

Maintained by Michael J. Kane. Last updated 1 years ago.

cpp

127 stars 11.87 score 920 scripts 64 dependents

epiforecasts

EpiNow2:Estimate Real-Time Case Counts and Time-Varying Epidemiological Parameters

Estimates the time-varying reproduction number, rate of spread, and doubling time using a range of open-source tools (Abbott et al. (2020) <doi:10.12688/wellcomeopenres.16006.1>), and current best practices (Gostic et al. (2020) <doi:10.1101/2020.06.18.20134858>). It aims to help users avoid some of the limitations of naive implementations in a framework that is informed by community feedback and is actively supported.

Maintained by Sebastian Funk. Last updated 1 months ago.

backcalculation covid-19 gaussian-processes open-source reproduction-number stan cpp

123 stars 11.86 score 210 scripts

jackstat

ModelMetrics:Rapid Calculation of Model Metrics

Collection of metrics for evaluating models written in C++ using 'Rcpp'. Popular metrics include area under the curve, log loss, root mean square error, etc.

Maintained by Tyler Hunt. Last updated 4 years ago.

auc logloss machine-learning metrics model-evaluation model-metrics cpp

29 stars 11.83 score 1.3k scripts 306 dependents

apache

nanoarrow:Interface to the 'nanoarrow' 'C' Library

Provides an 'R' interface to the 'nanoarrow' 'C' library and the 'Apache Arrow' application binary interface. Functions to import and export 'ArrowArray', 'ArrowSchema', and 'ArrowArrayStream' 'C' structures to and from 'R' objects are provided alongside helpers to facilitate zero-copy data transfer among 'R' bindings to libraries implementing the 'Arrow' 'C' data interface.

Maintained by Dewey Dunnington. Last updated 17 hours ago.

cpp

185 stars 11.83 score 37 scripts 27 dependents

bnosac

udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.

Maintained by Jan Wijffels. Last updated 2 years ago.

conll dependency-parser lemmatization natural-language-processing nlp pos-tagging r-pkg rcpp text-mining tokenizer udpipe cpp

215 stars 11.83 score 1.2k scripts 9 dependents

bioc

methylKit:DNA methylation analysis from high-throughput bisulfite sequencing results

methylKit is an R package for DNA methylation analysis and annotation from high-throughput bisulfite sequencing. The package is designed to deal with sequencing data from RRBS and its variants, but also target-capture methods and whole genome bisulfite sequencing. It also has functions to analyze base-pair resolution 5hmC data from experimental protocols such as oxBS-Seq and TAB-Seq. Methylation calling can be performed directly from Bismark aligned BAM files.

Maintained by Altuna Akalin. Last updated 29 days ago.

dnamethylation sequencing methylseq genome-biology methylation statistical-analysis visualization curl bzip2 xz-utils zlib cpp

220 stars 11.80 score 578 scripts 3 dependents

r-lib

archive:Multi-Format Archive and Compression Support

Bindings to 'libarchive' <http://www.libarchive.org> the Multi-format archive and compression library. Offers R connections and direct extraction for many archive formats including 'tar', 'ZIP', '7-zip', 'RAR', 'CAB' and compression formats including 'gzip', 'bzip2', 'compress', 'lzma' and 'xz'.

Maintained by Gábor Csárdi. Last updated 5 days ago.

compression connections libarchive cpp

144 stars 11.80 score 494 scripts 27 dependents

tiledb-inc

tiledb:Modern Database Engine for Complex Data Based on Multi-Dimensional Arrays

The modern database 'TileDB' introduces a powerful on-disk format for storing and accessing any complex data based on multi-dimensional arrays. It supports dense and sparse arrays, dataframes and key-values stores, cloud storage ('S3', 'GCS', 'Azure'), chunked arrays, multiple compression, encryption and checksum filters, uses a fully multi-threaded implementation, supports parallel I/O, data versioning ('time travel'), metadata and groups. It is implemented as an embeddable cross-platform C++ library with APIs from several languages, and integrations. This package provides the R support.

Maintained by Isaiah Norton. Last updated 1 days ago.

array hdfs s3 storage-manager tiledb cpp

108 stars 11.79 score 306 scripts 4 dependents

kevinushey

sourcetools:Tools for Reading, Tokenizing and Parsing R Code

Tools for Reading, Tokenizing and Parsing R Code.

Maintained by Kevin Ushey. Last updated 2 years ago.

cpp

78 stars 11.77 score 32 scripts 1.8k dependents

r-forge

minqa:Derivative-Free Optimization Algorithms by Quadratic Approximation

Derivative-free optimization by quadratic approximation based on an interface to Fortran implementations by M. J. D. Powell.

Maintained by Katharine M. Mullen. Last updated 3 months ago.

fortran cpp

1 stars 11.73 score 227 scripts 1.7k dependents

prioritizr

prioritizr:Systematic Conservation Prioritization in R

Systematic conservation prioritization using mixed integer linear programming (MILP). It provides a flexible interface for building and solving conservation planning problems. Once built, conservation planning problems can be solved using a variety of commercial and open-source exact algorithm solvers. By using exact algorithm solvers, solutions can be generated that are guaranteed to be optimal (or within a pre-specified optimality gap). Furthermore, conservation problems can be constructed to optimize the spatial allocation of different management actions or zones, meaning that conservation practitioners can identify solutions that benefit multiple stakeholders. To solve large-scale or complex conservation planning problems, users should install the Gurobi optimization software (available from <https://www.gurobi.com/>) and the 'gurobi' R package (see Gurobi Installation Guide vignette for details). Users can also install the IBM CPLEX software (<https://www.ibm.com/products/ilog-cplex-optimization-studio/cplex-optimizer>) and the 'cplexAPI' R package (available at <https://github.com/cran/cplexAPI>). Additionally, the 'rcbc' R package (available at <https://github.com/dirkschumacher/rcbc>) can be used to generate solutions using the CBC optimization software (<https://github.com/coin-or/Cbc>). For further details, see Hanson et al. (2025) <doi:10.1111/cobi.14376>.

Maintained by Richard Schuster. Last updated 1 days ago.

biodiversity conservation conservation-planner optimization prioritization solver spatial cpp

124 stars 11.71 score 584 scripts 2 dependents

satijalab

SeuratObject:Data Structures for Single Cell Data

Defines S4 classes for single-cell genomic data and associated information, such as dimensionality reduction embeddings, nearest-neighbor graphs, and spatially-resolved coordinates. Provides data access methods and R-native hooks to ensure the Seurat object is familiar to other R users. See Satija R, Farrell J, Gennert D, et al (2015) <doi:10.1038/nbt.3192>, Macosko E, Basu A, Satija R, et al (2015) <doi:10.1016/j.cell.2015.05.002>, and Stuart T, Butler A, et al (2019) <doi:10.1016/j.cell.2019.05.031> for more details.

Maintained by Paul Hoffman. Last updated 2 years ago.

cpp

25 stars 11.69 score 1.2k scripts 88 dependents

wilkelab

gridtext:Improved Text Rendering Support for 'Grid' Graphics

Provides support for rendering of formatted text using 'grid' graphics. Text can be formatted via a minimal subset of 'Markdown', 'HTML', and inline 'CSS' directives, and it can be rendered both with and without word wrap.

Maintained by Brenton M. Wiernik. Last updated 1 years ago.

cpp

96 stars 11.66 score 344 scripts 208 dependents

twolodzko

extraDistr:Additional Univariate and Multivariate Distributions

Density, distribution function, quantile function and random generation for a number of univariate and multivariate distributions. This package implements the following distributions: Bernoulli, beta-binomial, beta-negative binomial, beta prime, Bhattacharjee, Birnbaum-Saunders, bivariate normal, bivariate Poisson, categorical, Dirichlet, Dirichlet-multinomial, discrete gamma, discrete Laplace, discrete normal, discrete uniform, discrete Weibull, Frechet, gamma-Poisson, generalized extreme value, Gompertz, generalized Pareto, Gumbel, half-Cauchy, half-normal, half-t, Huber density, inverse chi-squared, inverse-gamma, Kumaraswamy, Laplace, location-scale t, logarithmic, Lomax, multivariate hypergeometric, multinomial, negative hypergeometric, non-standard beta, normal mixture, Poisson mixture, Pareto, power, reparametrized beta, Rayleigh, shifted Gompertz, Skellam, slash, triangular, truncated binomial, truncated normal, truncated Poisson, Tukey lambda, Wald, zero-inflated binomial, zero-inflated negative binomial, zero-inflated Poisson.

Maintained by Tymoteusz Wolodzko. Last updated 24 days ago.

c-plus-plus c-plus-plus-11 distribution multivariate-distributions probability random-generation rcpp statistics cpp

53 stars 11.60 score 1.5k scripts 107 dependents

luca-scr

GA:Genetic Algorithms

Flexible general-purpose toolbox implementing genetic algorithms (GAs) for stochastic optimisation. Binary, real-valued, and permutation representations are available to optimize a fitness function, i.e. a function provided by users depending on their objective function. Several genetic operators are available and can be combined to explore the best settings for the current task. Furthermore, users can define new genetic operators and easily evaluate their performances. Local search using general-purpose optimisation algorithms can be applied stochastically to exploit interesting regions. GAs can be run sequentially or in parallel, using an explicit master-slave parallelisation or a coarse-grain islands approach. For more details see Scrucca (2013) <doi:10.18637/jss.v053.i04> and Scrucca (2017) <doi:10.32614/RJ-2017-008>.

Maintained by Luca Scrucca. Last updated 7 months ago.

genetic-algorithm optimisation cpp

93 stars 11.58 score 624 scripts 52 dependents

declaredesign

estimatr:Fast Estimators for Design-Based Inference

Fast procedures for small set of commonly-used, design-appropriate estimators with robust standard errors and confidence intervals. Includes estimators for linear regression, instrumental variables regression, difference-in-means, Horvitz-Thompson estimation, and regression improving precision of experimental estimates by interacting treatment with centered pre-treatment covariates introduced by Lin (2013) <doi:10.1214/12-AOAS583>.

Maintained by Graeme Blair. Last updated 2 months ago.

cpp

133 stars 11.58 score 1.7k scripts 11 dependents

edwinth

padr:Quickly Get Datetime Data Ready for Analysis

Transforms datetime data into a format ready for analysis. It offers two core functionalities; aggregating data to a higher level interval (thicken) and imputing records where observations were absent (pad).

Maintained by Edwin Thoen. Last updated 4 months ago.

cpp

132 stars 11.55 score 428 scripts 20 dependents

functionaldata

fdapace:Functional Data Analysis and Empirical Dynamics

A versatile package that provides implementation of various methods of Functional Data Analysis (FDA) and Empirical Dynamics. The core of this package is Functional Principal Component Analysis (FPCA), a key technique for functional data analysis, for sparsely or densely sampled random trajectories and time courses, via the Principal Analysis by Conditional Estimation (PACE) algorithm. This core algorithm yields covariance and mean functions, eigenfunctions and principal component (scores), for both functional data and derivatives, for both dense (functional) and sparse (longitudinal) sampling designs. For sparse designs, it provides fitted continuous trajectories with confidence bands, even for subjects with very few longitudinal observations. PACE is a viable and flexible alternative to random effects modeling of longitudinal data. There is also a Matlab version (PACE) that contains some methods not available on fdapace and vice versa. Updates to fdapace were supported by grants from NIH Echo and NSF DMS-1712864 and DMS-2014626. Please cite our package if you use it (You may run the command citation("fdapace") to get the citation format and bibtex entry). References: Wang, J.L., Chiou, J., Müller, H.G. (2016) <doi:10.1146/annurev-statistics-041715-033624>; Chen, K., Zhang, X., Petersen, A., Müller, H.G. (2017) <doi:10.1007/s12561-015-9137-5>.

Maintained by Yidong Zhou. Last updated 9 months ago.

cpp

31 stars 11.54 score 474 scripts 25 dependents

urbananalyst

dodgr:Distances on Directed Graphs

Distances on dual-weighted directed graphs using priority-queue shortest paths (Padgham (2019) <doi:10.32866/6945>). Weighted directed graphs have weights from A to B which may differ from those from B to A. Dual-weighted directed graphs have two sets of such weights. A canonical example is a street network to be used for routing in which routes are calculated by weighting distances according to the type of way and mode of transport, yet lengths of routes must be calculated from direct distances.

Maintained by Mark Padgham. Last updated 19 hours ago.

distance openstreetmap router shortest-paths street-networks cpp

129 stars 11.52 score 229 scripts 4 dependents

tylermorganwall

rayshader:Create Maps and Visualize Data in 2D and 3D

Uses a combination of raytracing and multiple hill shading methods to produce 2D and 3D data visualizations and maps. Includes water detection and layering functions, programmable color palette generation, several built-in textures for hill shading, 2D and 3D plotting options, a built-in path tracer, 'Wavefront' OBJ file export, and the ability to save 3D visualizations to a 3D printable format.

Maintained by Tyler Morgan-Wall. Last updated 2 months ago.

cpp

2.1k stars 11.51 score 1.5k scripts 5 dependents

wenjie2wang

splines2:Regression Spline Functions and Classes

Constructs basis functions of B-splines, M-splines, I-splines, convex splines (C-splines), periodic splines, natural cubic splines, generalized Bernstein polynomials, their derivatives, and integrals (except C-splines) by closed-form recursive formulas. It also contains a C++ head-only library integrated with Rcpp. See Wang and Yan (2021) <doi:10.6339/21-JDS1020> for details.

Maintained by Wenjie Wang. Last updated 24 days ago.

derivative integral rcpp splines openblas cpp

43 stars 11.46 score 394 scripts 34 dependents

bioc

msa:Multiple Sequence Alignment

The 'msa' package provides a unified R/Bioconductor interface to the multiple sequence alignment algorithms ClustalW, ClustalOmega, and Muscle. All three algorithms are integrated in the package, therefore, they do not depend on any external software tools and are available for all major platforms. The multiple sequence alignment algorithms are complemented by a function for pretty-printing multiple sequence alignments using the LaTeX package TeXshade.

Maintained by Ulrich Bodenhofer. Last updated 1 months ago.

multiplesequencealignment alignment multiplecomparison sequencing cpp

17 stars 11.46 score 744 scripts 6 dependents

privefl

bigsnpr:Analysis of Massive SNP Arrays

Easy-to-use, efficient, flexible and scalable tools for analyzing massive SNP arrays. Privé et al. (2018) <doi:10.1093/bioinformatics/bty185>.

Maintained by Florian Privé. Last updated 23 days ago.

big-data bioinformatics memory-mapped-file parallel-computing polygenic-scores population-structure-inference snp-data statistical-methods openblas zlib cpp openmp

200 stars 11.44 score 1.5k scripts 3 dependents

eddelbuettel

RProtoBuf:R Interface to the 'Protocol Buffers' 'API' (Version 2 or 3)

Protocol Buffers are a way of encoding structured data in an efficient yet extensible format. Google uses Protocol Buffers for almost all of its internal 'RPC' protocols and file formats. Additional documentation is available in two included vignettes one of which corresponds to our 'JSS' paper (2016, <doi:10.18637/jss.v071.i02>. A sufficiently recent version of 'Protocol Buffers' library is required; currently version 3.3.0 from 2017 is the stated minimum.

Maintained by Dirk Eddelbuettel. Last updated 13 days ago.

c-plus-plus protocol-buffers protobuf cpp

73 stars 11.44 score 126 scripts 21 dependents

bioc

destiny:Creates diffusion maps

Create and plot diffusion maps.

Maintained by Philipp Angerer. Last updated 4 months ago.

cellbiology cellbasedassays clustering software visualization diffusion-maps dimensionality-reduction cpp

82 stars 11.44 score 792 scripts 1 dependents

r-simmer

simmer:Discrete-Event Simulation for R

A process-oriented and trajectory-based Discrete-Event Simulation (DES) package for R. It is designed as a generic yet powerful framework. The architecture encloses a robust and fast simulation core written in 'C++' with automatic monitoring capabilities. It provides a rich and flexible R API that revolves around the concept of trajectory, a common path in the simulation model for entities of the same type. Documentation about 'simmer' is provided by several vignettes included in this package, via the paper by Ucar, Smeets & Azcorra (2019, <doi:10.18637/jss.v090.i02>), and the paper by Ucar, Hernández, Serrano & Azcorra (2018, <doi:10.1109/MCOM.2018.1700960>); see 'citation("simmer")' for details.

Maintained by Iñaki Ucar. Last updated 6 months ago.

discrete-event simulation cpp

223 stars 11.43 score 440 scripts 6 dependents

sachaepskamp

qgraph:Graph Plotting Methods, Psychometric Data Visualization and Graphical Model Estimation

Fork of qgraph - Weighted network visualization and analysis, as well as Gaussian graphical model computation. See Epskamp et al. (2012) <doi:10.18637/jss.v048.i04>.

Maintained by Sacha Epskamp. Last updated 1 years ago.

cpp

69 stars 11.43 score 1.2k scripts 63 dependents

epimodel

EpiModel:Mathematical Modeling of Infectious Disease Dynamics

Tools for simulating mathematical models of infectious disease dynamics. Epidemic model classes include deterministic compartmental models, stochastic individual-contact models, and stochastic network models. Network models use the robust statistical methods of exponential-family random graph models (ERGMs) from the Statnet suite of software packages in R. Standard templates for epidemic modeling include SI, SIR, and SIS disease types. EpiModel features an API for extending these templates to address novel scientific research aims. Full methods for EpiModel are detailed in Jenness et al. (2018, <doi:10.18637/jss.v084.i08>).

Maintained by Samuel Jenness. Last updated 2 months ago.

agent-based-modeling epidemics epidemiology infectious-diseases network-graph cpp

250 stars 11.43 score 315 scripts