R-universe search: knowledge extraction

leifeld

texreg:Conversion of R Regression Output to LaTeX or HTML Tables

Converts coefficients, standard errors, significance stars, and goodness-of-fit statistics of statistical models into LaTeX tables or HTML tables/MS Word documents or to nicely formatted screen output for the R console for easy model comparison. A list of several models can be combined in a single table. The output is highly customizable. New model types can be easily implemented. Details can be found in Leifeld (2013), JStatSoft <doi:10.18637/jss.v055.i08>.)

Maintained by Philip Leifeld. Last updated 2 months ago.

html-tables latex latex-tables regression reporting table texreg

220.2 match 113 stars 14.09 score 1.8k scripts 67 dependents

cran

nlme:Linear and Nonlinear Mixed Effects Models

Fit and compare Gaussian linear and nonlinear mixed-effects models.

Maintained by R Core Team. Last updated 2 months ago.

fortran

105.0 match 6 stars 13.00 score 13k scripts 8.7k dependents

skoval

RISmed:Download Content from NCBI Databases

A set of tools to extract bibliographic content from the National Center for Biotechnology Information (NCBI) databases, including PubMed. The name RISmed is a portmanteau of RIS (for Research Information Systems, a common tag format for bibliographic data) and PubMed.

Maintained by Stephanie Kovalchik. Last updated 3 years ago.

155.7 match 38 stars 6.94 score 252 scripts 3 dependents

bjw34032

oro.nifti:Rigorous - 'NIfTI' + 'ANALYZE' + 'AFNI' : Input / Output

Functions for the input/output and visualization of medical imaging data that follow either the 'ANALYZE', 'NIfTI' or 'AFNI' formats. This package is part of the Rigorous Analytics bundle.

Maintained by Brandon Whitcher. Last updated 7 months ago.

afni analyze nifti

136.6 match 2 stars 7.70 score 764 scripts 38 dependents

dankelley

oce:Analysis of Oceanographic Data

Supports the analysis of Oceanographic data, including 'ADCP' measurements, measurements made with 'argo' floats, 'CTD' measurements, sectional data, sea-level time series, coastline and topographic data, etc. Provides specialized functions for calculating seawater properties such as potential temperature in either the 'UNESCO' or 'TEOS-10' equation of state. Produces graphical displays that conform to the conventions of the Oceanographic literature. This package is discussed extensively by Kelley (2018) "Oceanographic Analysis with R" <doi:10.1007/978-1-4939-8844-0>.

Maintained by Dan Kelley. Last updated 3 days ago.

oceanography fortran cpp

55.8 match 146 stars 15.42 score 4.2k scripts 18 dependents

trinker

qdapRegex:Regular Expression Removal, Extraction, and Replacement Tools

A collection of regular expression tools associated with the 'qdap' package that may be useful outside of the context of discourse analysis. Tools include removal/extraction/replacement of abbreviations, dates, dollar amounts, email addresses, hash tags, numbers, percentages, citations, person tags, phone numbers, times, and zip codes.

Maintained by Tyler Rinker. Last updated 1 years ago.

qdapregex regular-expression

69.2 match 50 stars 9.48 score 502 scripts 41 dependents

ipums

ipumsr:An R Interface for Downloading, Reading, and Handling IPUMS Data

An easy way to work with census, survey, and geographic data provided by IPUMS in R. Generate and download data through the IPUMS API and load IPUMS files into R with their associated metadata to make analysis easier. IPUMS data describing 1.4 billion individuals drawn from over 750 censuses and surveys is available free of charge from the IPUMS website <https://www.ipums.org>.

Maintained by Derek Burk. Last updated 21 days ago.

53.8 match 28 stars 11.07 score 720 scripts 2 dependents

lrberge

fixest:Fast Fixed-Effects Estimations

Fast and user-friendly estimation of econometric models with multiple fixed-effects. Includes ordinary least squares (OLS), generalized linear models (GLM) and the negative binomial. The core of the package is based on optimized parallel C++ code, scaling especially well for large data sets. The method to obtain the fixed-effects coefficients is based on Berge (2018) <https://github.com/lrberge/fixest/blob/master/_DOCS/FENmlm_paper.pdf>. Further provides tools to export and view the results of several estimations with intuitive design to cluster the standard-errors.

Maintained by Laurent Berge. Last updated 7 months ago.

cpp openmp

38.7 match 387 stars 14.69 score 3.8k scripts 25 dependents

spatstat

spatstat.geom:Geometrical Functionality of the 'spatstat' Family

Defines spatial data types and supports geometrical operations on them. Data types include point patterns, windows (domains), pixel images, line segment patterns, tessellations and hyperframes. Capabilities include creation and manipulation of data (using command line or graphical interaction), plotting, geometrical operations (rotation, shift, rescale, affine transformation), convex hull, discretisation and pixellation, Dirichlet tessellation, Delaunay triangulation, pairwise distances, nearest-neighbour distances, distance transform, morphological operations (erosion, dilation, closing, opening), quadrat counting, geometrical measurement, geometrical covariance, colour maps, calculus on spatial domains, Gaussian blur, level sets of images, transects of images, intersections between objects, minimum distance matching. (Excludes spatial data on a network, which are supported by the package 'spatstat.linnet'.)

Maintained by Adrian Baddeley. Last updated 10 hours ago.

classes-and-objects distance-calculation geometry geometry-processing images mensuration plotting point-patterns spatial-data spatial-data-analysis

45.6 match 7 stars 12.10 score 241 scripts 227 dependents

tidyverts

fable:Forecasting Models for Tidy Time Series

Provides a collection of commonly used univariate and multivariate time series forecasting models including automatically selected exponential smoothing (ETS) and autoregressive integrated moving average (ARIMA) models. These models work within the 'fable' framework provided by the 'fabletools' package, which provides the tools to evaluate, visualise, and combine models in a workflow consistent with the tidyverse.

Maintained by Mitchell OHara-Wild. Last updated 4 months ago.

forecasting cpp

38.6 match 565 stars 13.52 score 2.1k scripts 6 dependents

paul-buerkner

brms:Bayesian Regression Models using 'Stan'

Fit Bayesian generalized (non-)linear multivariate multilevel models using 'Stan' for full Bayesian inference. A wide range of distributions and link functions are supported, allowing users to fit -- among others -- linear, robust linear, count data, survival, response times, ordinal, zero-inflated, hurdle, and even self-defined mixture models all in a multilevel context. Further modeling options include both theory-driven and data-driven non-linear terms, auto-correlation structures, censoring and truncation, meta-analytic standard errors, and quite a few more. In addition, all parameters of the response distribution can be predicted in order to perform distributional regression. Prior specifications are flexible and explicitly encourage users to apply prior distributions that actually reflect their prior knowledge. Models can easily be evaluated and compared using several methods assessing posterior or prior predictions. References: Bürkner (2017) <doi:10.18637/jss.v080.i01>; Bürkner (2018) <doi:10.32614/RJ-2018-017>; Bürkner (2021) <doi:10.18637/jss.v100.i05>; Carpenter et al. (2017) <doi:10.18637/jss.v076.i01>.

Maintained by Paul-Christian Bürkner. Last updated 4 days ago.

bayesian-inference brms multilevel-models stan statistical-models

28.6 match 1.3k stars 16.61 score 13k scripts 34 dependents

ropensci

tidyhydat:Extract and Tidy Canadian 'Hydrometric' Data

Provides functions to access historical and real-time national 'hydrometric' data from Water Survey of Canada data sources (<https://dd.weather.gc.ca/hydrometric/csv/> and <https://collaboration.cmc.ec.gc.ca/cmc/hydrometrics/www/>) and then applies tidy data principles.

Maintained by Sam Albers. Last updated 7 days ago.

citz government-data hydrology hydrometrics tidy-data water-resources

46.1 match 71 stars 9.59 score 202 scripts 3 dependents

bioc

xcms:LC-MS and GC-MS Data Analysis

Framework for processing and visualization of chromatographically separated and single-spectra mass spectral data. Imports from AIA/ANDI NetCDF, mzXML, mzData and mzML files. Preprocesses data for high-throughput, untargeted analyte profiling.

Maintained by Steffen Neumann. Last updated 4 days ago.

immunooncology massspectrometry metabolomics bioconductor feature-detection mass-spectrometry peak-detection cpp

30.5 match 196 stars 14.31 score 984 scripts 11 dependents

spatstat

spatstat.model:Parametric Statistical Modelling and Inference for the 'spatstat' Family

Functionality for parametric statistical modelling and inference for spatial data, mainly spatial point patterns, in the 'spatstat' family of packages. (Excludes analysis of spatial data on a linear network, which is covered by the separate package 'spatstat.linnet'.) Supports parametric modelling, formal statistical inference, and model validation. Parametric models include Poisson point processes, Cox point processes, Neyman-Scott cluster processes, Gibbs point processes and determinantal point processes. Models can be fitted to data using maximum likelihood, maximum pseudolikelihood, maximum composite likelihood and the method of minimum contrast. Fitted models can be simulated and predicted. Formal inference includes hypothesis tests (quadrat counting tests, Cressie-Read tests, Clark-Evans test, Berman test, Diggle-Cressie-Loosmore-Ford test, scan test, studentised permutation test, segregation test, ANOVA tests of fitted models, adjusted composite likelihood ratio test, envelope tests, Dao-Genton test, balanced independent two-stage test), confidence intervals for parameters, and prediction intervals for point counts. Model validation techniques include leverage, influence, partial residuals, added variable plots, diagnostic plots, pseudoscore residual plots, model compensators and Q-Q plots.

Maintained by Adrian Baddeley. Last updated 9 days ago.

analysis-of-variance cluster-process confidence-intervals cox-process determinantal-point-processes gibbs-process influence leverage model-diagnostics neyman-scott parameter-estimation poisson-process spatial-analysis spatial-modelling spatial-point-processes statistical-inference

46.3 match 5 stars 9.09 score 6 scripts 46 dependents

muschellij2

fslr:Wrapper Functions for 'FSL' ('FMRIB' Software Library) from Functional MRI of the Brain ('FMRIB')

Wrapper functions that interface with 'FSL' <http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/>, a powerful and commonly-used 'neuroimaging' software, using system commands. The goal is to be able to interface with 'FSL' completely in R, where you pass R objects of class 'nifti', implemented by package 'oro.nifti', and the function executes an 'FSL' command and returns an R object of class 'nifti' if desired.

Maintained by John Muschelli. Last updated 1 months ago.

fsl fslr neuroimaging neuroimaging-analysis neuroimaging-data-science

52.1 match 41 stars 8.01 score 420 scripts

tagteam

riskRegression:Risk Regression Models and Prediction Scores for Survival Analysis with Competing Risks

Implementation of the following methods for event history analysis. Risk regression models for survival endpoints also in the presence of competing risks are fitted using binomial regression based on a time sequence of binary event status variables. A formula interface for the Fine-Gray regression model and an interface for the combination of cause-specific Cox regression models. A toolbox for assessing and comparing performance of risk predictions (risk markers and risk prediction models). Prediction performance is measured by the Brier score and the area under the ROC curve for binary possibly time-dependent outcome. Inverse probability of censoring weighting and pseudo values are used to deal with right censored data. Lists of risk markers and lists of risk models are assessed simultaneously. Cross-validation repeatedly splits the data, trains the risk prediction models on one part of each split and then summarizes and compares the performance across splits.

Maintained by Thomas Alexander Gerds. Last updated 19 days ago.

openblas cpp

29.1 match 46 stars 13.00 score 736 scripts 35 dependents

gavinsimpson

gratia:Graceful 'ggplot'-Based Graphics and Other Functions for GAMs Fitted Using 'mgcv'

Graceful 'ggplot'-based graphics and utility functions for working with generalized additive models (GAMs) fitted using the 'mgcv' package. Provides a reimplementation of the plot() method for GAMs that 'mgcv' provides, as well as 'tidyverse' compatible representations of estimated smooths.

Maintained by Gavin L. Simpson. Last updated 2 days ago.

distributional-regression gam gamm generalized-additive-mixed-models generalized-additive-models ggplot2 glm lm mgcv penalized-spline random-effects smoothing splines

29.1 match 217 stars 12.99 score 1.6k scripts 2 dependents

zpneal

backbone:Extracts the Backbone from Graphs

An implementation of methods for extracting an unweighted unipartite graph (i.e. a backbone) from an unweighted unipartite graph, a weighted unipartite graph, the projection of an unweighted bipartite graph, or the projection of a weighted bipartite graph (Neal, 2022 <doi:10.1371/journal.pone.0269137>).

Maintained by Zachary Neal. Last updated 1 years ago.

cpp

49.6 match 41 stars 7.06 score 31 scripts 2 dependents

tidymodels

recipes:Preprocessing and Feature Engineering Steps for Modeling

A recipe prepares your data for modeling. We provide an extensible framework for pipeable sequences of feature engineering steps provides preprocessing tools to be applied to data. Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets. The resulting processed output can then be used as inputs for statistical or machine learning models.

Maintained by Max Kuhn. Last updated 8 days ago.

18.4 match 584 stars 18.71 score 7.2k scripts 380 dependents

ikosmidis

enrichwith:Methods to Enrich R Objects with Extra Components

Provides the "enrich" method to enrich list-like R objects with new, relevant components. The current version has methods for enriching objects of class 'family', 'link-glm', 'lm', 'glm' and 'betareg'. The resulting objects preserve their class, so all methods associated with them still apply. The package also provides the 'enriched_glm' function that has the same interface as 'glm' but results in objects of class 'enriched_glm'. In addition to the usual components in a `glm` object, 'enriched_glm' objects carry an object-specific simulate method and functions to compute the scores, the observed and expected information matrix, the first-order bias, as well as model densities, probabilities, and quantiles at arbitrary parameter values. The package can also be used to produce customizable source code templates for the structured implementation of methods to compute new components and enrich arbitrary objects.

Maintained by Ioannis Kosmidis. Last updated 5 years ago.

infrastructure

46.7 match 6 stars 7.35 score 16 scripts 12 dependents

mjskay

tidybayes:Tidy Data and 'Geoms' for Bayesian Models

Compose data for and extract, manipulate, and visualize posterior draws from Bayesian models ('JAGS', 'Stan', 'rstanarm', 'brms', 'MCMCglmm', 'coda', ...) in a tidy data format. Functions are provided to help extract tidy data frames of draws from Bayesian models and that generate point summaries and intervals in a tidy format. In addition, 'ggplot2' 'geoms' and 'stats' are provided for common visualization primitives like points with multiple uncertainty intervals, eye plots (intervals plus densities), and fit curves with multiple, arbitrary uncertainty bands.

Maintained by Matthew Kay. Last updated 6 months ago.

bayesian-data-analysis brms ggplot2 jags stan tidy-data visualization

22.6 match 733 stars 14.72 score 7.3k scripts 20 dependents

biodiverse

spOccupancy:Single-Species, Multi-Species, and Integrated Spatial Occupancy Models

Fits single-species, multi-species, and integrated non-spatial and spatial occupancy models using Markov Chain Monte Carlo (MCMC). Models are fit using Polya-Gamma data augmentation detailed in Polson, Scott, and Windle (2013) <doi:10.1080/01621459.2013.829001>. Spatial models are fit using either Gaussian processes or Nearest Neighbor Gaussian Processes (NNGP) for large spatial datasets. Details on NNGP models are given in Datta, Banerjee, Finley, and Gelfand (2016) <doi:10.1080/01621459.2015.1044091> and Finley, Datta, and Banerjee (2022) <doi:10.18637/jss.v103.i05>. Provides functionality for data integration of multiple single-species occupancy data sets using a joint likelihood framework. Details on data integration are given in Miller, Pacifici, Sanderlin, and Reich (2019) <doi:10.1111/2041-210X.13110>. Details on single-species and multi-species models are found in MacKenzie, Nichols, Lachman, Droege, Royle, and Langtimm (2002) <doi:10.1890/0012-9658(2002)083[2248:ESORWD]2.0.CO;2> and Dorazio and Royle <doi:10.1198/016214505000000015>, respectively.

Maintained by Jeffrey Doser. Last updated 24 days ago.

openblas cpp openmp

45.3 match 59 stars 7.31 score 204 scripts

gtatters

Thermimage:Thermal Image Analysis

A collection of functions and routines for inputting thermal image video files, plotting and converting binary raw data into estimates of temperature. First published 2015-03-26. Written primarily for research purposes in biological applications of thermal images. v1 included the base calculations for converting thermal image binary values to temperatures. v2 included additional equations for providing heat transfer calculations and an import function for thermal image files (v2.2.3 fixed error importing thermal image to windows OS). v3. Added numerous functions for converting thermal image, videos, rewriting and exporting. v3.1. Added new functions to convert files. v3.2. Fixed the various functions related to finding frame times. v4.0. fixed an error in atmospheric attenuation constants, affecting raw2temp and temp2raw functions. Recommend update for use with long distance calculations. v.4.1.3. changed to frameLocates to reflect change to as.character() to format().

Maintained by Glenn J. Tattersall. Last updated 3 years ago.

animal-physiology heat-exchange heat-flux image-frames temperature thermal-biology thermal-images

56.5 match 169 stars 5.85 score 83 scripts

wkumler

RaMS:R Access to Mass-Spec Data

R-based access to mass-spectrometry (MS) data. While many packages exist to process MS data, many of these make it difficult to access the underlying mass-to-charge ratio (m/z), intensity, and retention time of the files themselves. This package is designed to format MS data in a tidy fashion and allows the user perform the plotting and analysis.

Maintained by William Kumler. Last updated 5 months ago.

mass-spectrometry-data tidy-data

35.2 match 24 stars 8.78 score 84 scripts 5 dependents

joshwlambert

DAISIEprep:Extracts Phylogenetic Island Community Data from Phylogenetic Trees

Extracts colonisation and branching times of island species to be used for analysis in the R package 'DAISIE'. It uses phylogenetic and endemicity data to extract the separate island colonists and store them.

Maintained by Joshua W. Lambert. Last updated 1 months ago.

data-science island-biogeography phylogenetics

44.9 match 6 stars 6.78 score 24 scripts

michaelhallquist

MplusAutomation:An R Package for Facilitating Large-Scale Latent Variable Analyses in Mplus

Leverages the R language to automate latent variable model estimation and interpretation using 'Mplus', a powerful latent variable modeling program developed by Muthen and Muthen (<https://www.statmodel.com>). Specifically, this package provides routines for creating related groups of models, running batches of models, and extracting and tabulating model parameters and fit statistics.

Maintained by Michael Hallquist. Last updated 2 months ago.

23.2 match 86 stars 12.96 score 664 scripts 13 dependents

pecanproject

PEcAn.data.atmosphere:PEcAn Functions Used for Managing Climate Driver Data

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The PECAn.data.atmosphere package converts climate driver data into a standard format for models integrated into PEcAn. As a standalone package, it provides an interface to access diverse climate data sets.

Maintained by David LeBauer. Last updated 4 days ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants

25.6 match 216 stars 11.59 score 64 scripts 14 dependents

philchalmers

mirt:Multidimensional Item Response Theory

Analysis of discrete response data using unidimensional and multidimensional item analysis models under the Item Response Theory paradigm (Chalmers (2012) <doi:10.18637/jss.v048.i06>). Exploratory and confirmatory item factor analysis models are estimated with quadrature (EM) or stochastic (MHRM) methods. Confirmatory bi-factor and two-tier models are available for modeling item testlets using dimension reduction EM algorithms, while multiple group analyses and mixed effects designs are included for detecting differential item, bundle, and test functioning, and for modeling item and person covariates. Finally, latent class models such as the DINA, DINO, multidimensional latent class, mixture IRT models, and zero-inflated response models are supported, as well as a wide family of probabilistic unfolding models.

Maintained by Phil Chalmers. Last updated 13 days ago.

irt mirt openblas cpp openmp

19.5 match 210 stars 14.98 score 2.5k scripts 40 dependents

stan-dev

cmdstanr:R Interface to 'CmdStan'

A lightweight interface to 'Stan' <https://mc-stan.org>. The 'CmdStanR' interface is an alternative to 'RStan' that calls the command line interface for compilation and running algorithms instead of interfacing with C++ via 'Rcpp'. This has many benefits including always being compatible with the latest version of Stan, fewer installation errors, fewer unexpected crashes in RStudio, and a more permissive license.

Maintained by Andrew Johnson. Last updated 10 months ago.

bayes bayesian markov-chain-monte-carlo maximum-likelihood mcmc stan variational-inference

23.6 match 145 stars 12.27 score 5.2k scripts 9 dependents

ohdsi

OhdsiReportGenerator:Observational Health Data Sciences and Informatics Report Generator

Extract results into R from the Observational Health Data Sciences and Informatics result database (see <https://ohdsi.github.io/Strategus/results-schema/index.html>) and generate reports/presentations via 'quarto' that summarize results in HTML format. Learn more about 'OhdsiReportGenerator' at <https://ohdsi.github.io/OhdsiReportGenerator/>.

Maintained by Jenna Reps. Last updated 20 days ago.

openjdk

62.6 match 4.54 score 2 scripts

neotomadb

neotoma2:Working with the Neotoma Paleoecology Database

Access and manipulation of data using the Neotoma Paleoecology Database. <https://api.neotomadb.org/api-docs/>.

Maintained by Dominguez Vidana Socorro. Last updated 8 months ago.

earthcube neotoma nsf paleoecology

52.3 match 8 stars 5.35 score 56 scripts

pln-team

PLNmodels:Poisson Lognormal Models

The Poisson-lognormal model and variants (Chiquet, Mariadassou and Robin, 2021 <doi:10.3389/fevo.2021.588292>) can be used for a variety of multivariate problems when count data are at play, including principal component analysis for count data, discriminant analysis, model-based clustering and network inference. Implements variational algorithms to fit such models accompanied with a set of functions for visualization and diagnostic.

Maintained by Julien Chiquet. Last updated 6 days ago.

count-data multivariate-analysis network-inference pca poisson-lognormal-model openblas cpp

28.6 match 56 stars 9.50 score 226 scripts

shixiangwang

sigminer:Extract, Analyze and Visualize Mutational Signatures for Genomic Variations

Genomic alterations including single nucleotide substitution, copy number alteration, etc. are the major force for cancer initialization and development. Due to the specificity of molecular lesions caused by genomic alterations, we can generate characteristic alteration spectra, called 'signature' (Wang, Shixiang, et al. (2021) <DOI:10.1371/journal.pgen.1009557> & Alexandrov, Ludmil B., et al. (2020) <DOI:10.1038/s41586-020-1943-3> & Steele Christopher D., et al. (2022) <DOI:10.1038/s41586-022-04738-6>). This package helps users to extract, analyze and visualize signatures from genomic alteration records, thus providing new insight into cancer study.

Maintained by Shixiang Wang. Last updated 5 months ago.

bayesian-nmf bioinformatics cancer-research cnv copynumber-signatures cosmic-signatures dbs easy-to-use indel mutational-signatures nmf nmf-extraction sbs signature-extraction somatic-mutations somatic-variants visualization cpp

27.7 match 150 stars 9.48 score 123 scripts 2 dependents

kkholst

lava:Latent Variable Models

A general implementation of Structural Equation Models with latent variables (MLE, 2SLS, and composite likelihood estimators) with both continuous, censored, and ordinal outcomes (Holst and Budtz-Joergensen (2013) <doi:10.1007/s00180-012-0344-y>). Mixture latent variable models and non-linear latent variable models (Holst and Budtz-Joergensen (2020) <doi:10.1093/biostatistics/kxy082>). The package also provides methods for graph exploration (d-separation, back-door criterion), simulation of general non-linear latent variable models, and estimation of influence functions for a broad range of statistical models.

Maintained by Klaus K. Holst. Last updated 2 months ago.

latent-variable-models simulation statistics structural-equation-models

20.4 match 33 stars 12.85 score 610 scripts 476 dependents

eitsupi

neopolars:R Bindings for the 'polars' Rust Library

Lightning-fast 'DataFrame' library written in 'Rust'. Convert R data to 'Polars' data and vice versa. Perform fast, lazy, larger-than-memory and optimized data queries. 'Polars' is interoperable with the package 'arrow', as both are based on the 'Apache Arrow' Columnar Format.

Maintained by Tatsuya Shima. Last updated 18 hours ago.

rust cargo

53.0 match 40 stars 4.87 score 1 scripts

bioc

maftools:Summarize, Analyze and Visualize MAF Files

Analyze and visualize Mutation Annotation Format (MAF) files from large scale sequencing studies. This package provides various functions to perform most commonly used analyses in cancer genomics and to create feature rich customizable visualzations with minimal effort.

Maintained by Anand Mayakonda. Last updated 5 months ago.

datarepresentation dnaseq visualization drivermutation variantannotation featureextraction classification somaticmutation sequencing functionalgenomics survival bioinformatics cancer-genome-atlas cancer-genomics genomics maf-files tcga curl bzip2 xz-utils zlib

17.5 match 459 stars 14.63 score 948 scripts 18 dependents

rorynolan

strex:Extra String Manipulation Functions

There are some things that I wish were easier with the 'stringr' or 'stringi' packages. The foremost of these is the extraction of numbers from strings. 'stringr' and 'stringi' make you figure out the regular expression for yourself; 'strex' takes care of this for you. There are many other handy functionalities in 'strex'. Contributions to this package are encouraged; it is intended as a miscellany of string manipulation functions that cannot be found in 'stringi' or 'stringr'.

Maintained by Rory Nolan. Last updated 6 months ago.

23.9 match 41 stars 10.59 score 1.2k scripts 18 dependents

stewid

SimInf:A Framework for Data-Driven Stochastic Disease Spread Simulations

Provides an efficient and very flexible framework to conduct data-driven epidemiological modeling in realistic large scale disease spread simulations. The framework integrates infection dynamics in subpopulations as continuous-time Markov chains using the Gillespie stochastic simulation algorithm and incorporates available data such as births, deaths and movements as scheduled events at predefined time-points. Using C code for the numerical solvers and 'OpenMP' (if available) to divide work over multiple processors ensures high performance when simulating a sample outcome. One of our design goals was to make the package extendable and enable usage of the numerical solvers from other R extension packages in order to facilitate complex epidemiological research. The package contains template models and can be extended with user-defined models. For more details see the paper by Widgren, Bauer, Eriksson and Engblom (2019) <doi:10.18637/jss.v091.i12>. The package also provides functionality to fit models to time series data using the Approximate Bayesian Computation Sequential Monte Carlo ('ABC-SMC') algorithm of Toni and others (2009) <doi:10.1098/rsif.2008.0172>.

Maintained by Stefan Widgren. Last updated 6 days ago.

data-driven epidemiology high-performance-computing markov-chain mathematical-modelling gsl openmp

24.9 match 35 stars 10.09 score 227 scripts

bmaitner

BIEN:Tools for Accessing the Botanical Information and Ecology Network Database

Provides Tools for Accessing the Botanical Information and Ecology Network Database. The BIEN database contains cleaned and standardized botanical data including occurrence, trait, plot and taxonomic data (See <https://bien.nceas.ucsb.edu/bien/> for more Information). This package provides functions that query the BIEN database by constructing and executing optimized SQL queries.

Maintained by Brian Maitner. Last updated 2 months ago.

41.2 match 6.04 score 205 scripts 5 dependents

bioc

GenomicFeatures:Query the gene models of a given organism/assembly

Extract the genomic locations of genes, transcripts, exons, introns, and CDS, for the gene models stored in a TxDb object. A TxDb object is a small database that contains the gene models of a given organism/assembly. Bioconductor provides a small collection of TxDb objects in the form of ready-to-install TxDb packages for the most commonly studied organisms. Additionally, the user can easily make a TxDb object (or package) for the organism/assembly of their choice by using the tools from the txdbmaker package.

Maintained by H. Pagès. Last updated 4 months ago.

genetics infrastructure annotation sequencing genomeannotation bioconductor-package core-package

16.1 match 26 stars 15.34 score 5.3k scripts 339 dependents

bioc

MicrobiotaProcess:A comprehensive R package for managing and analyzing microbiome and other ecological data within the tidy framework

MicrobiotaProcess is an R package for analysis, visualization and biomarker discovery of microbial datasets. It introduces MPSE class, this make it more interoperable with the existing computing ecosystem. Moreover, it introduces a tidy microbiome data structure paradigm and analysis grammar. It provides a wide variety of microbiome data analysis procedures under the unified and common framework (tidy-like framework).

Maintained by Shuangbin Xu. Last updated 5 months ago.

visualization microbiome software multiplecomparison featureextraction microbiome-analysis microbiome-data

24.4 match 183 stars 9.70 score 126 scripts 1 dependents

rpolars

polars:Lightning-Fast 'DataFrame' Library

Lightning-fast 'DataFrame' library written in 'Rust'. Convert R data to 'Polars' data and vice versa. Perform fast, lazy, larger-than-memory and optimized data queries. 'Polars' is interoperable with the package 'arrow', as both are based on the 'Apache Arrow' Columnar Format.

Maintained by Soren Welling. Last updated 5 days ago.

arrow polars rust

19.7 match 499 stars 12.01 score 1.0k scripts 2 dependents

insightsengineering

teal.transform:Functions for Extracting and Merging Data in the 'teal' Framework

A standardized user interface for column selection, that facilitates dataset merging in 'teal' framework.

Maintained by Dawid Kaledkowski. Last updated 1 months ago.

merge modules nest transform

27.7 match 3 stars 8.39 score 9 scripts 4 dependents

bodkan

slendr:A Simulation Framework for Spatiotemporal Population Genetics

A framework for simulating spatially explicit genomic data which leverages real cartographic information for programmatic and visual encoding of spatiotemporal population dynamics on real geographic landscapes. Population genetic models are then automatically executed by the 'SLiM' software by Haller et al. (2019) <doi:10.1093/molbev/msy228> behind the scenes, using a custom built-in simulation 'SLiM' script. Additionally, fully abstract spatial models not tied to a specific geographic location are supported, and users can also simulate data from standard, non-spatial, random-mating models. These can be simulated either with the 'SLiM' built-in back-end script, or using an efficient coalescent population genetics simulator 'msprime' by Baumdicker et al. (2022) <doi:10.1093/genetics/iyab229> with a custom-built 'Python' script bundled with the R package. Simulated genomic data is saved in a tree-sequence format and can be loaded, manipulated, and summarised using tree-sequence functionality via an R interface to the 'Python' module 'tskit' by Kelleher et al. (2019) <doi:10.1038/s41588-019-0483-y>. Complete model configuration, simulation and analysis pipelines can be therefore constructed without a need to leave the R environment, eliminating friction between disparate tools for population genetic simulations and data analysis.

Maintained by Martin Petr. Last updated 14 days ago.

popgen population-genetics simulations spatial-statistics

25.3 match 56 stars 9.15 score 88 scripts

jknowles

merTools:Tools for Analyzing Mixed Effect Regression Models

Provides methods for extracting results from mixed-effect model objects fit with the 'lme4' package. Allows construction of prediction intervals efficiently from large scale linear and generalized linear mixed-effects models. This method draws from the simulation framework used in the Gelman and Hill (2007) textbook: Data Analysis Using Regression and Multilevel/Hierarchical Models.

Maintained by Jared E. Knowles. Last updated 1 years ago.

22.0 match 105 stars 10.49 score 768 scripts

lme4

lme4:Linear Mixed-Effects Models using 'Eigen' and S4

Fit linear and generalized linear mixed-effects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the 'Eigen' C++ library for numerical linear algebra and 'RcppEigen' "glue".

Maintained by Ben Bolker. Last updated 5 days ago.

cpp

11.0 match 647 stars 20.69 score 35k scripts 1.5k dependents

wviechtb

metafor:Meta-Analysis Package for R

A comprehensive collection of functions for conducting meta-analyses in R. The package includes functions to calculate various effect sizes or outcome measures, fit equal-, fixed-, random-, and mixed-effects models to such data, carry out moderator and meta-regression analyses, and create various types of meta-analytical plots (e.g., forest, funnel, radial, L'Abbe, Baujat, bubble, and GOSH plots). For meta-analyses of binomial and person-time data, the package also provides functions that implement specialized methods, including the Mantel-Haenszel method, Peto's method, and a variety of suitable generalized linear (mixed-effects) models (i.e., mixed-effects logistic and Poisson regression models). Finally, the package provides functionality for fitting meta-analytic multivariate/multilevel models that account for non-independent sampling errors and/or true effects (e.g., due to the inclusion of multiple treatment studies, multiple endpoints, or other forms of clustering). Network meta-analyses and meta-analyses accounting for known correlation structures (e.g., due to phylogenetic relatedness) can also be conducted. An introduction to the package can be found in Viechtbauer (2010) <doi:10.18637/jss.v036.i03>.

Maintained by Wolfgang Viechtbauer. Last updated 3 days ago.

meta-analysis mixed-effects multilevel-models multivariate

13.8 match 246 stars 16.30 score 4.9k scripts 92 dependents

kassambara

factoextra:Extract and Visualize the Results of Multivariate Data Analyses

Provides some easy-to-use functions to extract and visualize the output of multivariate data analyses, including 'PCA' (Principal Component Analysis), 'CA' (Correspondence Analysis), 'MCA' (Multiple Correspondence Analysis), 'FAMD' (Factor Analysis of Mixed Data), 'MFA' (Multiple Factor Analysis) and 'HMFA' (Hierarchical Multiple Factor Analysis) functions from different R packages. It contains also functions for simplifying some clustering analysis steps and provides 'ggplot2' - based elegant data visualization.

Maintained by Alboukadel Kassambara. Last updated 5 years ago.

15.8 match 363 stars 14.13 score 15k scripts 52 dependents

rspatial

terra:Spatial Data Analysis

Methods for spatial data analysis with vector (points, lines, polygons) and raster (grid) data. Methods for vector data include geometric operations such as intersect and buffer. Raster methods include local, focal, global, zonal and geometric operations. The predict and interpolate methods facilitate the use of regression type (interpolation, machine learning) models for spatial prediction, including with satellite remote sensing data. Processing of very large files is supported. See the manual and tutorials on <https://rspatial.org/> to get started. 'terra' replaces the 'raster' package ('terra' can do more, and it is faster and easier to use).

Maintained by Robert J. Hijmans. Last updated 3 hours ago.

geospatial raster spatial vector onetbb proj gdal geos cpp

12.5 match 559 stars 17.63 score 17k scripts 851 dependents

bozenne

LMMstar:Repeated Measurement Models for Discrete Times

Companion R package for the course "Statistical analysis of correlated and repeated measurements for health science researchers" taught by the section of Biostatistics of the University of Copenhagen. It implements linear mixed models where the model for the variance-covariance of the residuals is specified via patterns (compound symmetry, toeplitz, unstructured, ...). Statistical inference for mean, variance, and correlation parameters is performed based on the observed information and a Satterthwaite approximation of the degrees of freedom. Normalized residuals are provided to assess model misspecification. Statistical inference can be performed for arbitrary linear or non-linear combination(s) of model coefficients. Predictions can be computed conditional to covariates only or also to outcome values.

Maintained by Brice Ozenne. Last updated 5 months ago.

34.7 match 4 stars 6.28 score 141 scripts

jaredlander

coefplot:Plots Coefficients from Fitted Models

Plots the coefficients from model objects. This very quickly shows the user the point estimates and confidence intervals for fitted models.

Maintained by Jared P. Lander. Last updated 3 years ago.

26.3 match 27 stars 8.28 score 744 scripts 1 dependents

alexpate30

rcprd:Extraction and Management of Clinical Practice Research Datalink Data

Simplify the process of extracting and processing Clinical Practice Research Datalink (CPRD) data in order to build datasets ready for statistical analysis. This process is difficult in 'R', as the raw data is very large and cannot be read into the R workspace. 'rcprd' utilises 'RSQLite' to create 'SQLite' databases which are stored on the hard disk. These are then queried to extract the required information for a cohort of interest, and create datasets ready for statistical analysis. The processes follow closely that from the 'rEHR' package, see Springate et al., (2017) <doi:10.1371/journal.pone.0171784>.

Maintained by Alexander Pate. Last updated 21 days ago.

39.2 match 2 stars 5.48 score 5 scripts

bbuchsbaum

neuroim:Data Structures and Handling for Neuroimaging Data

A collection of data structures that represent volumetric brain imaging data. The focus is on basic data handling for 3D and 4D neuroimaging data. In addition, there are function to read and write NIFTI files and limited support for reading AFNI files.

Maintained by Bradley Buchsbaum. Last updated 4 years ago.

cpp

37.7 match 6 stars 5.64 score 48 scripts

easystats

see:Model Visualisation Toolbox for 'easystats' and 'ggplot2'

Provides plotting utilities supporting packages in the 'easystats' ecosystem (<https://github.com/easystats/easystats>) and some extra themes, geoms, and scales for 'ggplot2'. Color scales are based on <https://materialui.co/>. References: Lüdecke et al. (2021) <doi:10.21105/joss.03393>.

Maintained by Indrajeet Patil. Last updated 7 days ago.

data-visualization easystats ggplot2 hacktoberfest plotting see statistics visualisation visualization

15.9 match 902 stars 13.22 score 2.0k scripts 3 dependents

philips-software

latrend:A Framework for Clustering Longitudinal Data

A framework for clustering longitudinal datasets in a standardized way. The package provides an interface to existing R packages for clustering longitudinal univariate trajectories, facilitating reproducible and transparent analyses. Additionally, standard tools are provided to support cluster analyses, including repeated estimation, model validation, and model assessment. The interface enables users to compare results between methods, and to implement and evaluate new methods with ease. The 'akmedoids' package is available from <https://github.com/MAnalytics/akmedoids>.

Maintained by Niek Den Teuling. Last updated 2 months ago.

cluster-analysis clustering-evaluation clustering-methods data-science longitudinal-clustering longitudinal-data mixture-models time-series-analysis

31.0 match 30 stars 6.77 score 26 scripts

tomasfryda

h2o:R Interface for the 'H2O' Scalable Machine Learning Platform

R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).

Maintained by Tomas Fryda. Last updated 1 years ago.

25.5 match 3 stars 8.20 score 7.8k scripts 11 dependents

cran

mgcv:Mixed GAM Computation Vehicle with Automatic Smoothness Estimation

Generalized additive (mixed) models, some of their extensions and other generalized ridge regression with multiple smoothing parameter estimation by (Restricted) Marginal Likelihood, Generalized Cross Validation and similar, or using iterated nested Laplace approximation for fully Bayesian inference. See Wood (2017) <doi:10.1201/9781315370279> for an overview. Includes a gam() function, a wide variety of smoothers, 'JAGS' support and distributions beyond the exponential family.

Maintained by Simon Wood. Last updated 1 years ago.

openblas openmp

16.2 match 32 stars 12.71 score 17k scripts 7.8k dependents

tidyverse

stringr:Simple, Consistent Wrappers for Common String Operations

A consistent, simple and easy to use set of wrappers around the fantastic 'stringi' package. All function and argument names (and positions) are consistent, all functions deal with "NA"'s and zero length vectors in the same way, and the output from one function is easy to feed into the input of another.

Maintained by Hadley Wickham. Last updated 7 months ago.

regular-expression strings

9.4 match 628 stars 21.99 score 164k scripts 8.3k dependents

gagolews

stringi:Fast and Portable Character String Processing Facilities

A collection of character string/text/natural language processing tools for pattern searching (e.g., with 'Java'-like regular expressions or the 'Unicode' collation algorithm), random string generation, case mapping, string transliteration, concatenation, sorting, padding, wrapping, Unicode normalisation, date-time formatting and parsing, and many more. They are fast, consistent, convenient, and - thanks to 'ICU' (International Components for Unicode) - portable across all locales and platforms. Documentation about 'stringi' is provided via its website at <https://stringi.gagolewski.com/> and the paper by Gagolewski (2022, <doi:10.18637/jss.v103.i02>).

Maintained by Marek Gagolewski. Last updated 1 months ago.

icu icu4c natural-language-processing nlp regex regexp string-manipulation stringi stringr text text-processing tidy-data unicode cpp

11.2 match 309 stars 18.31 score 10k scripts 8.6k dependents

tidyverts

fabletools:Core Tools for Packages in the 'fable' Framework

Provides tools, helpers and data structures for developing models and time series functions for 'fable' and extension packages. These tools support a consistent and tidy interface for time series modelling and analysis.

Maintained by Mitchell OHara-Wild. Last updated 1 months ago.

16.8 match 91 stars 12.18 score 396 scripts 18 dependents

bioc

ballgown:Flexible, isoform-level differential expression analysis

Tools for statistical analysis of assembled transcriptomes, including flexible differential expression analysis, visualization of transcript structures, and matching of assembled transcripts to annotation.

Maintained by Jack Fu. Last updated 5 months ago.

immunooncology rnaseq statisticalmethod preprocessing differentialexpression

19.4 match 146 stars 10.51 score 338 scripts 1 dependents

r-lib

httr2:Perform HTTP Requests and Process the Responses

Tools for creating and modifying HTTP requests, then performing them and processing the results. 'httr2' is a modern re-imagining of 'httr' that uses a pipe-based interface and solves more of the problems that API wrapping packages face.

Maintained by Hadley Wickham. Last updated 10 days ago.

http

11.4 match 246 stars 17.66 score 1.9k scripts 1.1k dependents

alexzwanenburg

familiar:End-to-End Automated Machine Learning and Model Evaluation

Single unified interface for end-to-end modelling of regression, categorical and time-to-event (survival) outcomes. Models created using familiar are self-containing, and their use does not require additional information such as baseline survival, feature clustering, or feature transformation and normalisation parameters. Model performance, calibration, risk group stratification, (permutation) variable importance, individual conditional expectation, partial dependence, and more, are assessed automatically as part of the evaluation process and exported in tabular format and plotted, and may also be computed manually using export and plot functions. Where possible, metrics and values obtained during the evaluation process come with confidence intervals.

Maintained by Alex Zwanenburg. Last updated 6 months ago.

ai explainable-ai machine-learning survival-analysis tabular-data

39.9 match 31 stars 5.05 score 18 scripts

biodiverse

ubms:Bayesian Models for Data from Unmarked Animals using 'Stan'

Fit Bayesian hierarchical models of animal abundance and occurrence via the 'rstan' package, the R interface to the 'Stan' C++ library. Supported models include single-season occupancy, dynamic occupancy, and N-mixture abundance models. Covariates on model parameters are specified using a formula-based interface similar to package 'unmarked', while also allowing for estimation of random slope and intercept terms. References: Carpenter et al. (2017) <doi:10.18637/jss.v076.i01>; Fiske and Chandler (2011) <doi:10.18637/jss.v043.i10>.

Maintained by Ken Kellner. Last updated 20 days ago.

distance-sampling hierarchical-models n-mixture-model occupancy stan openblas cpp

25.1 match 35 stars 7.88 score 73 scripts

bart1

move:Visualizing and Analyzing Animal Track Data

Contains functions to access movement data stored in 'movebank.org' as well as tools to visualize and statistically analyze animal movement data, among others functions to calculate dynamic Brownian Bridge Movement Models. Move helps addressing movement ecology questions.

Maintained by Bart Kranstauber. Last updated 4 months ago.

cpp

22.4 match 8.74 score 690 scripts 3 dependents

ropensci

restez:Create and Query a Local Copy of 'GenBank' in R

Download large sections of 'GenBank' <https://www.ncbi.nlm.nih.gov/genbank/> and generate a local SQL-based database. A user can then query this database using 'restez' functions or through 'rentrez' <https://CRAN.R-project.org/package=rentrez> wrappers.

Maintained by Joel H. Nitta. Last updated 11 days ago.

dna entrez genbank sequence

27.7 match 26 stars 7.01 score 175 scripts 1 dependents

sciviews

pastecs:Package for Analysis of Space-Time Ecological Series

Regularisation, decomposition and analysis of space-time series. The pastecs R package is a PNEC-Art4 and IFREMER (Benoit Beliaeff <Benoit.Beliaeff@ifremer.fr>) initiative to bring PASSTEC 2000 functionalities to R.

Maintained by Philippe Grosjean. Last updated 1 years ago.

ecology time-series

18.7 match 4 stars 10.34 score 2.1k scripts 13 dependents

ropensci

rdhs:API Client and Dataset Management for the Demographic and Health Survey (DHS) Data

Provides a client for (1) querying the DHS API for survey indicators and metadata (<https://api.dhsprogram.com/#/index.html>), (2) identifying surveys and datasets for analysis, (3) downloading survey datasets from the DHS website, (4) loading datasets and associate metadata into R, and (5) extracting variables and combining datasets for pooled analysis.

Maintained by OJ Watson. Last updated 19 days ago.

dataset dhs dhs-api extract peer-reviewed survey-data

19.1 match 35 stars 10.07 score 286 scripts 3 dependents

glmmtmb

glmmTMB:Generalized Linear Mixed Models using Template Model Builder

Fit linear and generalized linear mixed models with various extensions, including zero-inflation. The models are fitted using maximum likelihood estimation via 'TMB' (Template Model Builder). Random effects are assumed to be Gaussian on the scale of the linear predictor and are integrated out using the Laplace approximation. Gradients are calculated using automatic differentiation.

Maintained by Mollie Brooks. Last updated 20 hours ago.

cpp openmp

12.8 match 312 stars 14.80 score 3.7k scripts 24 dependents

epimodel

EpiModel:Mathematical Modeling of Infectious Disease Dynamics

Tools for simulating mathematical models of infectious disease dynamics. Epidemic model classes include deterministic compartmental models, stochastic individual-contact models, and stochastic network models. Network models use the robust statistical methods of exponential-family random graph models (ERGMs) from the Statnet suite of software packages in R. Standard templates for epidemic modeling include SI, SIR, and SIS disease types. EpiModel features an API for extending these templates to address novel scientific research aims. Full methods for EpiModel are detailed in Jenness et al. (2018, <doi:10.18637/jss.v084.i08>).

Maintained by Samuel Jenness. Last updated 2 months ago.

agent-based-modeling epidemics epidemiology infectious-diseases network-graph cpp

16.3 match 250 stars 11.57 score 315 scripts

lcbc-uio

galamm:Generalized Additive Latent and Mixed Models

Estimates generalized additive latent and mixed models using maximum marginal likelihood, as defined in Sorensen et al. (2023) <doi:10.1007/s11336-023-09910-z>, which is an extension of Rabe-Hesketh and Skrondal (2004)'s unifying framework for multilevel latent variable modeling <doi:10.1007/BF02295939>. Efficient computation is done using sparse matrix methods, Laplace approximation, and automatic differentiation. The framework includes generalized multilevel models with heteroscedastic residuals, mixed response types, factor loadings, smoothing splines, crossed random effects, and combinations thereof. Syntax for model formulation is close to 'lme4' (Bates et al. (2015) <doi:10.18637/jss.v067.i01>) and 'PLmixed' (Rockwood and Jeon (2019) <doi:10.1080/00273171.2018.1516541>).

Maintained by Øystein Sørensen. Last updated 6 months ago.

generalized-additive-models hierarchical-models item-response-theory latent-variable-models structural-equation-models cpp

25.6 match 29 stars 7.33 score 41 scripts

biodiverse

spAbundance:Univariate and Multivariate Spatial Modeling of Species Abundance

Fits single-species (univariate) and multi-species (multivariate) non-spatial and spatial abundance models in a Bayesian framework using Markov Chain Monte Carlo (MCMC). Spatial models are fit using Nearest Neighbor Gaussian Processes (NNGPs). Details on NNGP models are given in Datta, Banerjee, Finley, and Gelfand (2016) <doi:10.1080/01621459.2015.1044091> and Finley, Datta, and Banerjee (2022) <doi:10.18637/jss.v103.i05>. Fits single-species and multi-species spatial and non-spatial versions of generalized linear mixed models (Gaussian, Poisson, Negative Binomial), N-mixture models (Royle 2004 <doi:10.1111/j.0006-341X.2004.00142.x>) and hierarchical distance sampling models (Royle, Dawson, Bates (2004) <doi:10.1890/03-3127>). Multi-species spatial models are fit using a spatial factor modeling approach with NNGPs for computational efficiency.

Maintained by Jeffrey Doser. Last updated 19 days ago.

openblas cpp openmp

29.8 match 17 stars 6.15 score 43 scripts 1 dependents

chockemeyer

kstMatrix:Basic Functions in Knowledge Space Theory Using Matrix Representation

Knowledge space theory by Doignon and Falmagne (1999) <doi:10.1007/978-3-642-58625-5> is a set- and order-theoretical framework, which proposes mathematical formalisms to operationalize knowledge structures in a particular domain. The 'kstMatrix' package provides basic functionalities to generate, handle, and manipulate knowledge structures and knowledge spaces. Opposed to the 'kst' package, 'kstMatrix' uses matrix representations for knowledge structures. Furthermore, 'kstMatrix' contains several knowledge spaces developed by the research group around Cornelia Dowling through querying experts.

Maintained by Cord Hockemeyer. Last updated 2 months ago.

52.1 match 2 stars 3.43 score 15 scripts 1 dependents

pablobarbera

Rfacebook:Access to Facebook API via R

Provides an interface to the Facebook API.

Maintained by Pablo Barbera. Last updated 5 years ago.

22.9 match 351 stars 7.75 score 268 scripts

bozenne

BuyseTest:Generalized Pairwise Comparisons

Implementation of the Generalized Pairwise Comparisons (GPC) as defined in Buyse (2010) <doi:10.1002/sim.3923> for complete observations, and extended in Peron (2018) <doi:10.1177/0962280216658320> to deal with right-censoring. GPC compare two groups of observations (intervention vs. control group) regarding several prioritized endpoints to estimate the probability that a random observation drawn from one group performs better/worse/equivalently than a random observation drawn from the other group. Summary statistics such as the net treatment benefit, win ratio, or win odds are then deduced from these probabilities. Confidence intervals and p-values are obtained based on asymptotic results (Ozenne 2021 <doi:10.1177/09622802211037067>), non-parametric bootstrap, or permutations. The software enables the use of thresholds of minimal importance difference, stratification, non-prioritized endpoints (O Brien test), and can handle right-censoring and competing-risks.

Maintained by Brice Ozenne. Last updated 6 days ago.

generalized-pairwise-comparisons non-parametric statistics cpp

30.0 match 5 stars 5.91 score 90 scripts

r-forge

R2MLwiN:Running 'MLwiN' from Within R

An R command interface to the 'MLwiN' multilevel modelling software package.

Maintained by Zhengzheng Zhang. Last updated 5 months ago.

32.9 match 5.35 score 125 scripts

gamlss-dev

gamlss:Generalized Additive Models for Location Scale and Shape

Functions for fitting the Generalized Additive Models for Location Scale and Shape introduced by Rigby and Stasinopoulos (2005), <doi:10.1111/j.1467-9876.2005.00510.x>. The models use a distributional regression approach where all the parameters of the conditional distribution of the response variable are modelled using explanatory variables.

Maintained by Mikis Stasinopoulos. Last updated 4 months ago.

15.6 match 16 stars 11.23 score 2.0k scripts 49 dependents

evolecolgroup

pastclim:Manipulate Time Series of Climate Reconstructions

Methods to easily extract and manipulate climate reconstructions for ecological and anthropological analyses, as described in Leonardi et al. (2023) <doi:10.1111/ecog.06481>. The package includes datasets of palaeoclimate reconstructions, present observations, and future projections from multiple climate models.

Maintained by Andrea Manica. Last updated 5 days ago.

climate-data paleoclimate species-distribution-modelling

21.5 match 38 stars 8.12 score 49 scripts

spatstat

spatstat.linnet:Linear Networks Functionality of the 'spatstat' Family

Defines types of spatial data on a linear network and provides functionality for geometrical operations, data analysis and modelling of data on a linear network, in the 'spatstat' family of packages. Contains definitions and support for linear networks, including creation of networks, geometrical measurements, topological connectivity, geometrical operations such as inserting and deleting vertices, intersecting a network with another object, and interactive editing of networks. Data types defined on a network include point patterns, pixel images, functions, and tessellations. Exploratory methods include kernel estimation of intensity on a network, K-functions and pair correlation functions on a network, simulation envelopes, nearest neighbour distance and empty space distance, relative risk estimation with cross-validated bandwidth selection. Formal hypothesis tests of random pattern (chi-squared, Kolmogorov-Smirnov, Monte Carlo, Diggle-Cressie-Loosmore-Ford, Dao-Genton, two-stage Monte Carlo) and tests for covariate effects (Cox-Berman-Waller-Lawson, Kolmogorov-Smirnov, ANOVA) are also supported. Parametric models can be fitted to point pattern data using the function lppm() similar to glm(). Only Poisson models are implemented so far. Models may involve dependence on covariates and dependence on marks. Models are fitted by maximum likelihood. Fitted point process models can be simulated, automatically. Formal hypothesis tests of a fitted model are supported (likelihood ratio test, analysis of deviance, Monte Carlo tests) along with basic tools for model selection (stepwise(), AIC()) and variable selection (sdr). Tools for validating the fitted model include simulation envelopes, residuals, residual plots and Q-Q plots, leverage and influence diagnostics, partial residuals, and added variable plots. Random point patterns on a network can be generated using a variety of models.

Maintained by Adrian Baddeley. Last updated 2 months ago.

density-estimation heat-equation kernel-density-estimation network-analysis point-processes spatial-data-analysis statistical-analysis statistical-inference statistical-models

17.7 match 6 stars 9.64 score 35 scripts 43 dependents

bioc

Rtpca:Thermal proximity co-aggregation with R

R package for performing thermal proximity co-aggregation analysis with thermal proteome profiling datasets to analyse protein complex assembly and (differential) protein-protein interactions across conditions.

Maintained by Nils Kurzawa. Last updated 5 months ago.

software proteomics dataimport

38.3 match 4.46 score 29 scripts

ropensci

pdftools:Text Extraction, Rendering and Converting of PDF Documents

Utilities based on 'libpoppler' <https://poppler.freedesktop.org> for extracting text, fonts, attachments and metadata from a PDF file. Also supports high quality rendering of PDF documents into PNG, JPEG, TIFF format, or into raw bitmap vectors for further processing in R.

Maintained by Jeroen Ooms. Last updated 15 days ago.

pdf-files pdf-format pdftools poppler poppler-library text-extraction cpp

12.9 match 529 stars 13.10 score 3.3k scripts 47 dependents

joshuaulrich

xts:eXtensible Time Series

Provide for uniform handling of R's different time-based data classes by extending zoo, maximizing native format information preservation and allowing for user level customization and extension, while simplifying cross-class interoperability.

Maintained by Joshua M. Ulrich. Last updated 4 months ago.

c time-series

9.2 match 221 stars 18.38 score 12k scripts 654 dependents

bioc

ORFik:Open Reading Frames in Genomics

R package for analysis of transcript and translation features through manipulation of sequence data and NGS data like Ribo-Seq, RNA-Seq, TCP-Seq and CAGE. It is generalized in the sense that any transcript region can be analysed, as the name hints to it was made with investigation of ribosomal patterns over Open Reading Frames (ORFs) as it's primary use case. ORFik is extremely fast through use of C++, data.table and GenomicRanges. Package allows to reassign starts of the transcripts with the use of CAGE-Seq data, automatic shifting of RiboSeq reads, finding of Open Reading Frames for whole genomes and much more.

Maintained by Haakon Tjeldnes. Last updated 29 days ago.

immunooncology software sequencing riboseq rnaseq functionalgenomics coverage alignment dataimport cpp

15.9 match 33 stars 10.63 score 115 scripts 2 dependents

mnrzrad

ImpShrinkage:Improved Shrinkage Estimations for Multiple Linear Regression

A variety of improved shrinkage estimators in the area of statistical analysis: unrestricted; restricted; preliminary test; improved preliminary test; Stein; and positive-rule Stein. More details can be found in chapter 7 of Saleh, A. K. Md. E. (2006) <ISBN: 978-0-471-56375-4>.

Maintained by Mina Norouzirad. Last updated 11 months ago.

46.5 match 4 stars 3.60 score 1 scripts

cjvanlissa

tidySEM:Tidy Structural Equation Modeling

A tidy workflow for generating, estimating, reporting, and plotting structural equation models using 'lavaan', 'OpenMx', or 'Mplus'. Throughout this workflow, elements of syntax, results, and graphs are represented as 'tidy' data, making them easy to customize. Includes functionality to estimate latent class analyses, and to plot 'dagitty' and 'igraph' objects.

Maintained by Caspar J. van Lissa. Last updated 9 days ago.

15.6 match 58 stars 10.69 score 330 scripts 1 dependents

sparklyr

sparklyr:R Interface to Apache Spark

R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.

Maintained by Edgar Ruiz. Last updated 8 hours ago.

apache-spark distributed dplyr ide livy machine-learning remote-clusters spark sparklyr

11.0 match 959 stars 15.20 score 4.0k scripts 21 dependents

stan-dev

rstan:R Interface to Stan

User-facing R functions are provided to parse, compile, test, estimate, and analyze Stan models by accessing the header-only Stan library provided by the 'StanHeaders' package. The Stan project develops a probabilistic programming language that implements full Bayesian statistical inference via Markov Chain Monte Carlo, rough Bayesian inference via 'variational' approximation, and (optionally penalized) maximum likelihood estimation via optimization. In all three cases, automatic differentiation is used to quickly and accurately evaluate gradients without burdening the user with the need to derive the partial derivatives.

Maintained by Ben Goodrich. Last updated 3 days ago.

bayesian-data-analysis bayesian-inference bayesian-statistics mcmc stan cpp

8.9 match 1.1k stars 18.67 score 14k scripts 279 dependents

markusfritsch

pdynmc:Moment Condition Based Estimation of Linear Dynamic Panel Data Models

Linear dynamic panel data modeling based on linear and nonlinear moment conditions as proposed by Holtz-Eakin, Newey, and Rosen (1988) <doi:10.2307/1913103>, Ahn and Schmidt (1995) <doi:10.1016/0304-4076(94)01641-C>, and Arellano and Bover (1995) <doi:10.1016/0304-4076(94)01642-D>. Estimation of the model parameters relies on the Generalized Method of Moments (GMM) and instrumental variables (IV) estimation, numerical optimization (when nonlinear moment conditions are employed) and the computation of closed form solutions (when estimation is based on linear moment conditions). One-step, two-step and iterated estimation is available. For inference and specification testing, Windmeijer (2005) <doi:10.1016/j.jeconom.2004.02.005> and doubly corrected standard errors (Hwang, Kang, Lee, 2021 <doi:10.1016/j.jeconom.2020.09.010>) are available. Additionally, serial correlation tests, tests for overidentification, and Wald tests are provided. Functions for visualizing panel data structures and modeling results obtained from GMM estimation are also available. The plot methods include functions to plot unbalanced panel structure, coefficient ranges and coefficient paths across GMM iterations (the latter is implemented according to the plot shown in Hansen and Lee, 2021 <doi:10.3982/ECTA16274>). For a more detailed description of the GMM-based functionality, please see Fritsch, Pua, Schnurbus (2021) <doi:10.32614/RJ-2021-035>. For more details on the IV-based estimation routines, see Fritsch, Pua, and Schnurbus (WP, 2024) and Han and Phillips (2010) <doi:10.1017/S026646660909063X>.

Maintained by Markus Fritsch. Last updated 15 days ago.

24.9 match 4 stars 6.65 score 106 scripts

samuel-marsh

scCustomize:Custom Visualizations & Functions for Streamlined Analyses of Single Cell Sequencing

Collection of functions created and/or curated to aid in the visualization and analysis of single-cell data using 'R'. 'scCustomize' aims to provide 1) Customized visualizations for aid in ease of use and to create more aesthetic and functional visuals. 2) Improve speed/reproducibility of common tasks/pieces of code in scRNA-seq analysis with a single or group of functions. For citation please use: Marsh SE (2021) "Custom Visualizations & Functions for Streamlined Analyses of Single Cell Sequencing" <doi:10.5281/zenodo.5706430> RRID:SCR_024675.

Maintained by Samuel Marsh. Last updated 3 months ago.

customization ggplot2 scrna-seq seurat single-cell single-cell-genomics single-cell-rna-seq visualization

18.5 match 242 stars 8.75 score 1.1k scripts

henrikbengtsson

R.utils:Various Programming Utilities

Utility functions useful when programming and developing R packages.

Maintained by Henrik Bengtsson. Last updated 1 years ago.

11.8 match 63 stars 13.74 score 5.7k scripts 814 dependents

ropensci

osmdata:Import 'OpenStreetMap' Data as Simple Features or Spatial Objects

Download and import of 'OpenStreetMap' ('OSM') data as 'sf' or 'sp' objects. 'OSM' data are extracted from the 'Overpass' web server (<https://overpass-api.de/>) and processed with very fast 'C++' routines for return to 'R'.

Maintained by Mark Padgham. Last updated 1 months ago.

open0street0map openstreetmap overpass0api osm cpp osm-data overpass-api peer-reviewed cpp

11.1 match 322 stars 14.53 score 2.8k scripts 14 dependents

r-lib

rlang:Functions for Base Types and Core R and 'Tidyverse' Features

A toolbox for working with base types, core R features like the condition system, and core 'Tidyverse' features like tidy evaluation.

Maintained by Lionel Henry. Last updated 21 days ago.

7.8 match 517 stars 20.53 score 9.8k scripts 15k dependents

briencj

dae:Functions Useful in the Design and ANOVA of Experiments

The content falls into the following groupings: (i) Data, (ii) Factor manipulation functions, (iii) Design functions, (iv) ANOVA functions, (v) Matrix functions, (vi) Projector and canonical efficiency functions, and (vii) Miscellaneous functions. There is a vignette describing how to use the design functions for randomizing and assessing designs available as a vignette called 'DesignNotes'. The ANOVA functions facilitate the extraction of information when the 'Error' function has been used in the call to 'aov'. The package 'dae' can also be installed from <http://chris.brien.name/rpackages/>.

Maintained by Chris Brien. Last updated 4 months ago.

18.5 match 1 stars 8.62 score 356 scripts 7 dependents

dkaschek

dMod:Dynamic Modeling and Parameter Estimation in ODE Models

The framework provides functions to generate ODEs of reaction networks, parameter transformations, observation functions, residual functions, etc. The framework follows the paradigm that derivative information should be used for optimization whenever possible. Therefore, all major functions produce and can handle expressions for symbolic derivatives.

Maintained by Daniel Kaschek. Last updated 11 days ago.

19.1 match 20 stars 8.35 score 251 scripts

r-lib

httr:Tools for Working with URLs and HTTP

Useful tools for working with HTTP organised by HTTP verbs (GET(), POST(), etc). Configuration functions make it easy to control additional request components (authenticate(), add_headers() and so on).

Maintained by Hadley Wickham. Last updated 1 years ago.

api curl http

7.8 match 989 stars 20.56 score 29k scripts 4.3k dependents

ropensci

EndoMineR:Functions to mine endoscopic and associated pathology datasets

This script comprises the functions that are used to clean up endoscopic reports and pathology reports as well as many of the scripts used for analysis. The scripts assume the endoscopy and histopathology data set is merged already but it can also be used of course with the unmerged datasets.

Maintained by Sebastian Zeki. Last updated 7 months ago.

endoscopy gastroenterology peer-reviewed semi-structured-data text-mining

28.5 match 13 stars 5.47 score 30 scripts

quanteda

spacyr:Wrapper to the 'spaCy' 'NLP' Library

An R wrapper to the 'Python' 'spaCy' 'NLP' library, from <https://spacy.io>.

Maintained by Kenneth Benoit. Last updated 1 months ago.

extract-entities nlp spacy speech-tagging

14.6 match 253 stars 10.68 score 408 scripts 6 dependents

knausb

vcfR:Manipulate and Visualize VCF Data

Facilitates easy manipulation of variant call format (VCF) data. Functions are provided to rapidly read from and write to VCF files. Once VCF data is read into R a parser function extracts matrices of data. This information can then be used for quality control or other purposes. Additional functions provide visualization of genomic data. Once processing is complete data may be written to a VCF file (*.vcf.gz). It also may be converted into other popular R objects (e.g., genlight, DNAbin). VcfR provides a link between VCF data and familiar R software.

Maintained by Brian J. Knaus. Last updated 25 days ago.

genomics population-genetics population-genomics rcpp vcf-data visualization zlib cpp

11.4 match 254 stars 13.59 score 3.1k scripts 19 dependents

predictiveecology

SpaDES.core:Core Utilities for Developing and Running Spatially Explicit Discrete Event Models

Provides the core framework for a discrete event system to implement a complete data-to-decisions, reproducible workflow. The core components facilitate the development of modular pieces, and enable the user to include additional functionality by running user-built modules. Includes conditional scheduling, restart after interruption, packaging of reusable modules, tools for developing arbitrary automated workflows, automated interweaving of modules of different temporal resolution, and tools for visualizing and understanding the within-project dependencies. The suggested package 'NLMR' can be installed from the repository (<https://PredictiveEcology.r-universe.dev>).

Maintained by Eliot J B McIntire. Last updated 20 days ago.

discrete-events-simulations simulation-framework simulation-modeling

14.5 match 10 stars 10.61 score 142 scripts 6 dependents

billdenney

PKNCA:Perform Pharmacokinetic Non-Compartmental Analysis

Compute standard Non-Compartmental Analysis (NCA) parameters for typical pharmacokinetic analyses and summarize them.

Maintained by Bill Denney. Last updated 18 days ago.

nca noncompartmental-analysis pharmacokinetics

12.1 match 73 stars 12.61 score 214 scripts 4 dependents

projectmosaic

mosaic:Project MOSAIC Statistics and Mathematics Teaching Utilities

Data sets and utilities from Project MOSAIC (<http://www.mosaic-web.org>) used to teach mathematics, statistics, computation and modeling. Funded by the NSF, Project MOSAIC is a community of educators working to tie together aspects of quantitative work that students in science, technology, engineering and mathematics will need in their professional lives, but which are usually taught in isolation, if at all.

Maintained by Randall Pruim. Last updated 1 years ago.

11.3 match 93 stars 13.32 score 7.2k scripts 7 dependents

satopaa

metaggR:Calculate the Knowledge-Weighted Estimate

According to a phenomenon known as "the wisdom of the crowds," combining point estimates from multiple judges often provides a more accurate aggregate estimate than using a point estimate from a single judge. However, if the judges use shared information in their estimates, the simple average will over-emphasize this common component at the expense of the judges’ private information. Asa Palley & Ville Satopää (2021) "Boosting the Wisdom of Crowds Within a Single Judgment Problem: Selective Averaging Based on Peer Predictions" <https://papers.ssrn.com/sol3/Papers.cfm?abstract_id=3504286> proposes a procedure for calculating a weighted average of the judges’ individual estimates such that resulting aggregate estimate appropriately combines the judges' collective information within a single estimation problem. The authors use both simulation and data from six experimental studies to illustrate that the weighting procedure outperforms existing averaging-like methods, such as the equally weighted average, trimmed average, and median. This aggregate estimate -- know as "the knowledge-weighted estimate" -- inputs a) judges' estimates of a continuous outcome (E) and b) predictions of others' average estimate of this outcome (P). In this R-package, the function knowledge_weighted_estimate(E,P) implements the knowledge-weighted estimate. Its use is illustrated with a simple stylized example and on real-world experimental data.

Maintained by Ville Satopää. Last updated 3 years ago.

52.4 match 1 stars 2.85 score 14 scripts

usdaforestservice

FIESTA:Forest Inventory Estimation and Analysis

A research estimation tool for analysts that work with sample-based inventory data from the U.S. Department of Agriculture, Forest Service, Forest Inventory and Analysis (FIA) Program.

Maintained by Grayson White. Last updated 17 hours ago.

20.4 match 30 stars 7.25 score 62 scripts

comeetie

greed:Clustering and Model Selection with the Integrated Classification Likelihood

An ensemble of algorithms that enable the clustering of networks and data matrices (such as counts, categorical or continuous) with different type of generative models. Model selection and clustering is performed in combination by optimizing the Integrated Classification Likelihood (which is equivalent to minimizing the description length). Several models are available such as: Stochastic Block Model, degree corrected Stochastic Block Model, Mixtures of Multinomial, Latent Block Model. The optimization is performed thanks to a combination of greedy local search and a genetic algorithm (see <arXiv:2002:11577> for more details).

Maintained by Etienne Côme. Last updated 2 years ago.

openblas cpp openmp

24.6 match 14 stars 5.94 score 41 scripts

tidyverse

magrittr:A Forward-Pipe Operator for R

Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions. For more information, see package vignette. To quote Rene Magritte, "Ceci n'est pas un pipe."

Maintained by Lionel Henry. Last updated 2 years ago.

pipe

6.9 match 961 stars 21.06 score 82k scripts 14k dependents

graemeleehickey

joineRML:Joint Modelling of Multivariate Longitudinal Data and Time-to-Event Outcomes

Fits the joint model proposed by Henderson and colleagues (2000) <doi:10.1093/biostatistics/1.4.465>, but extended to the case of multiple continuous longitudinal measures. The time-to-event data is modelled using a Cox proportional hazards regression model with time-varying covariates. The multiple longitudinal outcomes are modelled using a multivariate version of the Laird and Ware linear mixed model. The association is captured by a multivariate latent Gaussian process. The model is estimated using a Monte Carlo Expectation Maximization algorithm. This project was funded by the Medical Research Council (Grant number MR/M013227/1).

Maintained by Graeme L. Hickey. Last updated 1 months ago.

armadillo biostatistics clinical-trials cox dynamic joint-models longitudinal-data multivariate-analysis multivariate-data multivariate-longitudinal-data prediction rcpp regression-models statistics survival openblas cpp openmp

16.0 match 30 stars 8.93 score 146 scripts 1 dependents

cemos-mannheim

MALDIcellassay:Automated MALDI Cell Assays Using Dose-Response Curve Fitting

Conduct automated cell-based assays using Matrix-Assisted Laser Desorption/Ionization (MALDI) methods for high-throughput screening of signals responsive to treatments. The package efficiently identifies high variance signals and fits dose-response curves to them. Quality metrics such as Z', V', log2 Fold-Change, and Curve response score (CRS) are provided for evaluating the potential of signals as biomarkers. The methodologies were introduced by Weigt et al. (2018) <doi:10.1038/s41598-018-29677-z> and refined by Unger et al. (2021) <doi:10.1038/s41596-021-00624-z>.

Maintained by Thomas Enzlein. Last updated 2 months ago.

cell-based-assay concentration-response-analysis maldi-tof-ms mass-spectrometry untargeted-metabolomics

30.1 match 4.74 score 9 scripts

braverock

PortfolioAnalytics:Portfolio Analysis, Including Numerical Methods for Optimization of Portfolios

Portfolio optimization and analysis routines and graphics.

Maintained by Brian G. Peterson. Last updated 3 months ago.

12.3 match 81 stars 11.49 score 626 scripts 2 dependents

bioc

slingshot:Tools for ordering single-cell sequencing

Provides functions for inferring continuous, branching lineage structures in low-dimensional data. Slingshot was designed to model developmental trajectories in single-cell RNA sequencing data and serve as a component in an analysis pipeline after dimensionality reduction and clustering. It is flexible enough to handle arbitrarily many branching events and allows for the incorporation of prior knowledge through supervised graph construction.

Maintained by Kelly Street. Last updated 5 months ago.

clustering differentialexpression geneexpression rnaseq sequencing software singlecell transcriptomics visualization

11.8 match 283 stars 12.01 score 1.0k scripts 4 dependents

pecanproject

PEcAn.data.land:PEcAn Functions Used for Ecological Forecasts and Reanalysis

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.

Maintained by Mike Dietze. Last updated 4 days ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants jags cpp

15.1 match 216 stars 9.32 score 19 scripts 10 dependents

vegandevs

vegan:Community Ecology Package

Ordination methods, diversity analysis and other functions for community and vegetation ecologists.

Maintained by Jari Oksanen. Last updated 18 days ago.

ecological-modelling ecology ordination fortran openblas

7.2 match 472 stars 19.41 score 15k scripts 440 dependents

tyee001

VGAM:Vector Generalized Linear and Additive Models

An implementation of about 6 major classes of statistical regression models. The central algorithm is Fisher scoring and iterative reweighted least squares. At the heart of this package are the vector generalized linear and additive model (VGLM/VGAM) classes. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are data-driven VGLMs that use smoothing. The book "Vector Generalized Linear and Additive Models: With an Implementation in R" (Yee, 2015) <DOI:10.1007/978-1-4939-2818-7> gives details of the statistical framework and the package. Currently only fixed-effects models are implemented. Many (100+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE. The other classes are RR-VGLMs (reduced-rank VGLMs), quadratic RR-VGLMs, doubly constrained RR-VGLMs, quadratic RR-VGLMs, reduced-rank VGAMs, RCIMs (row-column interaction models)---these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Hauck-Donner effect detection is implemented. Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.

Maintained by Thomas Yee. Last updated 1 months ago.

fortran

13.0 match 10 stars 10.67 score 3.6k scripts 169 dependents

nicholasjclark

mvgam:Multivariate (Dynamic) Generalized Additive Models

Fit Bayesian Dynamic Generalized Additive Models to multivariate observations. Users can build nonlinear State-Space models that can incorporate semiparametric effects in observation and process components, using a wide range of observation families. Estimation is performed using Markov Chain Monte Carlo with Hamiltonian Monte Carlo in the software 'Stan'. References: Clark & Wells (2023) <doi:10.1111/2041-210X.13974>.

Maintained by Nicholas J Clark. Last updated 1 days ago.

bayesian-statistics dynamic-factor-models ecological-modelling forecasting gaussian-process generalised-additive-models generalized-additive-models joint-species-distribution-modelling multilevel-models multivariate-timeseries stan time-series-analysis timeseries vector-autoregression vectorautoregression cpp

14.1 match 139 stars 9.85 score 117 scripts

mlampros

OpenImageR:An Image Processing Toolkit

Incorporates functions for image preprocessing, filtering and image recognition. The package takes advantage of 'RcppArmadillo' to speed up computationally intensive functions. The histogram of oriented gradients descriptor is a modification of the 'findHOGFeatures' function of the 'SimpleCV' computer vision platform, the average_hash(), dhash() and phash() functions are based on the 'ImageHash' python library. The Gabor Feature Extraction functions are based on 'Matlab' code of the paper, "CloudID: Trustworthy cloud-based and cross-enterprise biometric identification" by M. Haghighat, S. Zonouz, M. Abdel-Mottaleb, Expert Systems with Applications, vol. 42, no. 21, pp. 7905-7916, 2015, <doi:10.1016/j.eswa.2015.06.025>. The 'SLIC' and 'SLICO' superpixel algorithms were explained in detail in (i) "SLIC Superpixels Compared to State-of-the-art Superpixel Methods", Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Suesstrunk, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, num. 11, p. 2274-2282, May 2012, <doi:10.1109/TPAMI.2012.120> and (ii) "SLIC Superpixels", Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Suesstrunk, EPFL Technical Report no. 149300, June 2010.

Maintained by Lampros Mouselimis. Last updated 2 years ago.

filtering gabor-feature-extraction gabor-filters hog-features image image-hashing processing rcpparmadillo recognition slic slico superpixels openblas cpp openmp

14.0 match 60 stars 9.86 score 358 scripts 8 dependents

r-lib

bit64:A S3 Class for Vectors of 64bit Integers

Package 'bit64' provides serializable S3 atomic 64bit (signed) integers. These are useful for handling database keys and exact counting in +-2^63. WARNING: do not use them as replacement for 32bit integers, integer64 are not supported for subscripting by R-core and they have different semantics when combined with double, e.g. integer64 + double => integer64. Class integer64 can be used in vectors, matrices, arrays and data.frames. Methods are available for coercion from and to logicals, integers, doubles, characters and factors as well as many elementwise and summary functions. Many fast algorithmic operations such as 'match' and 'order' support inter- active data exploration and manipulation and optionally leverage caching.

Maintained by Michael Chirico. Last updated 6 days ago.

9.2 match 35 stars 14.91 score 1.5k scripts 3.2k dependents

jacobseedorff21

BranchGLM:Efficient Best Subset Selection for GLMs via Branch and Bound Algorithms

Performs efficient and scalable glm best subset selection using a novel implementation of a branch and bound algorithm. To speed up the model fitting process, a range of optimization methods are implemented in 'RcppArmadillo'. Parallel computation is available using 'OpenMP'.

Maintained by Jacob Seedorff. Last updated 6 months ago.

generalized-linear-models regression statistics subset-selection variable-selection openblas cpp openmp

22.2 match 7 stars 6.20 score 30 scripts

bioc

VennDetail:A package for visualization and extract details

A set of functions to generate high-resolution Venn,Vennpie plot,extract and combine details of these subsets with user datasets in data frame is available.

Maintained by Kai Guo. Last updated 5 months ago.

datarepresentation graphandnetwork extract venndiagram

20.2 match 29 stars 6.75 score 65 scripts

wavx

bioacoustics:Analyse Audio Recordings and Automatically Extract Animal Vocalizations

Contains all the necessary tools to process audio recordings of various formats (e.g., WAV, WAC, MP3, ZC), filter noisy files, display audio signals, detect and extract automatically acoustic features for further analysis such as classification.

Maintained by Jean Marchal. Last updated 1 years ago.

fftw3 cpp

17.7 match 47 stars 7.71 score 72 scripts 5 dependents

edsandorf

spdesign:Designing Stated Preference Experiments

Contemporary software commonly used to design stated preference experiments are expensive and the code is closed source. This is a free software package with an easy to use interface to make flexible stated preference experimental designs using state-of-the-art methods. For an overview of stated choice experimental design theory, see e.g., Rose, J. M. & Bliemer, M. C. J. (2014) in Hess S. & Daly. A. <doi:10.4337/9781781003152>. The package website can be accessed at <https://spdesign.edsandorf.me>. We acknowledge funding from the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant INSPiRE (Grant agreement ID: 793163).

Maintained by Erlend Dancke Sandorf. Last updated 5 months ago.

29.5 match 4.60 score 20 scripts

jenniniku

gllvm:Generalized Linear Latent Variable Models

Analysis of multivariate data using generalized linear latent variable models (gllvm). Estimation is performed using either the Laplace method, variational approximations, or extended variational approximations, implemented via TMB (Kristensen et al. (2016), <doi:10.18637/jss.v070.i05>).

Maintained by Jenni Niku. Last updated 1 days ago.

cpp openmp

12.8 match 52 stars 10.54 score 176 scripts 1 dependents

bioc

BASiCS:Bayesian Analysis of Single-Cell Sequencing data

Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model to perform statistical analyses of single-cell RNA sequencing datasets in the context of supervised experiments (where the groups of cells of interest are known a priori, e.g. experimental conditions or cell types). BASiCS performs built-in data normalisation (global scaling) and technical noise quantification (based on spike-in genes). BASiCS provides an intuitive detection criterion for highly (or lowly) variable genes within a single group of cells. Additionally, BASiCS can compare gene expression patterns between two or more pre-specified groups of cells. Unlike traditional differential expression tools, BASiCS quantifies changes in expression that lie beyond comparisons of means, also allowing the study of changes in cell-to-cell heterogeneity. The latter can be quantified via a biological over-dispersion parameter that measures the excess of variability that is observed with respect to Poisson sampling noise, after normalisation and technical noise removal. Due to the strong mean/over-dispersion confounding that is typically observed for scRNA-seq datasets, BASiCS also tests for changes in residual over-dispersion, defined by residual values with respect to a global mean/over-dispersion trend.

Maintained by Catalina Vallejos. Last updated 5 months ago.

immunooncology normalization sequencing rnaseq software geneexpression transcriptomics singlecell differentialexpression bayesian cellbiology bioconductor-package gene-expression rcpp rcpparmadillo scrna-seq single-cell openblas cpp openmp

13.0 match 83 stars 10.26 score 368 scripts 1 dependents

usaid-oha-si

gisr:Geospatial Analytics Utility functions

R Spatial functions for HIV/AIDS related Geospatial Analytics.

Maintained by Baboyma Kagniniwa. Last updated 1 years ago.

gis map

24.9 match 2 stars 5.29 score 328 scripts

jhelvy

logitr:Logit Models w/Preference & WTP Space Utility Parameterizations

Fast estimation of multinomial (MNL) and mixed logit (MXL) models in R. Models can be estimated using "Preference" space or "Willingness-to-pay" (WTP) space utility parameterizations. Weighted models can also be estimated. An option is available to run a parallelized multistart optimization loop with random starting points in each iteration, which is useful for non-convex problems like MXL models or models with WTP space utility parameterizations. The main optimization loop uses the 'nloptr' package to minimize the negative log-likelihood function. Additional functions are available for computing and comparing WTP from both preference space and WTP space models and for predicting expected choices and choice probabilities for sets of alternatives based on an estimated model. Mixed logit models can include uncorrelated or correlated heterogeneity covariances and are estimated using maximum simulated likelihood based on the algorithms in Train (2009) <doi:10.1017/CBO9780511805271>. More details can be found in Helveston (2023) <doi:10.18637/jss.v105.i10>.

Maintained by John Helveston. Last updated 5 months ago.

log-likelihood logit logit-model mixed-logit mlogit multinomial-regression mxl mxl-models preference-space preferences willingness-to-pay wtp

14.4 match 54 stars 9.10 score 119 scripts 1 dependents

benjaminhlina

nichetools:Complementary Package to 'nicheROVER' and 'SIBER'

Provides functions complementary to packages 'nicheROVER' and 'SIBER' allowing the user to extract Bayesian estimates from data objects created by the packages 'nicheROVER' and 'SIBER'. Please see the following publications for detailed methods on 'nicheROVER' and 'SIBER' Hansen et al. (2015) <doi:10.1890/14-0235.1>, Jackson et al. (2011) <doi:10.1111/j.1365-2656.2011.01806.x>, and Layman et al. (2007) <doi:10.1890/0012-9658(2007)88[42:CSIRPF]2.0.CO;2>, respectfully.

Maintained by Benjamin L. Hlina. Last updated 5 days ago.

jags cpp

20.5 match 2 stars 6.39 score 17 scripts

alarm-redist

redist:Simulation Methods for Legislative Redistricting

Enables researchers to sample redistricting plans from a pre-specified target distribution using Sequential Monte Carlo and Markov Chain Monte Carlo algorithms. The package allows for the implementation of various constraints in the redistricting process such as geographic compactness and population parity requirements. Tools for analysis such as computation of various summary statistics and plotting functionality are also included. The package implements the SMC algorithm of McCartan and Imai (2023) <doi:10.1214/23-AOAS1763>, the enumeration algorithm of Fifield, Imai, Kawahara, and Kenny (2020) <doi:10.1080/2330443X.2020.1791773>, the Flip MCMC algorithm of Fifield, Higgins, Imai and Tarr (2020) <doi:10.1080/10618600.2020.1739532>, the Merge-split/Recombination algorithms of Carter et al. (2019) <arXiv:1911.01503> and DeFord et al. (2021) <doi:10.1162/99608f92.eb30390f>, and the Short-burst optimization algorithm of Cannon et al. (2020) <arXiv:2011.02288>.

Maintained by Christopher T. Kenny. Last updated 2 months ago.

geospatial gerrymandering redistricting sampling openblas cpp openmp

14.2 match 68 stars 9.17 score 259 scripts

gitdemont

IFC:Tools for Imaging Flow Cytometry

Contains several tools to treat imaging flow cytometry data from 'ImageStream®' and 'FlowSight®' cytometers ('Amnis®' 'Cytek®'). Provides an easy and simple way to read and write .fcs, .rif, .cif and .daf files. Information such as masks, features, regions and populations set within these files can be retrieved for each single cell. In addition, raw data such as images stored can also be accessed. Users, may hopefully increase their productivity thanks to dedicated functions to extract, visualize, manipulate and export 'IFC' data. Toy data example can be installed through the 'IFCdata' package of approximately 32 MB, which is available in a 'drat' repository <https://gitdemont.github.io/IFCdata/>. See file 'COPYRIGHTS' and file 'AUTHORS' for a list of copyright holders and authors.

Maintained by Yohann Demont. Last updated 11 days ago.

cytometry cytometry-data flow flow-cytometry flow-cytometry-analysis flow-cytometry-data flow-cytometry-files ifc image imaging-flow-cytometry imaging-flow-cytometry-data microscopy cpp

24.1 match 4 stars 5.34 score 12 scripts

beckerbenj

eatGADS:Data Management of Large Hierarchical Data

Import 'SPSS' data, handle and change 'SPSS' meta data, store and access large hierarchical data in 'SQLite' data bases.

Maintained by Benjamin Becker. Last updated 25 days ago.

17.5 match 1 stars 7.36 score 34 scripts 1 dependents

paleolimbot

wk:Lightweight Well-Known Geometry Parsing

Provides a minimal R and C++ API for parsing well-known binary and well-known text representation of geometries to and from R-native formats. Well-known binary is compact and fast to parse; well-known text is human-readable and is useful for writing tests. These formats are useful in R only if the information they contain can be accessed in R, for which high-performance functions are provided here.

Maintained by Dewey Dunnington. Last updated 5 months ago.

cpp

10.0 match 47 stars 12.85 score 89 scripts 1.2k dependents

yihui

knitr:A General-Purpose Package for Dynamic Report Generation in R

Provides a general-purpose tool for dynamic report generation in R using Literate Programming techniques.

Maintained by Yihui Xie. Last updated 2 days ago.

dynamic-documents knitr literate-programming rmarkdown sweave

5.4 match 2.4k stars 23.61 score 116k scripts 4.2k dependents

robjhyndman

tsfeatures:Time Series Feature Extraction

Methods for extracting various features from time series data. The features provided are those from Hyndman, Wang and Laptev (2013) <doi:10.1109/ICDMW.2015.104>, Kang, Hyndman and Smith-Miles (2017) <doi:10.1016/j.ijforecast.2016.09.004> and from Fulcher, Little and Jones (2013) <doi:10.1098/rsif.2013.0048>. Features include spectral entropy, autocorrelations, measures of the strength of seasonality and trend, and so on. Users can also define their own feature functions.

Maintained by Rob Hyndman. Last updated 8 months ago.

feature-extraction time-series

11.1 match 254 stars 11.47 score 268 scripts 22 dependents

tidyverse

ggplot2:Create Elegant Data Visualisations Using the Grammar of Graphics

A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

Maintained by Thomas Lin Pedersen. Last updated 11 days ago.

data-visualisation visualisation

5.1 match 6.6k stars 25.10 score 645k scripts 7.5k dependents

wraff

wrMisc:Analyze Experimental High-Throughput (Omics) Data

The efficient treatment and convenient analysis of experimental high-throughput (omics) data gets facilitated through this collection of diverse functions. Several functions address advanced object-conversions, like manipulating lists of lists or lists of arrays, reorganizing lists to arrays or into separate vectors, merging of multiple entries, etc. Another set of functions provides speed-optimized calculation of standard deviation (sd), coefficient of variance (CV) or standard error of the mean (SEM) for data in matrixes or means per line with respect to additional grouping (eg n groups of replicates). A group of functions facilitate dealing with non-redundant information, by indexing unique, adding counters to redundant or eliminating lines with respect redundancy in a given reference-column, etc. Help is provided to identify very closely matching numeric values to generate (partial) distance matrixes for very big data in a memory efficient manner or to reduce the complexity of large data-sets by combining very close values. Other functions help aligning a matrix or data.frame to a reference using partial matching or to mine an experimental setup to extract patterns of replicate samples. Many times large experimental datasets need some additional filtering, adequate functions are provided. Convenient data normalization is supported in various different modes, parameter estimation via permutations or boot-strap as well as flexible testing of multiple pair-wise combinations using the framework of 'limma' is provided, too. Batch reading (or writing) of sets of files and combining data to arrays is supported, too.

Maintained by Wolfgang Raffelsberger. Last updated 7 months ago.

28.7 match 4.44 score 33 scripts 4 dependents

chockemeyer

kst:Knowledge Space Theory

Knowledge space theory by Doignon and Falmagne (1999) <doi:10.1007/978-3-642-58625-5> is a set- and order-theoretical framework, which proposes mathematical formalisms to operationalize knowledge structures in a particular domain. The 'kst' package provides basic functionalities to generate, handle, and manipulate knowledge structures and knowledge spaces.

Maintained by Cord Hockemeyer. Last updated 2 years ago.

37.5 match 6 stars 3.36 score 38 scripts

tidyverse

tidyr:Tidy Messy Data

Tools to help to create tidy data, where each column is a variable, each row is an observation, and each cell contains a single value. 'tidyr' contains tools for changing the shape (pivoting) and hierarchy (nesting and 'unnesting') of a dataset, turning deeply nested lists into rectangular data frames ('rectangling'), and extracting values out of string columns. It also includes tools for working with missing values (both implicit and explicit).

Maintained by Hadley Wickham. Last updated 15 days ago.

tidy-data cpp

5.5 match 1.4k stars 22.88 score 168k scripts 5.5k dependents

easystats

parameters:Processing of Model Parameters

Utilities for processing the parameters of various statistical models. Beyond computing p values, CIs, and other indices for a wide variety of models (see list of supported models using the function 'insight::supported_models()'), this package implements features like bootstrapping or simulating of parameters and models, feature reduction (feature extraction and variable selection) as well as functions to describe data and variable characteristics (e.g. skewness, kurtosis, smoothness or distribution).

Maintained by Daniel Lüdecke. Last updated 13 hours ago.

beta bootstrap ci confidence-intervals data-reduction easystats fa feature-extraction feature-reduction hacktoberfest parameters pca pvalues regression-models robust-statistics standardize standardized-estimates statistical-models

8.0 match 454 stars 15.65 score 1.8k scripts 56 dependents

alexanderrobitzsch

CDM:Cognitive Diagnosis Modeling

Functions for cognitive diagnosis modeling and multidimensional item response modeling for dichotomous and polytomous item responses. This package enables the estimation of the DINA and DINO model (Junker & Sijtsma, 2001, <doi:10.1177/01466210122032064>), the multiple group (polytomous) GDINA model (de la Torre, 2011, <doi:10.1007/s11336-011-9207-7>), the multiple choice DINA model (de la Torre, 2009, <doi:10.1177/0146621608320523>), the general diagnostic model (GDM; von Davier, 2008, <doi:10.1348/000711007X193957>), the structured latent class model (SLCA; Formann, 1992, <doi:10.1080/01621459.1992.10475229>) and regularized latent class analysis (Chen, Li, Liu, & Ying, 2017, <doi:10.1007/s11336-016-9545-6>). See George, Robitzsch, Kiefer, Gross, and Uenlue (2017) <doi:10.18637/jss.v074.i02> or Robitzsch and George (2019, <doi:10.1007/978-3-030-05584-4_26>) for further details on estimation and the package structure. For tutorials on how to use the CDM package see George and Robitzsch (2015, <doi:10.20982/tqmp.11.3.p189>) as well as Ravand and Robitzsch (2015).

Maintained by Alexander Robitzsch. Last updated 9 months ago.

cognitive-diagnostic-models item-response-theory cpp

14.2 match 22 stars 8.76 score 138 scripts 28 dependents

pecanproject

PEcAn.data.remote:PEcAn Functions Used for Extracting Remote Sensing Data

PEcAn module for processing remote data. Python module requirements: requests, json, re, ast, panads, sys. If any of these modules are missing, install using pip install <module name>.

Maintained by Bailey Morrison. Last updated 4 days ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants

14.0 match 216 stars 8.74 score 6 scripts 5 dependents

bioc

iSEE:Interactive SummarizedExperiment Explorer

Create an interactive Shiny-based graphical user interface for exploring data stored in SummarizedExperiment objects, including row- and column-level metadata. The interface supports transmission of selections between plots and tables, code tracking, interactive tours, interactive or programmatic initialization, preservation of app state, and extensibility to new panel types via S4 classes. Special attention is given to single-cell data in a SingleCellExperiment object with visualization of dimensionality reduction results.

Maintained by Kevin Rue-Albrecht. Last updated 12 days ago.

cellbasedassays clustering dimensionreduction featureextraction geneexpression gui immunooncology shinyapps singlecell transcription transcriptomics visualization dimension-reduction feature-extraction gene-expression hacktoberfest human-cell-atlas shiny single-cell

9.5 match 225 stars 12.86 score 380 scripts 9 dependents

adibender

pammtools:Piece-Wise Exponential Additive Mixed Modeling Tools for Survival Analysis

The Piece-wise exponential (Additive Mixed) Model (PAMM; Bender and others (2018) <doi: 10.1177/1471082X17748083>) is a powerful model class for the analysis of survival (or time-to-event) data, based on Generalized Additive (Mixed) Models (GA(M)Ms). It offers intuitive specification and robust estimation of complex survival models with stratified baseline hazards, random effects, time-varying effects, time-dependent covariates and cumulative effects (Bender and others (2019)), as well as support for left-truncated, competing risks and recurrent events data. pammtools provides tidy workflow for survival analysis with PAMMs, including data simulation, transformation and other functions for data preprocessing and model post-processing as well as visualization.

Maintained by Andreas Bender. Last updated 2 months ago.

additive-models pamm pammtools piece-wise-exponential survival-analysis

13.9 match 48 stars 8.78 score 310 scripts 8 dependents

mikejohnson51

climateR:climateR

Find, subset, and retrive geospatial data by AOI.

Maintained by Mike Johnson. Last updated 3 months ago.

aoi climate dataset geospatial gridded-climate-data weather

13.9 match 187 stars 8.74 score 156 scripts 1 dependents

patzaw

TKCat:Tailored Knowledge Catalog

Facilitate the management of data from knowledge resources that are frequently used alone or together in research environments. In 'TKCat', knowledge resources are manipulated as modeled database (MDB) objects. These objects provide access to the data tables along with a general description of the resource and a detail data model documenting the tables, their fields and their relationships. These MDBs are then gathered in catalogs that can be easily explored an shared. Finally, 'TKCat' provides tools to easily subset, filter and combine MDBs and create new catalogs suited for specific needs.

Maintained by Patrice Godard. Last updated 20 hours ago.

19.8 match 5 stars 6.13 score 27 scripts

amices

mice:Multivariate Imputation by Chained Equations

Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.

Maintained by Stef van Buuren. Last updated 8 days ago.

chained-equations fcs imputation mice missing-data missing-values multiple-imputation multivariate-data cpp

7.3 match 462 stars 16.50 score 10k scripts 154 dependents

hzambran

hydroTSM:Time Series Management and Analysis for Hydrological Modelling

S3 functions for management, analysis, interpolation and plotting of time series used in hydrology and related environmental sciences. In particular, this package is highly oriented to hydrological modelling tasks. The focus of this package has been put in providing a collection of tools useful for the daily work of hydrologists (although an effort was made to optimise each function as much as possible, functionality has had priority over speed). Bugs / comments / questions / collaboration of any kind are very welcomed, and in particular, datasets that can be included in this package for academic purposes.

Maintained by Mauricio Zambrano-Bigiarini. Last updated 1 months ago.

hydrology hydrology-modeling hydrology-statistical resource water-resources

11.9 match 45 stars 10.14 score 340 scripts 10 dependents

tidymodels

hardhat:Construct Modeling Packages

Building modeling packages is hard. A large amount of effort generally goes into providing an implementation for a new method that is efficient, fast, and correct, but often less emphasis is put on the user interface. A good interface requires specialized knowledge about S3 methods and formulas, which the average package developer might not have. The goal of 'hardhat' is to reduce the burden around building new modeling packages by providing functionality for preprocessing, predicting, and validating input.

Maintained by Hannah Frick. Last updated 2 months ago.

8.1 match 103 stars 14.88 score 175 scripts 436 dependents

khliland

baseline:Baseline Correction of Spectra

Collection of baseline correction algorithms, along with a framework and a Tcl/Tk enabled GUI for optimising baseline algorithm parameters. Typical use of the package is for removing background effects from spectra originating from various types of spectroscopy and spectrometry, possibly optimizing this with regard to regression or classification results. Correction methods include polynomial fitting, weighted local smoothers and many more.

Maintained by Kristian Hovde Liland. Last updated 10 months ago.

16.9 match 9 stars 7.08 score 74 scripts 12 dependents

stan-dev

posterior:Tools for Working with Posterior Distributions

Provides useful tools for both users and developers of packages for fitting Bayesian models or working with output from Bayesian models. The primary goals of the package are to: (a) Efficiently convert between many different useful formats of draws (samples) from posterior or prior distributions. (b) Provide consistent methods for operations commonly performed on draws, for example, subsetting, binding, or mutating draws. (c) Provide various summaries of draws in convenient formats. (d) Provide lightweight implementations of state of the art posterior inference diagnostics. References: Vehtari et al. (2021) <doi:10.1214/20-BA1221>.

Maintained by Paul-Christian Bürkner. Last updated 12 days ago.

bayes bayesian mcmc

7.4 match 168 stars 16.13 score 3.3k scripts 342 dependents

bioc

IsoformSwitchAnalyzeR:Identify, Annotate and Visualize Isoform Switches with Functional Consequences from both short- and long-read RNA-seq data

Analysis of alternative splicing and isoform switches with predicted functional consequences (e.g. gain/loss of protein domains etc.) from quantification of all types of RNASeq by tools such as Kallisto, Salmon, StringTie, Cufflinks/Cuffdiff etc.

Maintained by Kristoffer Vitting-Seerup. Last updated 5 months ago.

geneexpression transcription alternativesplicing differentialexpression differentialsplicing visualization statisticalmethod transcriptomevariant biomedicalinformatics functionalgenomics systemsbiology transcriptomics rnaseq annotation functionalprediction geneprediction dataimport multiplecomparison batcheffect immunooncology

12.8 match 108 stars 9.26 score 125 scripts

rspatial

raster:Geographic Data Analysis and Modeling

Reading, writing, manipulating, analyzing and modeling of spatial data. This package has been superseded by the "terra" package <https://CRAN.R-project.org/package=terra>.

Maintained by Robert J. Hijmans. Last updated 2 months ago.

cpp

7.0 match 164 stars 17.05 score 58k scripts 555 dependents

luukvdmeer

sfnetworks:Tidy Geospatial Networks

Provides a tidy approach to spatial network analysis, in the form of classes and functions that enable a seamless interaction between the network analysis package 'tidygraph' and the spatial analysis package 'sf'.

Maintained by Lucas van der Meer. Last updated 3 months ago.

geospatial-networks network-analysis rspatial simple-features spatial-analysis spatial-data-science spatial-networks tidygraph tidyverse

12.3 match 372 stars 9.63 score 332 scripts 6 dependents

pbastide

PhylogeneticEM:Automatic Shift Detection using a Phylogenetic EM

Implementation of the automatic shift detection method for Brownian Motion (BM) or Ornstein–Uhlenbeck (OU) models of trait evolution on phylogenies. Some tools to handle equivalent shifts configurations are also available. See Bastide et al. (2017) <doi:10.1111/rssb.12206> and Bastide et al. (2018) <doi:10.1093/sysbio/syy005>.

Maintained by Paul Bastide. Last updated 1 months ago.

openblas cpp

17.0 match 16 stars 6.96 score 47 scripts

epiverse-trace

epiparameter:Classes and Helper Functions for Working with Epidemiological Parameters

Classes and helper functions for loading, extracting, converting, manipulating, plotting and aggregating epidemiological parameters for infectious diseases. Epidemiological parameters extracted from the literature are loaded from the 'epiparameterDB' R package.

Maintained by Joshua W. Lambert. Last updated 2 months ago.

data-access data-package epidemiology epiverse probability-distribution

11.9 match 33 stars 9.84 score 102 scripts 1 dependents

eco-hydro

phenofit:Extract Remote Sensing Vegetation Phenology

The merits of 'TIMESAT' and 'phenopix' are adopted. Besides, a simple and growing season dividing method and a practical snow elimination method based on Whittaker were proposed. 7 curve fitting methods and 4 phenology extraction methods were provided. Parameters boundary are considered for every curve fitting methods according to their ecological meaning. And 'optimx' is used to select best optimization method for different curve fitting methods. Reference: Kong, D., (2020). R package: A state-of-the-art Vegetation Phenology extraction package, phenofit version 0.3.1, <doi:10.5281/zenodo.5150204>; Kong, D., Zhang, Y., Wang, D., Chen, J., & Gu, X. (2020). Photoperiod Explains the Asynchronization Between Vegetation Carbon Phenology and Vegetation Greenness Phenology. Journal of Geophysical Research: Biogeosciences, 125(8), e2020JG005636. <doi:10.1029/2020JG005636>; Kong, D., Zhang, Y., Gu, X., & Wang, D. (2019). A robust method for reconstructing global MODIS EVI time series on the Google Earth Engine. ISPRS Journal of Photogrammetry and Remote Sensing, 155, 13–24; Zhang, Q., Kong, D., Shi, P., Singh, V.P., Sun, P., 2018. Vegetation phenology on the Qinghai-Tibetan Plateau and its response to climate change (1982–2013). Agric. For. Meteorol. 248, 408–417. <doi:10.1016/j.agrformet.2017.10.026>.

Maintained by Dongdong Kong. Last updated 1 months ago.

phenology remote-sensing openblas cpp openmp

15.2 match 78 stars 7.71 score 332 scripts

alexanderrobitzsch

TAM:Test Analysis Modules

Includes marginal maximum likelihood estimation and joint maximum likelihood estimation for unidimensional and multidimensional item response models. The package functionality covers the Rasch model, 2PL model, 3PL model, generalized partial credit model, multi-faceted Rasch model, nominal item response model, structured latent class model, mixture distribution IRT models, and located latent class models. Latent regression models and plausible value imputation are also supported. For details see Adams, Wilson and Wang, 1997 <doi:10.1177/0146621697211001>, Adams, Wilson and Wu, 1997 <doi:10.3102/10769986022001047>, Formann, 1982 <doi:10.1002/bimj.4710240209>, Formann, 1992 <doi:10.1080/01621459.1992.10475229>.

Maintained by Alexander Robitzsch. Last updated 6 months ago.

item-response-theory openblas cpp

13.1 match 16 stars 8.93 score 258 scripts 25 dependents

branchlab

metasnf:Meta Clustering with Similarity Network Fusion

Framework to facilitate patient subtyping with similarity network fusion and meta clustering. The similarity network fusion (SNF) algorithm was introduced by Wang et al. (2014) in <doi:10.1038/nmeth.2810>. SNF is a data integration approach that can transform high-dimensional and diverse data types into a single similarity network suitable for clustering with minimal loss of information from each initial data source. The meta clustering approach was introduced by Caruana et al. (2006) in <doi:10.1109/ICDM.2006.103>. Meta clustering involves generating a wide range of cluster solutions by adjusting clustering hyperparameters, then clustering the solutions themselves into a manageable number of qualitatively similar solutions, and finally characterizing representative solutions to find ones that are best for the user's specific context. This package provides a framework to easily transform multi-modal data into a wide range of similarity network fusion-derived cluster solutions as well as to visualize, characterize, and validate those solutions. Core package functionality includes easy customization of distance metrics, clustering algorithms, and SNF hyperparameters to generate diverse clustering solutions; calculation and plotting of associations between features, between patients, and between cluster solutions; and standard cluster validation approaches including resampled measures of cluster stability, standard metrics of cluster quality, and label propagation to evaluate generalizability in unseen data. Associated vignettes guide the user through using the package to identify patient subtypes while adhering to best practices for unsupervised learning.

Maintained by Prashanth S Velayudhan. Last updated 7 days ago.

bioinformatics clustering metaclustering snf

14.2 match 8 stars 8.21 score 30 scripts

stan-dev

rstanarm:Bayesian Applied Regression Modeling via Stan

Estimates previously compiled regression models using the 'rstan' package, which provides the R interface to the Stan C++ library for Bayesian estimation. Users specify models via the customary R syntax with a formula and data.frame plus some additional arguments for priors.

Maintained by Ben Goodrich. Last updated 9 months ago.

bayesian bayesian-data-analysis bayesian-inference bayesian-methods bayesian-statistics multilevel-models rstan rstanarm stan statistical-modeling cpp

7.4 match 393 stars 15.68 score 5.0k scripts 13 dependents

bioc

PREDA:Position Related Data Analysis

Package for the position related analysis of quantitative functional genomics data.

Maintained by Francesco Ferrari. Last updated 5 months ago.

software copynumbervariation geneexpression genetics

26.9 match 4.30 score 9 scripts

europeanifcbgroup

iRfcb:Tools for Managing Imaging FlowCytobot (IFCB) Data

A comprehensive suite of tools for managing, processing, and analyzing data from the IFCB. I R FlowCytobot ('iRfcb') supports quality control, geospatial analysis, and preparation of IFCB data for publication in databases like <https://www.gbif.org>, <https://www.obis.org>, <https://emodnet.ec.europa.eu/en>, <https://shark.smhi.se/>, and <https://www.ecotaxa.org>. The package integrates with the MATLAB 'ifcb-analysis' tool, which is described in Sosik and Olson (2007) <doi:10.4319/lom.2007.5.204>, and provides features for working with raw, manually classified, and machine learning–classified image datasets. Key functionalities include image extraction, particle size distribution analysis, taxonomic data handling, and biomass concentration calculations, essential for plankton research.

Maintained by Anders Torstensson. Last updated 2 days ago.

20.2 match 1 stars 5.72 score

ycroissant

plm:Linear Models for Panel Data

A set of estimators for models and (robust) covariance matrices, and tests for panel data econometrics, including within/fixed effects, random effects, between, first-difference, nested random effects as well as instrumental-variable (IV) and Hausman-Taylor-style models, panel generalized method of moments (GMM) and general FGLS models, mean groups (MG), demeaned MG, and common correlated effects (CCEMG) and pooled (CCEP) estimators with common factors, variable coefficients and limited dependent variables models. Test functions include model specification, serial correlation, cross-sectional dependence, panel unit root and panel Granger (non-)causality. Typical references are general econometrics text books such as Baltagi (2021), Econometric Analysis of Panel Data (<doi:10.1007/978-3-030-53953-5>), Hsiao (2014), Analysis of Panel Data (<doi:10.1017/CBO9781139839327>), and Croissant and Millo (2018), Panel Data Econometrics with R (<doi:10.1002/9781119504641>).

Maintained by Kevin Tappe. Last updated 2 days ago.

9.5 match 59 stars 12.06 score 39 dependents

stan-dev

loo:Efficient Leave-One-Out Cross-Validation and WAIC for Bayesian Models

Efficient approximate leave-one-out cross-validation (LOO) for Bayesian models fit using Markov chain Monte Carlo, as described in Vehtari, Gelman, and Gabry (2017) <doi:10.1007/s11222-016-9696-4>. The approximation uses Pareto smoothed importance sampling (PSIS), a new procedure for regularizing importance weights. As a byproduct of the calculations, we also obtain approximate standard errors for estimated predictive errors and for the comparison of predictive errors between models. The package also provides methods for using stacking and other model weighting techniques to average Bayesian predictive distributions.

Maintained by Jonah Gabry. Last updated 5 days ago.

bayes bayesian bayesian-data-analysis bayesian-inference bayesian-methods bayesian-statistics cross-validation information-criterion model-comparison stan

6.6 match 152 stars 17.30 score 2.6k scripts 297 dependents

bioc

tidytof:Analyze High-dimensional Cytometry Data Using Tidy Data Principles

This package implements an interactive, scientific analysis pipeline for high-dimensional cytometry data built using tidy data principles. It is specifically designed to play well with both the tidyverse and Bioconductor software ecosystems, with functionality for reading/writing data files, data cleaning, preprocessing, clustering, visualization, modeling, and other quality-of-life functions. tidytof implements a "grammar" of high-dimensional cytometry data analysis.

Maintained by Timothy Keyes. Last updated 5 months ago.

singlecell flowcytometry bioinformatics cytometry data-science single-cell tidyverse cpp

15.6 match 18 stars 7.24 score 35 scripts

stevenvb12

patternize:Quantification of Color Pattern Variation

Quantification of variation in organismal color patterns as obtained from image data. Patternize defines homology between pattern positions across images either through fixed landmarks or image registration. Pattern identification is performed by categorizing the distribution of colors using RGB thresholds or image segmentation.

Maintained by Steven Van Belleghem. Last updated 9 months ago.

22.6 match 31 stars 4.97 score 30 scripts

open-aims

bayesnec:A Bayesian No-Effect- Concentration (NEC) Algorithm

Implementation of No-Effect-Concentration estimation that uses 'brms' (see Burkner (2017)<doi:10.18637/jss.v080.i01>; Burkner (2018)<doi:10.32614/RJ-2018-017>; Carpenter 'et al.' (2017)<doi:10.18637/jss.v076.i01> to fit concentration(dose)-response data using Bayesian methods for the purpose of estimating 'ECx' values, but more particularly 'NEC' (see Fox (2010)<doi:10.1016/j.ecoenv.2009.09.012>), 'NSEC' (see Fisher and Fox (2023)<doi:10.1002/etc.5610>), and 'N(S)EC (see Fisher et al. 2023<doi:10.1002/ieam.4809>). A full description of this package can be found in Fisher 'et al.' (2024)<doi:10.18637/jss.v110.i05>. This package expands and supersedes an original version implemented in 'R2jags' (see Su and Yajima (2020)<https://CRAN.R-project.org/package=R2jags>; Fisher et al. (2020)<doi:10.5281/ZENODO.3966864>).

Maintained by Rebecca Fisher. Last updated 7 months ago.

bayesian-inference concentration-response ecotoxicology no-effect-concentration non-linear-decay threshold-derivation toxicology

13.6 match 13 stars 8.15 score 360 scripts

ironholds

urltools:Vectorised Tools for URL Handling and Parsing

A toolkit for all URL-handling needs, including encoding and decoding, parsing, parameter extraction and modification. All functions are designed to be both fast and entirely vectorised. It is intended to be useful for people dealing with web-related datasets, such as server-side logs, although may be useful for other situations involving large sets of URLs.

Maintained by Os Keyes. Last updated 4 years ago.

access-logs data-import url cpp

8.2 match 131 stars 13.43 score 968 scripts 264 dependents

bioc

dreamlet:Scalable differential expression analysis of single cell transcriptomics datasets with complex study designs

Recent advances in single cell/nucleus transcriptomic technology has enabled collection of cohort-scale datasets to study cell type specific gene expression differences associated disease state, stimulus, and genetic regulation. The scale of these data, complex study designs, and low read count per cell mean that characterizing cell type specific molecular mechanisms requires a user-frieldly, purpose-build analytical framework. We have developed the dreamlet package that applies a pseudobulk approach and fits a regression model for each gene and cell cluster to test differential expression across individuals associated with a trait of interest. Use of precision-weighted linear mixed models enables accounting for repeated measures study designs, high dimensional batch effects, and varying sequencing depth or observed cells per biosample.

Maintained by Gabriel Hoffman. Last updated 5 months ago.

rnaseq geneexpression differentialexpression batcheffect qualitycontrol regression genesetenrichment generegulation epigenetics functionalgenomics transcriptomics normalization singlecell preprocessing sequencing immunooncology software cpp

13.4 match 12 stars 7.99 score 128 scripts

bbolker

broom.mixed:Tidying Methods for Mixed Models

Convert fitted objects from various R mixed-model packages into tidy data frames along the lines of the 'broom' package. The package provides three S3 generics for each model: tidy(), which summarizes a model's statistical findings such as coefficients of a regression; augment(), which adds columns to the original data such as predictions, residuals and cluster assignments; and glance(), which provides a one-row summary of model-level statistics.

Maintained by Ben Bolker. Last updated 3 months ago.

7.0 match 231 stars 15.22 score 4.0k scripts 37 dependents

ropensci

RNeXML:Semantically Rich I/O for the 'NeXML' Format

Provides access to phyloinformatic data in 'NeXML' format. The package should add new functionality to R such as the possibility to manipulate 'NeXML' objects in more various and refined way and compatibility with 'ape' objects.

Maintained by Carl Boettiger. Last updated 11 months ago.

metadata nexml phylogenetics linked-data

10.6 match 13 stars 9.92 score 100 scripts 19 dependents

phuse-org

sendigR:Enable Cross-Study Analysis of 'CDISC' 'SEND' Datasets

A system enables cross study Analysis by extracting and filtering study data for control animals from 'CDISC' 'SEND' Study Repository. These data types are supported: Body Weights, Laboratory test results and Microscopic findings. These database types are supported: 'SQLite' and 'Oracle'.

Maintained by Wenxian Wang. Last updated 12 days ago.

16.7 match 12 stars 6.28 score 6 scripts

atorus-research

Tplyr:A Traceability Focused Grammar of Clinical Data Summary

A traceability focused tool created to simplify the data manipulation necessary to create clinical summaries.

Maintained by Mike Stackhouse. Last updated 1 years ago.

pharma tables

11.0 match 95 stars 9.49 score 138 scripts 2 dependents

bstatcomp

bayes4psy:User Friendly Bayesian Data Analysis for Psychology

Contains several Bayesian models for data analysis of psychological tests. A user friendly interface for these models should enable students and researchers to perform professional level Bayesian data analysis without advanced knowledge in programming and Bayesian statistics. This package is based on the Stan platform (Carpenter et el. 2017 <doi:10.18637/jss.v076.i01>).

Maintained by Jure Demšar. Last updated 1 years ago.

cpp

16.0 match 14 stars 6.44 score 33 scripts

ropensci

jstor:Read Data from JSTOR/DfR

Functions and helpers to import metadata, ngrams and full-texts delivered by Data for Research by JSTOR.

Maintained by Thomas Klebel. Last updated 8 months ago.

jstor peer-reviewed text-analysis text-mining

14.1 match 47 stars 7.29 score 55 scripts

ropensci

osmextract:Download and Import Open Street Map Data Extracts

Match, download, convert and import Open Street Map data extracts obtained from several providers.

Maintained by Andrea Gilardi. Last updated 2 months ago.

geo geofabrik-zone open-data osm osm-pbf

10.4 match 173 stars 9.73 score 342 scripts

mindthegap-erc

admtools:Estimate and Manipulate Age-Depth Models

Estimate age-depth models from stratigraphic and sedimentological data, and transform data between the time and stratigraphic domain.

Maintained by Niklas Hohmann. Last updated 3 months ago.

age-depth-model geochronology sedimentology stratigraphy

14.4 match 4 stars 7.01 score 34 scripts 1 dependents

ropensci

rtika:R Interface to 'Apache Tika'

Extract text or metadata from over a thousand file types, using Apache Tika <https://tika.apache.org/>. Get either plain text or structured XHTML content.

Maintained by Sasha Goodman. Last updated 2 years ago.

extract-metadata extract-text java parse pdf-files peer-reviewed tesseract tika

16.9 match 55 stars 6.00 score 12 scripts

ropensci

rsat:Dealing with Multiplatform Satellite Images

Downloading, customizing, and processing time series of satellite images for a region of interest. 'rsat' functions allow a unified access to multispectral images from Landsat, MODIS and Sentinel repositories. 'rsat' also offers capabilities for customizing satellite images, such as tile mosaicking, image cropping and new variables computation. Finally, 'rsat' covers the processing, including cloud masking, compositing and gap-filling/smoothing time series of images (Militino et al., 2018 <doi:10.3390/rs10030398> and Militino et al., 2019 <doi:10.1109/TGRS.2019.2904193>).

Maintained by Unai Pérez - Goya. Last updated 11 months ago.

satellite-images

13.6 match 54 stars 7.45 score 52 scripts

davidearn

epigrowthfit:Nonlinear Mixed Effects Models of Epidemic Growth

Maximum likelihood estimation of nonlinear mixed effects models of epidemic growth using Template Model Builder ('TMB'). Enables joint estimation for collections of disease incidence time series, including time series that describe multiple epidemic waves. Supports a set of widely used phenomenological models: exponential, logistic, Richards (generalized logistic), subexponential, and Gompertz. Provides methods for interrogating model objects and several auxiliary functions, including one for computing basic reproduction numbers from fitted values of the initial exponential growth rate. Preliminary versions of this software were applied in Ma et al. (2014) <doi:10.1007/s11538-013-9918-2> and in Earn et al. (2020) <doi:10.1073/pnas.2004904117>.

Maintained by Mikael Jagan. Last updated 26 days ago.

cpp openmp

17.5 match 9 stars 5.75 score 45 scripts

denisrustand

INLAjoint:Multivariate Joint Modeling for Longitudinal and Time-to-Event Outcomes with 'INLA'

Estimation of joint models for multivariate longitudinal markers (with various distributions available) and survival outcomes (possibly accounting for competing risks) with Integrated Nested Laplace Approximations (INLA). The flexible and user friendly function joint() facilitates the use of the fast and reliable inference technique implemented in the 'INLA' package for joint modeling. More details are given in the help page of the joint() function (accessible via ?joint in the R console) and the vignette associated to the joint() function (accessible via vignette("INLAjoint") in the R console).

Maintained by Denis Rustand. Last updated 26 days ago.

13.7 match 19 stars 7.36 score 40 scripts

c1au6i0

extractox:Extract Tox Info from Various Databases

Extract toxicological and chemical information from databases maintained by scientific agencies and resources, including the Comparative Toxicogenomics Database <https://ctdbase.org/>, the Integrated Chemical Environment <https://ice.ntp.niehs.nih.gov/>, the Integrated Risk Information System <https://cfpub.epa.gov/ncea/iris/>, Provisional Peer-Reviewed Toxicity Values <https://www.epa.gov/pprtv/provisional-peer-reviewed-toxicity-values-pprtvs-assessments>, the CompTox Chemicals Dashboard Resource Hub <https://www.epa.gov/comptox-tools/comptox-chemicals-dashboard-resource-hub>, PubChem <https://pubchem.ncbi.nlm.nih.gov/>, and others.

Maintained by Claudio Zanettini. Last updated 1 months ago.

21.9 match 3 stars 4.59 score 3 scripts

quanteda

quanteda:Quantitative Analysis of Textual Data

A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.

Maintained by Kenneth Benoit. Last updated 2 months ago.

corpus natural-language-processing quanteda text-analytics onetbb cpp

6.0 match 851 stars 16.68 score 5.4k scripts 51 dependents

bioc

musicatk:Mutational Signature Comprehensive Analysis Toolkit

Mutational signatures are carcinogenic exposures or aberrant cellular processes that can cause alterations to the genome. We created musicatk (MUtational SIgnature Comprehensive Analysis ToolKit) to address shortcomings in versatility and ease of use in other pre-existing computational tools. Although many different types of mutational data have been generated, current software packages do not have a flexible framework to allow users to mix and match different types of mutations in the mutational signature inference process. Musicatk enables users to count and combine multiple mutation types, including SBS, DBS, and indels. Musicatk calculates replication strand, transcription strand and combinations of these features along with discovery from unique and proprietary genomic feature associated with any mutation type. Musicatk also implements several methods for discovery of new signatures as well as methods to infer exposure given an existing set of signatures. Musicatk provides functions for visualization and downstream exploratory analysis including the ability to compare signatures between cohorts and find matching signatures in COSMIC V2 or COSMIC V3.

Maintained by Joshua D. Campbell. Last updated 5 months ago.

software biologicalquestion somaticmutation variantannotation

14.3 match 13 stars 7.02 score 20 scripts

choonghyunryu

dlookr:Tools for Data Diagnosis, Exploration, Transformation

A collection of tools that support data diagnosis, exploration, and transformation. Data diagnostics provides information and visualization of missing values, outliers, and unique and negative values to help you understand the distribution and quality of your data. Data exploration provides information and visualization of the descriptive statistics of univariate variables, normality tests and outliers, correlation of two variables, and the relationship between the target variable and predictor. Data transformation supports binning for categorizing continuous variables, imputes missing values and outliers, and resolves skewness. And it creates automated reports that support these three tasks.

Maintained by Choonghyun Ryu. Last updated 9 months ago.

9.1 match 212 stars 11.05 score 748 scripts 2 dependents

insightsengineering

rbmi:Reference Based Multiple Imputation

Implements standard and reference based multiple imputation methods for continuous longitudinal endpoints (Gower-Page et al. (2022) <doi:10.21105/joss.04251>). In particular, this package supports deterministic conditional mean imputation and jackknifing as described in Wolbers et al. (2022) <doi:10.1002/pst.2234>, Bayesian multiple imputation as described in Carpenter et al. (2013) <doi:10.1080/10543406.2013.834911>, and bootstrapped maximum likelihood imputation as described in von Hippel and Bartlett (2021) <doi: 10.1214/20-STS793>.

Maintained by Isaac Gravestock. Last updated 25 days ago.

11.3 match 18 stars 8.78 score 33 scripts 1 dependents

ropenspain

opendataes:Interact with the datos.gob.es API to download public data from all of Spain

Easily interact with the API from http://datos.gob.es to download data over 19,000 files from all different provinces of Spain.

Maintained by Jorge Cimentada. Last updated 4 years ago.

27.6 match 20 stars 3.60 score 9 scripts

bioc

limma:Linear Models for Microarray and Omics Data

Data analysis, linear models and differential expression for omics data.

Maintained by Gordon Smyth. Last updated 7 days ago.

exonarray geneexpression transcription alternativesplicing differentialexpression differentialsplicing genesetenrichment dataimport bayesian clustering regression timecourse microarray micrornaarray mrnamicroarray onechannel proprietaryplatforms twochannel sequencing rnaseq batcheffect multiplecomparison normalization preprocessing qualitycontrol biomedicalinformatics cellbiology cheminformatics epigenetics functionalgenomics genetics immunooncology metabolomics proteomics systemsbiology transcriptomics

7.2 match 13.81 score 16k scripts 585 dependents

rpahl

container:Extending Base 'R' Lists

Extends the functionality of base 'R' lists and provides specialized data structures 'deque', 'set', 'dict', and 'dict.table', the latter to extend the 'data.table' package.

Maintained by Roman Pahl. Last updated 2 months ago.

container data-structures deque dict sets

13.9 match 16 stars 7.13 score 140 scripts

helske

KFAS:Kalman Filter and Smoother for Exponential Family State Space Models

State space modelling is an efficient and flexible framework for statistical inference of a broad class of time series and other data. KFAS includes computationally efficient functions for Kalman filtering, smoothing, forecasting, and simulation of multivariate exponential family state space models, with observations from Gaussian, Poisson, binomial, negative binomial, and gamma distributions. See the paper by Helske (2017) <doi:10.18637/jss.v078.i10> for details.

Maintained by Jouni Helske. Last updated 6 months ago.

dynamic-linear-model exponential-family fortran gaussian-models state-space time-series openblas

9.0 match 64 stars 10.97 score 242 scripts 16 dependents

ropensci

dynamite:Bayesian Modeling and Causal Inference for Multivariate Longitudinal Data

Easy-to-use and efficient interface for Bayesian inference of complex panel (time series) data using dynamic multivariate panel models by Helske and Tikka (2024) <doi:10.1016/j.alcr.2024.100617>. The package supports joint modeling of multiple measurements per individual, time-varying and time-invariant effects, and a wide range of discrete and continuous distributions. Estimation of these dynamic multivariate panel models is carried out via 'Stan'. For an in-depth tutorial of the package, see (Tikka and Helske, 2024) <doi:10.48550/arXiv.2302.01607>.

Maintained by Santtu Tikka. Last updated 21 days ago.

bayesian-inference panel-data stan statistical-models

12.4 match 29 stars 7.92 score 20 scripts

khamidieh

RND:Risk Neutral Density Extraction Package

Extract the implied risk neutral density from options using various methods.

Maintained by Kam Hamidieh. Last updated 8 years ago.

35.1 match 1 stars 2.80 score 70 scripts

rstudio

gt:Easily Create Presentation-Ready Display Tables

Build display tables from tabular data with an easy-to-use set of functions. With its progressive approach, we can construct display tables with a cohesive set of table parts. Table values can be formatted using any of the included formatting functions. Footnotes and cell styles can be precisely added through a location targeting system. The way in which 'gt' handles things for you means that you don't often have to worry about the fine details.

Maintained by Richard Iannone. Last updated 13 days ago.

docx easy-to-use html latex rtf summary-tables

5.3 match 2.1k stars 18.36 score 20k scripts 112 dependents

ohdsi

DatabaseConnector:Connecting to Various Database Platforms

An R 'DataBase Interface' ('DBI') compatible interface to various database platforms ('PostgreSQL', 'Oracle', 'Microsoft SQL Server', 'Amazon Redshift', 'Microsoft Parallel Database Warehouse', 'IBM Netezza', 'Apache Impala', 'Google BigQuery', 'Snowflake', 'Spark', 'SQLite', and 'InterSystems IRIS'). Also includes support for fetching data as 'Andromeda' objects. Uses either 'Java Database Connectivity' ('JDBC') or other 'DBI' drivers to connect to databases.

Maintained by Martijn Schuemie. Last updated 2 months ago.

hades openjdk

7.8 match 56 stars 12.63 score 772 scripts 11 dependents

easystats

insight:Easy Access to Model Information for Various Model Objects

A tool to provide an easy, intuitive and consistent access to information contained in various R models, like model formulas, model terms, information about random effects, data that was used to fit the model or data from response variables. 'insight' mainly revolves around two types of functions: Functions that find (the names of) information, starting with 'find_', and functions that get the underlying data, starting with 'get_'. The package has a consistent syntax and works with many different model objects, where otherwise functions to access these information are missing.

Maintained by Daniel Lüdecke. Last updated 13 hours ago.

easystats hacktoberfest insight models names predictors random

5.6 match 412 stars 17.24 score 568 scripts 211 dependents

chop-cgtinformatics

REDCapTidieR:Extract 'REDCap' Databases into Tidy 'Tibble's

Convert 'REDCap' exports into tidy tables for easy handling of 'REDCap' repeat instruments and event arms.

Maintained by Richard Hanna. Last updated 1 months ago.

redcap redcap-api tidy-data

12.0 match 35 stars 8.08 score 36 scripts

traitecoevo

austraits:Helpful functions to access the AusTraits database and wrangle data from other traits.build databases

`austraits` allow users to **access, explore and wrangle data** from traits.build relational databases. It is also an R interface to AusTraits, the Australian plant trait database. This package contains functions for joining data from various tables, filtering to specific records, combining multiple databases and visualising the distribution of the data. We expect this package will assist users in working with `traits.build` databases.

Maintained by Fonti Kar. Last updated 2 months ago.

australia database plants traits

16.3 match 22 stars 5.93 score 43 scripts 1 dependents

usaid-oha-si

grabr:OHA/SI APIs Package

Provides a series of base functions useful to the GH OHA SI team. These function extend the utility functions in glamr, focusing primarily on API utility functions.

Maintained by Aaron Chafetz. Last updated 6 months ago.

18.7 match 1 stars 5.14 score 69 scripts

briencj

growthPheno:Functional Analysis of Phenotypic Growth Data to Smooth and Extract Traits

Assists in the plotting and functional smoothing of traits measured over time and the extraction of features from these traits, implementing the SET (Smoothing and Extraction of Traits) method described in Brien et al. (2020) Plant Methods, 16. Smoothing of growth trends for individual plants using natural cubic smoothing splines or P-splines is available for removing transient effects and segmented smoothing is available to deal with discontinuities in growth trends. There are graphical tools for assessing the adequacy of trait smoothing, both when using this and other packages, such as those that fit nonlinear growth models. A range of per-unit (plant, pot, plot) growth traits or features can be extracted from the data, including single time points, interval growth rates and other growth statistics, such as maximum growth or days to maximum growth. The package also has tools adapted to inputting data from high-throughput phenotyping facilities, such from a Lemna-Tec Scananalyzer 3D (see <https://www.youtube.com/watch?v=MRAF_mAEa7E/> for more information). The package 'growthPheno' can also be installed from <http://chris.brien.name/rpackages/>.

Maintained by Chris Brien. Last updated 2 days ago.

14.4 match 6 stars 6.66 score 42 scripts

wenchao-ma

GDINA:The Generalized DINA Model Framework

A set of psychometric tools for cognitive diagnosis modeling based on the generalized deterministic inputs, noisy and gate (G-DINA) model by de la Torre (2011) <DOI:10.1007/s11336-011-9207-7> and its extensions, including the sequential G-DINA model by Ma and de la Torre (2016) <DOI:10.1111/bmsp.12070> for polytomous responses, and the polytomous G-DINA model by Chen and de la Torre <DOI:10.1177/0146621613479818> for polytomous attributes. Joint attribute distribution can be independent, saturated, higher-order, loglinear smoothed or structured. Q-matrix validation, item and model fit statistics, model comparison at test and item level and differential item functioning can also be conducted. A graphical user interface is also provided. For tutorials, please check Ma and de la Torre (2020) <DOI:10.18637/jss.v093.i14>, Ma and de la Torre (2019) <DOI:10.1111/emip.12262>, Ma (2019) <DOI:10.1007/978-3-030-05584-4_29> and de la Torre and Akbay (2019).

Maintained by Wenchao Ma. Last updated 1 months ago.

cdm cognitive-diagnosis dcm dina-model dino estimation-models gdina item-response-theory psychometrics openblas cpp

10.7 match 30 stars 8.92 score 94 scripts 6 dependents

opengeos

whitebox:'WhiteboxTools' R Frontend

An R frontend for the 'WhiteboxTools' library, which is an advanced geospatial data analysis platform developed by Prof. John Lindsay at the University of Guelph's Geomorphometry and Hydrogeomatics Research Group. 'WhiteboxTools' can be used to perform common geographical information systems (GIS) analysis operations, such as cost-distance analysis, distance buffering, and raster reclassification. Remote sensing and image processing tasks include image enhancement (e.g. panchromatic sharpening, contrast adjustments), image mosaicing, numerous filtering operations, simple classification (k-means), and common image transformations. 'WhiteboxTools' also contains advanced tooling for spatial hydrological analysis (e.g. flow-accumulation, watershed delineation, stream network analysis, sink removal), terrain analysis (e.g. common terrain indices such as slope, curvatures, wetness index, hillshading; hypsometric analysis; multi-scale topographic position analysis), and LiDAR data processing. Suggested citation: Lindsay (2016) <doi:10.1016/j.cageo.2016.07.003>.

Maintained by Andrew Brown. Last updated 5 months ago.

geomorphometry geoprocessing geospatial gis hydrology remote-sensing rstudio

9.9 match 173 stars 9.65 score 203 scripts 2 dependents

ips-lmu

emuR:Main Package of the EMU Speech Database Management System

Provide the EMU Speech Database Management System (EMU-SDMS) with database management, data extraction, data preparation and data visualization facilities. See <https://ips-lmu.github.io/The-EMU-SDMS-Manual/> for more details.

Maintained by Markus Jochim. Last updated 1 years ago.

13.8 match 24 stars 6.89 score 135 scripts 1 dependents

tidymodels

parsnip:A Common API to Modeling and Analysis Functions

A common interface is provided to allow users to specify a model without having to remember the different argument names across different functions or computational engines (e.g. 'R', 'Spark', 'Stan', 'H2O', etc).

Maintained by Max Kuhn. Last updated 6 days ago.

5.8 match 612 stars 16.37 score 3.4k scripts 69 dependents

alexander-pastukhov

eyelinkReader:Import Gaze Data for EyeLink Eye Tracker

Import gaze data from edf files generated by the SR Research <https://www.sr-research.com/> EyeLink eye tracker. Gaze data, both recorded events and samples, is imported per trial. The package allows to extract events of interest, such as saccades, blinks, etc. as well as recorded variables and custom events (areas of interest, triggers) into separate tables. The package requires EDF API library that can be obtained at <https://www.sr-research.com/support/>.

Maintained by Alexander Pastukhov. Last updated 3 months ago.

edf eye-tracking eyelink sr-research cpp

14.5 match 13 stars 6.52 score 34 scripts

momx

Momocs:Morphometrics using R

The goal of 'Momocs' is to provide a complete, convenient, reproducible and open-source toolkit for 2D morphometrics. It includes most common 2D morphometrics approaches on outlines, open outlines, configurations of landmarks, traditional morphometrics, and facilities for data preparation, manipulation and visualization with a consistent grammar throughout. It allows reproducible, complex morphometrics analyses and other morphometrics approaches should be easy to plug in, or develop from, on top of this canvas.

Maintained by Vincent Bonhomme. Last updated 1 years ago.

morphometrics

12.8 match 51 stars 7.42 score 346 scripts