Showing 200 of total 997 results (show query)
bioc
BiocParallel:Bioconductor facilities for parallel evaluation
This package provides modified versions and novel implementation of functions for parallel evaluation, tailored to use with Bioconductor objects.
Maintained by Martin Morgan. Last updated 24 days ago.
infrastructurebioconductor-packagecore-packageu24ca289073cpp
29.7 match 67 stars 17.40 score 7.3k scripts 1.1k dependentskylebaron
mrgsim.parallel:Simulate with 'mrgsolve' in Parallel
Simulation from an 'mrgsolve' <https://cran.r-project.org/package=mrgsolve> model using a parallel backend. Input data sets are split (chunked) and simulated in parallel using mclapply() or future_lapply() <https://cran.r-project.org/package=future.apply>.
Maintained by Kyle Baron. Last updated 3 months ago.
57.4 match 5 stars 5.11 score 17 scriptsrevolutionanalytics
foreach:Provides Foreach Looping Construct
Support for the foreach looping construct. Foreach is an idiom that allows for iterating over elements in a collection, without the use of an explicit loop counter. This package in particular is intended to be used for its return value, rather than for its side effects. In that sense, it is similar to the standard lapply function, but doesn't require the evaluation of a function. Using foreach without side effects also facilitates executing the loop in parallel.
Maintained by Folashade Daniel. Last updated 3 years ago.
12.3 match 54 stars 17.16 score 43k scripts 2.8k dependentsshikokuchuo
mirai:Minimalist Async Evaluation Framework for R
Designed for simplicity, a 'mirai' evaluates an R expression asynchronously in a parallel process, locally or distributed over the network. The result is automatically available upon completion. Modern networking and concurrency, built on 'nanonext' and 'NNG' (Nanomsg Next Gen), ensures reliable and efficient scheduling over fast inter-process communications or TCP/IP secured by TLS. Distributed computing can launch remote resources via SSH or cluster managers. An inherently queued architecture handles many more tasks than available processes, and requires no storage on the file system. Innovative features include support for otherwise non-exportable reference objects, event-driven promises, and asynchronous parallel map.
Maintained by Charlie Gao. Last updated 20 hours ago.
asyncasynchronous-tasksconcurrencydistributed-computinghigh-performance-computingparallel-computing
14.4 match 217 stars 11.94 score 130 scripts 7 dependentssfcheung
manymome:Mediation, Moderation and Moderated-Mediation After Model Fitting
Computes indirect effects, conditional effects, and conditional indirect effects in a structural equation model or path model after model fitting, with no need to define any user parameters or label any paths in the model syntax, using the approach presented in Cheung and Cheung (2024) <doi:10.3758/s13428-023-02224-z>. Can also form bootstrap confidence intervals by doing bootstrapping only once and reusing the bootstrap estimates in all subsequent computations. Supports bootstrap confidence intervals for standardized (partially or completely) indirect effects, conditional effects, and conditional indirect effects as described in Cheung (2009) <doi:10.3758/BRM.41.2.425> and Cheung, Cheung, Lau, Hui, and Vong (2022) <doi:10.1037/hea0001188>. Model fitting can be done by structural equation modeling using lavaan() or regression using lm().
Maintained by Shu Fai Cheung. Last updated 21 days ago.
bootstrappingconfidence-intervallavaanmanymomemediationmoderated-mediationmoderationregressionsemstandardized-effect-sizestructural-equation-modeling
21.0 match 1 stars 8.06 score 172 scripts 4 dependentsdeepayan
lattice:Trellis Graphics for R
A powerful and elegant high-level data visualization system inspired by Trellis graphics, with an emphasis on multivariate data. Lattice is sufficient for typical graphics needs, and is also flexible enough to handle most nonstandard requirements. See ?Lattice for an introduction.
Maintained by Deepayan Sarkar. Last updated 11 months ago.
9.4 match 68 stars 17.33 score 27k scripts 13k dependentsr-lib
testthat:Unit Testing for R
Software testing is important, but, in part because it is frustrating and boring, many of us avoid it. 'testthat' is a testing framework for R that is easy to learn and use, and integrates with your existing 'workflow'.
Maintained by Hadley Wickham. Last updated 15 days ago.
7.7 match 900 stars 20.97 score 74k scripts 465 dependentsdatacloning
dclone:Data Cloning and MCMC Tools for Maximum Likelihood Methods
Low level functions for implementing maximum likelihood estimating procedures for complex models using data cloning and Bayesian Markov chain Monte Carlo methods as described in Solymos 2010 <doi:10.32614/RJ-2010-011>. Sequential and parallel MCMC support for 'JAGS', 'WinBUGS', 'OpenBUGS', and 'Stan'.
Maintained by Peter Solymos. Last updated 6 months ago.
22.9 match 7 stars 6.91 score 215 scripts 4 dependentscoatless-rpkg
sitmo:Parallel Pseudo Random Number Generator (PPRNG) 'sitmo' Header Files
Provided within are two high quality and fast PPRNGs that may be used in an 'OpenMP' parallel environment. In addition, there is a generator for one dimensional low-discrepancy sequence. The objective of this library to consolidate the distribution of the 'sitmo' (C++98 & C++11), 'threefry' and 'vandercorput' (C++11-only) engines on CRAN by enabling others to link to the header files inside of 'sitmo' instead of including a copy of each engine within their individual package. Lastly, the package contains example implementations using the 'sitmo' package and three accompanying vignette that provide additional information.
Maintained by James Balamuta. Last updated 1 years ago.
parallelrandom-generationrcppcppopenmp
15.9 match 7 stars 9.75 score 15 scripts 201 dependentssmartdata-analysis-and-statistics
SimTOST:Sample Size Estimation for Bio-Equivalence Trials Through Simulation
Sample size estimation for bio-equivalence trials is supported through a simulation-based approach that extends the Two One-Sided Tests (TOST) procedure. The methodology provides flexibility in hypothesis testing, accommodates multiple treatment comparisons, and accounts for correlated endpoints. Users can model complex trial scenarios, including parallel and crossover designs, intra-subject variability, and different equivalence margins. Monte Carlo simulations enable accurate estimation of power and type I error rates, ensuring well-calibrated study designs. The statistical framework builds on established methods for equivalence testing and multiple hypothesis testing in bio-equivalence studies, as described in Schuirmann (1987) <doi:10.1007/BF01068419>, Mielke et al. (2018) <doi:10.1080/19466315.2017.1371071>, Shieh (2022) <doi:10.1371/journal.pone.0269128>, and Sozu et al. (2015) <doi:10.1007/978-3-319-22005-5>. Comprehensive documentation and vignettes guide users through implementation and interpretation of results.
Maintained by Thomas Debray. Last updated 25 days ago.
mcmcmulti-armmultiple-comparisonssample-size-calculationsample-size-estimationtrial-simulationopenblascpp
22.9 match 2 stars 6.47 score 7 scriptsjwood000
RcppAlgos:High Performance Tools for Combinatorics and Computational Mathematics
Provides optimized functions and flexible iterators implemented in C++ for solving problems in combinatorics and computational mathematics. Handles various combinatorial objects including combinations, permutations, integer partitions and compositions, Cartesian products, unordered Cartesian products, and partition of groups. Utilizes the RMatrix class from 'RcppParallel' for thread safety. The combination and permutation functions contain constraint parameters that allow for generation of all results of a vector meeting specific criteria (e.g. finding all combinations such that the sum is between two bounds). Capable of ranking/unranking combinatorial objects efficiently (e.g. retrieve only the nth lexicographical result) which sets up nicely for parallelization as well as random sampling. Gmp support permits exploration where the total number of results is large (e.g. comboSample(10000, 500, n = 4)). Additionally, there are several high performance number theoretic functions that are useful for problems common in computational mathematics. Some of these functions make use of the fast integer division library 'libdivide'. The primeSieve function is based on the segmented sieve of Eratosthenes implementation by Kim Walisch. It is also efficient for large numbers by using the cache friendly improvements originally developed by Tomás Oliveira. Finally, there is a prime counting function that implements Legendre's formula based on the work of Kim Walisch.
Maintained by Joseph Wood. Last updated 1 months ago.
combinationscombinatoricsfactorizationnumber-theoryparallelpermutationprime-factorizationsprimesievegmpcpp
14.3 match 45 stars 10.04 score 153 scripts 12 dependentsmihaiconstantin
parabar:Progress Bar for Parallel Tasks
A simple interface in the form of R6 classes for executing tasks in parallel, tracking their progress, and displaying accurate progress bars.
Maintained by Mihai Constantin. Last updated 3 months ago.
parallel-computingprogress-bar
18.7 match 19 stars 7.53 score 20 scripts 5 dependentsmllg
batchtools:Tools for Computation on Batch Systems
As a successor of the packages 'BatchJobs' and 'BatchExperiments', this package provides a parallel implementation of the Map function for high performance computing systems managed by schedulers 'IBM Spectrum LSF' (<https://www.ibm.com/products/hpc-workload-management>), 'OpenLava' (<https://www.openlava.org/>), 'Univa Grid Engine'/'Oracle Grid Engine' (<https://www.univa.com/>), 'Slurm' (<https://slurm.schedmd.com/>), 'TORQUE/PBS' (<https://adaptivecomputing.com/cherry-services/torque-resource-manager/>), or 'Docker Swarm' (<https://docs.docker.com/engine/swarm/>). A multicore and socket mode allow the parallelization on a local machines, and multiple machines can be hooked up via SSH to create a makeshift cluster. Moreover, the package provides an abstraction mechanism to define large-scale computer experiments in a well-organized and reproducible way.
Maintained by Michel Lang. Last updated 2 years ago.
batchexperimentsbatchjobsdocker-swarmhigh-performance-computinghpchpc-clusterslsfopenlavaparallel-computingreproducibilitysgeslurmtorque
12.0 match 175 stars 11.39 score 772 scripts 14 dependentsbioc
metapod:Meta-Analyses on P-Values of Differential Analyses
Implements a variety of methods for combining p-values in differential analyses of genome-scale datasets. Functions can combine p-values across different tests in the same analysis (e.g., genomic windows in ChIP-seq, exons in RNA-seq) or for corresponding tests across separate analyses (e.g., replicated comparisons, effect of different treatment conditions). Support is provided for handling log-transformed input p-values, missing values and weighting where appropriate.
Maintained by Aaron Lun. Last updated 3 months ago.
multiplecomparisondifferentialpeakcallingcpp
18.0 match 7.44 score 17 scripts 46 dependentsbioc
SeqArray:Data Management of Large-Scale Whole-Genome Sequence Variant Calls
Data management of large-scale whole-genome sequencing variant calls with thousands of individuals: genotypic data (e.g., SNVs, indels and structural variation calls) and annotations in SeqArray GDS files are stored in an array-oriented and compressed manner, with efficient data access using the R programming language.
Maintained by Xiuwen Zheng. Last updated 8 days ago.
infrastructuredatarepresentationsequencinggeneticsbioinformaticsgds-formatsnpsnvweswgscpp
11.0 match 45 stars 12.08 score 1.1k scripts 9 dependentsohdsi
ParallelLogger:Support for Parallel Computation, Logging, and Function Automation
Support for parallel computation with progress bar, and option to stop or proceed on errors. Also provides logging to console and disk, and the logging persists in the parallel threads. Additional functions support function call automation with delayed execution (e.g. for executing functions in parallel).
Maintained by Martijn Schuemie. Last updated 6 months ago.
13.5 match 12 stars 9.18 score 87 scripts 11 dependentsbioc
RnBeads:RnBeads
RnBeads facilitates comprehensive analysis of various types of DNA methylation data at the genome scale.
Maintained by Fabian Mueller. Last updated 1 months ago.
dnamethylationmethylationarraymethylseqepigeneticsqualitycontrolpreprocessingbatcheffectdifferentialmethylationsequencingcpgislandimmunooncologytwochanneldataimport
18.0 match 6.85 score 169 scripts 1 dependentsflorianhartig
BayesianTools:General-Purpose MCMC and SMC Samplers and Tools for Bayesian Statistics
General-purpose MCMC and SMC samplers, as well as plots and diagnostic functions for Bayesian statistics, with a particular focus on calibrating complex system models. Implemented samplers include various Metropolis MCMC variants (including adaptive and/or delayed rejection MH), the T-walk, two differential evolution MCMCs, two DREAM MCMCs, and a sequential Monte Carlo (SMC) particle filter.
Maintained by Florian Hartig. Last updated 1 years ago.
bayesecological-modelsmcmcoptimizationsmcsystems-biologycpp
11.7 match 122 stars 10.17 score 580 scripts 5 dependentstlverse
delayed:A Framework for Parallelizing Dependent Tasks
Mechanisms to parallelize dependent tasks in a manner that optimizes the compute resources available. It provides access to "delayed" computations, which may be parallelized using futures. It is, to an extent, a facsimile of the 'Dask' library (<https://www.dask.org/>), for the 'Python' language.
Maintained by Jeremy Coyle. Last updated 11 months ago.
16.4 match 23 stars 7.03 score 39 scripts 8 dependentsrenozao
doRNG:Generic Reproducible Parallel Backend for 'foreach' Loops
Provides functions to perform reproducible parallel foreach loops, using independent random streams as generated by L'Ecuyer's combined multiple-recursive generator [L'Ecuyer (1999), <DOI:10.1287/opre.47.1.159>]. It enables to easily convert standard '%dopar%' loops into fully reproducible loops, independently of the number of workers, the task scheduling strategy, or the chosen parallel environment and associated foreach backend.
Maintained by Renaud Gaujoux. Last updated 2 years ago.
9.0 match 20 stars 12.63 score 4.3k scripts 183 dependentskvnkuang
pbmcapply:Tracking the Progress of Mc*pply with Progress Bar
A light-weight package helps you track and visualize the progress of parallel version of vectorized R functions (mc*apply). Parallelization (mc.core > 1) works only on *nix (Linux, Unix such as macOS) system due to the lack of fork() functionality, which is essential for mc*apply, on Windows.
Maintained by Kevin kuang. Last updated 3 years ago.
10.8 match 44 stars 10.28 score 972 scripts 65 dependentsboennecd
parglm:Parallel GLM
Provides a parallel estimation method for generalized linear models without compiling with a multithreaded LAPACK or BLAS.
Maintained by Benjamin Christoffersen. Last updated 3 years ago.
generalized-linear-modelsparallel-computingopenblascpp
15.4 match 11 stars 6.41 score 39 scripts 4 dependentsacguidoum
Sim.DiffProc:Simulation of Diffusion Processes
It provides users with a wide range of tools to simulate, estimate, analyze, and visualize the dynamics of stochastic differential systems in both forms Ito and Stratonovich. Statistical analysis with parallel Monte Carlo and moment equations methods of SDEs <doi:10.18637/jss.v096.i02>. Enabled many searchers in different domains to use these equations to modeling practical problems in financial and actuarial modeling and other areas of application, e.g., modeling and simulate of first passage time problem in shallow water using the attractive center (Boukhetala K, 1996) ISBN:1-56252-342-2.
Maintained by Arsalane Chouaib Guidoum. Last updated 1 years ago.
dynamic-systemmoment-equationsmonte-carlo-simulationparallel-computingstochastic-calculusstochastic-differential-equationtransition-density
12.7 match 13 stars 7.69 score 86 scripts 4 dependentsmiraisolutions
rTRNG:Advanced and Parallel Random Number Generation via 'TRNG'
Embeds sources and headers from Tina's Random Number Generator ('TRNG') C++ library. Exposes some functionality for easier access, testing and benchmarking into R. Provides examples of how to use parallel RNG with 'RcppParallel'. The methods and techniques behind 'TRNG' are illustrated in the package vignettes and examples. Full documentation is available in Bauke (2021) <https://github.com/rabauke/trng4/blob/v4.23.1/doc/trng.pdf>.
Maintained by Riccardo Porreca. Last updated 1 years ago.
17.1 match 19 stars 5.63 score 15 scriptsavi-kenny
SimEngine:A Modular Framework for Statistical Simulations in R
An open-source R package for structuring, maintaining, running, and debugging statistical simulations on both local and cluster-based computing environments.See full documentation at <https://avi-kenny.github.io/SimEngine/>.
Maintained by Avi Kenny. Last updated 21 days ago.
12.8 match 12 stars 7.18 score 50 scriptsrevolutionanalytics
doParallel:Foreach Parallel Adaptor for the 'parallel' Package
Provides a parallel backend for the %dopar% function using the parallel package.
Maintained by Folashade Daniel. Last updated 3 years ago.
6.2 match 5 stars 14.56 score 50k scripts 1.4k dependentscloudyr
googleComputeEngineR:R Interface with Google Compute Engine
Interact with the 'Google Compute Engine' API in R. Lets you create, start and stop instances in the 'Google Cloud'. Support for preconfigured instances, with templates for common R needs.
Maintained by Mark Edmondson. Last updated 3 years ago.
apicloud-computingcloudyrgoogle-cloudgoogleauthrlaunching-virtual-machines
9.2 match 152 stars 9.78 score 235 scriptstidyverse
purrr:Functional Programming Tools
A complete and consistent functional programming toolkit for R.
Maintained by Hadley Wickham. Last updated 1 months ago.
4.0 match 1.3k stars 22.12 score 59k scripts 6.9k dependentscalvagone
campsis:Generic PK/PD Simulation Platform CAMPSIS
A generic, easy-to-use and intuitive pharmacokinetic/pharmacodynamic (PK/PD) simulation platform based on R packages 'rxode2' and 'mrgsolve'. CAMPSIS provides an abstraction layer over the underlying processes of writing a PK/PD model, assembling a custom dataset and running a simulation. CAMPSIS has a strong dependency to the R package 'campsismod', which allows to read/write a model from/to files and adapt it further on the fly in the R environment. Package 'campsis' allows the user to assemble a dataset in an intuitive manner. Once the user’s dataset is ready, the package is in charge of preparing the simulation, calling 'rxode2' or 'mrgsolve' (at the user's choice) and returning the results, for the given model, dataset and desired simulation settings.
Maintained by Nicolas Luyckx. Last updated 1 months ago.
11.6 match 8 stars 7.52 score 93 scriptsbioc
Spectra:Spectra Infrastructure for Mass Spectrometry Data
The Spectra package defines an efficient infrastructure for storing and handling mass spectrometry spectra and functionality to subset, process, visualize and compare spectra data. It provides different implementations (backends) to store mass spectrometry data. These comprise backends tuned for fast data access and processing and backends for very large data sets ensuring a small memory footprint.
Maintained by RforMassSpectrometry Package Maintainer. Last updated 8 days ago.
infrastructureproteomicsmassspectrometrymetabolomicsbioconductorhacktoberfestmass-spectrometry
6.7 match 41 stars 13.01 score 254 scripts 35 dependentsrcppcore
RcppParallel:Parallel Programming Tools for 'Rcpp'
High level functions for parallel programming with 'Rcpp'. For example, the 'parallelFor()' function can be used to convert the work of a standard serial "for" loop into a parallel one and the 'parallelReduce()' function can be used for accumulating aggregate or other values.
Maintained by Kevin Ushey. Last updated 2 months ago.
5.8 match 173 stars 14.89 score 215 scripts 790 dependentsprivefl
bigsnpr:Analysis of Massive SNP Arrays
Easy-to-use, efficient, flexible and scalable tools for analyzing massive SNP arrays. Privé et al. (2018) <doi:10.1093/bioinformatics/bty185>.
Maintained by Florian Privé. Last updated 9 days ago.
big-databioinformaticsmemory-mapped-fileparallel-computingpolygenic-scorespopulation-structure-inferencesnp-datastatistical-methodsopenblaszlibcppopenmp
7.5 match 200 stars 11.44 score 1.5k scripts 3 dependentsbnosac
udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit
This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.
Maintained by Jan Wijffels. Last updated 2 years ago.
conlldependency-parserlemmatizationnatural-language-processingnlppos-taggingr-pkgrcpptext-miningtokenizerudpipecpp
7.2 match 215 stars 11.83 score 1.2k scripts 9 dependentsmrc-ide
hipercow:High Performance Computing
Set up cluster environments and jobs. Moo.
Maintained by Rich FitzJohn. Last updated 10 days ago.
12.4 match 1 stars 6.53 score 45 scripts 1 dependentsnorskregnesentral
shapr:Prediction Explanation with Dependence-Aware Shapley Values
Complex machine learning models are often hard to interpret. However, in many situations it is crucial to understand and explain why a model made a specific prediction. Shapley values is the only method for such prediction explanation framework with a solid theoretical foundation. Previously known methods for estimating the Shapley values do, however, assume feature independence. This package implements methods which accounts for any feature dependence, and thereby produces more accurate estimates of the true Shapley values. An accompanying 'Python' wrapper ('shaprpy') is available through the GitHub repository.
Maintained by Martin Jullum. Last updated 1 months ago.
explainable-aiexplainable-mlrcpprcpparmadilloshapleyopenblascppopenmp
7.5 match 153 stars 10.62 score 175 scripts 1 dependentsprivefl
bigstatsr:Statistical Tools for Filebacked Big Matrices
Easy-to-use, efficient, flexible and scalable statistical tools. Package bigstatsr provides and uses Filebacked Big Matrices via memory-mapping. It provides for instance matrix operations, Principal Component Analysis, sparse linear supervised models, utility functions and more <doi:10.1093/bioinformatics/bty185>.
Maintained by Florian Privé. Last updated 6 months ago.
big-datalarge-matricesmemory-mapped-fileparallel-computingstatistical-methodsopenblascppopenmp
7.5 match 180 stars 10.59 score 394 scripts 16 dependentsgrvanderploeg
parafac4microbiome:Parallel Factor Analysis Modelling of Longitudinal Microbiome Data
Creation and selection of PARAllel FACtor Analysis (PARAFAC) models of longitudinal microbiome data. You can import your own data with our import functions or use one of the example datasets to create your own PARAFAC models. Selection of the optimal number of components can be done using assessModelQuality() and assessModelStability(). The selected model can then be plotted using plotPARAFACmodel(). The Parallel Factor Analysis method was originally described by Caroll and Chang (1970) <doi:10.1007/BF02310791> and Harshman (1970) <https://www.psychology.uwo.ca/faculty/harshman/wpppfac0.pdf>.
Maintained by Geert Roelof van der Ploeg. Last updated 19 days ago.
dimensionality-reductionmicrobiomemicrobiome-datamultiwaymultiway-algorithmsparallel-factor-analysis
12.3 match 6 stars 6.31 score 13 scriptsfifis
pnd:Parallel Numerical Derivatives, Gradients, Jacobians, and Hessians of Arbitrary Accuracy Order
Numerical derivatives through finite-difference approximations can be calculated using the 'pnd' package with parallel capabilities and optimal step-size selection to improve accuracy. These functions facilitate efficient computation of derivatives, gradients, Jacobians, and Hessians, allowing for more evaluations to reduce the mathematical and machine errors. Designed for compatibility with the 'numDeriv' package, which has not received updates in several years, it introduces advanced features such as computing derivatives of arbitrary order, improving the accuracy of Hessian approximations by avoiding repeated differencing, and parallelising slow functions on Windows, Mac, and Linux.
Maintained by Andreï Victorovitch Kostyrka. Last updated 4 days ago.
14.6 match 1 stars 5.22 score 5 scriptsmdsteiner
EFAtools:Fast and Flexible Implementations of Exploratory Factor Analysis Tools
Provides functions to perform exploratory factor analysis (EFA) procedures and compare their solutions. The goal is to provide state-of-the-art factor retention methods and a high degree of flexibility in the EFA procedures. This way, for example, implementations from R 'psych' and 'SPSS' can be compared. Moreover, functions for Schmid-Leiman transformation and the computation of omegas are provided. To speed up the analyses, some of the iterative procedures, like principal axis factoring (PAF), are implemented in C++.
Maintained by Markus Steiner. Last updated 3 months ago.
11.3 match 10 stars 6.57 score 83 scripts 1 dependentspbreheny
biglasso:Extending Lasso Model Fitting to Big Data
Extend lasso and elastic-net model fitting for large data sets that cannot be loaded into memory. Designed to be more memory- and computation-efficient than existing lasso-fitting packages like 'glmnet' and 'ncvreg', thus allowing the user to analyze big data with limited RAM <doi:10.32614/RJ-2021-001>.
Maintained by Patrick Breheny. Last updated 10 days ago.
bigdatalassoout-of-coreparallel-computingcppopenmp
7.5 match 113 stars 9.84 score 74 scripts 1 dependentsflorafauna
optimParallel:Parallel Version of the L-BFGS-B Optimization Method
Provides a parallel version of the L-BFGS-B method of optim(). The main function of the package is optimParallel(), which has the same usage and output as optim(). Using optimParallel() can significantly reduce the optimization time.
Maintained by Florian Gerber. Last updated 4 years ago.
8.0 match 9 stars 9.19 score 157 scripts 91 dependentsbioc
peakPantheR:Peak Picking and Annotation of High Resolution Experiments
An automated pipeline for the detection, integration and reporting of predefined features across a large number of mass spectrometry data files. It enables the real time annotation of multiple compounds in a single file, or the parallel annotation of multiple compounds in multiple files. A graphical user interface as well as command line functions will assist in assessing the quality of annotation and update fitting parameters until a satisfactory result is obtained.
Maintained by Arnaud Wolfer. Last updated 5 months ago.
massspectrometrymetabolomicspeakdetectionfeature-detectionmass-spectrometry
10.7 match 12 stars 6.82 score 23 scriptsr-lidar
lidR:Airborne LiDAR Data Manipulation and Visualization for Forestry Applications
Airborne LiDAR (Light Detection and Ranging) interface for data manipulation and visualization. Read/write 'las' and 'laz' files, computation of metrics in area based approach, point filtering, artificial point reduction, classification from geographic data, normalization, individual tree segmentation and other manipulations.
Maintained by Jean-Romain Roussel. Last updated 1 months ago.
alsforestrylaslazlidarpoint-cloudremote-sensingopenblascppopenmp
5.0 match 623 stars 14.47 score 844 scripts 8 dependentsmatloff
partools:Tools for the 'Parallel' Package
Miscellaneous utilities for parallelizing large computations. Alternative to MapReduce. File splitting and distributed operations such as sort and aggregate. "Software Alchemy" method for parallelizing most statistical methods, presented in N. Matloff, Parallel Computation for Data Science, Chapman and Hall, 2015. Includes a debugging aid.
Maintained by Norm Matloff. Last updated 2 years ago.
9.6 match 40 stars 7.51 score 30 scripts 3 dependentsropensci
stplanr:Sustainable Transport Planning
Tools for transport planning with an emphasis on spatial transport data and non-motorized modes. The package was originally developed to support the 'Propensity to Cycle Tool', a publicly available strategic cycle network planning tool (Lovelace et al. 2017) <doi:10.5198/jtlu.2016.862>, but has since been extended to support public transport routing and accessibility analysis (Moreno-Monroy et al. 2017) <doi:10.1016/j.jtrangeo.2017.08.012> and routing with locally hosted routing engines such as 'OSRM' (Lowans et al. 2023) <doi:10.1016/j.enconman.2023.117337>. The main functions are for creating and manipulating geographic "desire lines" from origin-destination (OD) data (building on the 'od' package); calculating routes on the transport network locally and via interfaces to routing services such as <https://cyclestreets.net/> (Desjardins et al. 2021) <doi:10.1007/s11116-021-10197-1>; and calculating route segment attributes such as bearing. The package implements the 'travel flow aggregration' method described in Morgan and Lovelace (2020) <doi:10.1177/2399808320942779> and the 'OD jittering' method described in Lovelace et al. (2022) <doi:10.32866/001c.33873>. Further information on the package's aim and scope can be found in the vignettes and in a paper in the R Journal (Lovelace and Ellison 2018) <doi:10.32614/RJ-2018-053>, and in a paper outlining the landscape of open source software for geographic methods in transport planning (Lovelace, 2021) <doi:10.1007/s10109-020-00342-2>.
Maintained by Robin Lovelace. Last updated 7 months ago.
cyclecyclingdesire-linesorigin-destinationpeer-reviewedpubic-transportroute-networkroutesroutingspatialtransporttransport-planningtransportationwalking
5.7 match 427 stars 12.31 score 684 scripts 3 dependentsdazzimonti
KrigInv:Kriging-Based Inversion for Deterministic and Noisy Computer Experiments
Criteria and algorithms for sequentially estimating level sets of a multivariate numerical function, possibly observed with noise.
Maintained by Dario Azzimonti. Last updated 3 years ago.
24.8 match 2.81 score 54 scripts 4 dependentscran
epiR:Tools for the Analysis of Epidemiological Data
Tools for the analysis of epidemiological and surveillance data. Contains functions for directly and indirectly adjusting measures of disease frequency, quantifying measures of association on the basis of single or multiple strata of count data presented in a contingency table, computation of confidence intervals around incidence risk and incidence rate estimates and sample size calculations for cross-sectional, case-control and cohort studies. Surveillance tools include functions to calculate an appropriate sample size for 1- and 2-stage representative freedom surveys, functions to estimate surveillance system sensitivity and functions to support scenario tree modelling analyses.
Maintained by Mark Stevenson. Last updated 1 months ago.
8.5 match 10 stars 8.18 score 10 dependentsmrc-ide
dust:Iterate Multiple Realisations of Stochastic Models
An Engine for simulation of stochastic models. Includes support for running stochastic models in parallel, either with shared or varying parameters. Simulations are run efficiently in compiled code and can be run with a fraction of simulated states returned to R, allowing control over memory usage. Support is provided for building bootstrap particle filter for performing Sequential Monte Carlo (e.g., Gordon et al. 1993 <doi:10.1049/ip-f-2.1993.0015>). The core of the simulation engine is the 'xoshiro256**' algorithm (Blackman and Vigna <arXiv:1805.01407>), and the package is further described in FitzJohn et al 2021 <doi:10.12688/wellcomeopenres.16466.2>.
Maintained by Rich FitzJohn. Last updated 5 months ago.
8.9 match 18 stars 7.84 score 60 scripts 3 dependentsphilchalmers
SimDesign:Structure for Organizing Monte Carlo Simulation Designs
Provides tools to safely and efficiently organize and execute Monte Carlo simulation experiments in R. The package controls the structure and back-end of Monte Carlo simulation experiments by utilizing a generate-analyse-summarise workflow. The workflow safeguards against common simulation coding issues, such as automatically re-simulating non-convergent results, prevents inadvertently overwriting simulation files, catches error and warning messages during execution, implicitly supports parallel processing with high-quality random number generation, and provides tools for managing high-performance computing (HPC) array jobs submitted to schedulers such as SLURM. For a pedagogical introduction to the package see Sigal and Chalmers (2016) <doi:10.1080/10691898.2016.1246953>. For a more in-depth overview of the package and its design philosophy see Chalmers and Adkins (2020) <doi:10.20982/tqmp.16.4.p248>.
Maintained by Phil Chalmers. Last updated 4 days ago.
monte-carlo-simulationsimulationsimulation-framework
5.2 match 62 stars 13.35 score 253 scripts 46 dependentsmoderndive
moderndive:Tidyverse-Friendly Introductory Linear Regression
Datasets and wrapper functions for tidyverse-friendly introductory linear regression, used in "Statistical Inference via Data Science: A ModernDive into R and the Tidyverse" available at <https://moderndive.com/>.
Maintained by Albert Y. Kim. Last updated 3 months ago.
6.1 match 88 stars 11.35 score 1.8k scriptsrsparapa
BART:Bayesian Additive Regression Trees
Bayesian Additive Regression Trees (BART) provide flexible nonparametric modeling of covariates for continuous, binary, categorical and time-to-event outcomes. For more information see Sparapani, Spanbauer and McCulloch <doi:10.18637/jss.v097.i01>.
Maintained by Rodney Sparapani. Last updated 9 months ago.
8.6 match 14 stars 7.96 score 474 scripts 10 dependentsthomasp85
ggraph:An Implementation of Grammar of Graphics for Graphs and Networks
The grammar of graphics as implemented in ggplot2 is a poor fit for graph and network visualizations due to its reliance on tabular data input. ggraph is an extension of the ggplot2 API tailored to graph visualizations and provides the same flexible approach to building up plots layer by layer.
Maintained by Thomas Lin Pedersen. Last updated 1 years ago.
ggplot-extensionggplot2graph-visualizationnetwork-visualizationvisualizationcpp
4.0 match 1.1k stars 16.96 score 9.2k scripts 111 dependentsdetlew
PowerTOST:Power and Sample Size for (Bio)Equivalence Studies
Contains functions to calculate power and sample size for various study designs used in bioequivalence studies. Use known.designs() to see the designs supported. Power and sample size can be obtained based on different methods, amongst them prominently the TOST procedure (two one-sided t-tests). See README and NEWS for further information.
Maintained by Detlew Labes. Last updated 12 months ago.
7.0 match 20 stars 9.61 score 112 scripts 4 dependentsasardaes
dtwclust:Time Series Clustering Along with Optimizations for the Dynamic Time Warping Distance
Time series clustering along with optimized techniques related to the Dynamic Time Warping distance and its corresponding lower bounds. Implementations of partitional, hierarchical, fuzzy, k-Shape and TADPole clustering are available. Functionality can be easily extended with custom distance measures and centroid definitions. Implementations of DTW barycenter averaging, a distance based on global alignment kernels, and the soft-DTW distance and centroid routines are also provided. All included distance functions have custom loops optimized for the calculation of cross-distance matrices, including parallelization support. Several cluster validity indices are included.
Maintained by Alexis Sarda. Last updated 8 months ago.
clusteringdtwtime-seriesopenblascpp
5.3 match 261 stars 12.39 score 406 scripts 14 dependentsmerck
simtrial:Clinical Trial Simulation
Provides some basic routines for simulating a clinical trial. The primary intent is to provide some tools to generate trial simulations for trials with time to event outcomes. Piecewise exponential failure rates and piecewise constant enrollment rates are the underlying mechanism used to simulate a broad range of scenarios such as those presented in Lin et al. (2020) <doi:10.1080/19466315.2019.1697738>. However, the basic generation of data is done using pipes to allow maximum flexibility for users to meet different needs.
Maintained by Yujie Zhao. Last updated 1 days ago.
7.2 match 21 stars 9.16 score 52 scriptsnanxstats
protr:Generating Various Numerical Representation Schemes for Protein Sequences
Comprehensive toolkit for generating various numerical features of protein sequences described in Xiao et al. (2015) <DOI:10.1093/bioinformatics/btv042>. For full functionality, the software 'ncbi-blast+' is needed, see <https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html> for more information.
Maintained by Nan Xiao. Last updated 6 months ago.
bioinformaticsfeature-engineeringfeature-extractionmachine-learningpeptidesprotein-sequencessequence-analysis
6.5 match 52 stars 10.02 score 173 scripts 3 dependentsmlr-org
rush:Rapid Parallel and Distributed Computing
Parallel computing with a network of local and remote workers. Fast exchange of results between the workers through a 'Redis' database. Key features include task queues, local caching, and sophisticated error handling.
Maintained by Marc Becker. Last updated 4 months ago.
12.9 match 11 stars 4.94 score 5 scriptstill-tietz
parsel:Parallel Dynamic Web-Scraping Using 'RSelenium'
A system to increase the efficiency of dynamic web-scraping with 'RSelenium' by leveraging parallel processing. You provide a function wrapper for your 'RSelenium' scraping routine with a set of inputs, and 'parsel' runs it in several browser instances. Chunked input processing as well as error catching and logging ensures seamless execution and minimal data loss, even when unforeseen 'RSelenium' errors occur. You can additionally build safe scraping functions with minimal coding by utilizing constructor functions that act as wrappers around 'RSelenium' methods.
Maintained by Till Tietz. Last updated 1 years ago.
16.4 match 15 stars 3.88 score 8 scriptsbioc
SNPRelate:Parallel Computing Toolset for Relatedness and Principal Component Analysis of SNP Data
Genome-wide association studies (GWAS) are widely used to investigate the genetic basis of diseases and traits, but they pose many computational challenges. We developed an R package SNPRelate to provide a binary format for single-nucleotide polymorphism (SNP) data in GWAS utilizing CoreArray Genomic Data Structure (GDS) data files. The GDS format offers the efficient operations specifically designed for integers with two bits, since a SNP could occupy only two bits. SNPRelate is also designed to accelerate two key computations on SNP data using parallel computing for multi-core symmetric multiprocessing computer architectures: Principal Component Analysis (PCA) and relatedness analysis using Identity-By-Descent measures. The SNP GDS format is also used by the GWASTools package with the support of S4 classes and generic functions. The extended GDS format is implemented in the SeqArray package to support the storage of single nucleotide variations (SNVs), insertion/deletion polymorphism (indel) and structural variation calls in whole-genome and whole-exome variant data.
Maintained by Xiuwen Zheng. Last updated 5 months ago.
infrastructuregeneticsstatisticalmethodprincipalcomponentbioinformaticsgds-formatpcasimdsnpopenblascpp
5.0 match 104 stars 12.69 score 1.6k scripts 18 dependentseagerai
fastai:Interface to 'fastai'
The 'fastai' <https://docs.fast.ai/index.html> library simplifies training fast and accurate neural networks using modern best practices. It is based on research in to deep learning best practices undertaken at 'fast.ai', including 'out of the box' support for vision, text, tabular, audio, time series, and collaborative filtering models.
Maintained by Turgut Abdullayev. Last updated 11 months ago.
audiocollaborative-filteringdarknetdarknet-image-classificationfastaimedicalobject-detectiontabulartextvision
6.6 match 118 stars 9.40 score 76 scriptsvivianephilipps
marqLevAlg:A Parallelized General-Purpose Optimization Based on Marquardt-Levenberg Algorithm
This algorithm provides a numerical solution to the problem of unconstrained local minimization (or maximization). It is particularly suited for complex problems and more efficient than the Gauss-Newton-like algorithm when starting from points very far from the final minimum (or maximum). Each iteration is parallelized and convergence relies on a stringent stopping criterion based on the first and second derivatives. See Philipps et al, 2021 <doi:10.32614/RJ-2021-089>.
Maintained by Viviane Philipps. Last updated 1 years ago.
9.3 match 7 stars 6.52 score 12 scripts 10 dependentsgiuseppec
iml:Interpretable Machine Learning
Interpretability methods to analyze the behavior and predictions of any machine learning model. Implemented methods are: Feature importance described by Fisher et al. (2018) <doi:10.48550/arxiv.1801.01489>, accumulated local effects plots described by Apley (2018) <doi:10.48550/arxiv.1612.08468>, partial dependence plots described by Friedman (2001) <www.jstor.org/stable/2699986>, individual conditional expectation ('ice') plots described by Goldstein et al. (2013) <doi:10.1080/10618600.2014.907095>, local models (variant of 'lime') described by Ribeiro et. al (2016) <doi:10.48550/arXiv.1602.04938>, the Shapley Value described by Strumbelj et. al (2014) <doi:10.1007/s10115-013-0679-x>, feature interactions described by Friedman et. al <doi:10.1214/07-AOAS148> and tree surrogate models.
Maintained by Giuseppe Casalicchio. Last updated 19 days ago.
4.6 match 494 stars 12.86 score 642 scripts 4 dependentsamices
mice:Multivariate Imputation by Chained Equations
Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.
Maintained by Stef van Buuren. Last updated 5 days ago.
chained-equationsfcsimputationmicemissing-datamissing-valuesmultiple-imputationmultivariate-datacpp
3.6 match 462 stars 16.50 score 10k scripts 154 dependentsearthlab
rslurm:Submit R Calculations to a 'Slurm' Cluster
Functions that simplify submitting R scripts to a 'Slurm' workload manager, in part by automating the division of embarrassingly parallel calculations across cluster nodes.
Maintained by Erick Verleye. Last updated 2 years ago.
7.1 match 54 stars 8.29 score 303 scripts 1 dependentsraicheg
nFactors:Parallel Analysis and Other Non Graphical Solutions to the Cattell Scree Test
Indices, heuristics and strategies to help determine the number of factors/components to retain: 1. Acceleration factor (af with or without Parallel Analysis); 2. Optimal Coordinates (noc with or without Parallel Analysis); 3. Parallel analysis (components, factors and bootstrap); 4. lambda > mean(lambda) (Kaiser, CFA and related); 5. Cattell-Nelson-Gorsuch (CNG); 6. Zoski and Jurs multiple regression (b, t and p); 7. Zoski and Jurs standard error of the regression coeffcient (sescree); 8. Nelson R2; 9. Bartlett khi-2; 10. Anderson khi-2; 11. Lawley khi-2 and 12. Bentler-Yuan khi-2.
Maintained by Gilles Raiche. Last updated 2 years ago.
10.6 match 5.46 score 498 scripts 4 dependentsocbe-uio
BayesMallows:Bayesian Preference Learning with the Mallows Rank Model
An implementation of the Bayesian version of the Mallows rank model (Vitelli et al., Journal of Machine Learning Research, 2018 <https://jmlr.org/papers/v18/15-481.html>; Crispino et al., Annals of Applied Statistics, 2019 <doi:10.1214/18-AOAS1203>; Sorensen et al., R Journal, 2020 <doi:10.32614/RJ-2020-026>; Stein, PhD Thesis, 2023 <https://eprints.lancs.ac.uk/id/eprint/195759>). Both Metropolis-Hastings and sequential Monte Carlo algorithms for estimating the models are available. Cayley, footrule, Hamming, Kendall, Spearman, and Ulam distances are supported in the models. The rank data to be analyzed can be in the form of complete rankings, top-k rankings, partially missing rankings, as well as consistent and inconsistent pairwise preferences. Several functions for plotting and studying the posterior distributions of parameters are provided. The package also provides functions for estimating the partition function (normalizing constant) of the Mallows rank model, both with the importance sampling algorithm of Vitelli et al. and asymptotic approximation with the IPFP algorithm (Mukherjee, Annals of Statistics, 2016 <doi:10.1214/15-AOS1389>).
Maintained by Oystein Sorensen. Last updated 1 months ago.
mallows-modelopenblascppopenmp
7.3 match 21 stars 7.91 score 36 scripts 1 dependentsmucollective
multiverse:Create 'multiverse analysis' in R
Implement 'multiverse' style analyses (Steegen S., Tuerlinckx F, Gelman A., Vanpaemal, W., 2016) <doi:10.1177/1745691616658637> to show the robustness of statistical inference. 'Multiverse analysis' is a philosophy of statistical reporting where paper authors report the outcomes of many different statistical analyses in order to show how fragile or robust their findings are. The 'multiverse' package (Sarma A., Kale A., Moon M., Taback N., Chevalier F., Hullman J., Kay M., 2021) <doi:10.31219/osf.io/yfbwm> allows users to concisely and flexibly implement 'multiverse-style' analysis, which involve declaring alternate ways of performing an analysis step, in R and R Notebooks.
Maintained by Abhraneel Sarma. Last updated 4 months ago.
6.8 match 62 stars 8.37 score 42 scriptsdselivanov
text2vec:Modern Text Mining Framework for R
Fast and memory-friendly tools for text vectorization, topic modeling (LDA, LSA), word embeddings (GloVe), similarities. This package provides a source-agnostic streaming API, which allows researchers to perform analysis of collections of documents which are larger than available RAM. All core functions are parallelized to benefit from multicore machines.
Maintained by Dmitriy Selivanov. Last updated 7 months ago.
glovelatent-dirichlet-allocationnatural-language-processingtext-miningtopic-modelingvectorizationword-embeddingsword2veccpp
4.2 match 860 stars 13.48 score 1.3k scripts 23 dependentsmunterfi
eRTG3D:Empirically Informed Random Trajectory Generation in 3-D
Creates realistic random trajectories in a 3-D space between two given fix points, so-called conditional empirical random walks (CERWs). The trajectory generation is based on empirical distribution functions extracted from observed trajectories (training data) and thus reflects the geometrical movement characteristics of the mover. A digital elevation model (DEM), representing the Earth's surface, and a background layer of probabilities (e.g. food sources, uplift potential, waterbodies, etc.) can be used to influence the trajectories. Unterfinger M (2018). "3-D Trajectory Simulation in Movement Ecology: Conditional Empirical Random Walk". Master's thesis, University of Zurich. <https://www.geo.uzh.ch/dam/jcr:6194e41e-055c-4635-9807-53c5a54a3be7/MasterThesis_Unterfinger_2018.pdf>. Technitis G, Weibel R, Kranstauber B, Safi K (2016). "An algorithm for empirically informed random trajectory generation between two endpoints". GIScience 2016: Ninth International Conference on Geographic Information Science, 9, online. <doi:10.5167/uzh-130652>.
Maintained by Merlin Unterfinger. Last updated 3 years ago.
3dbirdsconditional-empirical-random-walkgliding-and-soaringmachine-learningmovement-ecologyrandom-trajectory-generatorrandom-walksimulationtrajectory-generation
9.8 match 6 stars 5.71 score 19 scriptsvlarmet
cppRouting:Algorithms for Routing and Solving the Traffic Assignment Problem
Calculation of distances, shortest paths and isochrones on weighted graphs using several variants of Dijkstra algorithm. Proposed algorithms are unidirectional Dijkstra (Dijkstra, E. W. (1959) <doi:10.1007/BF01386390>), bidirectional Dijkstra (Goldberg, Andrew & Fonseca F. Werneck, Renato (2005) <https://archive.siam.org/meetings/alenex05/papers/03agoldberg.pdf>), A* search (P. E. Hart, N. J. Nilsson et B. Raphael (1968) <doi:10.1109/TSSC.1968.300136>), new bidirectional A* (Pijls & Post (2009) <https://repub.eur.nl/pub/16100/ei2009-10.pdf>), Contraction hierarchies (R. Geisberger, P. Sanders, D. Schultes and D. Delling (2008) <doi:10.1007/978-3-540-68552-4_24>), PHAST (D. Delling, A.Goldberg, A. Nowatzyk, R. Werneck (2011) <doi:10.1016/j.jpdc.2012.02.007>). Algorithms for solving the traffic assignment problem are All-or-Nothing assignment, Method of Successive Averages, Frank-Wolfe algorithm (M. Fukushima (1984) <doi:10.1016/0191-2615(84)90029-8>), Conjugate and Bi-Conjugate Frank-Wolfe algorithms (M. Mitradjieva, P. O. Lindberg (2012) <doi:10.1287/trsc.1120.0409>), Algorithm-B (R. B. Dial (2006) <doi:10.1016/j.trb.2006.02.008>).
Maintained by Vincent Larmet. Last updated 9 months ago.
algorithmalgorithm-bbidirectional-a-star-algorithmc-plus-pluscontraction-hierarchiesdijkstra-algorithmdistancefrank-wolfeisochronesparallel-computingrcppshortest-pathstraffic-assignmentcpp
7.5 match 112 stars 7.42 score 39 scripts 4 dependentsuscbiostats
fmcmc:A friendly MCMC framework
Provides a friendly (flexible) Markov Chain Monte Carlo (MCMC) framework for implementing Metropolis-Hastings algorithm in a modular way allowing users to specify automatic convergence checker, personalized transition kernels, and out-of-the-box multiple MCMC chains using parallel computing. Most of the methods implemented in this package can be found in Brooks et al. (2011, ISBN 9781420079425). Among the methods included, we have: Haario (2001) <doi:10.1007/s11222-011-9269-5> Adaptive Metropolis, Vihola (2012) <doi:10.1007/s11222-011-9269-5> Robust Adaptive Metropolis, and Thawornwattana et al. (2018) <doi:10.1214/17-BA1084> Mirror transition kernels.
Maintained by George Vega Yon. Last updated 1 years ago.
adaptivebayesian-inferencemarkov-chain-monte-carlomcmcmetropolis-hastingsparallel-computing
8.0 match 16 stars 6.79 score 86 scripts 1 dependentselvanceyhan
pcds:Proximity Catch Digraphs and Their Applications
Contains the functions for construction and visualization of various families of the proximity catch digraphs (PCDs) (see (Ceyhan (2005) ISBN:978-3-639-19063-2), for computing the graph invariants for testing the patterns of segregation and association against complete spatial randomness (CSR) or uniformity in one, two and three dimensional cases. The package also has tools for generating points from these spatial patterns. The graph invariants used in testing spatial point data are the domination number (Ceyhan (2011) <doi:10.1080/03610921003597211>) and arc density (Ceyhan et al. (2006) <doi:10.1016/j.csda.2005.03.002>; Ceyhan et al. (2007) <doi:10.1002/cjs.5550350106>). The PCD families considered are Arc-Slice PCDs, Proportional-Edge PCDs, and Central Similarity PCDs.
Maintained by Elvan Ceyhan. Last updated 2 years ago.
9.3 match 5.80 score 21 scripts 2 dependentsmyles-lewis
mcprogress:Progress Bars and Messages for Parallel Processes
Tools for monitoring progress during parallel processing. Lightweight package which acts as a wrapper around mclapply() and adds a progress bar to it in 'RStudio' or 'Linux' environments. Simply replace your original call to mclapply() with pmclapply(). A progress bar can also be displayed during parallelisation via the 'foreach' package. Also included are functions to safely print messages (including error messages) from within parallelised code, which can be useful for debugging parallelised R code.
Maintained by Myles Lewis. Last updated 6 months ago.
11.9 match 1 stars 4.48 score 2 scripts 1 dependentsalexeckert
parallelDist:Parallel Distance Matrix Computation using Multiple Threads
A fast parallelized alternative to R's native 'dist' function to calculate distance matrices for continuous, binary, and multi-dimensional input matrices, which supports a broad variety of 41 predefined distance functions from the 'stats', 'proxy' and 'dtw' R packages, as well as user- defined functions written in C++. For ease of use, the 'parDist' function extends the signature of the 'dist' function and uses the same parameter naming conventions as distance methods of existing R packages. The package is mainly implemented in C++ and leverages the 'RcppParallel' package to parallelize the distance computations with the help of the 'TinyThread' library. Furthermore, the 'Armadillo' linear algebra library is used for optimized matrix operations during distance calculations. The curiously recurring template pattern (CRTP) technique is applied to avoid virtual functions, which improves the Dynamic Time Warping calculations while the implementation stays flexible enough to support different DTW step patterns and normalization methods.
Maintained by Alexander Eckert. Last updated 3 years ago.
data-sciencedistance-computationsmatricesopenblascpp
5.3 match 51 stars 9.92 score 432 scripts 14 dependentssales-lab
parmigene:Parallel Mutual Information Estimation for Gene Network Reconstruction
Parallel estimation of the mutual information based on entropy estimates from k-nearest neighbors distances and algorithms for the reconstruction of gene regulatory networks (Sales et al, 2011 <doi:10.1093/bioinformatics/btr274>).
Maintained by Gabriele Sales. Last updated 5 months ago.
8.7 match 5 stars 6.06 score 38 scripts 4 dependentssahakyanlab
ROptimus:A Parallel General-Purpose Adaptive Optimisation Engine
A general-purpose optimisation engine that supports i) Monte Carlo optimisation with Metropolis criterion [Metropolis et al. (1953) <doi:10.1063/1.1699114>, Hastings (1970) <doi:10.1093/biomet/57.1.97>] and Acceptance Ratio Simulated Annealing [Kirkpatrick et al. (1983) <doi:10.1126/science.220.4598.671>, Černý (1985) <doi:10.1007/BF00940812>] on multiple cores, and ii) Acceptance Ratio Replica Exchange Monte Carlo Optimisation. In each case, the system pseudo-temperature is dynamically adjusted such that the observed acceptance ratio is kept near to the desired (fixed or changing) acceptance ratio.
Maintained by Aleksandr B. Sahakyan. Last updated 2 years ago.
13.9 match 4 stars 3.78 score 2 scriptsglmmtmb
glmmTMB:Generalized Linear Mixed Models using Template Model Builder
Fit linear and generalized linear mixed models with various extensions, including zero-inflation. The models are fitted using maximum likelihood estimation via 'TMB' (Template Model Builder). Random effects are assumed to be Gaussian on the scale of the linear predictor and are integrated out using the Laplace approximation. Gradients are calculated using automatic differentiation.
Maintained by Mollie Brooks. Last updated 10 days ago.
3.1 match 312 stars 16.77 score 3.7k scripts 24 dependentsheike
ggpcp:Parallel Coordinate Plots in the 'ggplot2' Framework
Modern Parallel Coordinate Plots have been introduced in the 1980s as a way to visualize arbitrarily many numeric variables. This Grammar of Graphics implementation also incorporates categorical variables into the plots in a principled manner. By separating the data managing part from the visual rendering, we give full access to the users while keeping the number of parameters manageably low.
Maintained by Heike Hofmann. Last updated 3 days ago.
12.9 match 1 stars 4.04 score 73 scriptsweecology
LDATS:Latent Dirichlet Allocation Coupled with Time Series Analyses
Combines Latent Dirichlet Allocation (LDA) and Bayesian multinomial time series methods in a two-stage analysis to quantify dynamics in high-dimensional temporal data. LDA decomposes multivariate data into lower-dimension latent groupings, whose relative proportions are modeled using generalized Bayesian time series models that include abrupt changepoints and smooth dynamics. The methods are described in Blei et al. (2003) <doi:10.1162/jmlr.2003.3.4-5.993>, Western and Kleykamp (2004) <doi:10.1093/pan/mph023>, Venables and Ripley (2002, ISBN-13:978-0387954578), and Christensen et al. (2018) <doi:10.1002/ecy.2373>.
Maintained by Juniper L. Simonis. Last updated 5 years ago.
changepointldaparallel-temperingportalsoftmax
7.5 match 25 stars 6.93 score 45 scriptsbhklab
mRMRe:Parallelized Minimum Redundancy, Maximum Relevance (mRMR)
Computes mutual information matrices from continuous, categorical and survival variables, as well as feature selection with minimum redundancy, maximum relevance (mRMR) and a new ensemble mRMR technique. Published in De Jay et al. (2013) <doi:10.1093/bioinformatics/btt383>.
Maintained by Benjamin Haibe-Kains. Last updated 4 years ago.
5.7 match 19 stars 8.95 score 105 scripts 2 dependentsstochastictree
stochtree:Stochastic Tree Ensembles (XBART and BART) for Supervised Learning and Causal Inference
Flexible stochastic tree ensemble software. Robust implementations of Bayesian Additive Regression Trees (BART) Chipman, George, McCulloch (2010) <doi:10.1214/09-AOAS285> for supervised learning and Bayesian Causal Forests (BCF) Hahn, Murray, Carvalho (2020) <doi:10.1214/19-BA1195> for causal inference. Enables model serialization and parallel sampling and provides a low-level interface for custom stochastic forest samplers.
Maintained by Drew Herren. Last updated 16 days ago.
bartbayesian-machine-learningbayesian-methodsdecision-treesgradient-boosted-treesmachine-learningprobabilistic-modelstree-ensemblescpp
5.9 match 20 stars 8.52 score 40 scriptsstatistikat
VIM:Visualization and Imputation of Missing Values
New tools for the visualization of missing and/or imputed values are introduced, which can be used for exploring the data and the structure of the missing and/or imputed values. Depending on this structure of the missing values, the corresponding methods may help to identify the mechanism generating the missing values and allows to explore the data including missing values. In addition, the quality of imputation can be visually explored using various univariate, bivariate, multiple and multivariate plot methods. A graphical user interface available in the separate package VIMGUI allows an easy handling of the implemented plot methods.
Maintained by Matthias Templ. Last updated 7 months ago.
hotdeckimputation-methodsmodel-predictionsvisualizationcpp
3.5 match 85 stars 14.44 score 2.6k scripts 19 dependentsdoi-usgs
EGRET:Exploration and Graphics for RivEr Trends
Statistics and graphics for streamflow history, water quality trends, and the statistical modeling algorithm: Weighted Regressions on Time, Discharge, and Season (WRTDS).
Maintained by Laura DeCicco. Last updated 4 months ago.
usgswater-qualitywater-quality-data
4.7 match 90 stars 10.72 score 362 scripts 1 dependentsropensci
canaper:Categorical Analysis of Neo- And Paleo-Endemism
Provides functions to analyze the spatial distribution of biodiversity, in particular categorical analysis of neo- and paleo-endemism (CANAPE) as described in Mishler et al (2014) <doi:10.1038/ncomms5473>. 'canaper' conducts statistical tests to determine the types of endemism that occur in a study area while accounting for the evolutionary relationships of species.
Maintained by Joel H. Nitta. Last updated 2 years ago.
9.3 match 7 stars 5.38 score 23 scriptsmasurp
specr:Conducting and Visualizing Specification Curve Analyses
Provides utilities for conducting specification curve analyses (Simonsohn, Simmons & Nelson (2020, <doi: 10.1038/s41562-020-0912-z>) or multiverse analyses (Steegen, Tuerlinckx, Gelman & Vanpaemel, 2016, <doi: 10.1177/1745691616658637>) including functions to setup, run, evaluate, and plot all specifications.
Maintained by Philipp K. Masur. Last updated 10 months ago.
6.2 match 68 stars 8.02 score 85 scriptsmihaiconstantin
doParabar:'foreach' Parallel Adapter for 'parabar' Backends
Provides a 'foreach' parallel adapter for 'parabar' backends. This package offers a minimal implementation of the '%dopar%' operator, enabling users to run 'foreach' loops in parallel, leveraging the parallel and progress-tracking capabilities of the 'parabar' package. Learn more about 'parabar' and 'doParabar' at <https://parabar.mihaiconstantin.com>.
Maintained by Mihai Constantin. Last updated 2 months ago.
13.5 match 1 stars 3.65 score 5 scripts 1 dependentsazure
azuremlsdk:Interface to the 'Azure Machine Learning' 'SDK'
Interface to the 'Azure Machine Learning' Software Development Kit ('SDK'). Data scientists can use the 'SDK' to train, deploy, automate, and manage machine learning models on the 'Azure Machine Learning' service. To learn more about 'Azure Machine Learning' visit the website: <https://docs.microsoft.com/en-us/azure/machine-learning/service/overview-what-is-azure-ml>.
Maintained by Diondra Peck. Last updated 3 years ago.
amlcomputeazureazure-machine-learningazuremldsimachine-learningrstudiosdk-r
5.5 match 106 stars 8.91 score 221 scriptsr-lidar
lasR:Fast and Pipeable Airborne LiDAR Data Tools
Fast and pipeable airborne lidar processing tools. Read/write 'las' and 'laz' files, computation of metrics in area based approach, point filtering, normalization, individual tree segmentation and other manipulations in a powerful and versatile processing chain.
Maintained by Jean-Romain Roussel. Last updated 20 days ago.
7.3 match 17 stars 6.76 score 26 scriptsskstavroglou
patterncausality:Pattern Causality Algorithm
A comprehensive package for detecting and analyzing causal relationships in complex systems using pattern-based approaches. Key features include state space reconstruction, pattern identification, and causality strength evaluation.
Maintained by Hui Wang. Last updated 28 days ago.
8.0 match 1 stars 6.08 score 20 scriptsmonty-se
PINstimation:Estimation of the Probability of Informed Trading
A comprehensive bundle of utilities for the estimation of probability of informed trading models: original PIN in Easley and O'Hara (1992) and Easley et al. (1996); Multilayer PIN (MPIN) in Ersan (2016); Adjusted PIN (AdjPIN) in Duarte and Young (2009); and volume-synchronized PIN (VPIN) in Easley et al. (2011, 2012). Implementations of various estimation methods suggested in the literature are included. Additional compelling features comprise posterior probabilities, an implementation of an expectation-maximization (EM) algorithm, and PIN decomposition into layers, and into bad/good components. Versatile data simulation tools, and trade classification algorithms are among the supplementary utilities. The package provides fast, compact, and precise utilities to tackle the sophisticated, error-prone, and time-consuming estimation procedure of informed trading, and this solely using the raw trade-level data.
Maintained by Montasser Ghachem. Last updated 5 months ago.
clustering-analysisexpectation-maximisation-algorithmhierarchical-clusteringinformation-asymmetrymarket-microstructuremaximum-likelihood-estimationmixture-distributionspoisson-distribution
7.5 match 36 stars 6.48 score 14 scriptswuqian77
TrialSize:R Functions for Chapter 3,4,6,7,9,10,11,12,14,15 of Sample Size Calculation in Clinical Research
Functions and Examples in Sample Size Calculation in Clinical Research.
Maintained by Vicky Qian Wu. Last updated 4 months ago.
12.8 match 3 stars 3.78 score 95 scripts 1 dependentscbhurley
PairViz:Visualization using Graph Traversal
Improving graphics by ameliorating order effects, using Eulerian tours and Hamiltonian decompositions of graphs. References for the methods presented here are C.B. Hurley and R.W. Oldford (2010) <doi:10.1198/jcgs.2010.09136> and C.B. Hurley and R.W. Oldford (2011) <doi:10.1007/s00180-011-0229-5>.
Maintained by Catherine Hurley. Last updated 3 years ago.
8.4 match 1 stars 5.75 score 42 scripts 3 dependentsbeccadaniel
doMC:Foreach Parallel Adaptor for 'parallel'
Provides a parallel backend for the %dopar% function using the multicore functionality of the parallel package.
Maintained by Folashade Daniel. Last updated 3 years ago.
6.5 match 7.39 score 10k scripts 2 dependentsmicrosoft
finnts:Microsoft Finance Time Series Forecasting Framework
Automated time series forecasting developed by Microsoft Finance. The Microsoft Finance Time Series Forecasting Framework, aka Finn, can be used to forecast any component of the income statement, balance sheet, or any other area of interest by finance. Any numerical quantity over time, Finn can be used to forecast it. While it can be applied outside of the finance domain, Finn was built to meet the needs of financial analysts to better forecast their businesses within a company, and has a lot of built in features that are specific to the needs of financial forecasters. Happy forecasting!
Maintained by Mike Tokic. Last updated 24 days ago.
businessdata-sciencefeature-selectionfinancefinntsforecastingmachine-learningmicrosofttime-series
5.1 match 193 stars 9.45 score 39 scriptsazure
AzureRMR:Interface to 'Azure Resource Manager'
A lightweight but powerful R interface to the 'Azure Resource Manager' REST API. The package exposes a comprehensive class framework and related tools for creating, updating and deleting 'Azure' resource groups, resources and templates. While 'AzureRMR' can be used to manage any 'Azure' service, it can also be extended by other packages to provide extra functionality for specific services. Part of the 'AzureR' family of packages.
Maintained by Hong Ooi. Last updated 1 years ago.
azureazure-resource-managerazure-sdk-rcloud
4.8 match 20 stars 9.94 score 51 scripts 12 dependentsmschubert
clustermq:Evaluate Function Calls on HPC Schedulers (LSF, SGE, SLURM, PBS/Torque)
Evaluate arbitrary function calls using workers on HPC schedulers in single line of code. All processing is done on the network without accessing the file system. Remote schedulers are supported via SSH.
Maintained by Michael Schubert. Last updated 23 days ago.
clusterhigh-performance-computinglsfsgeslurmsshzeromq3cpp
4.6 match 149 stars 10.23 score 253 scriptstimelyportfolio
parcoords:'Htmlwidget' for 'd3.js' Parallel Coordinates Chart
Create interactive parallel coordinates charts with this 'htmlwidget' wrapper for 'd3.js' <https://github.com/BigFatDog/parcoords-es> {'parallel-coordinates'}.
Maintained by Kenton Russell. Last updated 3 years ago.
8.3 match 77 stars 5.73 score 141 scriptsrdatatable
data.table:Extension of `data.frame`
Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development.
Maintained by Tyson Barrett. Last updated 17 hours ago.
2.0 match 3.7k stars 23.53 score 230k scripts 4.6k dependentsmrc-ide
drjacoby:Flexible Markov Chain Monte Carlo via Reparameterization
drjacoby is an R package for performing Bayesian inference via Markov chain monte carlo (MCMC). In addition to being highly flexible it implements some advanced techniques that can improve mixing in tricky situations.
Maintained by Bob Verity. Last updated 9 months ago.
7.5 match 12 stars 6.27 score 77 scriptsdipterix
dipsaus:A Dipping Sauce for Data Analysis and Visualizations
Works as an "add-on" to packages like 'shiny', 'future', as well as 'rlang', and provides utility functions. Just like dipping sauce adding flavors to potato chips or pita bread, 'dipsaus' for data analysis and visualizations adds handy functions and enhancements to popular packages. The goal is to provide simple solutions that are frequently asked for online, such as how to synchronize 'shiny' inputs without freezing the app, or how to get memory size on 'Linux' or 'MacOS' system. The enhancements roughly fall into these four categories: 1. 'shiny' input widgets; 2. high-performance computing using the 'future' package; 3. modify R calls and convert among numbers, strings, and other objects. 4. utility functions to get system information such like CPU chip-set, memory limit, etc.
Maintained by Zhengjia Wang. Last updated 4 days ago.
5.9 match 13 stars 7.90 score 85 scripts 3 dependentslindbrook
cholera:Amend, Augment and Aid Analysis of John Snow's Cholera Map
Amends errors, augments data and aids analysis of John Snow's map of the 1854 London cholera outbreak.
Maintained by lindbrook. Last updated 16 hours ago.
choleradata-visualizationdatasetsepidemiologyjohn-snowpublic-healthtriangulation-delaunayvoronoivoronoi-polygons
5.0 match 136 stars 9.33 score 95 scriptsrstudio
keras3:R Interface to 'Keras'
Interface to 'Keras' <https://keras.io>, a high-level neural networks API. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both CPU and GPU devices.
Maintained by Tomasz Kalinowski. Last updated 3 days ago.
3.4 match 845 stars 13.57 score 264 scripts 2 dependentsfstpackage
fst:Lightning Fast Serialization of Data Frames
Multithreaded serialization of compressed data frames using the 'fst' format. The 'fst' format allows for full random access of stored data and a wide range of compression settings using the LZ4 and ZSTD compressors.
Maintained by Mark Klik. Last updated 6 months ago.
compressiondata-framedata-storagecpp
3.5 match 624 stars 13.14 score 1.9k scripts 55 dependentsbioc
GenomicPlot:Plot profiles of next generation sequencing data in genomic features
Visualization of next generation sequencing (NGS) data is essential for interpreting high-throughput genomics experiment results. 'GenomicPlot' facilitates plotting of NGS data in various formats (bam, bed, wig and bigwig); both coverage and enrichment over input can be computed and displayed with respect to genomic features (such as UTR, CDS, enhancer), and user defined genomic loci or regions. Statistical tests on signal intensity within user defined regions of interest can be performed and represented as boxplots or bar graphs. Parallel processing is used to speed up computation on multicore platforms. In addition to genomic plots which is suitable for displaying of coverage of genomic DNA (such as ChIPseq data), metagenomic (without introns) plots can also be made for RNAseq or CLIPseq data as well.
Maintained by Shuye Pu. Last updated 1 months ago.
alternativesplicingchipseqcoveragegeneexpressionrnaseqsequencingsoftwaretranscriptionvisualizationannotation
8.2 match 3 stars 5.62 score 4 scriptsbioc
MAPFX:MAssively Parallel Flow cytometry Xplorer (MAPFX): A Toolbox for Analysing Data from the Massively-Parallel Cytometry Experiments
MAPFX is an end-to-end toolbox that pre-processes the raw data from MPC experiments (e.g., BioLegend's LEGENDScreen and BD Lyoplates assays), and further imputes the ‘missing’ infinity markers in the wells without those measurements. The pipeline starts by performing background correction on raw intensities to remove the noise from electronic baseline restoration and fluorescence compensation by adapting a normal-exponential convolution model. Unwanted technical variation, from sources such as well effects, is then removed using a log-normal model with plate, column, and row factors, after which infinity markers are imputed using the informative backbone markers as predictors. The completed dataset can then be used for clustering and other statistical analyses. Additionally, MAPFX can be used to normalise data from FFC assays as well.
Maintained by Hsiao-Chi Liao. Last updated 5 months ago.
softwareflowcytometrycellbasedassayssinglecellproteomicsclustering
10.1 match 1 stars 4.54 scorejoshuawlambert
rFSA:Feasible Solution Algorithm for Finding Best Subsets and Interactions
Assists in statistical model building to find optimal and semi-optimal higher order interactions and best subsets. Uses the lm(), glm(), and other R functions to fit models generated from a feasible solution algorithm. Discussed in Subset Selection in Regression, A Miller (2002). Applied and explained for least median of squares in Hawkins (1993) <doi:10.1016/0167-9473(93)90246-P>. The feasible solution algorithm comes up with model forms of a specific type that can have fixed variables, higher order interactions and their lower order terms.
Maintained by Joshua Lambert. Last updated 4 years ago.
algorithmfsainteractionmodelsparallelstatisticalstatisticssubset
11.0 match 7 stars 4.15 score 20 scriptsbioc
rhdf5:R Interface to HDF5
This package provides an interface between HDF5 and R. HDF5's main features are the ability to store and access very large and/or complex datasets and a wide variety of metadata on mass storage (disk) through a completely portable file format. The rhdf5 package is thus suited for the exchange of large and/or complex datasets between R and other software package, and for letting R applications work on datasets that are larger than the available RAM.
Maintained by Mike Smith. Last updated 2 months ago.
infrastructuredataimporthdf5rhdf5opensslcurlzlibcpp
2.8 match 62 stars 15.93 score 4.2k scripts 232 dependentsgrasia
knnp:Time Series Prediction using K-Nearest Neighbors Algorithm (Parallel)
Two main functionalities are provided. One of them is predicting values with k-nearest neighbors algorithm and the other is optimizing the parameters k and d of the algorithm. These are carried out in parallel using multiple threads.
Maintained by Daniel Bastarrica Lacalle. Last updated 5 years ago.
knearest-neighbor-algorithmparalleltime-series-forecasting
16.6 match 1 stars 2.70 score 8 scriptsmkoohafkan
reval:Argument Table Generation for Sensitivity Analysis
Simplified scenario testing and sensitivity analysis, redesigned to use packages 'future' and 'furrr'. Provides functions for generating function argument sets using one-factor-at-a-time (OFAT) and (sampled) permutations.
Maintained by Michael C Koohafkan. Last updated 6 months ago.
11.0 match 2 stars 4.04 score 11 scriptsdaqana
dqrng:Fast Pseudo Random Number Generators
Several fast random number generators are provided as C++ header only libraries: The PCG family by O'Neill (2014 <https://www.cs.hmc.edu/tr/hmc-cs-2014-0905.pdf>) as well as the Xoroshiro / Xoshiro family by Blackman and Vigna (2021 <doi:10.1145/3460772>). In addition fast functions for generating random numbers according to a uniform, normal and exponential distribution are included. The latter two use the Ziggurat algorithm originally proposed by Marsaglia and Tsang (2000, <doi:10.18637/jss.v005.i08>). The fast sampling methods support unweighted sampling both with and without replacement. These functions are exported to R and as a C++ interface and are enabled for use with the default 64 bit generator from the PCG family, Xoroshiro128+/++/** and Xoshiro256+/++/** as well as the 64 bit version of the 20 rounds Threefry engine (Salmon et al., 2011, <doi:10.1145/2063384.2063405>) as provided by the package 'sitmo'.
Maintained by Ralf Stubner. Last updated 6 months ago.
randomrandom-distributionsrandom-generationrandom-samplingrngcpp
3.3 match 42 stars 13.12 score 188 scripts 183 dependentsurbananalyst
dodgr:Distances on Directed Graphs
Distances on dual-weighted directed graphs using priority-queue shortest paths (Padgham (2019) <doi:10.32866/6945>). Weighted directed graphs have weights from A to B which may differ from those from B to A. Dual-weighted directed graphs have two sets of such weights. A canonical example is a street network to be used for routing in which routes are calculated by weighting distances according to the type of way and mode of transport, yet lengths of routes must be calculated from direct distances.
Maintained by Mark Padgham. Last updated 4 days ago.
distanceopenstreetmaproutershortest-pathsstreet-networkscpp
3.8 match 129 stars 11.53 score 229 scripts 4 dependentsropensci
drake:A Pipeline Toolkit for Reproducible Computation at Scale
A general-purpose computational engine for data analysis, drake rebuilds intermediate data objects when their dependencies change, and it skips work when the results are already up to date. Not every execution starts from scratch, there is native support for parallel and distributed computing, and completed projects have tangible evidence that they are reproducible. Extensive documentation, from beginner-friendly tutorials to practical examples and more, is available at the reference website <https://docs.ropensci.org/drake/> and the online manual <https://books.ropensci.org/drake/>.
Maintained by William Michael Landau. Last updated 3 months ago.
data-sciencedrakehigh-performance-computingmakefilepeer-reviewedpipelinereproducibilityreproducible-researchropensciworkflow
3.8 match 1.3k stars 11.49 score 1.7k scripts 1 dependentssnoweye
pbdMPI:R Interface to MPI for HPC Clusters (Programming with Big Data Project)
A simplified, efficient, interface to MPI for HPC clusters. It is a derivation and rethinking of the Rmpi package. pbdMPI embraces the prevalent parallel programming style on HPC clusters. Beyond the interface, a collection of functions for global work with distributed data and resource-independent RNG reproducibility is included. It is based on S4 classes and methods.
Maintained by Wei-Chen Chen. Last updated 6 months ago.
6.0 match 2 stars 7.11 score 179 scripts 3 dependentsbraverock
PortfolioAnalytics:Portfolio Analysis, Including Numerical Methods for Optimization of Portfolios
Portfolio optimization and analysis routines and graphics.
Maintained by Brian G. Peterson. Last updated 3 months ago.
3.7 match 81 stars 11.49 score 626 scripts 2 dependentsmrc-ide
monty:Monte Carlo Models
Experimental sources for the next generation of mcstate, now called 'monty', which will support much of the old mcstate functionality but new things like better parameter interfaces, Hamiltonian Monte Carlo, and other features.
Maintained by Rich FitzJohn. Last updated 1 months ago.
5.7 match 3 stars 7.52 score 29 scripts 3 dependentsmattmar
rasterdiv:Diversity Indices for Numerical Matrices
Provides methods to calculate diversity indices on numerical matrices based on information theory, as described in Rocchini, Marcantonio and Ricotta (2017) <doi:10.1016/j.ecolind.2016.07.039>, and Rocchini et al. (2021) <doi:10.1101/2021.01.23.427872>.
Maintained by Matteo Marcantonio. Last updated 18 days ago.
5.5 match 15 stars 7.65 score 44 scripts 1 dependentshenrikbengtsson
marshal:Framework to Marshal Objects to be Used in Another R Process
Some types of R objects can be used only in the R session they were created. If used as-is in another R process, such objects often result in an immediate error or in obscure and hard-to-troubleshoot outcomes. Because of this, they cannot be saved to file and re-used at a later time. They can also not be exported to a worker in parallel processing. These objects are sometimes referred to as non-exportable or non-serializable objects. One solution to this problem is to use "marshalling" to encode the R object into an exportable representation that then can be used to re-create a copy of that object in another R process. This package provides a framework for marshalling and unmarshalling R objects such that they can be transferred using functions such as serialize() and unserialize() of base R.
Maintained by Henrik Bengtsson. Last updated 1 years ago.
marshallingparallelserialization
13.4 match 14 stars 3.10 score 18 scriptsfutureverse
marshal:Framework to Marshal Objects to be Used in Another R Process
Some types of R objects can be used only in the R session they were created. If used as-is in another R process, such objects often result in an immediate error or in obscure and hard-to-troubleshoot outcomes. Because of this, they cannot be saved to file and re-used at a later time. They can also not be exported to a worker in parallel processing. These objects are sometimes referred to as non-exportable or non-serializable objects. One solution to this problem is to use "marshalling" to encode the R object into an exportable representation that then can be used to re-create a copy of that object in another R process. This package provides a framework for marshalling and unmarshalling R objects such that they can be transferred using functions such as serialize() and unserialize() of base R.
Maintained by Henrik Bengtsson. Last updated 1 years ago.
marshallingparallelserialization
13.4 match 14 stars 3.10 score 18 scriptsprivefl
bigparallelr:Easy Parallel Tools
Utility functions for easy parallelism in R. Include some reexports from other packages, utility functions for splitting and parallelizing over blocks, and choosing and setting the number of cores used.
Maintained by Florian Privé. Last updated 5 months ago.
6.4 match 4 stars 6.44 score 76 scripts 19 dependentsbioc
HIBAG:HLA Genotype Imputation with Attribute Bagging
Imputes HLA classical alleles using GWAS SNP data, and it relies on a training set of HLA and SNP genotypes. HIBAG can be used by researchers with published parameter estimates instead of requiring access to large training sample datasets. It combines the concepts of attribute bagging, an ensemble classifier method, with haplotype inference for SNPs and HLA types. Attribute bagging is a technique which improves the accuracy and stability of classifier ensembles using bootstrap aggregating and random variable selection.
Maintained by Xiuwen Zheng. Last updated 4 months ago.
geneticsstatisticalmethodbioinformaticsgpuhlaimputationmhcsnpcpp
5.0 match 30 stars 8.24 score 48 scriptsggobi
GGally:Extension to 'ggplot2'
The R package 'ggplot2' is a plotting system based on the grammar of graphics. 'GGally' extends 'ggplot2' by adding several functions to reduce the complexity of combining geometric objects with transformed data. Some of these functions include a pairwise plot matrix, a two group pairwise plot matrix, a parallel coordinates plot, a survival plot, and several functions to plot networks.
Maintained by Barret Schloerke. Last updated 10 months ago.
2.5 match 597 stars 16.15 score 17k scripts 154 dependentsiiasa
ibis.iSDM:Modelling framework for integrated biodiversity distribution scenarios
Integrated framework of modelling the distribution of species and ecosystems in a suitability framing. This package allows the estimation of integrated species distribution models (iSDM) based on several sources of evidence and provided presence-only and presence-absence datasets. It makes heavy use of point-process models for estimating habitat suitability and allows to include spatial latent effects and priors in the estimation. To do so 'ibis.iSDM' supports a number of engines for Bayesian and more non-parametric machine learning estimation. Further, the 'ibis.iSDM' is specifically customized to support spatial-temporal projections of habitat suitability into the future.
Maintained by Martin Jung. Last updated 4 months ago.
bayesianbiodiversityintegrated-frameworkpoisson-processscenariossdmspatial-grainspatial-predictionsspecies-distribution-modelling
9.2 match 21 stars 4.36 score 12 scripts 1 dependentsaphp
heemod:Markov Models for Health Economic Evaluations
An implementation of the modelling and reporting features described in reference textbook and guidelines (Briggs, Andrew, et al. Decision Modelling for Health Economic Evaluation. Oxford Univ. Press, 2011; Siebert, U. et al. State-Transition Modeling. Medical Decision Making 32, 690-700 (2012).): deterministic and probabilistic sensitivity analysis, heterogeneity analysis, time dependency on state-time and model-time (semi-Markov and non-homogeneous Markov models), etc.
Maintained by Kevin Zarca. Last updated 6 months ago.
4.5 match 15 stars 8.81 score 204 scriptsfutureverse
future.tools:Tools for Working with Futures
Tools for Working with Futures.
Maintained by Henrik Bengtsson. Last updated 9 months ago.
parallel-computingparallel-programming
15.0 match 2 stars 2.60 scorebioc
sitePath:Phylogeny-based sequence clustering with site polymorphism
Using site polymorphism is one of the ways to cluster DNA/protein sequences but it is possible for the sequences with the same polymorphism on a single site to be genetically distant. This package is aimed at clustering sequences using site polymorphism and their corresponding phylogenetic trees. By considering their location on the tree, only the structurally adjacent sequences will be clustered. However, the adjacent sequences may not necessarily have the same polymorphism. So a branch-and-bound like algorithm is used to minimize the entropy representing the purity of site polymorphism of each cluster.
Maintained by Chengyang Ji. Last updated 5 months ago.
alignmentmultiplesequencealignmentphylogeneticssnpsoftwaremutationcpp
7.5 match 8 stars 5.20 score 9 scripts2005m
kit:Data Manipulation Functions Implemented in C
Basic functions, implemented in C, for large data manipulation. Fast vectorised ifelse()/nested if()/switch() functions, psum()/pprod() functions equivalent to pmin()/pmax() plus others which are missing from base R. Most of these functions are callable at C level.
Maintained by Morgan Jacob. Last updated 6 months ago.
4.3 match 58 stars 9.11 score 92 scripts 5 dependentsmartynplummer
rjags:Bayesian Graphical Models using MCMC
Interface to the JAGS MCMC library.
Maintained by Martyn Plummer. Last updated 7 months ago.
4.0 match 7 stars 9.48 score 4.0k scripts 165 dependentsfunecology
fundiversity:Easy Computation of Functional Diversity Indices
Computes six functional diversity indices. These are namely, Functional Divergence (FDiv), Function Evenness (FEve), Functional Richness (FRic), Functional Richness intersections (FRic_intersect), Functional Dispersion (FDis), and Rao's entropy (Q) (reviewed in Villéger et al. 2008 <doi:10.1890/07-1206.1>). Provides efficient, modular, and parallel functions to compute functional diversity indices (Grenié & Gruson 2023 <doi:10.1111/ecog.06585>).
Maintained by Matthias Grenié. Last updated 8 months ago.
biodiversitybiodiversity-indicatorsbiodiversity-informaticsfunctional-diversityfunctional-ecologyfunctional-traitfunctional-traitstraittrait-basedtraits
5.2 match 38 stars 7.34 score 38 scriptsjohncoene
echarts4r:Create Interactive Graphs with 'Echarts JavaScript' Version 5
Easily create interactive charts by leveraging the 'Echarts Javascript' library which includes 36 chart types, themes, 'Shiny' proxies and animations.
Maintained by David Munoz Tord. Last updated 1 days ago.
echartshacktoberfesthtmlwidgethtmlwidgetsvisualization
3.3 match 603 stars 11.45 score 1.3k scripts 11 dependentsheike
ggparallel:Variations of Parallel Coordinate Plots for Categorical Data
Create hammock plots, parallel sets, and common angle plots with 'ggplot2'.
Maintained by Heike Hofmann. Last updated 1 years ago.
7.1 match 41 stars 5.32 score 51 scriptssapfluxnet
sapfluxnetr:Working with 'Sapfluxnet' Project Data
Access, modify, aggregate and plot data from the 'Sapfluxnet' project (<http://sapfluxnet.creaf.cat>), the first global database of sap flow measurements.
Maintained by Victor Granda. Last updated 2 years ago.
5.8 match 25 stars 6.57 score 49 scriptsr-spatial
RSAGA:SAGA Geoprocessing and Terrain Analysis
Provides access to geocomputing and terrain analysis functions of the geographical information system (GIS) 'SAGA' (System for Automated Geoscientific Analyses) from within R by running the command line version of SAGA. This package furthermore provides several R functions for handling ASCII grids, including a flexible framework for applying local functions (including predict methods of fitted models) and focal functions to multiple grids. SAGA GIS is available under GPL-2 / LGPL-2 licences from <https://sourceforge.net/projects/saga-gis/>.
Maintained by Alexander Brenning. Last updated 1 months ago.
4.3 match 23 stars 8.72 score 275 scriptsjwb133
smcfcs:Multiple Imputation of Covariates by Substantive Model Compatible Fully Conditional Specification
Implements multiple imputation of missing covariates by Substantive Model Compatible Fully Conditional Specification. This is a modification of the popular FCS/chained equations multiple imputation approach, and allows imputation of missing covariate values from models which are compatible with the user specified substantive model.
Maintained by Jonathan Bartlett. Last updated 15 hours ago.
4.0 match 11 stars 9.00 score 59 scripts 1 dependentsbcallaway11
did:Treatment Effects with Multiple Periods and Groups
The standard Difference-in-Differences (DID) setup involves two periods and two groups -- a treated group and untreated group. Many applications of DID methods involve more than two periods and have individuals that are treated at different points in time. This package contains tools for computing average treatment effect parameters in Difference in Differences setups with more than two periods and with variation in treatment timing using the methods developed in Callaway and Sant'Anna (2021) <doi:10.1016/j.jeconom.2020.12.001>. The main parameters are group-time average treatment effects which are the average treatment effect for a particular group at a a particular time. These can be aggregated into a fewer number of treatment effect parameters, and the package deals with the cases where there is selective treatment timing, dynamic treatment effects, calendar time effects, or combinations of these. There are also functions for testing the Difference in Differences assumption, and plotting group-time average treatment effects.
Maintained by Brantly Callaway. Last updated 4 months ago.
3.0 match 327 stars 12.01 score 696 scripts 3 dependentseddelbuettel
prrd:Parallel Runs of Reverse Depends
Reverse depends for a given package are queued such that multiple workers can run the reverse-dependency tests in parallel.
Maintained by Dirk Eddelbuettel. Last updated 28 days ago.
hacktoberfestreverse-dependencies
7.3 match 12 stars 4.95 score 2 scriptsbrry
berryFunctions:Function Collection Related to Plotting and Hydrology
Draw horizontal histograms, color scattered points by 3rd dimension, enhance date- and log-axis plots, zoom in X11 graphics, trace errors and warnings, use the unit hydrograph in a linear storage cascade, convert lists to data.frames and arrays, fit multiple functions.
Maintained by Berry Boessenkool. Last updated 1 months ago.
3.8 match 13 stars 9.43 score 350 scripts 16 dependentsprioritizr
prioritizr:Systematic Conservation Prioritization in R
Systematic conservation prioritization using mixed integer linear programming (MILP). It provides a flexible interface for building and solving conservation planning problems. Once built, conservation planning problems can be solved using a variety of commercial and open-source exact algorithm solvers. By using exact algorithm solvers, solutions can be generated that are guaranteed to be optimal (or within a pre-specified optimality gap). Furthermore, conservation problems can be constructed to optimize the spatial allocation of different management actions or zones, meaning that conservation practitioners can identify solutions that benefit multiple stakeholders. To solve large-scale or complex conservation planning problems, users should install the Gurobi optimization software (available from <https://www.gurobi.com/>) and the 'gurobi' R package (see Gurobi Installation Guide vignette for details). Users can also install the IBM CPLEX software (<https://www.ibm.com/products/ilog-cplex-optimization-studio/cplex-optimizer>) and the 'cplexAPI' R package (available at <https://github.com/cran/cplexAPI>). Additionally, the 'rcbc' R package (available at <https://github.com/dirkschumacher/rcbc>) can be used to generate solutions using the CBC optimization software (<https://github.com/coin-or/Cbc>). For further details, see Hanson et al. (2025) <doi:10.1111/cobi.14376>.
Maintained by Richard Schuster. Last updated 10 days ago.
biodiversityconservationconservation-planneroptimizationprioritizationsolverspatialcpp
3.0 match 124 stars 11.82 score 584 scripts 2 dependentsteppeiyamamoto
mediation:Causal Mediation Analysis
We implement parametric and non parametric mediation analysis. This package performs the methods and suggestions in Imai, Keele and Yamamoto (2010) <DOI:10.1214/10-STS321>, Imai, Keele and Tingley (2010) <DOI:10.1037/a0020761>, Imai, Tingley and Yamamoto (2013) <DOI:10.1111/j.1467-985X.2012.01032.x>, Imai and Yamamoto (2013) <DOI:10.1093/pan/mps040> and Yamamoto (2013) <http://web.mit.edu/teppei/www/research/IVmediate.pdf>. In addition to the estimation of causal mediation effects, the software also allows researchers to conduct sensitivity analysis for certain parametric models.
Maintained by Teppei Yamamoto. Last updated 6 years ago.
3.4 match 10.48 score 896 scripts 11 dependentscran
Rmpi:Interface (Wrapper) to MPI (Message-Passing Interface)
An interface (wrapper) to MPI. It also provides interactive R manager and worker environment.
Maintained by Hao Yu. Last updated 2 months ago.
7.4 match 5 stars 4.76 score 5 dependentsbioc
matter:Out-of-core statistical computing and signal processing
Toolbox for larger-than-memory scientific computing and visualization, providing efficient out-of-core data structures using files or shared memory, for dense and sparse vectors, matrices, and arrays, with applications to nonuniformly sampled signals and images.
Maintained by Kylie A. Bemis. Last updated 3 months ago.
infrastructuredatarepresentationdataimportdimensionreductionpreprocessingcpp
3.7 match 57 stars 9.52 score 64 scripts 2 dependentsmiracum
DQAstats:Core Functions for Data Quality Assessment
Perform data quality assessment ('DQA') of electronic health records ('EHR'). Publication: Kapsner et al. (2021) <doi:10.1055/s-0041-1733847>.
Maintained by Lorenz A. Kapsner. Last updated 12 days ago.
5.3 match 9 stars 6.55 score 4 scripts 1 dependentsarturstat
TPmsm:Estimation of Transition Probabilities in Multistate Models
Estimation of transition probabilities for the illness-death model and or the three-state progressive model.
Maintained by Artur Araujo. Last updated 1 years ago.
illness-death-modelkaplan-meiermonte-carlo-simulationmulti-state-modelsopenmp-parallelizationsurvival-analysistransition-probabilitiesopenblasopenmp
7.5 match 1 stars 4.52 score 22 scripts 1 dependentsbioc
flowViz:Visualization for flow cytometry
Provides visualization tools for flow cytometry data.
Maintained by Mike Jiang. Last updated 5 months ago.
immunooncologyinfrastructureflowcytometrycellbasedassaysvisualization
4.6 match 7.44 score 231 scripts 12 dependentscsgillespie
benchmarkme:Crowd Sourced System Benchmarks
Benchmark your CPU and compare against other CPUs. Also provides functions for obtaining system specifications, such as RAM, CPU type, and R version.
Maintained by Colin Gillespie. Last updated 10 months ago.
3.8 match 41 stars 8.96 score 118 scripts 13 dependentsbioc
mpra:Analyze massively parallel reporter assays
Tools for data management, count preprocessing, and differential analysis in massively parallel report assays (MPRA).
Maintained by Leslie Myint. Last updated 5 months ago.
softwaregeneregulationsequencingfunctionalgenomics
5.3 match 6 stars 6.28 score 15 scriptsroliveros-ramos
calibrar:Automated Parameter Estimation for Complex Models
General optimisation and specific tools for the parameter estimation (i.e. calibration) of complex models, including stochastic ones. It implements generic functions that can be used for fitting any type of models, especially those with non-differentiable objective functions, with the same syntax as 'stats::optim()'. It supports multiple phases estimation (sequential parameter masking), constrained optimization (bounding box restrictions) and automatic parallel computation of numerical gradients. Some common maximum likelihood estimation methods and automated construction of the objective function from simulated model outputs is provided. See <https://roliveros-ramos.github.io/calibrar/> for more details.
Maintained by Ricardo Oliveros-Ramos. Last updated 18 days ago.
modelingoptimizationoptimization-methods
5.5 match 7 stars 6.05 score 27 scriptsropensci
targets:Dynamic Function-Oriented 'Make'-Like Declarative Pipelines
Pipeline tools coordinate the pieces of computationally demanding analysis projects. The 'targets' package is a 'Make'-like pipeline tool for statistics and data science in R. The package skips costly runtime for tasks that are already up to date, orchestrates the necessary computation with implicit parallel computing, and abstracts files as R objects. If all the current output matches the current upstream code and data, then the whole pipeline is up to date, and the results are more trustworthy than otherwise. The methodology in this package borrows from GNU 'Make' (2015, ISBN:978-9881443519) and 'drake' (2018, <doi:10.21105/joss.00550>).
Maintained by William Michael Landau. Last updated 14 hours ago.
data-sciencehigh-performance-computingmakepeer-reviewedpipeliner-targetopiareproducibilityreproducible-researchtargetsworkflow
2.2 match 973 stars 15.20 score 4.6k scripts 22 dependentsvegandevs
vegan:Community Ecology Package
Ordination methods, diversity analysis and other functions for community and vegetation ecologists.
Maintained by Jari Oksanen. Last updated 15 days ago.
ecological-modellingecologyordinationfortranopenblas
1.7 match 472 stars 19.41 score 15k scripts 440 dependentsr-spatial
spdep:Spatial Dependence: Weighting Schemes, Statistics
A collection of functions to create spatial weights matrix objects from polygon 'contiguities', from point patterns by distance and tessellations, for summarizing these objects, and for permitting their use in spatial data analysis, including regional aggregation by minimum spanning tree; a collection of tests for spatial 'autocorrelation', including global 'Morans I' and 'Gearys C' proposed by 'Cliff' and 'Ord' (1973, ISBN: 0850860369) and (1981, ISBN: 0850860814), 'Hubert/Mantel' general cross product statistic, Empirical Bayes estimates and 'Assunção/Reis' (1999) <doi:10.1002/(SICI)1097-0258(19990830)18:16%3C2147::AID-SIM179%3E3.0.CO;2-I> Index, 'Getis/Ord' G ('Getis' and 'Ord' 1992) <doi:10.1111/j.1538-4632.1992.tb00261.x> and multicoloured join count statistics, 'APLE' ('Li 'et al.' ) <doi:10.1111/j.1538-4632.2007.00708.x>, local 'Moran's I', 'Gearys C' ('Anselin' 1995) <doi:10.1111/j.1538-4632.1995.tb00338.x> and 'Getis/Ord' G ('Ord' and 'Getis' 1995) <doi:10.1111/j.1538-4632.1995.tb00912.x>, 'saddlepoint' approximations ('Tiefelsdorf' 2002) <doi:10.1111/j.1538-4632.2002.tb01084.x> and exact tests for global and local 'Moran's I' ('Bivand et al.' 2009) <doi:10.1016/j.csda.2008.07.021> and 'LOSH' local indicators of spatial heteroscedasticity ('Ord' and 'Getis') <doi:10.1007/s00168-011-0492-y>. The implementation of most of these measures is described in 'Bivand' and 'Wong' (2018) <doi:10.1007/s11749-018-0599-x>, with further extensions in 'Bivand' (2022) <doi:10.1111/gean.12319>. 'Lagrange' multiplier tests for spatial dependence in linear models are provided ('Anselin et al'. 1996) <doi:10.1016/0166-0462(95)02111-6>, as are 'Rao' score tests for hypothesised spatial 'Durbin' models based on linear models ('Koley' and 'Bera' 2023) <doi:10.1080/17421772.2023.2256810>. A local indicators for categorical data (LICD) implementation based on 'Carrer et al.' (2021) <doi:10.1016/j.jas.2020.105306> and 'Bivand et al.' (2017) <doi:10.1016/j.spasta.2017.03.003> was added in 1.3-7. From 'spdep' and 'spatialreg' versions >= 1.2-1, the model fitting functions previously present in this package are defunct in 'spdep' and may be found in 'spatialreg'.
Maintained by Roger Bivand. Last updated 17 days ago.
spatial-autocorrelationspatial-dependencespatial-weights
2.0 match 131 stars 16.62 score 6.0k scripts 107 dependentsr-lib
httr2:Perform HTTP Requests and Process the Responses
Tools for creating and modifying HTTP requests, then performing them and processing the results. 'httr2' is a modern re-imagining of 'httr' that uses a pipe-based interface and solves more of the problems that API wrapping packages face.
Maintained by Hadley Wickham. Last updated 7 days ago.
1.9 match 246 stars 17.66 score 1.9k scripts 1.1k dependentsauto-optimization
iraceplot:Plots for Visualizing the Data Produced by the 'irace' Package
Graphical visualization tools for analyzing the data produced by 'irace'. The 'iraceplot' package enables users to analyze the performance and the parameter space data sampled by the configuration during the search process. It provides a set of functions that generate different plots to visualize the configurations sampled during the execution of 'irace' and their performance. The functions just require the log file generated by 'irace' and, in some cases, they can be used with user-provided data.
Maintained by Manuel López-Ibáñez. Last updated 1 months ago.
5.8 match 5 stars 5.70 score 7 scriptst-kalinowski
keras:R Interface to 'Keras'
Interface to 'Keras' <https://keras.io>, a high-level neural networks 'API'. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both 'CPU' and 'GPU' devices.
Maintained by Tomasz Kalinowski. Last updated 11 months ago.
3.0 match 10.82 score 10k scripts 54 dependentsbioc
parglms:support for parallelized estimation of GLMs/GEEs
This package provides support for parallelized estimation of GLMs/GEEs, catering for dispersed data.
Maintained by VJ Carey. Last updated 5 months ago.
9.8 match 3.30 score 3 scriptsmarcellgranat
currr:Apply Mapping Functions in Frequent Saving
Implementations of the family of map() functions with frequent saving of the intermediate results. The contained functions let you start the evaluation of the iterations where you stopped (reading the already evaluated ones from cache), and work with the currently evaluated iterations while remaining ones are running in a background job. Parallel computing is also easier with the workers parameter.
Maintained by Marcell Granat. Last updated 7 months ago.
checkpointsparallel-computingpurrr
8.0 match 21 stars 4.02 score 7 scriptscran
snowfall:Easier Cluster Computing (Based on 'snow')
Usability wrapper around snow for easier development of parallel R programs. This package offers e.g. extended error checks, and additional functions. All functions work in sequential mode, too, if no cluster is present or wished. Package is also designed as connector to the cluster management tool sfCluster, but can also used without it.
Maintained by Jochen Knaus. Last updated 1 years ago.
4.1 match 7.89 score 1.8k scripts 48 dependentsbusiness-science
modeltime:The Tidymodels Extension for Time Series Modeling
The time series forecasting framework for use with the 'tidymodels' ecosystem. Models include ARIMA, Exponential Smoothing, and additional time series models from the 'forecast' and 'prophet' packages. Refer to "Forecasting Principles & Practice, Second edition" (<https://otexts.com/fpp2/>). Refer to "Prophet: forecasting at scale" (<https://research.facebook.com/blog/2017/02/prophet-forecasting-at-scale/>.).
Maintained by Matt Dancho. Last updated 5 months ago.
arimadata-sciencedeep-learningetsforecastingmachine-learningmachine-learning-algorithmsmodeltimeprophettbatstidymodelingtidymodelstimetime-seriestime-series-analysistimeseriestimeseries-forecasting
3.0 match 549 stars 10.57 score 1.1k scripts 7 dependentssuyusung
R2jags:Using R to Run 'JAGS'
Providing wrapper functions to implement Bayesian analysis in JAGS. Some major features include monitoring convergence of a MCMC model using Rubin and Gelman Rhat statistics, automatically running a MCMC model till it converges, and implementing parallel processing of a MCMC model for multiple chains.
Maintained by Yu-Sung Su. Last updated 4 months ago.
2.8 match 8 stars 11.43 score 3.4k scripts 47 dependentsslzhang-fd
mirtjml:Joint Maximum Likelihood Estimation for High-Dimensional Item Factor Analysis
Provides constrained joint maximum likelihood estimation algorithms for item factor analysis (IFA) based on multidimensional item response theory models. So far, we provide functions for exploratory and confirmatory IFA based on the multidimensional two parameter logistic (M2PL) model for binary response data. Comparing with traditional estimation methods for IFA, the methods implemented in this package scale better to data with large numbers of respondents, items, and latent factors. The computation is facilitated by multiprocessing 'OpenMP' API. For more information, please refer to: 1. Chen, Y., Li, X., & Zhang, S. (2018). Joint Maximum Likelihood Estimation for High-Dimensional Exploratory Item Factor Analysis. Psychometrika, 1-23. <doi:10.1007/s11336-018-9646-5>; 2. Chen, Y., Li, X., & Zhang, S. (2019). Structured Latent Factor Analysis for Large-scale Data: Identifiability, Estimability, and Their Implications. Journal of the American Statistical Association, <doi: 10.1080/01621459.2019.1635485>.
Maintained by Siliang Zhang. Last updated 4 years ago.
ifaitem-factor-analysislarge-scale-assessmentparallel-computingpsychometricsopenblascppopenmp
7.5 match 9 stars 4.21 score 12 scripts 1 dependentsbioc
monocle:Clustering, differential expression, and trajectory analysis for single- cell RNA-Seq
Monocle performs differential expression and time-series analysis for single-cell expression experiments. It orders individual cells according to progress through a biological process, without knowing ahead of time which genes define progress through that process. Monocle also performs differential expression analysis, clustering, visualization, and other useful tasks on single cell expression data. It is designed to work with RNA-Seq and qPCR data, but could be used with other types as well.
Maintained by Cole Trapnell. Last updated 5 months ago.
immunooncologysequencingrnaseqgeneexpressiondifferentialexpressioninfrastructuredataimportdatarepresentationvisualizationclusteringmultiplecomparisonqualitycontrolcpp
3.5 match 8.89 score 1.6k scripts 2 dependentsjeff-hughes
paramtest:Run a Function Iteratively While Varying Parameters
Run simulations or other functions while easily varying parameters from one iteration to the next. Some common use cases would be grid search for machine learning algorithms, running sets of simulations (e.g., estimating statistical power for complex models), or bootstrapping under various conditions. See the 'paramtest' documentation for more information and examples.
Maintained by Jeffrey Hughes. Last updated 7 years ago.
6.5 match 1 stars 4.85 score 47 scriptsmarcozanotti
dispositionEffect:Analysis of Disposition Effect on Financial Portfolios
Evaluate the presence of disposition effect and others irrational investor's behaviors based solely on investor's transactions and financial market data. Experimental data can also be used to perform the analysis. Four different methodologies are implemented to account for the different nature of human behaviors on financial markets. Novel analyses such as portfolio driven and time series disposition effect are also allowed.
Maintained by Marco Zanotti. Last updated 3 years ago.
behavioral-economicsbehavioral-scienceseconometricseconomicsfinancefinancial-analysisfinancial-datafinancial-marketstime-series
6.0 match 4 stars 5.20 score 9 scriptschristophergandrud
mcreplicate:Multi-Core Replicate
Multi-core replication function to make it easier to do fast Monte Carlo simulation. Based on the mcreplicate() function from the 'rethinking' package. The 'rethinking' package requires installing 'rstan', which is onerous to install, while also not adding capabilities to this function.
Maintained by Christopher Gandrud. Last updated 4 years ago.
7.5 match 5 stars 4.16 score 29 scriptsss3sim
ss3sim:Fisheries Stock Assessment Simulation Testing with Stock Synthesis
A framework for fisheries stock assessment simulation testing with Stock Synthesis (SS3) as described in Anderson et al. (2014) <doi:10.1371/journal.pone.0092725>.
Maintained by Kelli F. Johnson. Last updated 5 months ago.
fisheriessimulationstock-synthesis
3.5 match 39 stars 8.89 score 149 scriptslme4
lme4:Linear Mixed-Effects Models using 'Eigen' and S4
Fit linear and generalized linear mixed-effects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the 'Eigen' C++ library for numerical linear algebra and 'RcppEigen' "glue".
Maintained by Ben Bolker. Last updated 1 days ago.
1.5 match 647 stars 20.69 score 35k scripts 1.5k dependentsbioc
GenomicFiles:Distributed computing by file or by range
This package provides infrastructure for parallel computations distributed 'by file' or 'by range'. User defined MAPPER and REDUCER functions provide added flexibility for data combination and manipulation.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
geneticsinfrastructuredataimportsequencingcoverage
4.5 match 6.86 score 89 scripts 16 dependentsbwlewis
doRedis:'Foreach' Parallel Adapter Using the 'Redis' Database
Create and manage fault-tolerant task queues for the 'foreach' package using the 'Redis' key/value database.
Maintained by B. W. Lewis. Last updated 4 years ago.
4.7 match 71 stars 6.56 score 42 scriptsbranchlab
metasnf:Meta Clustering with Similarity Network Fusion
Framework to facilitate patient subtyping with similarity network fusion and meta clustering. The similarity network fusion (SNF) algorithm was introduced by Wang et al. (2014) in <doi:10.1038/nmeth.2810>. SNF is a data integration approach that can transform high-dimensional and diverse data types into a single similarity network suitable for clustering with minimal loss of information from each initial data source. The meta clustering approach was introduced by Caruana et al. (2006) in <doi:10.1109/ICDM.2006.103>. Meta clustering involves generating a wide range of cluster solutions by adjusting clustering hyperparameters, then clustering the solutions themselves into a manageable number of qualitatively similar solutions, and finally characterizing representative solutions to find ones that are best for the user's specific context. This package provides a framework to easily transform multi-modal data into a wide range of similarity network fusion-derived cluster solutions as well as to visualize, characterize, and validate those solutions. Core package functionality includes easy customization of distance metrics, clustering algorithms, and SNF hyperparameters to generate diverse clustering solutions; calculation and plotting of associations between features, between patients, and between cluster solutions; and standard cluster validation approaches including resampled measures of cluster stability, standard metrics of cluster quality, and label propagation to evaluate generalizability in unseen data. Associated vignettes guide the user through using the package to identify patient subtypes while adhering to best practices for unsupervised learning.
Maintained by Prashanth S Velayudhan. Last updated 3 days ago.
bioinformaticsclusteringmetaclusteringsnf
3.8 match 8 stars 8.21 score 30 scriptsbonsook
REN:Regularization Ensemble for Robust Portfolio Optimization
Portfolio optimization is achieved through a combination of regularization techniques and ensemble methods that are designed to generate stable out-of-sample return predictions, particularly in the presence of strong correlations among assets. The package includes functions for data preparation, parallel processing, and portfolio analysis using methods such as Mean-Variance, James-Stein, LASSO, Ridge Regression, and Equal Weighting. It also provides visualization tools and performance metrics, such as the Sharpe ratio, volatility, and maximum drawdown, to assess the results.
Maintained by Bonsoo Koo. Last updated 5 months ago.
6.1 match 1 stars 5.04 score 2 scriptsskranz
ParallelTrendsPlot:Experimental Package: Plots to diagnose parallel trends in DID regression with additional control variables.
Experimental Package: Plots to diagnose parallel trends in DID regression with additional control variables.
Maintained by Sebastian Kranz. Last updated 3 years ago.
12.4 match 6 stars 2.48 score 3 scriptsbenubah
control:A Control Systems Toolbox
Solves control systems problems relating to time/frequency response, LTI systems design and analysis, transfer function manipulations, and system conversion.
Maintained by Ben C. Ubah. Last updated 5 years ago.
5.2 match 19 stars 5.86 score 76 scriptssparklyr
sparklyr:R Interface to Apache Spark
R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.
Maintained by Edgar Ruiz. Last updated 8 days ago.
apache-sparkdistributeddplyridelivymachine-learningremote-clusterssparksparklyr
2.0 match 959 stars 15.16 score 4.0k scripts 21 dependentscran
cdparcoord:Top Frequency-Based Parallel Coordinates
Parallel coordinate plotting with resolutions for large data sets and missing values.
Maintained by Norm Matloff. Last updated 6 years ago.
10.8 match 2.81 score 13 scriptsbioc
coMethDMR:Accurate identification of co-methylated and differentially methylated regions in epigenome-wide association studies
coMethDMR identifies genomic regions associated with continuous phenotypes by optimally leverages covariations among CpGs within predefined genomic regions. Instead of testing all CpGs within a genomic region, coMethDMR carries out an additional step that selects co-methylated sub-regions first without using any outcome information. Next, coMethDMR tests association between methylation within the sub-region and continuous phenotype using a random coefficient mixed effects model, which models both variations between CpG sites within the region and differential methylation simultaneously.
Maintained by Fernanda Veitzman. Last updated 5 months ago.
dnamethylationepigeneticsmethylationarraydifferentialmethylationgenomewideassociation
4.7 match 7 stars 6.47 score 42 scriptsbioc
S4Vectors:Foundation of vector-like and list-like containers in Bioconductor
The S4Vectors package defines the Vector and List virtual classes and a set of generic functions that extend the semantic of ordinary vectors and lists in R. Package developers can easily implement vector-like or list-like objects as concrete subclasses of Vector or List. In addition, a few low-level concrete subclasses of general interest (e.g. DataFrame, Rle, Factor, and Hits) are implemented in the S4Vectors package itself (many more are implemented in the IRanges package and in other Bioconductor infrastructure packages).
Maintained by Hervé Pagès. Last updated 1 months ago.
infrastructuredatarepresentationbioconductor-packagecore-package
1.9 match 18 stars 16.05 score 1.0k scripts 1.9k dependentsstan-dev
bayesplot:Plotting for Bayesian Models
Plotting functions for posterior analysis, MCMC diagnostics, prior and posterior predictive checks, and other visualizations to support the applied Bayesian workflow advocated in Gabry, Simpson, Vehtari, Betancourt, and Gelman (2019) <doi:10.1111/rssa.12378>. The package is designed not only to provide convenient functionality for users, but also a common set of functions that can be easily used by developers working on a variety of R packages for Bayesian modeling, particularly (but not exclusively) packages interfacing with 'Stan'.
Maintained by Jonah Gabry. Last updated 1 months ago.
bayesianggplot2mcmcpandocstanstatistical-graphicsvisualization
1.8 match 436 stars 16.69 score 6.5k scripts 98 dependentspatriciamar
ShinyItemAnalysis:Test and Item Analysis via Shiny
Package including functions and interactive shiny application for the psychometric analysis of educational tests, psychological assessments, health-related and other types of multi-item measurements, or ratings from multiple raters.
Maintained by Patricia Martinkova. Last updated 1 months ago.
assessmentdifferential-item-functioningitem-analysisitem-response-theorypsychometricsshiny
3.8 match 44 stars 7.88 score 105 scripts 3 dependentsthomasp85
ggforce:Accelerating 'ggplot2'
The aim of 'ggplot2' is to aid in visual data investigations. This focus has led to a lack of facilities for composing specialised plots. 'ggforce' aims to be a collection of mainly new stats and geoms that fills this gap. All additional functionality is aimed to come through the official extension system so using 'ggforce' should be a stable experience.
Maintained by Thomas Lin Pedersen. Last updated 1 years ago.
ggplot-extensionggplot2visualizationcpp
1.9 match 920 stars 15.83 score 9.3k scripts 293 dependentsf-rousset
spaMM:Mixed-Effect Models, with or without Spatial Random Effects
Inference based on models with or without spatially-correlated random effects, multivariate responses, or non-Gaussian random effects (e.g., Beta). Variation in residual variance (heteroscedasticity) can itself be represented by a mixed-effect model. Both classical geostatistical models (Rousset and Ferdy 2014 <doi:10.1111/ecog.00566>), and Markov random field models on irregular grids (as considered in the 'INLA' package, <https://www.r-inla.org>), can be fitted, with distinct computational procedures exploiting the sparse matrix representations for the latter case and other autoregressive models. Laplace approximations are used for likelihood or restricted likelihood. Penalized quasi-likelihood and other variants discussed in the h-likelihood literature (Lee and Nelder 2001 <doi:10.1093/biomet/88.4.987>) are also implemented.
Maintained by François Rousset. Last updated 9 months ago.
6.0 match 4.94 score 208 scripts 5 dependentsmarce10
warbleR:Streamline Bioacoustic Analysis
Functions aiming to facilitate the analysis of the structure of animal acoustic signals in 'R'. 'warbleR' makes use of the basic sound analysis tools from the packages 'tuneR' and 'seewave', and offers new tools for explore and quantify acoustic signal structure. The package allows to organize and manipulate multiple sound files, create spectrograms of complete recordings or individual signals in different formats, run several measures of acoustic structure, and characterize different structural levels in acoustic signals.
Maintained by Marcelo Araya-Salas. Last updated 2 months ago.
animal-acoustic-signalsaudio-processingbioacousticsspectrogramstreamline-analysiscpp
2.7 match 54 stars 11.01 score 270 scripts 4 dependentsr-forge
tm:Text Mining Package
A framework for text mining applications within R.
Maintained by Kurt Hornik. Last updated 24 days ago.
2.3 match 12.96 score 14k scripts 101 dependentszdk123
pulsar:Parallel Utilities for Lambda Selection along a Regularization Path
Model selection for penalized graphical models using the Stability Approach to Regularization Selection ('StARS'), with options for speed-ups including Bounded StARS (B-StARS), batch computing, and other stability metrics (e.g., graphlet stability G-StARS). Christian L. Müller, Richard Bonneau, Zachary Kurtz (2016) <arXiv:1605.07072>.
Maintained by Zachary Kurtz. Last updated 1 years ago.
4.7 match 10 stars 6.16 score 65 scriptsrapidsurveys
bbw:Blocked Weighted Bootstrap
The blocked weighted bootstrap (BBW) is an estimation technique for use with data from two-stage cluster sampled surveys in which either prior weighting (e.g. population-proportional sampling or PPS as used in Standardized Monitoring and Assessment of Relief and Transitions or SMART surveys) or posterior weighting (e.g. as used in rapid assessment method or RAM and simple spatial sampling method or S3M surveys) is implemented. See Cameron et al (2008) <doi:10.1162/rest.90.3.414> for application of bootstrap to cluster samples. See Aaron et al (2016) <doi:10.1371/journal.pone.0163176> and Aaron et al (2016) <doi:10.1371/journal.pone.0162462> for application of the blocked weighted bootstrap to estimate indicators from two-stage cluster sampled surveys.
Maintained by Ernest Guevarra. Last updated 2 months ago.
bootstrapping-statisticsramsurveys
4.9 match 3 stars 5.91 score 9 scripts 2 dependentsbioc
gdsfmt:R Interface to CoreArray Genomic Data Structure (GDS) Files
Provides a high-level R interface to CoreArray Genomic Data Structure (GDS) data files. GDS is portable across platforms with hierarchical structure to store multiple scalable array-oriented data sets with metadata information. It is suited for large-scale datasets, especially for data which are much larger than the available random-access memory. The gdsfmt package offers the efficient operations specifically designed for integers of less than 8 bits, since a diploid genotype, like single-nucleotide polymorphism (SNP), usually occupies fewer bits than a byte. Data compression and decompression are available with relatively efficient random access. It is also allowed to read a GDS file in parallel with multiple R processes supported by the package parallel.
Maintained by Xiuwen Zheng. Last updated 19 days ago.
infrastructuredataimportbioinformaticsgds-formatgenomicscpp
2.6 match 18 stars 11.30 score 920 scripts 30 dependentsstatnet
ergm:Fit, Simulate and Diagnose Exponential-Family Models for Networks
An integrated set of tools to analyze and simulate networks based on exponential-family random graph models (ERGMs). 'ergm' is a part of the Statnet suite of packages for network analysis. See Hunter, Handcock, Butts, Goodreau, and Morris (2008) <doi:10.18637/jss.v024.i03> and Krivitsky, Hunter, Morris, and Klumb (2023) <doi:10.18637/jss.v105.i06>.
Maintained by Pavel N. Krivitsky. Last updated 5 days ago.
1.9 match 100 stars 15.36 score 1.4k scripts 36 dependentsjpquast
protti:Bottom-Up Proteomics and LiP-MS Quality Control and Data Analysis Tools
Useful functions and workflows for proteomics quality control and data analysis of both limited proteolysis-coupled mass spectrometry (LiP-MS) (Feng et. al. (2014) <doi:10.1038/nbt.2999>) and regular bottom-up proteomics experiments. Data generated with search tools such as 'Spectronaut', 'MaxQuant' and 'Proteome Discover' can be easily used due to flexibility of functions.
Maintained by Jan-Philipp Quast. Last updated 5 months ago.
data-analysislip-msmass-spectrometryomicsproteinproteomicssystems-biology
3.3 match 61 stars 8.58 score 83 scriptsalexchristensen
NetworkToolbox:Methods and Measures for Brain, Cognitive, and Psychometric Network Analysis
Implements network analysis and graph theory measures used in neuroscience, cognitive science, and psychology. Methods include various filtering methods and approaches such as threshold, dependency (Kenett, Tumminello, Madi, Gur-Gershgoren, Mantegna, & Ben-Jacob, 2010 <doi:10.1371/journal.pone.0015032>), Information Filtering Networks (Barfuss, Massara, Di Matteo, & Aste, 2016 <doi:10.1103/PhysRevE.94.062306>), and Efficiency-Cost Optimization (Fallani, Latora, & Chavez, 2017 <doi:10.1371/journal.pcbi.1005305>). Brain methods include the recently developed Connectome Predictive Modeling (see references in package). Also implements several network measures including local network characteristics (e.g., centrality), community-level network characteristics (e.g., community centrality), global network characteristics (e.g., clustering coefficient), and various other measures associated with the reliability and reproducibility of network analysis.
Maintained by Alexander Christensen. Last updated 2 years ago.
4.0 match 23 stars 6.99 score 101 scripts 4 dependentsrstudio
tfdatasets:Interface to 'TensorFlow' Datasets
Interface to 'TensorFlow' Datasets, a high-level library for building complex input pipelines from simple, re-usable pieces. See <https://www.tensorflow.org/guide> for additional details.
Maintained by Tomasz Kalinowski. Last updated 3 days ago.
3.0 match 34 stars 9.32 score 656 scripts 3 dependentsbeccadaniel
doSNOW:Foreach Parallel Adaptor for the 'snow' Package
Provides a parallel backend for the %dopar% function using the snow package of Tierney, Rossini, Li, and Sevcikova.
Maintained by Folashade Daniel. Last updated 3 years ago.
3.5 match 1 stars 7.88 score 2.6k scripts 98 dependentspecanproject
PEcAn.data.remote:PEcAn Functions Used for Extracting Remote Sensing Data
PEcAn module for processing remote data. Python module requirements: requests, json, re, ast, panads, sys. If any of these modules are missing, install using pip install <module name>.
Maintained by Bailey Morrison. Last updated 14 hours ago.
bayesiancyberinfrastructuredata-assimilationdata-scienceecosystem-modelecosystem-scienceforecastingmeta-analysisnational-science-foundationpecanplants
3.2 match 216 stars 8.74 score 6 scripts 5 dependentsjonlachmann
FBMS:Flexible Bayesian Model Selection and Model Averaging
Implements the Mode Jumping Markov Chain Monte Carlo algorithm described in <doi:10.1016/j.csda.2018.05.020> and its Genetically Modified counterpart described in <doi:10.1613/jair.1.13047> as well as the sub-sampling versions described in <doi:10.1016/j.ijar.2022.08.018> for flexible Bayesian model selection and model averaging.
Maintained by Jon Lachmann. Last updated 16 days ago.
11.3 match 2.45 score 28 scriptsr-spatial
stars:Spatiotemporal Arrays, Raster and Vector Data Cubes
Reading, manipulating, writing and plotting spatiotemporal arrays (raster and vector data cubes) in 'R', using 'GDAL' bindings provided by 'sf', and 'NetCDF' bindings by 'ncmeta' and 'RNetCDF'.
Maintained by Edzer Pebesma. Last updated 29 days ago.
1.5 match 568 stars 18.26 score 7.2k scripts 135 dependentssnoweye
pmclust:Parallel Model-Based Clustering using Expectation-Gathering-Maximization Algorithm for Finite Mixture Gaussian Model
Aims to utilize model-based clustering (unsupervised) for high dimensional and ultra large data, especially in a distributed manner. The code employs 'pbdMPI' to perform a expectation-gathering-maximization algorithm for finite mixture Gaussian models. The unstructured dispersion matrices are assumed in the Gaussian models. The implementation is default in the single program multiple data programming model. The code can be executed through 'pbdMPI' and MPI' implementations such as 'OpenMPI' and 'MPICH'. See the High Performance Statistical Computing website <https://snoweye.github.io/hpsc/> for more information, documents and examples.
Maintained by Wei-Chen Chen. Last updated 2 years ago.
7.4 match 5 stars 3.70 score 4 scriptsantonio-pgarcia
rrepast:Invoke 'Repast Simphony' Simulation Models
An R and Repast integration tool for running individual-based (IbM) simulation models developed using 'Repast Simphony' Agent-Based framework directly from R code supporting multicore execution. This package integrates 'Repast Simphony' models within R environment, making easier the tasks of running and analyzing model output data for automated parameter calibration and for carrying out uncertainty and sensitivity analysis using the power of R environment.
Maintained by Antonio Prestes Garcia. Last updated 5 years ago.
6.0 match 3 stars 4.53 score 38 scripts 1 dependentsbioc
easyRNASeq:Count summarization and normalization for RNA-Seq data
Calculates the coverage of high-throughput short-reads against a genome of reference and summarizes it per feature of interest (e.g. exon, gene, transcript). The data can be normalized as 'RPKM' or by the 'DESeq' or 'edgeR' package.
Maintained by Nicolas Delhomme. Last updated 5 months ago.
geneexpressionrnaseqgeneticspreprocessingimmunooncology
5.0 match 5.43 score 15 scripts 1 dependents