R-universe search: parallel

bioc

BiocParallel:Bioconductor facilities for parallel evaluation

This package provides modified versions and novel implementation of functions for parallel evaluation, tailored to use with Bioconductor objects.

Maintained by Martin Morgan. Last updated 24 days ago.

infrastructure bioconductor-package core-package u24ca289073 cpp

29.7 match 67 stars 17.40 score 7.3k scripts 1.1k dependents

kylebaron

mrgsim.parallel:Simulate with 'mrgsolve' in Parallel

Simulation from an 'mrgsolve' <https://cran.r-project.org/package=mrgsolve> model using a parallel backend. Input data sets are split (chunked) and simulated in parallel using mclapply() or future_lapply() <https://cran.r-project.org/package=future.apply>.

Maintained by Kyle Baron. Last updated 3 months ago.

future mrgsolve parallelization

57.4 match 5 stars 5.11 score 17 scripts

revolutionanalytics

foreach:Provides Foreach Looping Construct

Support for the foreach looping construct. Foreach is an idiom that allows for iterating over elements in a collection, without the use of an explicit loop counter. This package in particular is intended to be used for its return value, rather than for its side effects. In that sense, it is similar to the standard lapply function, but doesn't require the evaluation of a function. Using foreach without side effects also facilitates executing the loop in parallel.

Maintained by Folashade Daniel. Last updated 3 years ago.

foreach parallel-computing

12.3 match 54 stars 17.16 score 43k scripts 2.8k dependents

revelle

psych:Procedures for Psychological, Psychometric, and Personality Research

A general purpose toolbox developed originally for personality, psychometric theory and experimental psychology. Functions are primarily for multivariate analysis and scale construction using factor analysis, principal component analysis, cluster analysis and reliability analysis, although others provide basic descriptive statistics. Item Response Theory is done using factor analysis of tetrachoric and polychoric correlations. Functions for analyzing data at multiple levels include within and between group statistics, including correlations and factor analysis. Validation and cross validation of scales developed using basic machine learning algorithms are provided, as are functions for simulating and testing particular item and test structures. Several functions serve as a useful front end for structural equation modeling. Graphical displays of path diagrams, including mediation models, factor analysis and structural equation models are created using basic graphics. Some of the functions are written to support a book on psychometric theory as well as publications in personality research. For more information, see the <https://personality-project.org/r/> web page.

Maintained by William Revelle. Last updated 3 months ago.

12.4 match 52 stars 13.94 score 29k scripts 317 dependents

shikokuchuo

mirai:Minimalist Async Evaluation Framework for R

Designed for simplicity, a 'mirai' evaluates an R expression asynchronously in a parallel process, locally or distributed over the network. The result is automatically available upon completion. Modern networking and concurrency, built on 'nanonext' and 'NNG' (Nanomsg Next Gen), ensures reliable and efficient scheduling over fast inter-process communications or TCP/IP secured by TLS. Distributed computing can launch remote resources via SSH or cluster managers. An inherently queued architecture handles many more tasks than available processes, and requires no storage on the file system. Innovative features include support for otherwise non-exportable reference objects, event-driven promises, and asynchronous parallel map.

Maintained by Charlie Gao. Last updated 20 hours ago.

async asynchronous-tasks concurrency distributed-computing high-performance-computing parallel-computing

14.4 match 217 stars 11.94 score 130 scripts 7 dependents

sfcheung

manymome:Mediation, Moderation and Moderated-Mediation After Model Fitting

Computes indirect effects, conditional effects, and conditional indirect effects in a structural equation model or path model after model fitting, with no need to define any user parameters or label any paths in the model syntax, using the approach presented in Cheung and Cheung (2024) <doi:10.3758/s13428-023-02224-z>. Can also form bootstrap confidence intervals by doing bootstrapping only once and reusing the bootstrap estimates in all subsequent computations. Supports bootstrap confidence intervals for standardized (partially or completely) indirect effects, conditional effects, and conditional indirect effects as described in Cheung (2009) <doi:10.3758/BRM.41.2.425> and Cheung, Cheung, Lau, Hui, and Vong (2022) <doi:10.1037/hea0001188>. Model fitting can be done by structural equation modeling using lavaan() or regression using lm().

Maintained by Shu Fai Cheung. Last updated 21 days ago.

bootstrapping confidence-interval lavaan manymome mediation moderated-mediation moderation regression sem standardized-effect-size structural-equation-modeling

21.0 match 1 stars 8.06 score 172 scripts 4 dependents

deepayan

lattice:Trellis Graphics for R

A powerful and elegant high-level data visualization system inspired by Trellis graphics, with an emphasis on multivariate data. Lattice is sufficient for typical graphics needs, and is also flexible enough to handle most nonstandard requirements. See ?Lattice for an introduction.

Maintained by Deepayan Sarkar. Last updated 11 months ago.

9.4 match 68 stars 17.33 score 27k scripts 13k dependents

r-lib

testthat:Unit Testing for R

Software testing is important, but, in part because it is frustrating and boring, many of us avoid it. 'testthat' is a testing framework for R that is easy to learn and use, and integrates with your existing 'workflow'.

Maintained by Hadley Wickham. Last updated 15 days ago.

unit-testing cpp

7.7 match 900 stars 20.97 score 74k scripts 465 dependents

datacloning

dclone:Data Cloning and MCMC Tools for Maximum Likelihood Methods

Low level functions for implementing maximum likelihood estimating procedures for complex models using data cloning and Bayesian Markov chain Monte Carlo methods as described in Solymos 2010 <doi:10.32614/RJ-2010-011>. Sequential and parallel MCMC support for 'JAGS', 'WinBUGS', 'OpenBUGS', and 'Stan'.

Maintained by Peter Solymos. Last updated 6 months ago.

jags cpp

22.9 match 7 stars 6.91 score 215 scripts 4 dependents

coatless-rpkg

sitmo:Parallel Pseudo Random Number Generator (PPRNG) 'sitmo' Header Files

Provided within are two high quality and fast PPRNGs that may be used in an 'OpenMP' parallel environment. In addition, there is a generator for one dimensional low-discrepancy sequence. The objective of this library to consolidate the distribution of the 'sitmo' (C++98 & C++11), 'threefry' and 'vandercorput' (C++11-only) engines on CRAN by enabling others to link to the header files inside of 'sitmo' instead of including a copy of each engine within their individual package. Lastly, the package contains example implementations using the 'sitmo' package and three accompanying vignette that provide additional information.

Maintained by James Balamuta. Last updated 1 years ago.

parallel random-generation rcpp cpp openmp

15.9 match 7 stars 9.75 score 15 scripts 201 dependents

smartdata-analysis-and-statistics

SimTOST:Sample Size Estimation for Bio-Equivalence Trials Through Simulation

Sample size estimation for bio-equivalence trials is supported through a simulation-based approach that extends the Two One-Sided Tests (TOST) procedure. The methodology provides flexibility in hypothesis testing, accommodates multiple treatment comparisons, and accounts for correlated endpoints. Users can model complex trial scenarios, including parallel and crossover designs, intra-subject variability, and different equivalence margins. Monte Carlo simulations enable accurate estimation of power and type I error rates, ensuring well-calibrated study designs. The statistical framework builds on established methods for equivalence testing and multiple hypothesis testing in bio-equivalence studies, as described in Schuirmann (1987) <doi:10.1007/BF01068419>, Mielke et al. (2018) <doi:10.1080/19466315.2017.1371071>, Shieh (2022) <doi:10.1371/journal.pone.0269128>, and Sozu et al. (2015) <doi:10.1007/978-3-319-22005-5>. Comprehensive documentation and vignettes guide users through implementation and interpretation of results.

Maintained by Thomas Debray. Last updated 25 days ago.

mcmc multi-arm multiple-comparisons sample-size-calculation sample-size-estimation trial-simulation openblas cpp

22.9 match 2 stars 6.47 score 7 scripts

jwood000

RcppAlgos:High Performance Tools for Combinatorics and Computational Mathematics

Provides optimized functions and flexible iterators implemented in C++ for solving problems in combinatorics and computational mathematics. Handles various combinatorial objects including combinations, permutations, integer partitions and compositions, Cartesian products, unordered Cartesian products, and partition of groups. Utilizes the RMatrix class from 'RcppParallel' for thread safety. The combination and permutation functions contain constraint parameters that allow for generation of all results of a vector meeting specific criteria (e.g. finding all combinations such that the sum is between two bounds). Capable of ranking/unranking combinatorial objects efficiently (e.g. retrieve only the nth lexicographical result) which sets up nicely for parallelization as well as random sampling. Gmp support permits exploration where the total number of results is large (e.g. comboSample(10000, 500, n = 4)). Additionally, there are several high performance number theoretic functions that are useful for problems common in computational mathematics. Some of these functions make use of the fast integer division library 'libdivide'. The primeSieve function is based on the segmented sieve of Eratosthenes implementation by Kim Walisch. It is also efficient for large numbers by using the cache friendly improvements originally developed by Tomás Oliveira. Finally, there is a prime counting function that implements Legendre's formula based on the work of Kim Walisch.

Maintained by Joseph Wood. Last updated 1 months ago.

combinations combinatorics factorization number-theory parallel permutation prime-factorizations primesieve gmp cpp

14.3 match 45 stars 10.04 score 153 scripts 12 dependents

mihaiconstantin

parabar:Progress Bar for Parallel Tasks

A simple interface in the form of R6 classes for executing tasks in parallel, tracking their progress, and displaying accurate progress bars.

Maintained by Mihai Constantin. Last updated 3 months ago.

parallel-computing progress-bar

18.7 match 19 stars 7.53 score 20 scripts 5 dependents

mllg

batchtools:Tools for Computation on Batch Systems

As a successor of the packages 'BatchJobs' and 'BatchExperiments', this package provides a parallel implementation of the Map function for high performance computing systems managed by schedulers 'IBM Spectrum LSF' (<https://www.ibm.com/products/hpc-workload-management>), 'OpenLava' (<https://www.openlava.org/>), 'Univa Grid Engine'/'Oracle Grid Engine' (<https://www.univa.com/>), 'Slurm' (<https://slurm.schedmd.com/>), 'TORQUE/PBS' (<https://adaptivecomputing.com/cherry-services/torque-resource-manager/>), or 'Docker Swarm' (<https://docs.docker.com/engine/swarm/>). A multicore and socket mode allow the parallelization on a local machines, and multiple machines can be hooked up via SSH to create a makeshift cluster. Moreover, the package provides an abstraction mechanism to define large-scale computer experiments in a well-organized and reproducible way.

Maintained by Michel Lang. Last updated 2 years ago.

batchexperiments batchjobs docker-swarm high-performance-computing hpc hpc-clusters lsf openlava parallel-computing reproducibility sge slurm torque

12.0 match 175 stars 11.39 score 772 scripts 14 dependents

bioc

metapod:Meta-Analyses on P-Values of Differential Analyses

Implements a variety of methods for combining p-values in differential analyses of genome-scale datasets. Functions can combine p-values across different tests in the same analysis (e.g., genomic windows in ChIP-seq, exons in RNA-seq) or for corresponding tests across separate analyses (e.g., replicated comparisons, effect of different treatment conditions). Support is provided for handling log-transformed input p-values, missing values and weighting where appropriate.

Maintained by Aaron Lun. Last updated 3 months ago.

multiplecomparison differentialpeakcalling cpp

18.0 match 7.44 score 17 scripts 46 dependents

bioc

SeqArray:Data Management of Large-Scale Whole-Genome Sequence Variant Calls

Data management of large-scale whole-genome sequencing variant calls with thousands of individuals: genotypic data (e.g., SNVs, indels and structural variation calls) and annotations in SeqArray GDS files are stored in an array-oriented and compressed manner, with efficient data access using the R programming language.

Maintained by Xiuwen Zheng. Last updated 8 days ago.

infrastructure datarepresentation sequencing genetics bioinformatics gds-format snp snv wes wgs cpp

11.0 match 45 stars 12.08 score 1.1k scripts 9 dependents

ohdsi

ParallelLogger:Support for Parallel Computation, Logging, and Function Automation

Support for parallel computation with progress bar, and option to stop or proceed on errors. Also provides logging to console and disk, and the logging persists in the parallel threads. Additional functions support function call automation with delayed execution (e.g. for executing functions in parallel).

Maintained by Martijn Schuemie. Last updated 6 months ago.

hades

13.5 match 12 stars 9.18 score 87 scripts 11 dependents

bioc

RnBeads:RnBeads

RnBeads facilitates comprehensive analysis of various types of DNA methylation data at the genome scale.

Maintained by Fabian Mueller. Last updated 1 months ago.

dnamethylation methylationarray methylseq epigenetics qualitycontrol preprocessing batcheffect differentialmethylation sequencing cpgisland immunooncology twochannel dataimport

18.0 match 6.85 score 169 scripts 1 dependents

tyee001

VGAM:Vector Generalized Linear and Additive Models

An implementation of about 6 major classes of statistical regression models. The central algorithm is Fisher scoring and iterative reweighted least squares. At the heart of this package are the vector generalized linear and additive model (VGLM/VGAM) classes. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are data-driven VGLMs that use smoothing. The book "Vector Generalized Linear and Additive Models: With an Implementation in R" (Yee, 2015) <DOI:10.1007/978-1-4939-2818-7> gives details of the statistical framework and the package. Currently only fixed-effects models are implemented. Many (100+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE. The other classes are RR-VGLMs (reduced-rank VGLMs), quadratic RR-VGLMs, doubly constrained RR-VGLMs, quadratic RR-VGLMs, reduced-rank VGAMs, RCIMs (row-column interaction models)---these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Hauck-Donner effect detection is implemented. Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.

Maintained by Thomas Yee. Last updated 1 months ago.

fortran

11.3 match 10 stars 10.67 score 3.6k scripts 169 dependents

florianhartig

BayesianTools:General-Purpose MCMC and SMC Samplers and Tools for Bayesian Statistics

General-purpose MCMC and SMC samplers, as well as plots and diagnostic functions for Bayesian statistics, with a particular focus on calibrating complex system models. Implemented samplers include various Metropolis MCMC variants (including adaptive and/or delayed rejection MH), the T-walk, two differential evolution MCMCs, two DREAM MCMCs, and a sequential Monte Carlo (SMC) particle filter.

Maintained by Florian Hartig. Last updated 1 years ago.

bayes ecological-models mcmc optimization smc systems-biology cpp

11.7 match 122 stars 10.17 score 580 scripts 5 dependents

tlverse

delayed:A Framework for Parallelizing Dependent Tasks

Mechanisms to parallelize dependent tasks in a manner that optimizes the compute resources available. It provides access to "delayed" computations, which may be parallelized using futures. It is, to an extent, a facsimile of the 'Dask' library (<https://www.dask.org/>), for the 'Python' language.

Maintained by Jeremy Coyle. Last updated 11 months ago.

parallel-computing

16.4 match 23 stars 7.03 score 39 scripts 8 dependents

renozao

doRNG:Generic Reproducible Parallel Backend for 'foreach' Loops

Provides functions to perform reproducible parallel foreach loops, using independent random streams as generated by L'Ecuyer's combined multiple-recursive generator [L'Ecuyer (1999), <DOI:10.1287/opre.47.1.159>]. It enables to easily convert standard '%dopar%' loops into fully reproducible loops, independently of the number of workers, the task scheduling strategy, or the chosen parallel environment and associated foreach backend.

Maintained by Renaud Gaujoux. Last updated 2 years ago.

9.0 match 20 stars 12.63 score 4.3k scripts 183 dependents

ropensci

chopin:Computation of Spatial Data by Hierarchical and Objective Partitioning of Inputs for Parallel Processing

Geospatial data computation is parallelized by grid, hierarchy, or raster files. Based on future and mirai parallel backends, terra and sf functions as well as convenience functions in the package can be distributed over multiple threads. The simplest way of parallelizing generic geospatial computation is to start from `par_pad_*` functions to `par_grid`, `par_hierarchy`, or `par_multirasters` functions. Virtually any functions accepting classes in terra or sf packages can be used in the three parallelization functions. A common raster-vector overlay operation is provided as a function `extract_at`, which uses exactextractr, with options for kernel weights for summarizing raster values at vector geometries. Other convenience functions for vector-vector operations including simple areal interpolation (`summarize_aw`) and summation of exponentially decaying weights (`summarize_sedc`) are also provided.

Maintained by Insang Song. Last updated 13 days ago.

18.2 match 16 stars 6.11 score 23 scripts

kvnkuang

pbmcapply:Tracking the Progress of Mc*pply with Progress Bar

A light-weight package helps you track and visualize the progress of parallel version of vectorized R functions (mc*apply). Parallelization (mc.core > 1) works only on *nix (Linux, Unix such as macOS) system due to the lack of fork() functionality, which is essential for mc*apply, on Windows.

Maintained by Kevin kuang. Last updated 3 years ago.

parallelization progress-bar

10.8 match 44 stars 10.28 score 972 scripts 65 dependents

boennecd

parglm:Parallel GLM

Provides a parallel estimation method for generalized linear models without compiling with a multithreaded LAPACK or BLAS.

Maintained by Benjamin Christoffersen. Last updated 3 years ago.

generalized-linear-models parallel-computing openblas cpp

15.4 match 11 stars 6.41 score 39 scripts 4 dependents

acguidoum

Sim.DiffProc:Simulation of Diffusion Processes

It provides users with a wide range of tools to simulate, estimate, analyze, and visualize the dynamics of stochastic differential systems in both forms Ito and Stratonovich. Statistical analysis with parallel Monte Carlo and moment equations methods of SDEs <doi:10.18637/jss.v096.i02>. Enabled many searchers in different domains to use these equations to modeling practical problems in financial and actuarial modeling and other areas of application, e.g., modeling and simulate of first passage time problem in shallow water using the attractive center (Boukhetala K, 1996) ISBN:1-56252-342-2.

Maintained by Arsalane Chouaib Guidoum. Last updated 1 years ago.

dynamic-system moment-equations monte-carlo-simulation parallel-computing stochastic-calculus stochastic-differential-equation transition-density

12.7 match 13 stars 7.69 score 86 scripts 4 dependents

miraisolutions

rTRNG:Advanced and Parallel Random Number Generation via 'TRNG'

Embeds sources and headers from Tina's Random Number Generator ('TRNG') C++ library. Exposes some functionality for easier access, testing and benchmarking into R. Provides examples of how to use parallel RNG with 'RcppParallel'. The methods and techniques behind 'TRNG' are illustrated in the package vignettes and examples. Full documentation is available in Bauke (2021) <https://github.com/rabauke/trng4/blob/v4.23.1/doc/trng.pdf>.

Maintained by Riccardo Porreca. Last updated 1 years ago.

hpc parallel rcpp trng cpp

17.1 match 19 stars 5.63 score 15 scripts

avi-kenny

SimEngine:A Modular Framework for Statistical Simulations in R

An open-source R package for structuring, maintaining, running, and debugging statistical simulations on both local and cluster-based computing environments.See full documentation at <https://avi-kenny.github.io/SimEngine/>.

Maintained by Avi Kenny. Last updated 21 days ago.

12.8 match 12 stars 7.18 score 50 scripts

revolutionanalytics

doParallel:Foreach Parallel Adaptor for the 'parallel' Package

Provides a parallel backend for the %dopar% function using the parallel package.

Maintained by Folashade Daniel. Last updated 3 years ago.

6.2 match 5 stars 14.56 score 50k scripts 1.4k dependents

cloudyr

googleComputeEngineR:R Interface with Google Compute Engine

Interact with the 'Google Compute Engine' API in R. Lets you create, start and stop instances in the 'Google Cloud'. Support for preconfigured instances, with templates for common R needs.

Maintained by Mark Edmondson. Last updated 3 years ago.

api cloud-computing cloudyr google-cloud googleauthr launching-virtual-machines

9.2 match 152 stars 9.78 score 235 scripts

tidyverse

purrr:Functional Programming Tools

A complete and consistent functional programming toolkit for R.

Maintained by Hadley Wickham. Last updated 1 months ago.

functional-programming

4.0 match 1.3k stars 22.12 score 59k scripts 6.9k dependents

calvagone

campsis:Generic PK/PD Simulation Platform CAMPSIS

A generic, easy-to-use and intuitive pharmacokinetic/pharmacodynamic (PK/PD) simulation platform based on R packages 'rxode2' and 'mrgsolve'. CAMPSIS provides an abstraction layer over the underlying processes of writing a PK/PD model, assembling a custom dataset and running a simulation. CAMPSIS has a strong dependency to the R package 'campsismod', which allows to read/write a model from/to files and adapt it further on the fly in the R environment. Package 'campsis' allows the user to assemble a dataset in an intuitive manner. Once the user’s dataset is ready, the package is in charge of preparing the simulation, calling 'rxode2' or 'mrgsolve' (at the user's choice) and returning the results, for the given model, dataset and desired simulation settings.

Maintained by Nicolas Luyckx. Last updated 1 months ago.

11.6 match 8 stars 7.52 score 93 scripts

bioc

Spectra:Spectra Infrastructure for Mass Spectrometry Data

The Spectra package defines an efficient infrastructure for storing and handling mass spectrometry spectra and functionality to subset, process, visualize and compare spectra data. It provides different implementations (backends) to store mass spectrometry data. These comprise backends tuned for fast data access and processing and backends for very large data sets ensuring a small memory footprint.

Maintained by RforMassSpectrometry Package Maintainer. Last updated 8 days ago.

infrastructure proteomics massspectrometry metabolomics bioconductor hacktoberfest mass-spectrometry

6.7 match 41 stars 13.01 score 254 scripts 35 dependents

rcppcore

RcppParallel:Parallel Programming Tools for 'Rcpp'

High level functions for parallel programming with 'Rcpp'. For example, the 'parallelFor()' function can be used to convert the work of a standard serial "for" loop into a parallel one and the 'parallelReduce()' function can be used for accumulating aggregate or other values.

Maintained by Kevin Ushey. Last updated 2 months ago.

onetbb cpp

5.8 match 173 stars 14.89 score 215 scripts 790 dependents

privefl

bigsnpr:Analysis of Massive SNP Arrays

Easy-to-use, efficient, flexible and scalable tools for analyzing massive SNP arrays. Privé et al. (2018) <doi:10.1093/bioinformatics/bty185>.

Maintained by Florian Privé. Last updated 9 days ago.

big-data bioinformatics memory-mapped-file parallel-computing polygenic-scores population-structure-inference snp-data statistical-methods openblas zlib cpp openmp

7.5 match 200 stars 11.44 score 1.5k scripts 3 dependents

bnosac

udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.

Maintained by Jan Wijffels. Last updated 2 years ago.

conll dependency-parser lemmatization natural-language-processing nlp pos-tagging r-pkg rcpp text-mining tokenizer udpipe cpp

7.2 match 215 stars 11.83 score 1.2k scripts 9 dependents

mrc-ide

hipercow:High Performance Computing

Set up cluster environments and jobs. Moo.

Maintained by Rich FitzJohn. Last updated 10 days ago.

12.4 match 1 stars 6.53 score 45 scripts 1 dependents

norskregnesentral

shapr:Prediction Explanation with Dependence-Aware Shapley Values

Complex machine learning models are often hard to interpret. However, in many situations it is crucial to understand and explain why a model made a specific prediction. Shapley values is the only method for such prediction explanation framework with a solid theoretical foundation. Previously known methods for estimating the Shapley values do, however, assume feature independence. This package implements methods which accounts for any feature dependence, and thereby produces more accurate estimates of the true Shapley values. An accompanying 'Python' wrapper ('shaprpy') is available through the GitHub repository.

Maintained by Martin Jullum. Last updated 1 months ago.

explainable-ai explainable-ml rcpp rcpparmadillo shapley openblas cpp openmp

7.5 match 153 stars 10.62 score 175 scripts 1 dependents

privefl

bigstatsr:Statistical Tools for Filebacked Big Matrices

Easy-to-use, efficient, flexible and scalable statistical tools. Package bigstatsr provides and uses Filebacked Big Matrices via memory-mapping. It provides for instance matrix operations, Principal Component Analysis, sparse linear supervised models, utility functions and more <doi:10.1093/bioinformatics/bty185>.

Maintained by Florian Privé. Last updated 6 months ago.

big-data large-matrices memory-mapped-file parallel-computing statistical-methods openblas cpp openmp

7.5 match 180 stars 10.59 score 394 scripts 16 dependents

grvanderploeg

parafac4microbiome:Parallel Factor Analysis Modelling of Longitudinal Microbiome Data

Creation and selection of PARAllel FACtor Analysis (PARAFAC) models of longitudinal microbiome data. You can import your own data with our import functions or use one of the example datasets to create your own PARAFAC models. Selection of the optimal number of components can be done using assessModelQuality() and assessModelStability(). The selected model can then be plotted using plotPARAFACmodel(). The Parallel Factor Analysis method was originally described by Caroll and Chang (1970) <doi:10.1007/BF02310791> and Harshman (1970) <https://www.psychology.uwo.ca/faculty/harshman/wpppfac0.pdf>.

Maintained by Geert Roelof van der Ploeg. Last updated 19 days ago.

dimensionality-reduction microbiome microbiome-data multiway multiway-algorithms parallel-factor-analysis

12.3 match 6 stars 6.31 score 13 scripts

fifis

pnd:Parallel Numerical Derivatives, Gradients, Jacobians, and Hessians of Arbitrary Accuracy Order

Numerical derivatives through finite-difference approximations can be calculated using the 'pnd' package with parallel capabilities and optimal step-size selection to improve accuracy. These functions facilitate efficient computation of derivatives, gradients, Jacobians, and Hessians, allowing for more evaluations to reduce the mathematical and machine errors. Designed for compatibility with the 'numDeriv' package, which has not received updates in several years, it introduces advanced features such as computing derivatives of arbitrary order, improving the accuracy of Hessian approximations by avoiding repeated differencing, and parallelising slow functions on Windows, Mac, and Linux.

Maintained by Andreï Victorovitch Kostyrka. Last updated 4 days ago.

14.6 match 1 stars 5.22 score 5 scripts

mdsteiner

EFAtools:Fast and Flexible Implementations of Exploratory Factor Analysis Tools

Provides functions to perform exploratory factor analysis (EFA) procedures and compare their solutions. The goal is to provide state-of-the-art factor retention methods and a high degree of flexibility in the EFA procedures. This way, for example, implementations from R 'psych' and 'SPSS' can be compared. Moreover, functions for Schmid-Leiman transformation and the computation of omegas are provided. To speed up the analyses, some of the iterative procedures, like principal axis factoring (PAF), are implemented in C++.

Maintained by Markus Steiner. Last updated 3 months ago.

openblas cpp openmp

11.3 match 10 stars 6.57 score 83 scripts 1 dependents

pbreheny

biglasso:Extending Lasso Model Fitting to Big Data

Extend lasso and elastic-net model fitting for large data sets that cannot be loaded into memory. Designed to be more memory- and computation-efficient than existing lasso-fitting packages like 'glmnet' and 'ncvreg', thus allowing the user to analyze big data with limited RAM <doi:10.32614/RJ-2021-001>.

Maintained by Patrick Breheny. Last updated 10 days ago.

bigdata lasso out-of-core parallel-computing cpp openmp

7.5 match 113 stars 9.84 score 74 scripts 1 dependents

florafauna

optimParallel:Parallel Version of the L-BFGS-B Optimization Method

Provides a parallel version of the L-BFGS-B method of optim(). The main function of the package is optimParallel(), which has the same usage and output as optim(). Using optimParallel() can significantly reduce the optimization time.

Maintained by Florian Gerber. Last updated 4 years ago.

8.0 match 9 stars 9.19 score 157 scripts 91 dependents

bioc

peakPantheR:Peak Picking and Annotation of High Resolution Experiments

An automated pipeline for the detection, integration and reporting of predefined features across a large number of mass spectrometry data files. It enables the real time annotation of multiple compounds in a single file, or the parallel annotation of multiple compounds in multiple files. A graphical user interface as well as command line functions will assist in assessing the quality of annotation and update fitting parameters until a satisfactory result is obtained.

Maintained by Arnaud Wolfer. Last updated 5 months ago.

massspectrometry metabolomics peakdetection feature-detection mass-spectrometry

10.7 match 12 stars 6.82 score 23 scripts

r-lidar

lidR:Airborne LiDAR Data Manipulation and Visualization for Forestry Applications

Airborne LiDAR (Light Detection and Ranging) interface for data manipulation and visualization. Read/write 'las' and 'laz' files, computation of metrics in area based approach, point filtering, artificial point reduction, classification from geographic data, normalization, individual tree segmentation and other manipulations.

Maintained by Jean-Romain Roussel. Last updated 1 months ago.

als forestry las laz lidar point-cloud remote-sensing openblas cpp openmp

5.0 match 623 stars 14.47 score 844 scripts 8 dependents

matloff

partools:Tools for the 'Parallel' Package

Miscellaneous utilities for parallelizing large computations. Alternative to MapReduce. File splitting and distributed operations such as sort and aggregate. "Software Alchemy" method for parallelizing most statistical methods, presented in N. Matloff, Parallel Computation for Data Science, Chapman and Hall, 2015. Includes a debugging aid.

Maintained by Norm Matloff. Last updated 2 years ago.

9.6 match 40 stars 7.51 score 30 scripts 3 dependents

ropensci

stplanr:Sustainable Transport Planning

Tools for transport planning with an emphasis on spatial transport data and non-motorized modes. The package was originally developed to support the 'Propensity to Cycle Tool', a publicly available strategic cycle network planning tool (Lovelace et al. 2017) <doi:10.5198/jtlu.2016.862>, but has since been extended to support public transport routing and accessibility analysis (Moreno-Monroy et al. 2017) <doi:10.1016/j.jtrangeo.2017.08.012> and routing with locally hosted routing engines such as 'OSRM' (Lowans et al. 2023) <doi:10.1016/j.enconman.2023.117337>. The main functions are for creating and manipulating geographic "desire lines" from origin-destination (OD) data (building on the 'od' package); calculating routes on the transport network locally and via interfaces to routing services such as <https://cyclestreets.net/> (Desjardins et al. 2021) <doi:10.1007/s11116-021-10197-1>; and calculating route segment attributes such as bearing. The package implements the 'travel flow aggregration' method described in Morgan and Lovelace (2020) <doi:10.1177/2399808320942779> and the 'OD jittering' method described in Lovelace et al. (2022) <doi:10.32866/001c.33873>. Further information on the package's aim and scope can be found in the vignettes and in a paper in the R Journal (Lovelace and Ellison 2018) <doi:10.32614/RJ-2018-053>, and in a paper outlining the landscape of open source software for geographic methods in transport planning (Lovelace, 2021) <doi:10.1007/s10109-020-00342-2>.

Maintained by Robin Lovelace. Last updated 7 months ago.

cycle cycling desire-lines origin-destination peer-reviewed pubic-transport route-network routes routing spatial transport transport-planning transportation walking

5.7 match 427 stars 12.31 score 684 scripts 3 dependents

dazzimonti

KrigInv:Kriging-Based Inversion for Deterministic and Noisy Computer Experiments

Criteria and algorithms for sequentially estimating level sets of a multivariate numerical function, possibly observed with noise.

Maintained by Dario Azzimonti. Last updated 3 years ago.

24.8 match 2.81 score 54 scripts 4 dependents

cran

epiR:Tools for the Analysis of Epidemiological Data

Tools for the analysis of epidemiological and surveillance data. Contains functions for directly and indirectly adjusting measures of disease frequency, quantifying measures of association on the basis of single or multiple strata of count data presented in a contingency table, computation of confidence intervals around incidence risk and incidence rate estimates and sample size calculations for cross-sectional, case-control and cohort studies. Surveillance tools include functions to calculate an appropriate sample size for 1- and 2-stage representative freedom surveys, functions to estimate surveillance system sensitivity and functions to support scenario tree modelling analyses.

Maintained by Mark Stevenson. Last updated 1 months ago.

8.5 match 10 stars 8.18 score 10 dependents

mrc-ide

dust:Iterate Multiple Realisations of Stochastic Models

An Engine for simulation of stochastic models. Includes support for running stochastic models in parallel, either with shared or varying parameters. Simulations are run efficiently in compiled code and can be run with a fraction of simulated states returned to R, allowing control over memory usage. Support is provided for building bootstrap particle filter for performing Sequential Monte Carlo (e.g., Gordon et al. 1993 <doi:10.1049/ip-f-2.1993.0015>). The core of the simulation engine is the 'xoshiro256**' algorithm (Blackman and Vigna <arXiv:1805.01407>), and the package is further described in FitzJohn et al 2021 <doi:10.12688/wellcomeopenres.16466.2>.

Maintained by Rich FitzJohn. Last updated 5 months ago.

cpp openmp

8.9 match 18 stars 7.84 score 60 scripts 3 dependents

philchalmers

SimDesign:Structure for Organizing Monte Carlo Simulation Designs

Provides tools to safely and efficiently organize and execute Monte Carlo simulation experiments in R. The package controls the structure and back-end of Monte Carlo simulation experiments by utilizing a generate-analyse-summarise workflow. The workflow safeguards against common simulation coding issues, such as automatically re-simulating non-convergent results, prevents inadvertently overwriting simulation files, catches error and warning messages during execution, implicitly supports parallel processing with high-quality random number generation, and provides tools for managing high-performance computing (HPC) array jobs submitted to schedulers such as SLURM. For a pedagogical introduction to the package see Sigal and Chalmers (2016) <doi:10.1080/10691898.2016.1246953>. For a more in-depth overview of the package and its design philosophy see Chalmers and Adkins (2020) <doi:10.20982/tqmp.16.4.p248>.

Maintained by Phil Chalmers. Last updated 4 days ago.

monte-carlo-simulation simulation simulation-framework

5.2 match 62 stars 13.35 score 253 scripts 46 dependents

moderndive

moderndive:Tidyverse-Friendly Introductory Linear Regression

Datasets and wrapper functions for tidyverse-friendly introductory linear regression, used in "Statistical Inference via Data Science: A ModernDive into R and the Tidyverse" available at <https://moderndive.com/>.

Maintained by Albert Y. Kim. Last updated 3 months ago.

6.1 match 88 stars 11.35 score 1.8k scripts

rsparapa

BART:Bayesian Additive Regression Trees

Bayesian Additive Regression Trees (BART) provide flexible nonparametric modeling of covariates for continuous, binary, categorical and time-to-event outcomes. For more information see Sparapani, Spanbauer and McCulloch <doi:10.18637/jss.v097.i01>.

Maintained by Rodney Sparapani. Last updated 9 months ago.

cpp openmp

8.6 match 14 stars 7.96 score 474 scripts 10 dependents

thomasp85

ggraph:An Implementation of Grammar of Graphics for Graphs and Networks

The grammar of graphics as implemented in ggplot2 is a poor fit for graph and network visualizations due to its reliance on tabular data input. ggraph is an extension of the ggplot2 API tailored to graph visualizations and provides the same flexible approach to building up plots layer by layer.

Maintained by Thomas Lin Pedersen. Last updated 1 years ago.

ggplot-extension ggplot2 graph-visualization network-visualization visualization cpp

4.0 match 1.1k stars 16.96 score 9.2k scripts 111 dependents

detlew

PowerTOST:Power and Sample Size for (Bio)Equivalence Studies

Contains functions to calculate power and sample size for various study designs used in bioequivalence studies. Use known.designs() to see the designs supported. Power and sample size can be obtained based on different methods, amongst them prominently the TOST procedure (two one-sided t-tests). See README and NEWS for further information.

Maintained by Detlew Labes. Last updated 12 months ago.

7.0 match 20 stars 9.61 score 112 scripts 4 dependents

asardaes

dtwclust:Time Series Clustering Along with Optimizations for the Dynamic Time Warping Distance

Time series clustering along with optimized techniques related to the Dynamic Time Warping distance and its corresponding lower bounds. Implementations of partitional, hierarchical, fuzzy, k-Shape and TADPole clustering are available. Functionality can be easily extended with custom distance measures and centroid definitions. Implementations of DTW barycenter averaging, a distance based on global alignment kernels, and the soft-DTW distance and centroid routines are also provided. All included distance functions have custom loops optimized for the calculation of cross-distance matrices, including parallelization support. Several cluster validity indices are included.

Maintained by Alexis Sarda. Last updated 8 months ago.

clustering dtw time-series openblas cpp

5.3 match 261 stars 12.39 score 406 scripts 14 dependents

merck

simtrial:Clinical Trial Simulation

Provides some basic routines for simulating a clinical trial. The primary intent is to provide some tools to generate trial simulations for trials with time to event outcomes. Piecewise exponential failure rates and piecewise constant enrollment rates are the underlying mechanism used to simulate a broad range of scenarios such as those presented in Lin et al. (2020) <doi:10.1080/19466315.2019.1697738>. However, the basic generation of data is done using pipes to allow maximum flexibility for users to meet different needs.

Maintained by Yujie Zhao. Last updated 1 days ago.

cpp

7.2 match 21 stars 9.16 score 52 scripts

nanxstats

protr:Generating Various Numerical Representation Schemes for Protein Sequences

Comprehensive toolkit for generating various numerical features of protein sequences described in Xiao et al. (2015) <DOI:10.1093/bioinformatics/btv042>. For full functionality, the software 'ncbi-blast+' is needed, see <https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html> for more information.

Maintained by Nan Xiao. Last updated 6 months ago.

bioinformatics feature-engineering feature-extraction machine-learning peptides protein-sequences sequence-analysis

6.5 match 52 stars 10.02 score 173 scripts 3 dependents

mlr-org

rush:Rapid Parallel and Distributed Computing

Parallel computing with a network of local and remote workers. Fast exchange of results between the workers through a 'Redis' database. Key features include task queues, local caching, and sophisticated error handling.

Maintained by Marc Becker. Last updated 4 months ago.

mlr3 parallel-computing

12.9 match 11 stars 4.94 score 5 scripts

till-tietz

parsel:Parallel Dynamic Web-Scraping Using 'RSelenium'

A system to increase the efficiency of dynamic web-scraping with 'RSelenium' by leveraging parallel processing. You provide a function wrapper for your 'RSelenium' scraping routine with a set of inputs, and 'parsel' runs it in several browser instances. Chunked input processing as well as error catching and logging ensures seamless execution and minimal data loss, even when unforeseen 'RSelenium' errors occur. You can additionally build safe scraping functions with minimal coding by utilizing constructor functions that act as wrappers around 'RSelenium' methods.

Maintained by Till Tietz. Last updated 1 years ago.

parallel rselenium web-scraping

16.4 match 15 stars 3.88 score 8 scripts

bioc

SNPRelate:Parallel Computing Toolset for Relatedness and Principal Component Analysis of SNP Data

Genome-wide association studies (GWAS) are widely used to investigate the genetic basis of diseases and traits, but they pose many computational challenges. We developed an R package SNPRelate to provide a binary format for single-nucleotide polymorphism (SNP) data in GWAS utilizing CoreArray Genomic Data Structure (GDS) data files. The GDS format offers the efficient operations specifically designed for integers with two bits, since a SNP could occupy only two bits. SNPRelate is also designed to accelerate two key computations on SNP data using parallel computing for multi-core symmetric multiprocessing computer architectures: Principal Component Analysis (PCA) and relatedness analysis using Identity-By-Descent measures. The SNP GDS format is also used by the GWASTools package with the support of S4 classes and generic functions. The extended GDS format is implemented in the SeqArray package to support the storage of single nucleotide variations (SNVs), insertion/deletion polymorphism (indel) and structural variation calls in whole-genome and whole-exome variant data.

Maintained by Xiuwen Zheng. Last updated 5 months ago.

infrastructure genetics statisticalmethod principalcomponent bioinformatics gds-format pca simd snp openblas cpp

5.0 match 104 stars 12.69 score 1.6k scripts 18 dependents

eagerai

fastai:Interface to 'fastai'

The 'fastai' <https://docs.fast.ai/index.html> library simplifies training fast and accurate neural networks using modern best practices. It is based on research in to deep learning best practices undertaken at 'fast.ai', including 'out of the box' support for vision, text, tabular, audio, time series, and collaborative filtering models.

Maintained by Turgut Abdullayev. Last updated 11 months ago.

audio collaborative-filtering darknet darknet-image-classification fastai medical object-detection tabular text vision

6.6 match 118 stars 9.40 score 76 scripts

vivianephilipps

marqLevAlg:A Parallelized General-Purpose Optimization Based on Marquardt-Levenberg Algorithm

This algorithm provides a numerical solution to the problem of unconstrained local minimization (or maximization). It is particularly suited for complex problems and more efficient than the Gauss-Newton-like algorithm when starting from points very far from the final minimum (or maximum). Each iteration is parallelized and convergence relies on a stringent stopping criterion based on the first and second derivatives. See Philipps et al, 2021 <doi:10.32614/RJ-2021-089>.

Maintained by Viviane Philipps. Last updated 1 years ago.

fortran

9.3 match 7 stars 6.52 score 12 scripts 10 dependents

giuseppec

iml:Interpretable Machine Learning

Interpretability methods to analyze the behavior and predictions of any machine learning model. Implemented methods are: Feature importance described by Fisher et al. (2018) <doi:10.48550/arxiv.1801.01489>, accumulated local effects plots described by Apley (2018) <doi:10.48550/arxiv.1612.08468>, partial dependence plots described by Friedman (2001) <www.jstor.org/stable/2699986>, individual conditional expectation ('ice') plots described by Goldstein et al. (2013) <doi:10.1080/10618600.2014.907095>, local models (variant of 'lime') described by Ribeiro et. al (2016) <doi:10.48550/arXiv.1602.04938>, the Shapley Value described by Strumbelj et. al (2014) <doi:10.1007/s10115-013-0679-x>, feature interactions described by Friedman et. al <doi:10.1214/07-AOAS148> and tree surrogate models.

Maintained by Giuseppe Casalicchio. Last updated 19 days ago.

4.6 match 494 stars 12.86 score 642 scripts 4 dependents

amices

mice:Multivariate Imputation by Chained Equations

Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.

Maintained by Stef van Buuren. Last updated 5 days ago.

chained-equations fcs imputation mice missing-data missing-values multiple-imputation multivariate-data cpp

3.6 match 462 stars 16.50 score 10k scripts 154 dependents

earthlab

rslurm:Submit R Calculations to a 'Slurm' Cluster

Functions that simplify submitting R scripts to a 'Slurm' workload manager, in part by automating the division of embarrassingly parallel calculations across cluster nodes.

Maintained by Erick Verleye. Last updated 2 years ago.

slurm

7.1 match 54 stars 8.29 score 303 scripts 1 dependents

raicheg

nFactors:Parallel Analysis and Other Non Graphical Solutions to the Cattell Scree Test

Indices, heuristics and strategies to help determine the number of factors/components to retain: 1. Acceleration factor (af with or without Parallel Analysis); 2. Optimal Coordinates (noc with or without Parallel Analysis); 3. Parallel analysis (components, factors and bootstrap); 4. lambda > mean(lambda) (Kaiser, CFA and related); 5. Cattell-Nelson-Gorsuch (CNG); 6. Zoski and Jurs multiple regression (b, t and p); 7. Zoski and Jurs standard error of the regression coeffcient (sescree); 8. Nelson R2; 9. Bartlett khi-2; 10. Anderson khi-2; 11. Lawley khi-2 and 12. Bentler-Yuan khi-2.

Maintained by Gilles Raiche. Last updated 2 years ago.

10.6 match 5.46 score 498 scripts 4 dependents

ocbe-uio

BayesMallows:Bayesian Preference Learning with the Mallows Rank Model

An implementation of the Bayesian version of the Mallows rank model (Vitelli et al., Journal of Machine Learning Research, 2018 <https://jmlr.org/papers/v18/15-481.html>; Crispino et al., Annals of Applied Statistics, 2019 <doi:10.1214/18-AOAS1203>; Sorensen et al., R Journal, 2020 <doi:10.32614/RJ-2020-026>; Stein, PhD Thesis, 2023 <https://eprints.lancs.ac.uk/id/eprint/195759>). Both Metropolis-Hastings and sequential Monte Carlo algorithms for estimating the models are available. Cayley, footrule, Hamming, Kendall, Spearman, and Ulam distances are supported in the models. The rank data to be analyzed can be in the form of complete rankings, top-k rankings, partially missing rankings, as well as consistent and inconsistent pairwise preferences. Several functions for plotting and studying the posterior distributions of parameters are provided. The package also provides functions for estimating the partition function (normalizing constant) of the Mallows rank model, both with the importance sampling algorithm of Vitelli et al. and asymptotic approximation with the IPFP algorithm (Mukherjee, Annals of Statistics, 2016 <doi:10.1214/15-AOS1389>).

Maintained by Oystein Sorensen. Last updated 1 months ago.

mallows-model openblas cpp openmp

7.3 match 21 stars 7.91 score 36 scripts 1 dependents

mucollective

multiverse:Create 'multiverse analysis' in R

Implement 'multiverse' style analyses (Steegen S., Tuerlinckx F, Gelman A., Vanpaemal, W., 2016) <doi:10.1177/1745691616658637> to show the robustness of statistical inference. 'Multiverse analysis' is a philosophy of statistical reporting where paper authors report the outcomes of many different statistical analyses in order to show how fragile or robust their findings are. The 'multiverse' package (Sarma A., Kale A., Moon M., Taback N., Chevalier F., Hullman J., Kay M., 2021) <doi:10.31219/osf.io/yfbwm> allows users to concisely and flexibly implement 'multiverse-style' analysis, which involve declaring alternate ways of performing an analysis step, in R and R Notebooks.

Maintained by Abhraneel Sarma. Last updated 4 months ago.

6.8 match 62 stars 8.37 score 42 scripts

dselivanov

text2vec:Modern Text Mining Framework for R

Fast and memory-friendly tools for text vectorization, topic modeling (LDA, LSA), word embeddings (GloVe), similarities. This package provides a source-agnostic streaming API, which allows researchers to perform analysis of collections of documents which are larger than available RAM. All core functions are parallelized to benefit from multicore machines.

Maintained by Dmitriy Selivanov. Last updated 7 months ago.

glove latent-dirichlet-allocation natural-language-processing text-mining topic-modeling vectorization word-embeddings word2vec cpp

4.2 match 860 stars 13.48 score 1.3k scripts 23 dependents

munterfi

eRTG3D:Empirically Informed Random Trajectory Generation in 3-D

Creates realistic random trajectories in a 3-D space between two given fix points, so-called conditional empirical random walks (CERWs). The trajectory generation is based on empirical distribution functions extracted from observed trajectories (training data) and thus reflects the geometrical movement characteristics of the mover. A digital elevation model (DEM), representing the Earth's surface, and a background layer of probabilities (e.g. food sources, uplift potential, waterbodies, etc.) can be used to influence the trajectories. Unterfinger M (2018). "3-D Trajectory Simulation in Movement Ecology: Conditional Empirical Random Walk". Master's thesis, University of Zurich. <https://www.geo.uzh.ch/dam/jcr:6194e41e-055c-4635-9807-53c5a54a3be7/MasterThesis_Unterfinger_2018.pdf>. Technitis G, Weibel R, Kranstauber B, Safi K (2016). "An algorithm for empirically informed random trajectory generation between two endpoints". GIScience 2016: Ninth International Conference on Geographic Information Science, 9, online. <doi:10.5167/uzh-130652>.

Maintained by Merlin Unterfinger. Last updated 3 years ago.

3d birds conditional-empirical-random-walk gliding-and-soaring machine-learning movement-ecology random-trajectory-generator random-walk simulation trajectory-generation

9.8 match 6 stars 5.71 score 19 scripts

vlarmet

cppRouting:Algorithms for Routing and Solving the Traffic Assignment Problem

Calculation of distances, shortest paths and isochrones on weighted graphs using several variants of Dijkstra algorithm. Proposed algorithms are unidirectional Dijkstra (Dijkstra, E. W. (1959) <doi:10.1007/BF01386390>), bidirectional Dijkstra (Goldberg, Andrew & Fonseca F. Werneck, Renato (2005) <https://archive.siam.org/meetings/alenex05/papers/03agoldberg.pdf>), A* search (P. E. Hart, N. J. Nilsson et B. Raphael (1968) <doi:10.1109/TSSC.1968.300136>), new bidirectional A* (Pijls & Post (2009) <https://repub.eur.nl/pub/16100/ei2009-10.pdf>), Contraction hierarchies (R. Geisberger, P. Sanders, D. Schultes and D. Delling (2008) <doi:10.1007/978-3-540-68552-4_24>), PHAST (D. Delling, A.Goldberg, A. Nowatzyk, R. Werneck (2011) <doi:10.1016/j.jpdc.2012.02.007>). Algorithms for solving the traffic assignment problem are All-or-Nothing assignment, Method of Successive Averages, Frank-Wolfe algorithm (M. Fukushima (1984) <doi:10.1016/0191-2615(84)90029-8>), Conjugate and Bi-Conjugate Frank-Wolfe algorithms (M. Mitradjieva, P. O. Lindberg (2012) <doi:10.1287/trsc.1120.0409>), Algorithm-B (R. B. Dial (2006) <doi:10.1016/j.trb.2006.02.008>).

Maintained by Vincent Larmet. Last updated 9 months ago.

algorithm algorithm-b bidirectional-a-star-algorithm c-plus-plus contraction-hierarchies dijkstra-algorithm distance frank-wolfe isochrones parallel-computing rcpp shortest-paths traffic-assignment cpp

7.5 match 112 stars 7.42 score 39 scripts 4 dependents

uscbiostats

fmcmc:A friendly MCMC framework

Provides a friendly (flexible) Markov Chain Monte Carlo (MCMC) framework for implementing Metropolis-Hastings algorithm in a modular way allowing users to specify automatic convergence checker, personalized transition kernels, and out-of-the-box multiple MCMC chains using parallel computing. Most of the methods implemented in this package can be found in Brooks et al. (2011, ISBN 9781420079425). Among the methods included, we have: Haario (2001) <doi:10.1007/s11222-011-9269-5> Adaptive Metropolis, Vihola (2012) <doi:10.1007/s11222-011-9269-5> Robust Adaptive Metropolis, and Thawornwattana et al. (2018) <doi:10.1214/17-BA1084> Mirror transition kernels.

Maintained by George Vega Yon. Last updated 1 years ago.

adaptive bayesian-inference markov-chain-monte-carlo mcmc metropolis-hastings parallel-computing

8.0 match 16 stars 6.79 score 86 scripts 1 dependents

elvanceyhan

pcds:Proximity Catch Digraphs and Their Applications

Contains the functions for construction and visualization of various families of the proximity catch digraphs (PCDs) (see (Ceyhan (2005) ISBN:978-3-639-19063-2), for computing the graph invariants for testing the patterns of segregation and association against complete spatial randomness (CSR) or uniformity in one, two and three dimensional cases. The package also has tools for generating points from these spatial patterns. The graph invariants used in testing spatial point data are the domination number (Ceyhan (2011) <doi:10.1080/03610921003597211>) and arc density (Ceyhan et al. (2006) <doi:10.1016/j.csda.2005.03.002>; Ceyhan et al. (2007) <doi:10.1002/cjs.5550350106>). The PCD families considered are Arc-Slice PCDs, Proportional-Edge PCDs, and Central Similarity PCDs.

Maintained by Elvan Ceyhan. Last updated 2 years ago.

9.3 match 5.80 score 21 scripts 2 dependents

myles-lewis

mcprogress:Progress Bars and Messages for Parallel Processes

Tools for monitoring progress during parallel processing. Lightweight package which acts as a wrapper around mclapply() and adds a progress bar to it in 'RStudio' or 'Linux' environments. Simply replace your original call to mclapply() with pmclapply(). A progress bar can also be displayed during parallelisation via the 'foreach' package. Also included are functions to safely print messages (including error messages) from within parallelised code, which can be useful for debugging parallelised R code.

Maintained by Myles Lewis. Last updated 6 months ago.

11.9 match 1 stars 4.48 score 2 scripts 1 dependents

alexeckert

parallelDist:Parallel Distance Matrix Computation using Multiple Threads

A fast parallelized alternative to R's native 'dist' function to calculate distance matrices for continuous, binary, and multi-dimensional input matrices, which supports a broad variety of 41 predefined distance functions from the 'stats', 'proxy' and 'dtw' R packages, as well as user- defined functions written in C++. For ease of use, the 'parDist' function extends the signature of the 'dist' function and uses the same parameter naming conventions as distance methods of existing R packages. The package is mainly implemented in C++ and leverages the 'RcppParallel' package to parallelize the distance computations with the help of the 'TinyThread' library. Furthermore, the 'Armadillo' linear algebra library is used for optimized matrix operations during distance calculations. The curiously recurring template pattern (CRTP) technique is applied to avoid virtual functions, which improves the Dynamic Time Warping calculations while the implementation stays flexible enough to support different DTW step patterns and normalization methods.

Maintained by Alexander Eckert. Last updated 3 years ago.

data-science distance-computations matrices openblas cpp

5.3 match 51 stars 9.92 score 432 scripts 14 dependents

sales-lab

parmigene:Parallel Mutual Information Estimation for Gene Network Reconstruction

Parallel estimation of the mutual information based on entropy estimates from k-nearest neighbors distances and algorithms for the reconstruction of gene regulatory networks (Sales et al, 2011 <doi:10.1093/bioinformatics/btr274>).

Maintained by Gabriele Sales. Last updated 5 months ago.

openmp

8.7 match 5 stars 6.06 score 38 scripts 4 dependents

sahakyanlab

ROptimus:A Parallel General-Purpose Adaptive Optimisation Engine

A general-purpose optimisation engine that supports i) Monte Carlo optimisation with Metropolis criterion [Metropolis et al. (1953) <doi:10.1063/1.1699114>, Hastings (1970) <doi:10.1093/biomet/57.1.97>] and Acceptance Ratio Simulated Annealing [Kirkpatrick et al. (1983) <doi:10.1126/science.220.4598.671>, Černý (1985) <doi:10.1007/BF00940812>] on multiple cores, and ii) Acceptance Ratio Replica Exchange Monte Carlo Optimisation. In each case, the system pseudo-temperature is dynamically adjusted such that the observed acceptance ratio is kept near to the desired (fixed or changing) acceptance ratio.

Maintained by Aleksandr B. Sahakyan. Last updated 2 years ago.

optimisation parallel

13.9 match 4 stars 3.78 score 2 scripts

glmmtmb

glmmTMB:Generalized Linear Mixed Models using Template Model Builder

Fit linear and generalized linear mixed models with various extensions, including zero-inflation. The models are fitted using maximum likelihood estimation via 'TMB' (Template Model Builder). Random effects are assumed to be Gaussian on the scale of the linear predictor and are integrated out using the Laplace approximation. Gradients are calculated using automatic differentiation.

Maintained by Mollie Brooks. Last updated 10 days ago.

cpp openmp

3.1 match 312 stars 16.77 score 3.7k scripts 24 dependents

heike

ggpcp:Parallel Coordinate Plots in the 'ggplot2' Framework

Modern Parallel Coordinate Plots have been introduced in the 1980s as a way to visualize arbitrarily many numeric variables. This Grammar of Graphics implementation also incorporates categorical variables into the plots in a principled manner. By separating the data managing part from the visual rendering, we give full access to the users while keeping the number of parameters manageably low.

Maintained by Heike Hofmann. Last updated 3 days ago.

12.9 match 1 stars 4.04 score 73 scripts

weecology

LDATS:Latent Dirichlet Allocation Coupled with Time Series Analyses

Combines Latent Dirichlet Allocation (LDA) and Bayesian multinomial time series methods in a two-stage analysis to quantify dynamics in high-dimensional temporal data. LDA decomposes multivariate data into lower-dimension latent groupings, whose relative proportions are modeled using generalized Bayesian time series models that include abrupt changepoints and smooth dynamics. The methods are described in Blei et al. (2003) <doi:10.1162/jmlr.2003.3.4-5.993>, Western and Kleykamp (2004) <doi:10.1093/pan/mph023>, Venables and Ripley (2002, ISBN-13:978-0387954578), and Christensen et al. (2018) <doi:10.1002/ecy.2373>.

Maintained by Juniper L. Simonis. Last updated 5 years ago.

changepoint lda parallel-tempering portal softmax

7.5 match 25 stars 6.93 score 45 scripts

bhklab

mRMRe:Parallelized Minimum Redundancy, Maximum Relevance (mRMR)

Computes mutual information matrices from continuous, categorical and survival variables, as well as feature selection with minimum redundancy, maximum relevance (mRMR) and a new ensemble mRMR technique. Published in De Jay et al. (2013) <doi:10.1093/bioinformatics/btt383>.

Maintained by Benjamin Haibe-Kains. Last updated 4 years ago.

cpp openmp

5.7 match 19 stars 8.95 score 105 scripts 2 dependents

stochastictree

stochtree:Stochastic Tree Ensembles (XBART and BART) for Supervised Learning and Causal Inference

Flexible stochastic tree ensemble software. Robust implementations of Bayesian Additive Regression Trees (BART) Chipman, George, McCulloch (2010) <doi:10.1214/09-AOAS285> for supervised learning and Bayesian Causal Forests (BCF) Hahn, Murray, Carvalho (2020) <doi:10.1214/19-BA1195> for causal inference. Enables model serialization and parallel sampling and provides a low-level interface for custom stochastic forest samplers.

Maintained by Drew Herren. Last updated 16 days ago.

bart bayesian-machine-learning bayesian-methods decision-trees gradient-boosted-trees machine-learning probabilistic-models tree-ensembles cpp

5.9 match 20 stars 8.52 score 40 scripts

statistikat

VIM:Visualization and Imputation of Missing Values

New tools for the visualization of missing and/or imputed values are introduced, which can be used for exploring the data and the structure of the missing and/or imputed values. Depending on this structure of the missing values, the corresponding methods may help to identify the mechanism generating the missing values and allows to explore the data including missing values. In addition, the quality of imputation can be visually explored using various univariate, bivariate, multiple and multivariate plot methods. A graphical user interface available in the separate package VIMGUI allows an easy handling of the implemented plot methods.

Maintained by Matthias Templ. Last updated 7 months ago.

hotdeck imputation-methods model-predictions visualization cpp

3.5 match 85 stars 14.44 score 2.6k scripts 19 dependents

doi-usgs

EGRET:Exploration and Graphics for RivEr Trends

Statistics and graphics for streamflow history, water quality trends, and the statistical modeling algorithm: Weighted Regressions on Time, Discharge, and Season (WRTDS).

Maintained by Laura DeCicco. Last updated 4 months ago.

usgs water-quality water-quality-data

4.7 match 90 stars 10.72 score 362 scripts 1 dependents

ropensci

canaper:Categorical Analysis of Neo- And Paleo-Endemism

Provides functions to analyze the spatial distribution of biodiversity, in particular categorical analysis of neo- and paleo-endemism (CANAPE) as described in Mishler et al (2014) <doi:10.1038/ncomms5473>. 'canaper' conducts statistical tests to determine the types of endemism that occur in a study area while accounting for the evolutionary relationships of species.

Maintained by Joel H. Nitta. Last updated 2 years ago.

biodiversity canape

9.3 match 7 stars 5.38 score 23 scripts

masurp

specr:Conducting and Visualizing Specification Curve Analyses

Provides utilities for conducting specification curve analyses (Simonsohn, Simmons & Nelson (2020, <doi: 10.1038/s41562-020-0912-z>) or multiverse analyses (Steegen, Tuerlinckx, Gelman & Vanpaemel, 2016, <doi: 10.1177/1745691616658637>) including functions to setup, run, evaluate, and plot all specifications.

Maintained by Philipp K. Masur. Last updated 10 months ago.

multiverse specification-curve

6.2 match 68 stars 8.02 score 85 scripts

mihaiconstantin

doParabar:'foreach' Parallel Adapter for 'parabar' Backends

Provides a 'foreach' parallel adapter for 'parabar' backends. This package offers a minimal implementation of the '%dopar%' operator, enabling users to run 'foreach' loops in parallel, leveraging the parallel and progress-tracking capabilities of the 'parabar' package. Learn more about 'parabar' and 'doParabar' at <https://parabar.mihaiconstantin.com>.

Maintained by Mihai Constantin. Last updated 2 months ago.

foreach parallel-computing

13.5 match 1 stars 3.65 score 5 scripts 1 dependents

azure

azuremlsdk:Interface to the 'Azure Machine Learning' 'SDK'

Interface to the 'Azure Machine Learning' Software Development Kit ('SDK'). Data scientists can use the 'SDK' to train, deploy, automate, and manage machine learning models on the 'Azure Machine Learning' service. To learn more about 'Azure Machine Learning' visit the website: <https://docs.microsoft.com/en-us/azure/machine-learning/service/overview-what-is-azure-ml>.

Maintained by Diondra Peck. Last updated 3 years ago.

amlcompute azure azure-machine-learning azureml dsi machine-learning rstudio sdk-r

5.5 match 106 stars 8.91 score 221 scripts

r-lidar

lasR:Fast and Pipeable Airborne LiDAR Data Tools

Fast and pipeable airborne lidar processing tools. Read/write 'las' and 'laz' files, computation of metrics in area based approach, point filtering, normalization, individual tree segmentation and other manipulations in a powerful and versatile processing chain.

Maintained by Jean-Romain Roussel. Last updated 20 days ago.

gdal cpp openmp

7.3 match 17 stars 6.76 score 26 scripts

skstavroglou

patterncausality:Pattern Causality Algorithm

A comprehensive package for detecting and analyzing causal relationships in complex systems using pattern-based approaches. Key features include state space reconstruction, pattern identification, and causality strength evaluation.

Maintained by Hui Wang. Last updated 28 days ago.

8.0 match 1 stars 6.08 score 20 scripts

monty-se

PINstimation:Estimation of the Probability of Informed Trading

A comprehensive bundle of utilities for the estimation of probability of informed trading models: original PIN in Easley and O'Hara (1992) and Easley et al. (1996); Multilayer PIN (MPIN) in Ersan (2016); Adjusted PIN (AdjPIN) in Duarte and Young (2009); and volume-synchronized PIN (VPIN) in Easley et al. (2011, 2012). Implementations of various estimation methods suggested in the literature are included. Additional compelling features comprise posterior probabilities, an implementation of an expectation-maximization (EM) algorithm, and PIN decomposition into layers, and into bad/good components. Versatile data simulation tools, and trade classification algorithms are among the supplementary utilities. The package provides fast, compact, and precise utilities to tackle the sophisticated, error-prone, and time-consuming estimation procedure of informed trading, and this solely using the raw trade-level data.

Maintained by Montasser Ghachem. Last updated 5 months ago.

clustering-analysis expectation-maximisation-algorithm hierarchical-clustering information-asymmetry market-microstructure maximum-likelihood-estimation mixture-distributions poisson-distribution

7.5 match 36 stars 6.48 score 14 scripts

wuqian77

TrialSize:R Functions for Chapter 3,4,6,7,9,10,11,12,14,15 of Sample Size Calculation in Clinical Research

Functions and Examples in Sample Size Calculation in Clinical Research.

Maintained by Vicky Qian Wu. Last updated 4 months ago.

cpp

12.8 match 3 stars 3.78 score 95 scripts 1 dependents

cbhurley

PairViz:Visualization using Graph Traversal

Improving graphics by ameliorating order effects, using Eulerian tours and Hamiltonian decompositions of graphs. References for the methods presented here are C.B. Hurley and R.W. Oldford (2010) <doi:10.1198/jcgs.2010.09136> and C.B. Hurley and R.W. Oldford (2011) <doi:10.1007/s00180-011-0229-5>.

Maintained by Catherine Hurley. Last updated 3 years ago.

8.4 match 1 stars 5.75 score 42 scripts 3 dependents

beccadaniel

doMC:Foreach Parallel Adaptor for 'parallel'

Provides a parallel backend for the %dopar% function using the multicore functionality of the parallel package.

Maintained by Folashade Daniel. Last updated 3 years ago.

6.5 match 7.39 score 10k scripts 2 dependents

microsoft

finnts:Microsoft Finance Time Series Forecasting Framework

Automated time series forecasting developed by Microsoft Finance. The Microsoft Finance Time Series Forecasting Framework, aka Finn, can be used to forecast any component of the income statement, balance sheet, or any other area of interest by finance. Any numerical quantity over time, Finn can be used to forecast it. While it can be applied outside of the finance domain, Finn was built to meet the needs of financial analysts to better forecast their businesses within a company, and has a lot of built in features that are specific to the needs of financial forecasters. Happy forecasting!

Maintained by Mike Tokic. Last updated 24 days ago.

business data-science feature-selection finance finnts forecasting machine-learning microsoft time-series

5.1 match 193 stars 9.45 score 39 scripts

azure

AzureRMR:Interface to 'Azure Resource Manager'

A lightweight but powerful R interface to the 'Azure Resource Manager' REST API. The package exposes a comprehensive class framework and related tools for creating, updating and deleting 'Azure' resource groups, resources and templates. While 'AzureRMR' can be used to manage any 'Azure' service, it can also be extended by other packages to provide extra functionality for specific services. Part of the 'AzureR' family of packages.

Maintained by Hong Ooi. Last updated 1 years ago.

azure azure-resource-manager azure-sdk-r cloud

4.8 match 20 stars 9.94 score 51 scripts 12 dependents

mschubert

clustermq:Evaluate Function Calls on HPC Schedulers (LSF, SGE, SLURM, PBS/Torque)

Evaluate arbitrary function calls using workers on HPC schedulers in single line of code. All processing is done on the network without accessing the file system. Remote schedulers are supported via SSH.

Maintained by Michael Schubert. Last updated 23 days ago.

cluster high-performance-computing lsf sge slurm ssh zeromq3 cpp

4.6 match 149 stars 10.23 score 253 scripts

timelyportfolio

parcoords:'Htmlwidget' for 'd3.js' Parallel Coordinates Chart

Create interactive parallel coordinates charts with this 'htmlwidget' wrapper for 'd3.js' <https://github.com/BigFatDog/parcoords-es> {'parallel-coordinates'}.

Maintained by Kenton Russell. Last updated 3 years ago.

8.3 match 77 stars 5.73 score 141 scripts

rdatatable

data.table:Extension of `data.frame`

Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development.

Maintained by Tyson Barrett. Last updated 17 hours ago.

2.0 match 3.7k stars 23.53 score 230k scripts 4.6k dependents

mrc-ide

drjacoby:Flexible Markov Chain Monte Carlo via Reparameterization

drjacoby is an R package for performing Bayesian inference via Markov chain monte carlo (MCMC). In addition to being highly flexible it implements some advanced techniques that can improve mixing in tricky situations.

Maintained by Bob Verity. Last updated 9 months ago.

cpp

7.5 match 12 stars 6.27 score 77 scripts

dipterix

dipsaus:A Dipping Sauce for Data Analysis and Visualizations

Works as an "add-on" to packages like 'shiny', 'future', as well as 'rlang', and provides utility functions. Just like dipping sauce adding flavors to potato chips or pita bread, 'dipsaus' for data analysis and visualizations adds handy functions and enhancements to popular packages. The goal is to provide simple solutions that are frequently asked for online, such as how to synchronize 'shiny' inputs without freezing the app, or how to get memory size on 'Linux' or 'MacOS' system. The enhancements roughly fall into these four categories: 1. 'shiny' input widgets; 2. high-performance computing using the 'future' package; 3. modify R calls and convert among numbers, strings, and other objects. 4. utility functions to get system information such like CPU chip-set, memory limit, etc.

Maintained by Zhengjia Wang. Last updated 4 days ago.

cpp

5.9 match 13 stars 7.90 score 85 scripts 3 dependents

lindbrook

cholera:Amend, Augment and Aid Analysis of John Snow's Cholera Map

Amends errors, augments data and aids analysis of John Snow's map of the 1854 London cholera outbreak.

Maintained by lindbrook. Last updated 16 hours ago.

cholera data-visualization datasets epidemiology john-snow public-health triangulation-delaunay voronoi voronoi-polygons

5.0 match 136 stars 9.33 score 95 scripts

rstudio

keras3:R Interface to 'Keras'

Interface to 'Keras' <https://keras.io>, a high-level neural networks API. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both CPU and GPU devices.

Maintained by Tomasz Kalinowski. Last updated 3 days ago.

3.4 match 845 stars 13.57 score 264 scripts 2 dependents

fstpackage

fst:Lightning Fast Serialization of Data Frames

Multithreaded serialization of compressed data frames using the 'fst' format. The 'fst' format allows for full random access of stored data and a wide range of compression settings using the LZ4 and ZSTD compressors.

Maintained by Mark Klik. Last updated 6 months ago.

compression data-frame data-storage cpp

3.5 match 624 stars 13.14 score 1.9k scripts 55 dependents

bioc

GenomicPlot:Plot profiles of next generation sequencing data in genomic features

Visualization of next generation sequencing (NGS) data is essential for interpreting high-throughput genomics experiment results. 'GenomicPlot' facilitates plotting of NGS data in various formats (bam, bed, wig and bigwig); both coverage and enrichment over input can be computed and displayed with respect to genomic features (such as UTR, CDS, enhancer), and user defined genomic loci or regions. Statistical tests on signal intensity within user defined regions of interest can be performed and represented as boxplots or bar graphs. Parallel processing is used to speed up computation on multicore platforms. In addition to genomic plots which is suitable for displaying of coverage of genomic DNA (such as ChIPseq data), metagenomic (without introns) plots can also be made for RNAseq or CLIPseq data as well.

Maintained by Shuye Pu. Last updated 1 months ago.

alternativesplicing chipseq coverage geneexpression rnaseq sequencing software transcription visualization annotation

8.2 match 3 stars 5.62 score 4 scripts

bioc

MAPFX:MAssively Parallel Flow cytometry Xplorer (MAPFX): A Toolbox for Analysing Data from the Massively-Parallel Cytometry Experiments

MAPFX is an end-to-end toolbox that pre-processes the raw data from MPC experiments (e.g., BioLegend's LEGENDScreen and BD Lyoplates assays), and further imputes the ‘missing’ infinity markers in the wells without those measurements. The pipeline starts by performing background correction on raw intensities to remove the noise from electronic baseline restoration and fluorescence compensation by adapting a normal-exponential convolution model. Unwanted technical variation, from sources such as well effects, is then removed using a log-normal model with plate, column, and row factors, after which infinity markers are imputed using the informative backbone markers as predictors. The completed dataset can then be used for clustering and other statistical analyses. Additionally, MAPFX can be used to normalise data from FFC assays as well.

Maintained by Hsiao-Chi Liao. Last updated 5 months ago.

software flowcytometry cellbasedassays singlecell proteomics clustering

10.1 match 1 stars 4.54 score

joshuawlambert

rFSA:Feasible Solution Algorithm for Finding Best Subsets and Interactions

Assists in statistical model building to find optimal and semi-optimal higher order interactions and best subsets. Uses the lm(), glm(), and other R functions to fit models generated from a feasible solution algorithm. Discussed in Subset Selection in Regression, A Miller (2002). Applied and explained for least median of squares in Hawkins (1993) <doi:10.1016/0167-9473(93)90246-P>. The feasible solution algorithm comes up with model forms of a specific type that can have fixed variables, higher order interactions and their lower order terms.

Maintained by Joshua Lambert. Last updated 4 years ago.

algorithm fsa interaction models parallel statistical statistics subset

11.0 match 7 stars 4.15 score 20 scripts

bioc

rhdf5:R Interface to HDF5

This package provides an interface between HDF5 and R. HDF5's main features are the ability to store and access very large and/or complex datasets and a wide variety of metadata on mass storage (disk) through a completely portable file format. The rhdf5 package is thus suited for the exchange of large and/or complex datasets between R and other software package, and for letting R applications work on datasets that are larger than the available RAM.

Maintained by Mike Smith. Last updated 2 months ago.

infrastructure dataimport hdf5 rhdf5 openssl curl zlib cpp

2.8 match 62 stars 15.93 score 4.2k scripts 232 dependents

grasia

knnp:Time Series Prediction using K-Nearest Neighbors Algorithm (Parallel)

Two main functionalities are provided. One of them is predicting values with k-nearest neighbors algorithm and the other is optimizing the parameters k and d of the algorithm. These are carried out in parallel using multiple threads.

Maintained by Daniel Bastarrica Lacalle. Last updated 5 years ago.

knearest-neighbor-algorithm parallel time-series-forecasting

16.6 match 1 stars 2.70 score 8 scripts

mkoohafkan

reval:Argument Table Generation for Sensitivity Analysis

Simplified scenario testing and sensitivity analysis, redesigned to use packages 'future' and 'furrr'. Provides functions for generating function argument sets using one-factor-at-a-time (OFAT) and (sampled) permutations.

Maintained by Michael C Koohafkan. Last updated 6 months ago.

parallel sensitivity-analysis

11.0 match 2 stars 4.04 score 11 scripts

daqana

dqrng:Fast Pseudo Random Number Generators

Several fast random number generators are provided as C++ header only libraries: The PCG family by O'Neill (2014 <https://www.cs.hmc.edu/tr/hmc-cs-2014-0905.pdf>) as well as the Xoroshiro / Xoshiro family by Blackman and Vigna (2021 <doi:10.1145/3460772>). In addition fast functions for generating random numbers according to a uniform, normal and exponential distribution are included. The latter two use the Ziggurat algorithm originally proposed by Marsaglia and Tsang (2000, <doi:10.18637/jss.v005.i08>). The fast sampling methods support unweighted sampling both with and without replacement. These functions are exported to R and as a C++ interface and are enabled for use with the default 64 bit generator from the PCG family, Xoroshiro128+/++/** and Xoshiro256+/++/** as well as the 64 bit version of the 20 rounds Threefry engine (Salmon et al., 2011, <doi:10.1145/2063384.2063405>) as provided by the package 'sitmo'.

Maintained by Ralf Stubner. Last updated 6 months ago.

random random-distributions random-generation random-sampling rng cpp

3.3 match 42 stars 13.12 score 188 scripts 183 dependents

urbananalyst

dodgr:Distances on Directed Graphs

Distances on dual-weighted directed graphs using priority-queue shortest paths (Padgham (2019) <doi:10.32866/6945>). Weighted directed graphs have weights from A to B which may differ from those from B to A. Dual-weighted directed graphs have two sets of such weights. A canonical example is a street network to be used for routing in which routes are calculated by weighting distances according to the type of way and mode of transport, yet lengths of routes must be calculated from direct distances.

Maintained by Mark Padgham. Last updated 4 days ago.

distance openstreetmap router shortest-paths street-networks cpp

3.8 match 129 stars 11.53 score 229 scripts 4 dependents

ropensci

drake:A Pipeline Toolkit for Reproducible Computation at Scale

A general-purpose computational engine for data analysis, drake rebuilds intermediate data objects when their dependencies change, and it skips work when the results are already up to date. Not every execution starts from scratch, there is native support for parallel and distributed computing, and completed projects have tangible evidence that they are reproducible. Extensive documentation, from beginner-friendly tutorials to practical examples and more, is available at the reference website <https://docs.ropensci.org/drake/> and the online manual <https://books.ropensci.org/drake/>.

Maintained by William Michael Landau. Last updated 3 months ago.

data-science drake high-performance-computing makefile peer-reviewed pipeline reproducibility reproducible-research ropensci workflow

3.8 match 1.3k stars 11.49 score 1.7k scripts 1 dependents

snoweye

pbdMPI:R Interface to MPI for HPC Clusters (Programming with Big Data Project)

A simplified, efficient, interface to MPI for HPC clusters. It is a derivation and rethinking of the Rmpi package. pbdMPI embraces the prevalent parallel programming style on HPC clusters. Beyond the interface, a collection of functions for global work with distributed data and resource-independent RNG reproducibility is included. It is based on S4 classes and methods.

Maintained by Wei-Chen Chen. Last updated 6 months ago.

openmpi

6.0 match 2 stars 7.11 score 179 scripts 3 dependents

braverock

PortfolioAnalytics:Portfolio Analysis, Including Numerical Methods for Optimization of Portfolios

Portfolio optimization and analysis routines and graphics.

Maintained by Brian G. Peterson. Last updated 3 months ago.

3.7 match 81 stars 11.49 score 626 scripts 2 dependents

mrc-ide

monty:Monte Carlo Models

Experimental sources for the next generation of mcstate, now called 'monty', which will support much of the old mcstate functionality but new things like better parameter interfaces, Hamiltonian Monte Carlo, and other features.

Maintained by Rich FitzJohn. Last updated 1 months ago.

cpp

5.7 match 3 stars 7.52 score 29 scripts 3 dependents

xiaolei-lab

rMVP:Memory-Efficient, Visualize-Enhanced, Parallel-Accelerated GWAS Tool

A memory-efficient, visualize-enhanced, parallel-accelerated Genome-Wide Association Study (GWAS) tool. It can (1) effectively process large data, (2) rapidly evaluate population structure, (3) efficiently estimate variance components several algorithms, (4) implement parallel-accelerated association tests of markers three methods, (5) globally efficient design on GWAS process computing, (6) enhance visualization of related information. 'rMVP' contains three models GLM (Alkes Price (2006) <DOI:10.1038/ng1847>), MLM (Jianming Yu (2006) <DOI:10.1038/ng1702>) and FarmCPU (Xiaolei Liu (2016) <doi:10.1371/journal.pgen.1005767>); variance components estimation methods EMMAX (Hyunmin Kang (2008) <DOI:10.1534/genetics.107.080101>;), FaSTLMM (method: Christoph Lippert (2011) <DOI:10.1038/nmeth.1681>, R implementation from 'GAPIT2': You Tang and Xiaolei Liu (2016) <DOI:10.1371/journal.pone.0107684> and 'SUPER': Qishan Wang and Feng Tian (2014) <DOI:10.1371/journal.pone.0107684>), and HE regression (Xiang Zhou (2017) <DOI:10.1214/17-AOAS1052>).

Maintained by Xiaolei Liu. Last updated 2 months ago.

openblas cpp openmp

5.3 match 287 stars 8.06 score 38 scripts

mattmar

rasterdiv:Diversity Indices for Numerical Matrices

Provides methods to calculate diversity indices on numerical matrices based on information theory, as described in Rocchini, Marcantonio and Ricotta (2017) <doi:10.1016/j.ecolind.2016.07.039>, and Rocchini et al. (2021) <doi:10.1101/2021.01.23.427872>.

Maintained by Matteo Marcantonio. Last updated 18 days ago.

5.5 match 15 stars 7.65 score 44 scripts 1 dependents

henrikbengtsson

marshal:Framework to Marshal Objects to be Used in Another R Process

Some types of R objects can be used only in the R session they were created. If used as-is in another R process, such objects often result in an immediate error or in obscure and hard-to-troubleshoot outcomes. Because of this, they cannot be saved to file and re-used at a later time. They can also not be exported to a worker in parallel processing. These objects are sometimes referred to as non-exportable or non-serializable objects. One solution to this problem is to use "marshalling" to encode the R object into an exportable representation that then can be used to re-create a copy of that object in another R process. This package provides a framework for marshalling and unmarshalling R objects such that they can be transferred using functions such as serialize() and unserialize() of base R.

Maintained by Henrik Bengtsson. Last updated 1 years ago.

marshalling parallel serialization

13.4 match 14 stars 3.10 score 18 scripts

futureverse

marshal:Framework to Marshal Objects to be Used in Another R Process

Some types of R objects can be used only in the R session they were created. If used as-is in another R process, such objects often result in an immediate error or in obscure and hard-to-troubleshoot outcomes. Because of this, they cannot be saved to file and re-used at a later time. They can also not be exported to a worker in parallel processing. These objects are sometimes referred to as non-exportable or non-serializable objects. One solution to this problem is to use "marshalling" to encode the R object into an exportable representation that then can be used to re-create a copy of that object in another R process. This package provides a framework for marshalling and unmarshalling R objects such that they can be transferred using functions such as serialize() and unserialize() of base R.

Maintained by Henrik Bengtsson. Last updated 1 years ago.

marshalling parallel serialization

13.4 match 14 stars 3.10 score 18 scripts

privefl

bigparallelr:Easy Parallel Tools

Utility functions for easy parallelism in R. Include some reexports from other packages, utility functions for splitting and parallelizing over blocks, and choosing and setting the number of cores used.

Maintained by Florian Privé. Last updated 5 months ago.

6.4 match 4 stars 6.44 score 76 scripts 19 dependents

bioc

HIBAG:HLA Genotype Imputation with Attribute Bagging

Imputes HLA classical alleles using GWAS SNP data, and it relies on a training set of HLA and SNP genotypes. HIBAG can be used by researchers with published parameter estimates instead of requiring access to large training sample datasets. It combines the concepts of attribute bagging, an ensemble classifier method, with haplotype inference for SNPs and HLA types. Attribute bagging is a technique which improves the accuracy and stability of classifier ensembles using bootstrap aggregating and random variable selection.

Maintained by Xiuwen Zheng. Last updated 4 months ago.

genetics statisticalmethod bioinformatics gpu hla imputation mhc snp cpp

5.0 match 30 stars 8.24 score 48 scripts

ggobi

GGally:Extension to 'ggplot2'

The R package 'ggplot2' is a plotting system based on the grammar of graphics. 'GGally' extends 'ggplot2' by adding several functions to reduce the complexity of combining geometric objects with transformed data. Some of these functions include a pairwise plot matrix, a two group pairwise plot matrix, a parallel coordinates plot, a survival plot, and several functions to plot networks.

Maintained by Barret Schloerke. Last updated 10 months ago.

2.5 match 597 stars 16.15 score 17k scripts 154 dependents

iiasa

ibis.iSDM:Modelling framework for integrated biodiversity distribution scenarios

Integrated framework of modelling the distribution of species and ecosystems in a suitability framing. This package allows the estimation of integrated species distribution models (iSDM) based on several sources of evidence and provided presence-only and presence-absence datasets. It makes heavy use of point-process models for estimating habitat suitability and allows to include spatial latent effects and priors in the estimation. To do so 'ibis.iSDM' supports a number of engines for Bayesian and more non-parametric machine learning estimation. Further, the 'ibis.iSDM' is specifically customized to support spatial-temporal projections of habitat suitability into the future.

Maintained by Martin Jung. Last updated 4 months ago.

bayesian biodiversity integrated-framework poisson-process scenarios sdm spatial-grain spatial-predictions species-distribution-modelling

9.2 match 21 stars 4.36 score 12 scripts 1 dependents

aphp

heemod:Markov Models for Health Economic Evaluations

An implementation of the modelling and reporting features described in reference textbook and guidelines (Briggs, Andrew, et al. Decision Modelling for Health Economic Evaluation. Oxford Univ. Press, 2011; Siebert, U. et al. State-Transition Modeling. Medical Decision Making 32, 690-700 (2012).): deterministic and probabilistic sensitivity analysis, heterogeneity analysis, time dependency on state-time and model-time (semi-Markov and non-homogeneous Markov models), etc.

Maintained by Kevin Zarca. Last updated 6 months ago.

4.5 match 15 stars 8.81 score 204 scripts

futureverse

future.tools:Tools for Working with Futures

Tools for Working with Futures.

Maintained by Henrik Bengtsson. Last updated 9 months ago.

parallel-computing parallel-programming

15.0 match 2 stars 2.60 score

bioc

sitePath:Phylogeny-based sequence clustering with site polymorphism

Using site polymorphism is one of the ways to cluster DNA/protein sequences but it is possible for the sequences with the same polymorphism on a single site to be genetically distant. This package is aimed at clustering sequences using site polymorphism and their corresponding phylogenetic trees. By considering their location on the tree, only the structurally adjacent sequences will be clustered. However, the adjacent sequences may not necessarily have the same polymorphism. So a branch-and-bound like algorithm is used to minimize the entropy representing the purity of site polymorphism of each cluster.

Maintained by Chengyang Ji. Last updated 5 months ago.

alignment multiplesequencealignment phylogenetics snp software mutation cpp

7.5 match 8 stars 5.20 score 9 scripts

2005m

kit:Data Manipulation Functions Implemented in C

Basic functions, implemented in C, for large data manipulation. Fast vectorised ifelse()/nested if()/switch() functions, psum()/pprod() functions equivalent to pmin()/pmax() plus others which are missing from base R. Most of these functions are callable at C level.

Maintained by Morgan Jacob. Last updated 6 months ago.

openmp

4.3 match 58 stars 9.11 score 92 scripts 5 dependents

martynplummer

rjags:Bayesian Graphical Models using MCMC

Interface to the JAGS MCMC library.

Maintained by Martyn Plummer. Last updated 7 months ago.

jags cpp

4.0 match 7 stars 9.48 score 4.0k scripts 165 dependents

funecology

fundiversity:Easy Computation of Functional Diversity Indices

Computes six functional diversity indices. These are namely, Functional Divergence (FDiv), Function Evenness (FEve), Functional Richness (FRic), Functional Richness intersections (FRic_intersect), Functional Dispersion (FDis), and Rao's entropy (Q) (reviewed in Villéger et al. 2008 <doi:10.1890/07-1206.1>). Provides efficient, modular, and parallel functions to compute functional diversity indices (Grenié & Gruson 2023 <doi:10.1111/ecog.06585>).

Maintained by Matthias Grenié. Last updated 8 months ago.

biodiversity biodiversity-indicators biodiversity-informatics functional-diversity functional-ecology functional-trait functional-traits trait trait-based traits

5.2 match 38 stars 7.34 score 38 scripts

johncoene

echarts4r:Create Interactive Graphs with 'Echarts JavaScript' Version 5

Easily create interactive charts by leveraging the 'Echarts Javascript' library which includes 36 chart types, themes, 'Shiny' proxies and animations.

Maintained by David Munoz Tord. Last updated 1 days ago.

echarts hacktoberfest htmlwidget htmlwidgets visualization

3.3 match 603 stars 11.45 score 1.3k scripts 11 dependents

heike

ggparallel:Variations of Parallel Coordinate Plots for Categorical Data

Create hammock plots, parallel sets, and common angle plots with 'ggplot2'.

Maintained by Heike Hofmann. Last updated 1 years ago.

7.1 match 41 stars 5.32 score 51 scripts

sapfluxnet

sapfluxnetr:Working with 'Sapfluxnet' Project Data

Access, modify, aggregate and plot data from the 'Sapfluxnet' project (<http://sapfluxnet.creaf.cat>), the first global database of sap flow measurements.

Maintained by Victor Granda. Last updated 2 years ago.

5.8 match 25 stars 6.57 score 49 scripts

r-spatial

RSAGA:SAGA Geoprocessing and Terrain Analysis

Provides access to geocomputing and terrain analysis functions of the geographical information system (GIS) 'SAGA' (System for Automated Geoscientific Analyses) from within R by running the command line version of SAGA. This package furthermore provides several R functions for handling ASCII grids, including a flexible framework for applying local functions (including predict methods of fitted models) and focal functions to multiple grids. SAGA GIS is available under GPL-2 / LGPL-2 licences from <https://sourceforge.net/projects/saga-gis/>.

Maintained by Alexander Brenning. Last updated 1 months ago.

4.3 match 23 stars 8.72 score 275 scripts

jwb133

smcfcs:Multiple Imputation of Covariates by Substantive Model Compatible Fully Conditional Specification

Implements multiple imputation of missing covariates by Substantive Model Compatible Fully Conditional Specification. This is a modification of the popular FCS/chained equations multiple imputation approach, and allows imputation of missing covariate values from models which are compatible with the user specified substantive model.

Maintained by Jonathan Bartlett. Last updated 15 hours ago.

4.0 match 11 stars 9.00 score 59 scripts 1 dependents

bcallaway11

did:Treatment Effects with Multiple Periods and Groups

The standard Difference-in-Differences (DID) setup involves two periods and two groups -- a treated group and untreated group. Many applications of DID methods involve more than two periods and have individuals that are treated at different points in time. This package contains tools for computing average treatment effect parameters in Difference in Differences setups with more than two periods and with variation in treatment timing using the methods developed in Callaway and Sant'Anna (2021) <doi:10.1016/j.jeconom.2020.12.001>. The main parameters are group-time average treatment effects which are the average treatment effect for a particular group at a a particular time. These can be aggregated into a fewer number of treatment effect parameters, and the package deals with the cases where there is selective treatment timing, dynamic treatment effects, calendar time effects, or combinations of these. There are also functions for testing the Difference in Differences assumption, and plotting group-time average treatment effects.

Maintained by Brantly Callaway. Last updated 4 months ago.

3.0 match 327 stars 12.01 score 696 scripts 3 dependents

eddelbuettel

prrd:Parallel Runs of Reverse Depends

Reverse depends for a given package are queued such that multiple workers can run the reverse-dependency tests in parallel.

Maintained by Dirk Eddelbuettel. Last updated 28 days ago.

hacktoberfest reverse-dependencies

7.3 match 12 stars 4.95 score 2 scripts

brry

berryFunctions:Function Collection Related to Plotting and Hydrology

Draw horizontal histograms, color scattered points by 3rd dimension, enhance date- and log-axis plots, zoom in X11 graphics, trace errors and warnings, use the unit hydrograph in a linear storage cascade, convert lists to data.frames and arrays, fit multiple functions.

Maintained by Berry Boessenkool. Last updated 1 months ago.

3.8 match 13 stars 9.43 score 350 scripts 16 dependents

prioritizr

prioritizr:Systematic Conservation Prioritization in R

Systematic conservation prioritization using mixed integer linear programming (MILP). It provides a flexible interface for building and solving conservation planning problems. Once built, conservation planning problems can be solved using a variety of commercial and open-source exact algorithm solvers. By using exact algorithm solvers, solutions can be generated that are guaranteed to be optimal (or within a pre-specified optimality gap). Furthermore, conservation problems can be constructed to optimize the spatial allocation of different management actions or zones, meaning that conservation practitioners can identify solutions that benefit multiple stakeholders. To solve large-scale or complex conservation planning problems, users should install the Gurobi optimization software (available from <https://www.gurobi.com/>) and the 'gurobi' R package (see Gurobi Installation Guide vignette for details). Users can also install the IBM CPLEX software (<https://www.ibm.com/products/ilog-cplex-optimization-studio/cplex-optimizer>) and the 'cplexAPI' R package (available at <https://github.com/cran/cplexAPI>). Additionally, the 'rcbc' R package (available at <https://github.com/dirkschumacher/rcbc>) can be used to generate solutions using the CBC optimization software (<https://github.com/coin-or/Cbc>). For further details, see Hanson et al. (2025) <doi:10.1111/cobi.14376>.

Maintained by Richard Schuster. Last updated 10 days ago.

biodiversity conservation conservation-planner optimization prioritization solver spatial cpp

3.0 match 124 stars 11.82 score 584 scripts 2 dependents

teppeiyamamoto

mediation:Causal Mediation Analysis

We implement parametric and non parametric mediation analysis. This package performs the methods and suggestions in Imai, Keele and Yamamoto (2010) <DOI:10.1214/10-STS321>, Imai, Keele and Tingley (2010) <DOI:10.1037/a0020761>, Imai, Tingley and Yamamoto (2013) <DOI:10.1111/j.1467-985X.2012.01032.x>, Imai and Yamamoto (2013) <DOI:10.1093/pan/mps040> and Yamamoto (2013) <http://web.mit.edu/teppei/www/research/IVmediate.pdf>. In addition to the estimation of causal mediation effects, the software also allows researchers to conduct sensitivity analysis for certain parametric models.

Maintained by Teppei Yamamoto. Last updated 6 years ago.

3.4 match 10.48 score 896 scripts 11 dependents

cran

Rmpi:Interface (Wrapper) to MPI (Message-Passing Interface)

An interface (wrapper) to MPI. It also provides interactive R manager and worker environment.

Maintained by Hao Yu. Last updated 2 months ago.

openmpi

7.4 match 5 stars 4.76 score 5 dependents

bioc

matter:Out-of-core statistical computing and signal processing

Toolbox for larger-than-memory scientific computing and visualization, providing efficient out-of-core data structures using files or shared memory, for dense and sparse vectors, matrices, and arrays, with applications to nonuniformly sampled signals and images.

Maintained by Kylie A. Bemis. Last updated 3 months ago.

infrastructure datarepresentation dataimport dimensionreduction preprocessing cpp

3.7 match 57 stars 9.52 score 64 scripts 2 dependents

miracum

DQAstats:Core Functions for Data Quality Assessment

Perform data quality assessment ('DQA') of electronic health records ('EHR'). Publication: Kapsner et al. (2021) <doi:10.1055/s-0041-1733847>.

Maintained by Lorenz A. Kapsner. Last updated 12 days ago.

data-quality openjdk

5.3 match 9 stars 6.55 score 4 scripts 1 dependents

arturstat

TPmsm:Estimation of Transition Probabilities in Multistate Models

Estimation of transition probabilities for the illness-death model and or the three-state progressive model.

Maintained by Artur Araujo. Last updated 1 years ago.

illness-death-model kaplan-meier monte-carlo-simulation multi-state-models openmp-parallelization survival-analysis transition-probabilities openblas openmp

7.5 match 1 stars 4.52 score 22 scripts 1 dependents

bioc

flowViz:Visualization for flow cytometry

Provides visualization tools for flow cytometry data.

Maintained by Mike Jiang. Last updated 5 months ago.

immunooncology infrastructure flowcytometry cellbasedassays visualization

4.6 match 7.44 score 231 scripts 12 dependents

rostats

RSurveillance:Design and Analysis of Disease Surveillance Activities

A range of functions for the design and analysis of disease surveillance activities. These functions were originally developed for animal health surveillance activities but can be equally applied to aquatic animal, wildlife, plant and human health surveillance activities. Utilities are included for sample size calculation and analysis of representative surveys for disease freedom, risk-based studies for disease freedom and for prevalence estimation. This package is based on Cameron A., Conraths F., Frohlich A., Schauer B., Schulz K., Sergeant E., Sonnenburg J., Staubach C. (2015). R package of functions for risk-based surveillance. Deliverable 6.24, WP 6 - Decision making tools for implementing risk-based surveillance, Grant Number no. 310806, RISKSUR (<https://www.fp7-risksur.eu/sites/default/files/documents/Deliverables/RISKSUR_%28310806%29_D6.24.pdf>). Many of the 'RSurveillance' functions are incorporated into the 'epitools' website: Sergeant, ESG, 2019. Epitools epidemiological calculators. Ausvet Pty Ltd. Available at: <http://epitools.ausvet.com.au>.

Maintained by Rohan Sadler. Last updated 5 years ago.

8.5 match 3.98 score 64 scripts

csgillespie

benchmarkme:Crowd Sourced System Benchmarks

Benchmark your CPU and compare against other CPUs. Also provides functions for obtaining system specifications, such as RAM, CPU type, and R version.

Maintained by Colin Gillespie. Last updated 10 months ago.

benchmark

3.8 match 41 stars 8.96 score 118 scripts 13 dependents

bioc

mpra:Analyze massively parallel reporter assays

Tools for data management, count preprocessing, and differential analysis in massively parallel report assays (MPRA).

Maintained by Leslie Myint. Last updated 5 months ago.

software generegulation sequencing functionalgenomics

5.3 match 6 stars 6.28 score 15 scripts

roliveros-ramos

calibrar:Automated Parameter Estimation for Complex Models

General optimisation and specific tools for the parameter estimation (i.e. calibration) of complex models, including stochastic ones. It implements generic functions that can be used for fitting any type of models, especially those with non-differentiable objective functions, with the same syntax as 'stats::optim()'. It supports multiple phases estimation (sequential parameter masking), constrained optimization (bounding box restrictions) and automatic parallel computation of numerical gradients. Some common maximum likelihood estimation methods and automated construction of the objective function from simulated model outputs is provided. See <https://roliveros-ramos.github.io/calibrar/> for more details.

Maintained by Ricardo Oliveros-Ramos. Last updated 18 days ago.

modeling optimization optimization-methods

5.5 match 7 stars 6.05 score 27 scripts

ropensci

targets:Dynamic Function-Oriented 'Make'-Like Declarative Pipelines

Pipeline tools coordinate the pieces of computationally demanding analysis projects. The 'targets' package is a 'Make'-like pipeline tool for statistics and data science in R. The package skips costly runtime for tasks that are already up to date, orchestrates the necessary computation with implicit parallel computing, and abstracts files as R objects. If all the current output matches the current upstream code and data, then the whole pipeline is up to date, and the results are more trustworthy than otherwise. The methodology in this package borrows from GNU 'Make' (2015, ISBN:978-9881443519) and 'drake' (2018, <doi:10.21105/joss.00550>).

Maintained by William Michael Landau. Last updated 14 hours ago.

data-science high-performance-computing make peer-reviewed pipeline r-targetopia reproducibility reproducible-research targets workflow

2.2 match 973 stars 15.20 score 4.6k scripts 22 dependents

vegandevs

vegan:Community Ecology Package

Ordination methods, diversity analysis and other functions for community and vegetation ecologists.

Maintained by Jari Oksanen. Last updated 15 days ago.

ecological-modelling ecology ordination fortran openblas

1.7 match 472 stars 19.41 score 15k scripts 440 dependents

r-spatial

spdep:Spatial Dependence: Weighting Schemes, Statistics

A collection of functions to create spatial weights matrix objects from polygon 'contiguities', from point patterns by distance and tessellations, for summarizing these objects, and for permitting their use in spatial data analysis, including regional aggregation by minimum spanning tree; a collection of tests for spatial 'autocorrelation', including global 'Morans I' and 'Gearys C' proposed by 'Cliff' and 'Ord' (1973, ISBN: 0850860369) and (1981, ISBN: 0850860814), 'Hubert/Mantel' general cross product statistic, Empirical Bayes estimates and 'Assunção/Reis' (1999) <doi:10.1002/(SICI)1097-0258(19990830)18:16%3C2147::AID-SIM179%3E3.0.CO;2-I> Index, 'Getis/Ord' G ('Getis' and 'Ord' 1992) <doi:10.1111/j.1538-4632.1992.tb00261.x> and multicoloured join count statistics, 'APLE' ('Li 'et al.' ) <doi:10.1111/j.1538-4632.2007.00708.x>, local 'Moran's I', 'Gearys C' ('Anselin' 1995) <doi:10.1111/j.1538-4632.1995.tb00338.x> and 'Getis/Ord' G ('Ord' and 'Getis' 1995) <doi:10.1111/j.1538-4632.1995.tb00912.x>, 'saddlepoint' approximations ('Tiefelsdorf' 2002) <doi:10.1111/j.1538-4632.2002.tb01084.x> and exact tests for global and local 'Moran's I' ('Bivand et al.' 2009) <doi:10.1016/j.csda.2008.07.021> and 'LOSH' local indicators of spatial heteroscedasticity ('Ord' and 'Getis') <doi:10.1007/s00168-011-0492-y>. The implementation of most of these measures is described in 'Bivand' and 'Wong' (2018) <doi:10.1007/s11749-018-0599-x>, with further extensions in 'Bivand' (2022) <doi:10.1111/gean.12319>. 'Lagrange' multiplier tests for spatial dependence in linear models are provided ('Anselin et al'. 1996) <doi:10.1016/0166-0462(95)02111-6>, as are 'Rao' score tests for hypothesised spatial 'Durbin' models based on linear models ('Koley' and 'Bera' 2023) <doi:10.1080/17421772.2023.2256810>. A local indicators for categorical data (LICD) implementation based on 'Carrer et al.' (2021) <doi:10.1016/j.jas.2020.105306> and 'Bivand et al.' (2017) <doi:10.1016/j.spasta.2017.03.003> was added in 1.3-7. From 'spdep' and 'spatialreg' versions >= 1.2-1, the model fitting functions previously present in this package are defunct in 'spdep' and may be found in 'spatialreg'.

Maintained by Roger Bivand. Last updated 17 days ago.

spatial-autocorrelation spatial-dependence spatial-weights

2.0 match 131 stars 16.62 score 6.0k scripts 107 dependents

r-lib

httr2:Perform HTTP Requests and Process the Responses

Tools for creating and modifying HTTP requests, then performing them and processing the results. 'httr2' is a modern re-imagining of 'httr' that uses a pipe-based interface and solves more of the problems that API wrapping packages face.

Maintained by Hadley Wickham. Last updated 7 days ago.

http

1.9 match 246 stars 17.66 score 1.9k scripts 1.1k dependents

auto-optimization

iraceplot:Plots for Visualizing the Data Produced by the 'irace' Package

Graphical visualization tools for analyzing the data produced by 'irace'. The 'iraceplot' package enables users to analyze the performance and the parameter space data sampled by the configuration during the search process. It provides a set of functions that generate different plots to visualize the configurations sampled during the execution of 'irace' and their performance. The functions just require the log file generated by 'irace' and, in some cases, they can be used with user-provided data.

Maintained by Manuel López-Ibáñez. Last updated 1 months ago.

irace parameter-tuning

5.8 match 5 stars 5.70 score 7 scripts

t-kalinowski

keras:R Interface to 'Keras'

Interface to 'Keras' <https://keras.io>, a high-level neural networks 'API'. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both 'CPU' and 'GPU' devices.

Maintained by Tomasz Kalinowski. Last updated 11 months ago.

3.0 match 10.82 score 10k scripts 54 dependents

bioc

parglms:support for parallelized estimation of GLMs/GEEs

This package provides support for parallelized estimation of GLMs/GEEs, catering for dispersed data.

Maintained by VJ Carey. Last updated 5 months ago.

9.8 match 3.30 score 3 scripts

marcellgranat

currr:Apply Mapping Functions in Frequent Saving

Implementations of the family of map() functions with frequent saving of the intermediate results. The contained functions let you start the evaluation of the iterations where you stopped (reading the already evaluated ones from cache), and work with the currently evaluated iterations while remaining ones are running in a background job. Parallel computing is also easier with the workers parameter.

Maintained by Marcell Granat. Last updated 7 months ago.

checkpoints parallel-computing purrr

8.0 match 21 stars 4.02 score 7 scripts

cran

snowfall:Easier Cluster Computing (Based on 'snow')

Usability wrapper around snow for easier development of parallel R programs. This package offers e.g. extended error checks, and additional functions. All functions work in sequential mode, too, if no cluster is present or wished. Package is also designed as connector to the cluster management tool sfCluster, but can also used without it.

Maintained by Jochen Knaus. Last updated 1 years ago.

4.1 match 7.89 score 1.8k scripts 48 dependents

business-science

modeltime:The Tidymodels Extension for Time Series Modeling

The time series forecasting framework for use with the 'tidymodels' ecosystem. Models include ARIMA, Exponential Smoothing, and additional time series models from the 'forecast' and 'prophet' packages. Refer to "Forecasting Principles & Practice, Second edition" (<https://otexts.com/fpp2/>). Refer to "Prophet: forecasting at scale" (<https://research.facebook.com/blog/2017/02/prophet-forecasting-at-scale/>.).

Maintained by Matt Dancho. Last updated 5 months ago.

arima data-science deep-learning ets forecasting machine-learning machine-learning-algorithms modeltime prophet tbats tidymodeling tidymodels time time-series time-series-analysis timeseries timeseries-forecasting

3.0 match 549 stars 10.57 score 1.1k scripts 7 dependents

suyusung

R2jags:Using R to Run 'JAGS'

Providing wrapper functions to implement Bayesian analysis in JAGS. Some major features include monitoring convergence of a MCMC model using Rubin and Gelman Rhat statistics, automatically running a MCMC model till it converges, and implementing parallel processing of a MCMC model for multiple chains.

Maintained by Yu-Sung Su. Last updated 4 months ago.

jags cpp

2.8 match 8 stars 11.43 score 3.4k scripts 47 dependents

slzhang-fd

mirtjml:Joint Maximum Likelihood Estimation for High-Dimensional Item Factor Analysis

Provides constrained joint maximum likelihood estimation algorithms for item factor analysis (IFA) based on multidimensional item response theory models. So far, we provide functions for exploratory and confirmatory IFA based on the multidimensional two parameter logistic (M2PL) model for binary response data. Comparing with traditional estimation methods for IFA, the methods implemented in this package scale better to data with large numbers of respondents, items, and latent factors. The computation is facilitated by multiprocessing 'OpenMP' API. For more information, please refer to: 1. Chen, Y., Li, X., & Zhang, S. (2018). Joint Maximum Likelihood Estimation for High-Dimensional Exploratory Item Factor Analysis. Psychometrika, 1-23. <doi:10.1007/s11336-018-9646-5>; 2. Chen, Y., Li, X., & Zhang, S. (2019). Structured Latent Factor Analysis for Large-scale Data: Identifiability, Estimability, and Their Implications. Journal of the American Statistical Association, <doi: 10.1080/01621459.2019.1635485>.

Maintained by Siliang Zhang. Last updated 4 years ago.

ifa item-factor-analysis large-scale-assessment parallel-computing psychometrics openblas cpp openmp

7.5 match 9 stars 4.21 score 12 scripts 1 dependents

bioc

monocle:Clustering, differential expression, and trajectory analysis for single- cell RNA-Seq

Monocle performs differential expression and time-series analysis for single-cell expression experiments. It orders individual cells according to progress through a biological process, without knowing ahead of time which genes define progress through that process. Monocle also performs differential expression analysis, clustering, visualization, and other useful tasks on single cell expression data. It is designed to work with RNA-Seq and qPCR data, but could be used with other types as well.

Maintained by Cole Trapnell. Last updated 5 months ago.

immunooncology sequencing rnaseq geneexpression differentialexpression infrastructure dataimport datarepresentation visualization clustering multiplecomparison qualitycontrol cpp

3.5 match 8.89 score 1.6k scripts 2 dependents

jeff-hughes

paramtest:Run a Function Iteratively While Varying Parameters

Run simulations or other functions while easily varying parameters from one iteration to the next. Some common use cases would be grid search for machine learning algorithms, running sets of simulations (e.g., estimating statistical power for complex models), or bootstrapping under various conditions. See the 'paramtest' documentation for more information and examples.

Maintained by Jeffrey Hughes. Last updated 7 years ago.

6.5 match 1 stars 4.85 score 47 scripts

marcozanotti

dispositionEffect:Analysis of Disposition Effect on Financial Portfolios

Evaluate the presence of disposition effect and others irrational investor's behaviors based solely on investor's transactions and financial market data. Experimental data can also be used to perform the analysis. Four different methodologies are implemented to account for the different nature of human behaviors on financial markets. Novel analyses such as portfolio driven and time series disposition effect are also allowed.

Maintained by Marco Zanotti. Last updated 3 years ago.

behavioral-economics behavioral-sciences econometrics economics finance financial-analysis financial-data financial-markets time-series

6.0 match 4 stars 5.20 score 9 scripts

christophergandrud

mcreplicate:Multi-Core Replicate

Multi-core replication function to make it easier to do fast Monte Carlo simulation. Based on the mcreplicate() function from the 'rethinking' package. The 'rethinking' package requires installing 'rstan', which is onerous to install, while also not adding capabilities to this function.

Maintained by Christopher Gandrud. Last updated 4 years ago.

parallel-computing simulation

7.5 match 5 stars 4.16 score 29 scripts

liamrevell

phytools:Phylogenetic Tools for Comparative Biology (and Other Things)

A wide range of methods for phylogenetic analysis - concentrated in phylogenetic comparative biology, but also including numerous techniques for visualizing, analyzing, manipulating, reading or writing, and even inferring phylogenetic trees. Included among the functions in phylogenetic comparative biology are various for ancestral state reconstruction, model-fitting, and simulation of phylogenies and trait data. A broad range of plotting methods for phylogenies and comparative data include (but are not restricted to) methods for mapping trait evolution on trees, for projecting trees into phenotype space or a onto a geographic map, and for visualizing correlated speciation between trees. Lastly, numerous functions are designed for reading, writing, analyzing, inferring, simulating, and manipulating phylogenetic trees and comparative data. For instance, there are functions for computing consensus phylogenies from a set, for simulating phylogenetic trees and data under a range of models, for randomly or non-randomly attaching species or clades to a tree, as well as for a wide range of other manipulations and analyses that phylogenetic biologists might find useful in their research.

Maintained by Liam J. Revell. Last updated 26 days ago.

2.3 match 218 stars 13.85 score 4.8k scripts 76 dependents

ss3sim

ss3sim:Fisheries Stock Assessment Simulation Testing with Stock Synthesis

A framework for fisheries stock assessment simulation testing with Stock Synthesis (SS3) as described in Anderson et al. (2014) <doi:10.1371/journal.pone.0092725>.

Maintained by Kelli F. Johnson. Last updated 5 months ago.

fisheries simulation stock-synthesis

3.5 match 39 stars 8.89 score 149 scripts

lme4

lme4:Linear Mixed-Effects Models using 'Eigen' and S4

Fit linear and generalized linear mixed-effects models. The models and their components are represented using S4 classes and methods. The core computational algorithms are implemented using the 'Eigen' C++ library for numerical linear algebra and 'RcppEigen' "glue".

Maintained by Ben Bolker. Last updated 1 days ago.

cpp

1.5 match 647 stars 20.69 score 35k scripts 1.5k dependents

bioc

GenomicFiles:Distributed computing by file or by range

This package provides infrastructure for parallel computations distributed 'by file' or 'by range'. User defined MAPPER and REDUCER functions provide added flexibility for data combination and manipulation.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

genetics infrastructure dataimport sequencing coverage

4.5 match 6.86 score 89 scripts 16 dependents

bwlewis

doRedis:'Foreach' Parallel Adapter Using the 'Redis' Database

Create and manage fault-tolerant task queues for the 'foreach' package using the 'Redis' key/value database.

Maintained by B. W. Lewis. Last updated 4 years ago.

4.7 match 71 stars 6.56 score 42 scripts

branchlab

metasnf:Meta Clustering with Similarity Network Fusion

Framework to facilitate patient subtyping with similarity network fusion and meta clustering. The similarity network fusion (SNF) algorithm was introduced by Wang et al. (2014) in <doi:10.1038/nmeth.2810>. SNF is a data integration approach that can transform high-dimensional and diverse data types into a single similarity network suitable for clustering with minimal loss of information from each initial data source. The meta clustering approach was introduced by Caruana et al. (2006) in <doi:10.1109/ICDM.2006.103>. Meta clustering involves generating a wide range of cluster solutions by adjusting clustering hyperparameters, then clustering the solutions themselves into a manageable number of qualitatively similar solutions, and finally characterizing representative solutions to find ones that are best for the user's specific context. This package provides a framework to easily transform multi-modal data into a wide range of similarity network fusion-derived cluster solutions as well as to visualize, characterize, and validate those solutions. Core package functionality includes easy customization of distance metrics, clustering algorithms, and SNF hyperparameters to generate diverse clustering solutions; calculation and plotting of associations between features, between patients, and between cluster solutions; and standard cluster validation approaches including resampled measures of cluster stability, standard metrics of cluster quality, and label propagation to evaluate generalizability in unseen data. Associated vignettes guide the user through using the package to identify patient subtypes while adhering to best practices for unsupervised learning.

Maintained by Prashanth S Velayudhan. Last updated 3 days ago.

bioinformatics clustering metaclustering snf

3.8 match 8 stars 8.21 score 30 scripts

bonsook

REN:Regularization Ensemble for Robust Portfolio Optimization

Portfolio optimization is achieved through a combination of regularization techniques and ensemble methods that are designed to generate stable out-of-sample return predictions, particularly in the presence of strong correlations among assets. The package includes functions for data preparation, parallel processing, and portfolio analysis using methods such as Mean-Variance, James-Stein, LASSO, Ridge Regression, and Equal Weighting. It also provides visualization tools and performance metrics, such as the Sharpe ratio, volatility, and maximum drawdown, to assess the results.

Maintained by Bonsoo Koo. Last updated 5 months ago.

6.1 match 1 stars 5.04 score 2 scripts

skranz

ParallelTrendsPlot:Experimental Package: Plots to diagnose parallel trends in DID regression with additional control variables.

Experimental Package: Plots to diagnose parallel trends in DID regression with additional control variables.

Maintained by Sebastian Kranz. Last updated 3 years ago.

12.4 match 6 stars 2.48 score 3 scripts

benubah

control:A Control Systems Toolbox

Solves control systems problems relating to time/frequency response, LTI systems design and analysis, transfer function manipulations, and system conversion.

Maintained by Ben C. Ubah. Last updated 5 years ago.

5.2 match 19 stars 5.86 score 76 scripts

sparklyr

sparklyr:R Interface to Apache Spark

R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.

Maintained by Edgar Ruiz. Last updated 8 days ago.

apache-spark distributed dplyr ide livy machine-learning remote-clusters spark sparklyr

2.0 match 959 stars 15.16 score 4.0k scripts 21 dependents

cran

cdparcoord:Top Frequency-Based Parallel Coordinates

Parallel coordinate plotting with resolutions for large data sets and missing values.

Maintained by Norm Matloff. Last updated 6 years ago.

10.8 match 2.81 score 13 scripts

bioc

coMethDMR:Accurate identification of co-methylated and differentially methylated regions in epigenome-wide association studies

coMethDMR identifies genomic regions associated with continuous phenotypes by optimally leverages covariations among CpGs within predefined genomic regions. Instead of testing all CpGs within a genomic region, coMethDMR carries out an additional step that selects co-methylated sub-regions first without using any outcome information. Next, coMethDMR tests association between methylation within the sub-region and continuous phenotype using a random coefficient mixed effects model, which models both variations between CpG sites within the region and differential methylation simultaneously.

Maintained by Fernanda Veitzman. Last updated 5 months ago.

dnamethylation epigenetics methylationarray differentialmethylation genomewideassociation

4.7 match 7 stars 6.47 score 42 scripts

bioc

S4Vectors:Foundation of vector-like and list-like containers in Bioconductor

The S4Vectors package defines the Vector and List virtual classes and a set of generic functions that extend the semantic of ordinary vectors and lists in R. Package developers can easily implement vector-like or list-like objects as concrete subclasses of Vector or List. In addition, a few low-level concrete subclasses of general interest (e.g. DataFrame, Rle, Factor, and Hits) are implemented in the S4Vectors package itself (many more are implemented in the IRanges package and in other Bioconductor infrastructure packages).

Maintained by Hervé Pagès. Last updated 1 months ago.

infrastructure datarepresentation bioconductor-package core-package

1.9 match 18 stars 16.05 score 1.0k scripts 1.9k dependents

stan-dev

bayesplot:Plotting for Bayesian Models

Plotting functions for posterior analysis, MCMC diagnostics, prior and posterior predictive checks, and other visualizations to support the applied Bayesian workflow advocated in Gabry, Simpson, Vehtari, Betancourt, and Gelman (2019) <doi:10.1111/rssa.12378>. The package is designed not only to provide convenient functionality for users, but also a common set of functions that can be easily used by developers working on a variety of R packages for Bayesian modeling, particularly (but not exclusively) packages interfacing with 'Stan'.

Maintained by Jonah Gabry. Last updated 1 months ago.

bayesian ggplot2 mcmc pandoc stan statistical-graphics visualization

1.8 match 436 stars 16.69 score 6.5k scripts 98 dependents

patriciamar

ShinyItemAnalysis:Test and Item Analysis via Shiny

Package including functions and interactive shiny application for the psychometric analysis of educational tests, psychological assessments, health-related and other types of multi-item measurements, or ratings from multiple raters.

Maintained by Patricia Martinkova. Last updated 1 months ago.

assessment differential-item-functioning item-analysis item-response-theory psychometrics shiny

3.8 match 44 stars 7.88 score 105 scripts 3 dependents

thomasp85

ggforce:Accelerating 'ggplot2'

The aim of 'ggplot2' is to aid in visual data investigations. This focus has led to a lack of facilities for composing specialised plots. 'ggforce' aims to be a collection of mainly new stats and geoms that fills this gap. All additional functionality is aimed to come through the official extension system so using 'ggforce' should be a stable experience.

Maintained by Thomas Lin Pedersen. Last updated 1 years ago.

ggplot-extension ggplot2 visualization cpp

1.9 match 920 stars 15.83 score 9.3k scripts 293 dependents

f-rousset

spaMM:Mixed-Effect Models, with or without Spatial Random Effects

Inference based on models with or without spatially-correlated random effects, multivariate responses, or non-Gaussian random effects (e.g., Beta). Variation in residual variance (heteroscedasticity) can itself be represented by a mixed-effect model. Both classical geostatistical models (Rousset and Ferdy 2014 <doi:10.1111/ecog.00566>), and Markov random field models on irregular grids (as considered in the 'INLA' package, <https://www.r-inla.org>), can be fitted, with distinct computational procedures exploiting the sparse matrix representations for the latter case and other autoregressive models. Laplace approximations are used for likelihood or restricted likelihood. Penalized quasi-likelihood and other variants discussed in the h-likelihood literature (Lee and Nelder 2001 <doi:10.1093/biomet/88.4.987>) are also implemented.

Maintained by François Rousset. Last updated 9 months ago.

gsl cpp openmp

6.0 match 4.94 score 208 scripts 5 dependents

marce10

warbleR:Streamline Bioacoustic Analysis

Functions aiming to facilitate the analysis of the structure of animal acoustic signals in 'R'. 'warbleR' makes use of the basic sound analysis tools from the packages 'tuneR' and 'seewave', and offers new tools for explore and quantify acoustic signal structure. The package allows to organize and manipulate multiple sound files, create spectrograms of complete recordings or individual signals in different formats, run several measures of acoustic structure, and characterize different structural levels in acoustic signals.

Maintained by Marcelo Araya-Salas. Last updated 2 months ago.

animal-acoustic-signals audio-processing bioacoustics spectrogram streamline-analysis cpp

2.7 match 54 stars 11.01 score 270 scripts 4 dependents

r-forge

tm:Text Mining Package

A framework for text mining applications within R.

Maintained by Kurt Hornik. Last updated 24 days ago.

cpp

2.3 match 12.96 score 14k scripts 101 dependents

zdk123

pulsar:Parallel Utilities for Lambda Selection along a Regularization Path

Model selection for penalized graphical models using the Stability Approach to Regularization Selection ('StARS'), with options for speed-ups including Bounded StARS (B-StARS), batch computing, and other stability metrics (e.g., graphlet stability G-StARS). Christian L. Müller, Richard Bonneau, Zachary Kurtz (2016) <arXiv:1605.07072>.

Maintained by Zachary Kurtz. Last updated 1 years ago.

graphical-models

4.7 match 10 stars 6.16 score 65 scripts

rapidsurveys

bbw:Blocked Weighted Bootstrap

The blocked weighted bootstrap (BBW) is an estimation technique for use with data from two-stage cluster sampled surveys in which either prior weighting (e.g. population-proportional sampling or PPS as used in Standardized Monitoring and Assessment of Relief and Transitions or SMART surveys) or posterior weighting (e.g. as used in rapid assessment method or RAM and simple spatial sampling method or S3M surveys) is implemented. See Cameron et al (2008) <doi:10.1162/rest.90.3.414> for application of bootstrap to cluster samples. See Aaron et al (2016) <doi:10.1371/journal.pone.0163176> and Aaron et al (2016) <doi:10.1371/journal.pone.0162462> for application of the blocked weighted bootstrap to estimate indicators from two-stage cluster sampled surveys.

Maintained by Ernest Guevarra. Last updated 2 months ago.

bootstrapping-statistics ram surveys

4.9 match 3 stars 5.91 score 9 scripts 2 dependents

bioc

gdsfmt:R Interface to CoreArray Genomic Data Structure (GDS) Files

Provides a high-level R interface to CoreArray Genomic Data Structure (GDS) data files. GDS is portable across platforms with hierarchical structure to store multiple scalable array-oriented data sets with metadata information. It is suited for large-scale datasets, especially for data which are much larger than the available random-access memory. The gdsfmt package offers the efficient operations specifically designed for integers of less than 8 bits, since a diploid genotype, like single-nucleotide polymorphism (SNP), usually occupies fewer bits than a byte. Data compression and decompression are available with relatively efficient random access. It is also allowed to read a GDS file in parallel with multiple R processes supported by the package parallel.

Maintained by Xiuwen Zheng. Last updated 19 days ago.

infrastructure dataimport bioinformatics gds-format genomics cpp

2.6 match 18 stars 11.30 score 920 scripts 30 dependents

statnet

ergm:Fit, Simulate and Diagnose Exponential-Family Models for Networks

An integrated set of tools to analyze and simulate networks based on exponential-family random graph models (ERGMs). 'ergm' is a part of the Statnet suite of packages for network analysis. See Hunter, Handcock, Butts, Goodreau, and Morris (2008) <doi:10.18637/jss.v024.i03> and Krivitsky, Hunter, Morris, and Klumb (2023) <doi:10.18637/jss.v105.i06>.

Maintained by Pavel N. Krivitsky. Last updated 5 days ago.

1.9 match 100 stars 15.36 score 1.4k scripts 36 dependents

jpquast

protti:Bottom-Up Proteomics and LiP-MS Quality Control and Data Analysis Tools

Useful functions and workflows for proteomics quality control and data analysis of both limited proteolysis-coupled mass spectrometry (LiP-MS) (Feng et. al. (2014) <doi:10.1038/nbt.2999>) and regular bottom-up proteomics experiments. Data generated with search tools such as 'Spectronaut', 'MaxQuant' and 'Proteome Discover' can be easily used due to flexibility of functions.

Maintained by Jan-Philipp Quast. Last updated 5 months ago.

data-analysis lip-ms mass-spectrometry omics protein proteomics systems-biology

3.3 match 61 stars 8.58 score 83 scripts

alexchristensen

NetworkToolbox:Methods and Measures for Brain, Cognitive, and Psychometric Network Analysis

Implements network analysis and graph theory measures used in neuroscience, cognitive science, and psychology. Methods include various filtering methods and approaches such as threshold, dependency (Kenett, Tumminello, Madi, Gur-Gershgoren, Mantegna, & Ben-Jacob, 2010 <doi:10.1371/journal.pone.0015032>), Information Filtering Networks (Barfuss, Massara, Di Matteo, & Aste, 2016 <doi:10.1103/PhysRevE.94.062306>), and Efficiency-Cost Optimization (Fallani, Latora, & Chavez, 2017 <doi:10.1371/journal.pcbi.1005305>). Brain methods include the recently developed Connectome Predictive Modeling (see references in package). Also implements several network measures including local network characteristics (e.g., centrality), community-level network characteristics (e.g., community centrality), global network characteristics (e.g., clustering coefficient), and various other measures associated with the reliability and reproducibility of network analysis.

Maintained by Alexander Christensen. Last updated 2 years ago.

network-analysis

4.0 match 23 stars 6.99 score 101 scripts 4 dependents

rstudio

tfdatasets:Interface to 'TensorFlow' Datasets

Interface to 'TensorFlow' Datasets, a high-level library for building complex input pipelines from simple, re-usable pieces. See <https://www.tensorflow.org/guide> for additional details.

Maintained by Tomasz Kalinowski. Last updated 3 days ago.

3.0 match 34 stars 9.32 score 656 scripts 3 dependents

beccadaniel

doSNOW:Foreach Parallel Adaptor for the 'snow' Package

Provides a parallel backend for the %dopar% function using the snow package of Tierney, Rossini, Li, and Sevcikova.

Maintained by Folashade Daniel. Last updated 3 years ago.

3.5 match 1 stars 7.88 score 2.6k scripts 98 dependents

pecanproject

PEcAn.data.remote:PEcAn Functions Used for Extracting Remote Sensing Data

PEcAn module for processing remote data. Python module requirements: requests, json, re, ast, panads, sys. If any of these modules are missing, install using pip install <module name>.

Maintained by Bailey Morrison. Last updated 14 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants

3.2 match 216 stars 8.74 score 6 scripts 5 dependents

jonlachmann

FBMS:Flexible Bayesian Model Selection and Model Averaging

Implements the Mode Jumping Markov Chain Monte Carlo algorithm described in <doi:10.1016/j.csda.2018.05.020> and its Genetically Modified counterpart described in <doi:10.1613/jair.1.13047> as well as the sub-sampling versions described in <doi:10.1016/j.ijar.2022.08.018> for flexible Bayesian model selection and model averaging.

Maintained by Jon Lachmann. Last updated 16 days ago.

cpp

11.3 match 2.45 score 28 scripts

r-spatial

stars:Spatiotemporal Arrays, Raster and Vector Data Cubes

Reading, manipulating, writing and plotting spatiotemporal arrays (raster and vector data cubes) in 'R', using 'GDAL' bindings provided by 'sf', and 'NetCDF' bindings by 'ncmeta' and 'RNetCDF'.

Maintained by Edzer Pebesma. Last updated 29 days ago.

raster satellite-images spatial

1.5 match 568 stars 18.26 score 7.2k scripts 135 dependents

snoweye

pmclust:Parallel Model-Based Clustering using Expectation-Gathering-Maximization Algorithm for Finite Mixture Gaussian Model

Aims to utilize model-based clustering (unsupervised) for high dimensional and ultra large data, especially in a distributed manner. The code employs 'pbdMPI' to perform a expectation-gathering-maximization algorithm for finite mixture Gaussian models. The unstructured dispersion matrices are assumed in the Gaussian models. The implementation is default in the single program multiple data programming model. The code can be executed through 'pbdMPI' and MPI' implementations such as 'OpenMPI' and 'MPICH'. See the High Performance Statistical Computing website <https://snoweye.github.io/hpsc/> for more information, documents and examples.

Maintained by Wei-Chen Chen. Last updated 2 years ago.

7.4 match 5 stars 3.70 score 4 scripts

antonio-pgarcia

rrepast:Invoke 'Repast Simphony' Simulation Models

An R and Repast integration tool for running individual-based (IbM) simulation models developed using 'Repast Simphony' Agent-Based framework directly from R code supporting multicore execution. This package integrates 'Repast Simphony' models within R environment, making easier the tasks of running and analyzing model output data for automated parameter calibration and for carrying out uncertainty and sensitivity analysis using the power of R environment.

Maintained by Antonio Prestes Garcia. Last updated 5 years ago.

openjdk

6.0 match 3 stars 4.53 score 38 scripts 1 dependents

bioc

easyRNASeq:Count summarization and normalization for RNA-Seq data

Calculates the coverage of high-throughput short-reads against a genome of reference and summarizes it per feature of interest (e.g. exon, gene, transcript). The data can be normalized as 'RPKM' or by the 'DESeq' or 'edgeR' package.

Maintained by Nicolas Delhomme. Last updated 5 months ago.

geneexpression rnaseq genetics preprocessing immunooncology

5.0 match 5.43 score 15 scripts 1 dependents