R-universe search: seeds

koheiw

newsmap:Semi-Supervised Model for Geographical Document Classification

Semissupervised model for geographical document classification (Watanabe 2018) <doi:10.1080/21670811.2017.1293487>. This package currently contains seed dictionaries in English, German, French, Spanish, Italian, Russian, Hebrew, Arabic, Turkish, Japanese and Chinese (Simplified and Traditional).

Maintained by Kohei Watanabe. Last updated 9 months ago.

machine-learning news-stories quanteda text-analysis

49.0 match 62 stars 6.05 score 8 scripts

newmi1988

seeds:Estimate Hidden Inputs using the Dynamic Elastic Net

Algorithms to calculate the hidden inputs of systems of differential equations. These hidden inputs can be interpreted as a control that tries to minimize the discrepancies between a given model and taken measurements. The idea is also called the Dynamic Elastic Net, as proposed in the paper "Learning (from) the errors of a systems biology model" (Engelhardt, Froelich, Kschischo 2016) <doi:10.1038/srep20772>. To use the experimental SBML import function, the 'rsbml' package is required. For installation I refer to the official 'rsbml' page: <https://bioconductor.org/packages/release/bioc/html/rsbml.html>.

Maintained by Tobias Newmiwaka. Last updated 4 years ago.

56.3 match 3.00 score 2 scripts

bioc

DelayedArray:A unified framework for working transparently with on-disk and in-memory array-like datasets

Wrapping an array-like object (typically an on-disk object) in a DelayedArray object allows one to perform common array operations on it without loading the object in memory. In order to reduce memory usage and optimize performance, operations on the object are either delayed or executed using a block processing mechanism. Note that this also works on in-memory array-like objects like DataFrame objects (typically with Rle columns), Matrix objects, ordinary arrays and, data frames.

Maintained by Hervé Pagès. Last updated 1 months ago.

infrastructure datarepresentation annotation genomeannotation bioconductor-package core-package u24ca289073

9.1 match 27 stars 15.59 score 538 scripts 1.2k dependents

statmanrobin

Stat2Data:Datasets for Stat2

Datasets for the textbook Stat2: Modeling with Regression and ANOVA (second edition). The package also includes data for the first edition, Stat2: Building Models for a World of Data and a few functions for plotting diagnostics.

Maintained by Robin Lock. Last updated 6 years ago.

27.9 match 5 stars 4.94 score 544 scripts

flavjack

GerminaR:Indices and Graphics for Assess Seed Germination Process

A collection of different indices and visualization techniques for evaluate the seed germination process in ecophysiological studies (Lozano-Isla et al. 2019) <doi:10.1111/1440-1703.1275>.

Maintained by Flavio Lozano-Isla. Last updated 3 days ago.

germianaquant inkaverse plant-science seed-germination shinyapp

20.9 match 4 stars 6.29 score 81 scripts

aravind-j

germinationmetrics:Seed Germination Indices and Curve Fitting

Provides functions to compute various germination indices such as germinability, median germination time, mean germination time, mean germination rate, speed of germination, Timson's index, germination value, coefficient of uniformity of germination, uncertainty of germination process, synchrony of germination etc. from germination count data. Includes functions for fitting cumulative seed germination curves using four-parameter hill function and computation of associated parameters. See the vignette for more, including full list of citations for the methods implemented.

Maintained by J. Aravind. Last updated 5 months ago.

curve-fitting germination germination-indices seed seed-germination-curve seed-germination-indices

29.5 match 3 stars 4.38 score 26 scripts

kwstat

agridat:Agricultural Datasets

Datasets from books, papers, and websites related to agriculture. Example graphics and analyses are included. Data come from small-plot trials, multi-environment trials, uniformity trials, yield monitors, and more.

Maintained by Kevin Wright. Last updated 1 months ago.

data

10.5 match 126 stars 10.78 score 1.7k scripts 1 dependents

ropensci

targets:Dynamic Function-Oriented 'Make'-Like Declarative Pipelines

Pipeline tools coordinate the pieces of computationally demanding analysis projects. The 'targets' package is a 'Make'-like pipeline tool for statistics and data science in R. The package skips costly runtime for tasks that are already up to date, orchestrates the necessary computation with implicit parallel computing, and abstracts files as R objects. If all the current output matches the current upstream code and data, then the whole pipeline is up to date, and the results are more trustworthy than otherwise. The methodology in this package borrows from GNU 'Make' (2015, ISBN:978-9881443519) and 'drake' (2018, <doi:10.21105/joss.00550>).

Maintained by William Michael Landau. Last updated 2 days ago.

data-science high-performance-computing make peer-reviewed pipeline r-targetopia reproducibility reproducible-research targets workflow

7.3 match 975 stars 15.18 score 4.6k scripts 22 dependents

julianfaraway

faraway:Datasets and Functions for Books by Julian Faraway

Books are "Linear Models with R" published 1st Ed. August 2004, 2nd Ed. July 2014, 3rd Ed. February 2025 by CRC press, ISBN 9781439887332, and "Extending the Linear Model with R" published by CRC press in 1st Ed. December 2005 and 2nd Ed. March 2016, ISBN 9781584884248 and "Practical Regression and ANOVA in R" contributed documentation on CRAN (now very dated).

Maintained by Julian Faraway. Last updated 1 months ago.

data

10.8 match 29 stars 9.43 score 1.7k scripts 1 dependents

tacazares

SeedMatchR:Find Matches to Canonical SiRNA Seeds in Genomic Features

On-target gene knockdown using siRNA ideally results from binding fully complementary regions in mRNA transcripts to induce cleavage. Off-target siRNA gene knockdown can occur through several modes, one being a seed-mediated mechanism mimicking miRNA gene regulation. Seed-mediated off-target effects occur when the ~8 nucleotides at the 5’ end of the guide strand, called a seed region, bind the 3’ untranslated regions of mRNA, causing reduced translation. Experiments using siRNA knockdown paired with RNA-seq can be used to detect siRNA sequences with potential off-target effects driven by the seed region. 'SeedMatchR' provides tools for exploring and detecting potential seed-mediated off-target effects of siRNA in RNA-seq experiments. 'SeedMatchR' is designed to extend current differential expression analysis tools, such as 'DESeq2', by annotating results with predicted seed matches. Using publicly available data, we demonstrate the ability of 'SeedMatchR' to detect cumulative changes in differential gene expression attributed to siRNA seed regions.

Maintained by Tareian Cazares. Last updated 1 years ago.

deseq2-analysis mirna rna-seq sirna transcriptomics

20.5 match 7 stars 4.54 score 7 scripts

koheiw

LSX:Semi-Supervised Algorithm for Document Scaling

A word embeddings-based semi-supervised model for document scaling Watanabe (2020) <doi:10.1080/19312458.2020.1832976>. LSS allows users to analyze large and complex corpora on arbitrary dimensions with seed words exploiting efficiency of word embeddings (SVD, Glove). It can generate word vectors on a users-provided corpus or incorporate a pre-trained word vectors.

Maintained by Kohei Watanabe. Last updated 2 months ago.

lsa quanteda sentiment-analysis text-analysis

12.9 match 55 stars 6.09 score 14 scripts

efernandezpascual

seedr:Hydro and Thermal Time Seed Germination Models in R

Analysis of seed germination data using the physiological time modelling approach. Includes functions to fit hydrotime and thermal-time models with the traditional approaches of Bradford (1990) <doi:10.1104/pp.94.2.840> and Garcia-Huidobro (1982) <doi:10.1093/jxb/33.2.288>. Allows to fit models to grouped datasets, i.e. datasets containing multiple species, seedlots or experiments.

Maintained by Fernández-Pascual Eduardo. Last updated 4 years ago.

agronomy botany ecology germination seeds

19.6 match 2 stars 4.00 score 6 scripts

coatless-rpkg

sitmo:Parallel Pseudo Random Number Generator (PPRNG) 'sitmo' Header Files

Provided within are two high quality and fast PPRNGs that may be used in an 'OpenMP' parallel environment. In addition, there is a generator for one dimensional low-discrepancy sequence. The objective of this library to consolidate the distribution of the 'sitmo' (C++98 & C++11), 'threefry' and 'vandercorput' (C++11-only) engines on CRAN by enabling others to link to the header files inside of 'sitmo' instead of including a copy of each engine within their individual package. Lastly, the package contains example implementations using the 'sitmo' package and three accompanying vignette that provide additional information.

Maintained by James Balamuta. Last updated 1 years ago.

parallel random-generation rcpp cpp openmp

8.0 match 7 stars 9.75 score 15 scripts 201 dependents

nlmixr2

rxode2:Facilities for Simulating from ODE-Based Models

Facilities for running simulations from ordinary differential equation ('ODE') models, such as pharmacometrics and other compartmental models. A compilation manager translates the ODE model into C, compiles it, and dynamically loads the object code into R for improved computational efficiency. An event table object facilitates the specification of complex dosing regimens (optional) and sampling schedules. NB: The use of this package requires both C and Fortran compilers, for details on their use with R please see Section 6.3, Appendix A, and Appendix D in the "R Administration and Installation" manual. Also the code is mostly released under GPL. The 'VODE' and 'LSODA' are in the public domain. The information is available in the inst/COPYRIGHTS.

Maintained by Matthew L. Fidler. Last updated 1 months ago.

fortran openblas cpp openmp

6.9 match 40 stars 11.24 score 220 scripts 13 dependents

emf-creaf

indicspecies:Relationship Between Species and Groups of Sites

Functions to assess the strength and statistical significance of the relationship between species occurrence/abundance and groups of sites [De Caceres & Legendre (2009) <doi:10.1890/08-1823.1>]. Also includes functions to measure species niche breadth using resource categories [De Caceres et al. (2011) <doi:10.1111/J.1600-0706.2011.19679.x>].

Maintained by Miquel De Cáceres. Last updated 27 days ago.

7.5 match 10 stars 9.49 score 386 scripts 4 dependents

daqana

dqrng:Fast Pseudo Random Number Generators

Several fast random number generators are provided as C++ header only libraries: The PCG family by O'Neill (2014 <https://www.cs.hmc.edu/tr/hmc-cs-2014-0905.pdf>) as well as the Xoroshiro / Xoshiro family by Blackman and Vigna (2021 <doi:10.1145/3460772>). In addition fast functions for generating random numbers according to a uniform, normal and exponential distribution are included. The latter two use the Ziggurat algorithm originally proposed by Marsaglia and Tsang (2000, <doi:10.18637/jss.v005.i08>). The fast sampling methods support unweighted sampling both with and without replacement. These functions are exported to R and as a C++ interface and are enabled for use with the default 64 bit generator from the PCG family, Xoroshiro128+/++/** and Xoshiro256+/++/** as well as the 64 bit version of the 20 rounds Threefry engine (Salmon et al., 2011, <doi:10.1145/2063384.2063405>) as provided by the package 'sitmo'.

Maintained by Ralf Stubner. Last updated 6 months ago.

random random-distributions random-generation random-sampling rng cpp

5.4 match 42 stars 13.12 score 188 scripts 183 dependents

handcock

RDS:Respondent-Driven Sampling

Provides functionality for carrying out estimation with data collected using Respondent-Driven Sampling. This includes Heckathorn's RDS-I and RDS-II estimators as well as Gile's Sequential Sampling estimator. The package is part of the "RDS Analyst" suite of packages for the analysis of respondent-driven sampling data. See Gile and Handcock (2010) <doi:10.1111/j.1467-9531.2010.01223.x>, Gile and Handcock (2015) <doi:10.1111/rssa.12091> and Gile, Beaudry, Handcock and Ott (2018) <doi:10.1146/annurev-statistics-031017-100704>.

Maintained by Mark S. Handcock. Last updated 6 months ago.

17.9 match 1 stars 3.87 score 82 scripts 3 dependents

signaturescience

rplanes:Plausibility Analysis of Epidemiological Signals

Provides functionality to prepare data and analyze plausibility of both forecasted and reported epidemiological signals. The functions implement a set of plausibility algorithms that are agnostic to geographic and time resolutions and are calculated independently then presented as a combined score.

Maintained by VP Nagraj. Last updated 8 months ago.

11.4 match 9 stars 6.03 score 7 scripts

adeverse

ade4:Analysis of Ecological Data: Exploratory and Euclidean Methods in Environmental Sciences

Tools for multivariate data analysis. Several methods are provided for the analysis (i.e., ordination) of one-table (e.g., principal component analysis, correspondence analysis), two-table (e.g., coinertia analysis, redundancy analysis), three-table (e.g., RLQ analysis) and K-table (e.g., STATIS, multiple coinertia analysis). The philosophy of the package is described in Dray and Dufour (2007) <doi:10.18637/jss.v022.i04>.

Maintained by Aurélie Siberchicot. Last updated 15 days ago.

openblas cpp

4.5 match 39 stars 14.96 score 2.2k scripts 256 dependents

onofriandreapg

drcSeedGerm:Utilities for Data Analyses in Seed Germination/Emergence Assays

Utility functions to be used to analyse datasets obtained from seed germination/emergence assays. Fits several types of seed germination/emergence models, including those reported in Onofri et al. (2018) "Hydrothermal-time-to-event models for seed germination", European Journal of Agronomy, 101, 129-139 <doi:10.1016/j.eja.2018.08.011>. Contains several datasets for practicing.

Maintained by Andrea Onofri. Last updated 2 months ago.

nonlinear-regression seed-germination-assays time-to-event

15.5 match 5 stars 3.97 score 37 scripts

dpc10ster

RJafroc:Artificial Intelligence Systems and Observer Performance

Analyzing the performance of artificial intelligence (AI) systems/algorithms characterized by a 'search-and-report' strategy. Historically observer performance has dealt with measuring radiologists' performances in search tasks, e.g., searching for lesions in medical images and reporting them, but the implicit location information has been ignored. The implemented methods apply to analyzing the absolute and relative performances of AI systems, comparing AI performance to a group of human readers or optimizing the reporting threshold of an AI system. In addition to performing historical receiver operating receiver operating characteristic (ROC) analysis (localization information ignored), the software also performs free-response receiver operating characteristic (FROC) analysis, where lesion localization information is used. A book using the software has been published: Chakraborty DP: Observer Performance Methods for Diagnostic Imaging - Foundations, Modeling, and Applications with R-Based Examples, Taylor-Francis LLC; 2017: <https://www.routledge.com/Observer-Performance-Methods-for-Diagnostic-Imaging-Foundations-Modeling/Chakraborty/p/book/9781482214840>. Online updates to this book, which use the software, are at <https://dpc10ster.github.io/RJafrocQuickStart/>, <https://dpc10ster.github.io/RJafrocRocBook/> and at <https://dpc10ster.github.io/RJafrocFrocBook/>. Supported data collection paradigms are the ROC, FROC and the location ROC (LROC). ROC data consists of single ratings per images, where a rating is the perceived confidence level that the image is that of a diseased patient. An ROC curve is a plot of true positive fraction vs. false positive fraction. FROC data consists of a variable number (zero or more) of mark-rating pairs per image, where a mark is the location of a reported suspicious region and the rating is the confidence level that it is a real lesion. LROC data consists of a rating and a location of the most suspicious region, for every image. Four models of observer performance, and curve-fitting software, are implemented: the binormal model (BM), the contaminated binormal model (CBM), the correlated contaminated binormal model (CORCBM), and the radiological search model (RSM). Unlike the binormal model, CBM, CORCBM and RSM predict 'proper' ROC curves that do not inappropriately cross the chance diagonal. Additionally, RSM parameters are related to search performance (not measured in conventional ROC analysis) and classification performance. Search performance refers to finding lesions, i.e., true positives, while simultaneously not finding false positive locations. Classification performance measures the ability to distinguish between true and false positive locations. Knowing these separate performances allows principled optimization of reader or AI system performance. This package supersedes Windows JAFROC (jackknife alternative FROC) software V4.2.1, <https://github.com/dpc10ster/WindowsJafroc>. Package functions are organized as follows. Data file related function names are preceded by 'Df', curve fitting functions by 'Fit', included data sets by 'dataset', plotting functions by 'Plot', significance testing functions by 'St', sample size related functions by 'Ss', data simulation functions by 'Simulate' and utility functions by 'Util'. Implemented are figures of merit (FOMs) for quantifying performance and functions for visualizing empirical or fitted operating characteristics: e.g., ROC, FROC, alternative FROC (AFROC) and weighted AFROC (wAFROC) curves. For fully crossed study designs significance testing of reader-averaged FOM differences between modalities is implemented via either Dorfman-Berbaum-Metz or the Obuchowski-Rockette methods. Also implemented is single modality analysis, which allows comparison of performance of a group of radiologists to a specified value, or comparison of AI to a group of radiologists interpreting the same cases. Crossed-modality analysis is implemented wherein there are two crossed modality factors and the aim is to determined performance in each modality factor averaged over all levels of the second factor. Sample size estimation tools are provided for ROC and FROC studies; these use estimates of the relevant variances from a pilot study to predict required numbers of readers and cases in a pivotal study to achieve the desired power. Utility and data file manipulation functions allow data to be read in any of the currently used input formats, including Excel, and the results of the analysis can be viewed in text or Excel output files. The methods are illustrated with several included datasets from the author's collaborations. This update includes improvements to the code, some as a result of user-reported bugs and new feature requests, and others discovered during ongoing testing and code simplification.

Maintained by Dev Chakraborty. Last updated 5 months ago.

ai-optimization artificial-intelligence-algorithms computer-aided-diagnosis froc-analysis roc-analysis target-classification target-localization cpp

10.3 match 19 stars 5.69 score 65 scripts

philchalmers

SimDesign:Structure for Organizing Monte Carlo Simulation Designs

Provides tools to safely and efficiently organize and execute Monte Carlo simulation experiments in R. The package controls the structure and back-end of Monte Carlo simulation experiments by utilizing a generate-analyse-summarise workflow. The workflow safeguards against common simulation coding issues, such as automatically re-simulating non-convergent results, prevents inadvertently overwriting simulation files, catches error and warning messages during execution, implicitly supports parallel processing with high-quality random number generation, and provides tools for managing high-performance computing (HPC) array jobs submitted to schedulers such as SLURM. For a pedagogical introduction to the package see Sigal and Chalmers (2016) <doi:10.1080/10691898.2016.1246953>. For a more in-depth overview of the package and its design philosophy see Chalmers and Adkins (2020) <doi:10.20982/tqmp.16.4.p248>.

Maintained by Phil Chalmers. Last updated 2 days ago.

monte-carlo-simulation simulation simulation-framework

4.4 match 62 stars 13.38 score 253 scripts 46 dependents

rstudio

tensorflow:R Interface to 'TensorFlow'

Interface to 'TensorFlow' <https://www.tensorflow.org/>, an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more 'CPUs' or 'GPUs' in a desktop, server, or mobile device with a single 'API'. 'TensorFlow' was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.

Maintained by Tomasz Kalinowski. Last updated 19 days ago.

3.8 match 1.3k stars 15.35 score 3.2k scripts 74 dependents

alenxav

NAM:Nested Association Mapping

Designed for association studies in nested association mapping (NAM) panels, experimental and random panels. The method is described by Xavier et al. (2015) <doi:10.1093/bioinformatics/btv448>. It includes tools for genome-wide associations of multiple populations, marker quality control, population genetics analysis, genome-wide prediction, solving mixed models and finding variance components through likelihood and Bayesian methods.

Maintained by Alencar Xavier. Last updated 5 years ago.

cpp

10.0 match 2 stars 5.72 score 44 scripts 1 dependents

snoweye

pbdMPI:R Interface to MPI for HPC Clusters (Programming with Big Data Project)

A simplified, efficient, interface to MPI for HPC clusters. It is a derivation and rethinking of the Rmpi package. pbdMPI embraces the prevalent parallel programming style on HPC clusters. Beyond the interface, a collection of functions for global work with distributed data and resource-independent RNG reproducibility is included. It is based on S4 classes and methods.

Maintained by Wei-Chen Chen. Last updated 6 months ago.

openmpi

8.0 match 2 stars 7.11 score 179 scripts 3 dependents

bbolker

emdbook:Support Functions and Data for "Ecological Models and Data"

Auxiliary functions and data sets for "Ecological Models and Data", a book presenting maximum likelihood estimation and related topics for ecologists (ISBN 978-0-691-12522-0).

Maintained by Ben Bolker. Last updated 8 months ago.

6.9 match 4 stars 8.04 score 656 scripts 21 dependents

stats-uoa

s20x:Functions for University of Auckland Course STATS 201/208 Data Analysis

A set of functions used in teaching STATS 201/208 Data Analysis at the University of Auckland. The functions are designed to make parts of R more accessible to a large undergraduate population who are mostly not statistics majors.

Maintained by James Curran. Last updated 2 years ago.

8.3 match 3 stars 6.40 score 211 scripts 3 dependents

pik-piam

magpie4:MAgPIE outputs R package for MAgPIE version 4.x

Common output routines for extracting results from the MAgPIE framework (versions 4.x).

Maintained by Benjamin Leon Bodirsky. Last updated 2 days ago.

6.6 match 2 stars 7.88 score 254 scripts 9 dependents

yiluheihei

RevEcoR:Reverse Ecology Analysis on Microbiome

An implementation of the reverse ecology framework. Reverse ecology refers to the use of genomics to study ecology with no a priori assumptions about the organism(s) under consideration, linking organisms to their environment. It allows researchers to reconstruct the metabolic networks and study the ecology of poorly characterized microbial species from their genomic information, and has substantial potentials for microbial community ecological analysis.

Maintained by Yang Cao. Last updated 6 years ago.

8.9 match 6 stars 5.77 score 22 scripts 1 dependents

renozao

rngtools:Utility Functions for Working with Random Number Generators

Provides a set of functions for working with Random Number Generators (RNGs). In particular, a generic S4 framework is defined for getting/setting the current RNG, or RNG data that are embedded into objects for reproducibility. Notably, convenient default methods greatly facilitate the way current RNG settings can be changed.

Maintained by Renaud Gaujoux. Last updated 3 years ago.

5.8 match 6 stars 8.65 score 85 scripts 216 dependents

hdhshowalter

SeedMaker:Generate a Collection of Seeds from a Single Seed

A mechanism for easily generating and organizing a collection of seeds from a single seed, which may be subsequently used to ensure reproducibility in processes/pipelines that utilize multiple random components (e.g., trial simulation).

Maintained by Hollins Showalter. Last updated 2 months ago.

13.0 match 3.70 score

rstudio

keras3:R Interface to 'Keras'

Interface to 'Keras' <https://keras.io>, a high-level neural networks API. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both CPU and GPU devices.

Maintained by Tomasz Kalinowski. Last updated 3 days ago.

3.3 match 845 stars 13.60 score 264 scripts 2 dependents

onofriandreapg

drcte:Statistical Approaches for Time-to-Event Data in Agriculture

A specific and comprehensive framework for the analyses of time-to-event data in agriculture. Fit non-parametric and parametric time-to-event models. Compare time-to-event curves for different experimental groups. Plots and other displays. It is particularly tailored to the analyses of data from germination and emergence assays. The methods are described in Onofri et al. (2022) "A unified framework for the analysis of germination, emergence, and other time-to-event data in weed science", Weed Science, 70, 259-271 <doi:10.1017/wsc.2022.8>.

Maintained by Andrea Onofri. Last updated 1 years ago.

non-linear-regression seed-germination time-to-event

10.9 match 4.07 score 39 scripts 2 dependents

calvagone

campsis:Generic PK/PD Simulation Platform CAMPSIS

A generic, easy-to-use and intuitive pharmacokinetic/pharmacodynamic (PK/PD) simulation platform based on R packages 'rxode2' and 'mrgsolve'. CAMPSIS provides an abstraction layer over the underlying processes of writing a PK/PD model, assembling a custom dataset and running a simulation. CAMPSIS has a strong dependency to the R package 'campsismod', which allows to read/write a model from/to files and adapt it further on the fly in the R environment. Package 'campsis' allows the user to assemble a dataset in an intuitive manner. Once the user’s dataset is ready, the package is in charge of preparing the simulation, calling 'rxode2' or 'mrgsolve' (at the user's choice) and returning the results, for the given model, dataset and desired simulation settings.

Maintained by Nicolas Luyckx. Last updated 1 months ago.

5.7 match 8 stars 7.52 score 93 scripts

bioc

GDSArray:Representing GDS files as array-like objects

GDS files are widely used to represent genotyping or sequence data. The GDSArray package implements the `GDSArray` class to represent nodes in GDS files in a matrix-like representation that allows easy manipulation (e.g., subsetting, mathematical transformation) in _R_. The data remains on disk until needed, so that very large files can be processed.

Maintained by Xiuwen Zheng. Last updated 16 hours ago.

infrastructure datarepresentation sequencing genotypingarray

6.3 match 5 stars 6.78 score 8 scripts 2 dependents

alenxav

bWGR:Bayesian Whole-Genome Regression

Whole-genome regression methods on Bayesian framework fitted via EM or Gibbs sampling, single step (<doi:10.1534/g3.119.400728>), univariate and multivariate (<doi:10.1186/s12711-022-00730-w>, <doi:10.1093/genetics/iyae179>), with optional kernel term and sampling techniques (<doi:10.1186/s12859-017-1582-3>).

Maintained by Alencar Xavier. Last updated 3 months ago.

cpp

10.0 match 7 stars 4.24 score 16 scripts

yihui

knitr:A General-Purpose Package for Dynamic Report Generation in R

Provides a general-purpose tool for dynamic report generation in R using Literate Programming techniques.

Maintained by Yihui Xie. Last updated 3 days ago.

dynamic-documents knitr literate-programming rmarkdown sweave

1.8 match 2.4k stars 23.61 score 116k scripts 4.2k dependents

r-lib

withr:Run Code 'With' Temporarily Modified Global State

A set of functions to run code 'with' safely and temporarily modified global state. Many of these functions were originally a part of the 'devtools' package, this provides a simple package with limited dependencies to provide access to these functions.

Maintained by Lionel Henry. Last updated 22 days ago.

2.3 match 176 stars 17.92 score 1.2k scripts 12k dependents

bioc

chihaya:Save Delayed Operations to a HDF5 File

Saves the delayed operations of a DelayedArray to a HDF5 file. This enables efficient recovery of the DelayedArray's contents in other languages and analysis frameworks.

Maintained by Aaron Lun. Last updated 5 months ago.

dataimport datarepresentation zlib cpp

9.1 match 4.38 score 16 scripts

mrc-ide

dust:Iterate Multiple Realisations of Stochastic Models

An Engine for simulation of stochastic models. Includes support for running stochastic models in parallel, either with shared or varying parameters. Simulations are run efficiently in compiled code and can be run with a fraction of simulated states returned to R, allowing control over memory usage. Support is provided for building bootstrap particle filter for performing Sequential Monte Carlo (e.g., Gordon et al. 1993 <doi:10.1049/ip-f-2.1993.0015>). The core of the simulation engine is the 'xoshiro256**' algorithm (Blackman and Vigna <arXiv:1805.01407>), and the package is further described in FitzJohn et al 2021 <doi:10.12688/wellcomeopenres.16466.2>.

Maintained by Rich FitzJohn. Last updated 6 months ago.

cpp openmp

4.9 match 18 stars 7.84 score 60 scripts 3 dependents

rstudio

reticulate:Interface to 'Python'

Interface to 'Python' modules, classes, and functions. When calling into 'Python', R data types are automatically converted to their equivalent 'Python' types. When values are returned from 'Python' to R they are converted back to R types. Compatible with all versions of 'Python' >= 2.7.

Maintained by Tomasz Kalinowski. Last updated 1 days ago.

cpp

1.8 match 1.7k stars 21.05 score 18k scripts 432 dependents

pachadotdev

analogsea:Interface to 'DigitalOcean'

Provides a set of functions for interacting with the 'DigitalOcean' API <https://www.digitalocean.com/>, including creating images, destroying them, rebooting, getting details on regions, and available images.

Maintained by Mauricio Vargas. Last updated 2 years ago.

cloud-computing droplet ssh

5.0 match 159 stars 7.56 score 100 scripts 1 dependents

igraph

igraph:Network Analysis and Visualization

Routines for simple graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more.

Maintained by Kirill Müller. Last updated 17 hours ago.

complex-networks graph-algorithms graph-theory mathematics network-analysis network-graph fortran libxml2 glpk openblas cpp

1.8 match 583 stars 21.10 score 31k scripts 1.9k dependents

bioc

XDE:XDE: a Bayesian hierarchical model for cross-study analysis of differential gene expression

Multi-level model for cross-study detection of differential gene expression.

Maintained by Robert Scharpf. Last updated 5 months ago.

microarray differentialexpression cpp

8.6 match 4.20 score 10 scripts

r-forge

distrSim:Simulation Classes Based on Package 'distr'

S4-classes for setting up a coherent framework for simulation within the distr family of packages.

Maintained by Peter Ruckdeschel. Last updated 2 months ago.

8.1 match 4.16 score 7 scripts 3 dependents

ramnathv

htmlwidgets:HTML Widgets for R

A framework for creating HTML widgets that render in various contexts including the R console, 'R Markdown' documents, and 'Shiny' web applications.

Maintained by Carson Sievert. Last updated 1 years ago.

1.8 match 791 stars 19.05 score 7.4k scripts 3.1k dependents

openpharma

crmPack:Object-Oriented Implementation of CRM Designs

Implements a wide range of model-based dose escalation designs, ranging from classical and modern continual reassessment methods (CRMs) based on dose-limiting toxicity endpoints to dual-endpoint designs taking into account a biomarker/efficacy outcome. The focus is on Bayesian inference, making it very easy to setup a new design with its own JAGS code. However, it is also possible to implement 3+3 designs for comparison or models with non-Bayesian estimation. The whole package is written in a modular form in the S4 class system, making it very flexible for adaptation to new models, escalation or stopping rules. Further details are presented in Sabanes Bove et al. (2019) <doi:10.18637/jss.v089.i10>.

Maintained by Daniel Sabanes Bove. Last updated 2 months ago.

jags cpp

4.2 match 21 stars 7.79 score 208 scripts

alarm-redist

redist:Simulation Methods for Legislative Redistricting

Enables researchers to sample redistricting plans from a pre-specified target distribution using Sequential Monte Carlo and Markov Chain Monte Carlo algorithms. The package allows for the implementation of various constraints in the redistricting process such as geographic compactness and population parity requirements. Tools for analysis such as computation of various summary statistics and plotting functionality are also included. The package implements the SMC algorithm of McCartan and Imai (2023) <doi:10.1214/23-AOAS1763>, the enumeration algorithm of Fifield, Imai, Kawahara, and Kenny (2020) <doi:10.1080/2330443X.2020.1791773>, the Flip MCMC algorithm of Fifield, Higgins, Imai and Tarr (2020) <doi:10.1080/10618600.2020.1739532>, the Merge-split/Recombination algorithms of Carter et al. (2019) <arXiv:1911.01503> and DeFord et al. (2021) <doi:10.1162/99608f92.eb30390f>, and the Short-burst optimization algorithm of Cannon et al. (2020) <arXiv:2011.02288>.

Maintained by Christopher T. Kenny. Last updated 2 months ago.

geospatial gerrymandering redistricting sampling openblas cpp openmp

3.5 match 68 stars 9.17 score 259 scripts

r-forge

setRNG:Set (Normal) Random Number Generator and Seed

Provides utilities to help set and record the setting of the seed and the uniform and normal generators used when a random experiment is run. The utilities can be used in other functions that do random experiments to simplify recording and/or setting all the necessary information for reproducibility. See the vignette and reference manual for examples.

Maintained by Paul Gilbert. Last updated 2 months ago.

5.3 match 5.92 score 17 scripts 5 dependents

bioc

Rsubread:Mapping, quantification and variant analysis of sequencing data

Alignment, quantification and analysis of RNA sequencing data (including both bulk RNA-seq and scRNA-seq) and DNA sequenicng data (including ATAC-seq, ChIP-seq, WGS, WES etc). Includes functionality for read mapping, read counting, SNP calling, structural variant detection and gene fusion discovery. Can be applied to all major sequencing techologies and to both short and long sequence reads.

Maintained by Wei Shi. Last updated 2 days ago.

sequencing alignment sequencematching rnaseq chipseq singlecell geneexpression generegulation genetics immunooncology snp geneticvariability preprocessing qualitycontrol genomeannotation genefusiondetection indeldetection variantannotation variantdetection multiplesequencealignment zlib

3.4 match 9.24 score 892 scripts 10 dependents

wjbraun

DAAG:Data Analysis and Graphics Data and Functions

Functions and data sets used in examples and exercises in the text Maindonald, J.H. and Braun, W.J. (2003, 2007, 2010) "Data Analysis and Graphics Using R", and in an upcoming Maindonald, Braun, and Andrews text that builds on this earlier text.

Maintained by W. John Braun. Last updated 11 months ago.

3.8 match 8.25 score 1.2k scripts 1 dependents

comeetie

greed:Clustering and Model Selection with the Integrated Classification Likelihood

An ensemble of algorithms that enable the clustering of networks and data matrices (such as counts, categorical or continuous) with different type of generative models. Model selection and clustering is performed in combination by optimizing the Integrated Classification Likelihood (which is equivalent to minimizing the description length). Several models are available such as: Stochastic Block Model, degree corrected Stochastic Block Model, Mixtures of Multinomial, Latent Block Model. The optimization is performed thanks to a combination of greedy local search and a genetic algorithm (see <arXiv:2002:11577> for more details).

Maintained by Etienne Côme. Last updated 2 years ago.

openblas cpp openmp

5.2 match 14 stars 5.94 score 41 scripts

ealedo

orthGS:Orthology vs Paralogy Relationships among Glutamine Synthetase from Plants

Tools to analyze and infer orthology and paralogy relationships between glutamine synthetase proteins in seed plants.

Maintained by Elena Aledo. Last updated 4 months ago.

7.9 match 3.80 score 21 scripts

tomasfryda

h2o:R Interface for the 'H2O' Scalable Machine Learning Platform

R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).

Maintained by Tomas Fryda. Last updated 1 years ago.

3.6 match 3 stars 8.20 score 7.8k scripts 11 dependents

mlverse

torch:Tensors and Neural Networks with 'GPU' Acceleration

Provides functionality to define and train neural networks similar to 'PyTorch' by Paszke et al (2019) <doi:10.48550/arXiv.1912.01703> but written entirely in R using the 'libtorch' library. Also supports low-level tensor operations and 'GPU' acceleration.

Maintained by Daniel Falbel. Last updated 9 days ago.

autograd deep-learning torch cpp

1.8 match 520 stars 16.52 score 1.4k scripts 38 dependents

koheiw

seededlda:Seeded Sequential LDA for Topic Modeling

Seeded Sequential LDA can classify sentences of texts into pre-define topics with a small number of seed words (Watanabe & Baturo, 2023) <doi:10.1177/08944393231178605>. Implements Seeded LDA (Lu et al., 2010) <doi:10.1109/ICDMW.2011.125> and Sequential LDA (Du et al., 2012) <doi:10.1007/s10115-011-0425-1> with the distributed LDA algorithm (Newman, et al., 2009) for parallel computing.

Maintained by Kohei Watanabe. Last updated 2 months ago.

semi-supervised-learning text-classification onetbb cpp

3.9 match 75 stars 7.38 score 177 scripts 1 dependents

r-forge

Sleuth3:Data Sets from Ramsey and Schafer's "Statistical Sleuth (3rd Ed)"

Data sets from Ramsey, F.L. and Schafer, D.W. (2013), "The Statistical Sleuth: A Course in Methods of Data Analysis (3rd ed)", Cengage Learning.

Maintained by Berwin A Turlach. Last updated 1 years ago.

4.5 match 6.38 score 522 scripts

hturner

BradleyTerry2:Bradley-Terry Models

Specify and fit the Bradley-Terry model, including structured versions in which the parameters are related to explanatory variables through a linear predictor and versions with contest-specific effects, such as a home advantage.

Maintained by Heather Turner. Last updated 6 years ago.

bradley-terry-models paired-comparisons statistical-models

3.6 match 20 stars 7.97 score 172 scripts 1 dependents

hanjunwei-lab

ssMutPA:Single-Sample Mutation-Based Pathway Analysis

A systematic bioinformatics tool to perform single-sample mutation-based pathway analysis by integrating somatic mutation data with the Protein-Protein Interaction (PPI) network. In this method, we use local and global weighted strategies to evaluate the effects of network genes from mutations according to the network topology and then calculate the mutation-based pathway enrichment score (ssMutPES) to reflect the accumulated effect of mutations of each pathway. Subsequently, the ssMutPES profiles are used for unsupervised spectral clustering to identify cancer subtypes.

Maintained by Junwei Han. Last updated 5 months ago.

7.0 match 4.00 score 9 scripts

bioc

PhyloProfile:PhyloProfile

PhyloProfile is a tool for exploring complex phylogenetic profiles. Phylogenetic profiles, presence/absence patterns of genes over a set of species, are commonly used to trace the functional and evolutionary history of genes across species and time. With PhyloProfile we can enrich regular phylogenetic profiles with further data like sequence/structure similarity, to make phylogenetic profiling more meaningful. Besides the interactive visualisation powered by R-Shiny, the package offers a set of further analysis features to gain insights like the gene age estimation or core gene identification.

Maintained by Vinh Tran. Last updated 9 days ago.

software visualization datarepresentation multiplecomparison functionalprediction dimensionreduction bioinformatics heatmap interactive-visualizations orthologs phylogenetic-profile shiny

3.5 match 33 stars 7.77 score 10 scripts

thothorn

HSAUR3:A Handbook of Statistical Analyses Using R (3rd Edition)

Functions, data sets, analyses and examples from the third edition of the book ''A Handbook of Statistical Analyses Using R'' (Torsten Hothorn and Brian S. Everitt, Chapman & Hall/CRC, 2014). The first chapter of the book, which is entitled ''An Introduction to R'', is completely included in this package, for all other chapters, a vignette containing all data analyses is available. In addition, Sweave source code for slides of selected chapters is included in this package (see HSAUR3/inst/slides). The publishers web page is '<https://www.routledge.com/A-Handbook-of-Statistical-Analyses-using-R/Hothorn-Everitt/p/book/9781482204582>'.

Maintained by Torsten Hothorn. Last updated 7 months ago.

4.0 match 6 stars 6.72 score 120 scripts 2 dependents

maximilianaxer

quaxnat:Estimation of Natural Regeneration Potential

Functions for estimating the potential dispersal of tree species using regeneration densities and dispersal distances to nearest seed trees. A quantile regression is implemented to determine the dispersal potential. Spatial prediction can be used to identify natural regeneration potential for forest restoration as described in Axer et al (2021) <doi:10.1016/j.foreco.2020.118802>.

Maintained by Maximilian Axer. Last updated 5 months ago.

7.3 match 1 stars 3.54 score 2 scripts

r-forge

Sleuth2:Data Sets from Ramsey and Schafer's "Statistical Sleuth (2nd Ed)"

Data sets from Ramsey, F.L. and Schafer, D.W. (2002), "The Statistical Sleuth: A Course in Methods of Data Analysis (2nd ed)", Duxbury.

Maintained by Berwin A Turlach. Last updated 1 years ago.

4.5 match 5.70 score 191 scripts

momx

Momocs:Morphometrics using R

The goal of 'Momocs' is to provide a complete, convenient, reproducible and open-source toolkit for 2D morphometrics. It includes most common 2D morphometrics approaches on outlines, open outlines, configurations of landmarks, traditional morphometrics, and facilities for data preparation, manipulation and visualization with a consistent grammar throughout. It allows reproducible, complex morphometrics analyses and other morphometrics approaches should be easy to plug in, or develop from, on top of this canvas.

Maintained by Vincent Bonhomme. Last updated 1 years ago.

morphometrics

3.4 match 51 stars 7.42 score 346 scripts

jeffreyracine

np:Nonparametric Kernel Smoothing Methods for Mixed Data Types

Nonparametric (and semiparametric) kernel methods that seamlessly handle a mix of continuous, unordered, and ordered factor data types. We would like to gratefully acknowledge support from the Natural Sciences and Engineering Research Council of Canada (NSERC, <https://www.nserc-crsng.gc.ca/>), the Social Sciences and Humanities Research Council of Canada (SSHRC, <https://www.sshrc-crsh.gc.ca/>), and the Shared Hierarchical Academic Research Computing Network (SHARCNET, <https://sharcnet.ca/>). We would also like to acknowledge the contributions of the GNU GSL authors. In particular, we adapt the GNU GSL B-spline routine gsl_bspline.c adding automated support for quantile knots (in addition to uniform knots), providing missing functionality for derivatives, and for extending the splines beyond their endpoints.

Maintained by Jeffrey S. Racine. Last updated 1 months ago.

2.0 match 49 stars 12.64 score 672 scripts 44 dependents

bertcarnell

lhs:Latin Hypercube Samples

Provides a number of methods for creating and augmenting Latin Hypercube Samples and Orthogonal Array Latin Hypercube Samples.

Maintained by Rob Carnell. Last updated 9 months ago.

latin-hypercube latin-hypercube-sample latin-hypercube-sampling lhs orthogonal-arrays cpp

1.8 match 44 stars 13.95 score 1.5k scripts 108 dependents

stan-dev

rstan:R Interface to Stan

User-facing R functions are provided to parse, compile, test, estimate, and analyze Stan models by accessing the header-only Stan library provided by the 'StanHeaders' package. The Stan project develops a probabilistic programming language that implements full Bayesian statistical inference via Markov Chain Monte Carlo, rough Bayesian inference via 'variational' approximation, and (optionally penalized) maximum likelihood estimation via optimization. In all three cases, automatic differentiation is used to quickly and accurately evaluate gradients without burdening the user with the need to derive the partial derivatives.

Maintained by Ben Goodrich. Last updated 5 hours ago.

bayesian-data-analysis bayesian-inference bayesian-statistics mcmc stan cpp

1.3 match 1.1k stars 18.76 score 14k scripts 280 dependents

marie-perrotdockes

MultiVarSel:Variable Selection in a Multivariate Linear Model

It performs variable selection in a multivariate linear model by estimating the covariance matrix of the residuals then use it to remove the dependence that may exist among the responses and eventually performs variable selection by using the Lasso criterion. The method is described in the paper Perrot-Dockès et al. (2017) <arXiv:1704.00076>.

Maintained by Marie Perrot-Dockès. Last updated 6 years ago.

12.5 match 2.00 score 8 scripts

bioc

rfaRm:An R interface to the Rfam database

rfaRm provides a client interface to the Rfam database of RNA families. Data that can be retrieved include RNA families, secondary structure images, covariance models, sequences within each family, alignments leading to the identification of a family and secondary structures in the dot-bracket format.

Maintained by Lara Selles Vidal. Last updated 5 months ago.

functionalgenomics dataimport thirdpartyclient visualization multiplesequencealignment

7.5 match 3.30 score 1 scripts

weecology

LDATS:Latent Dirichlet Allocation Coupled with Time Series Analyses

Combines Latent Dirichlet Allocation (LDA) and Bayesian multinomial time series methods in a two-stage analysis to quantify dynamics in high-dimensional temporal data. LDA decomposes multivariate data into lower-dimension latent groupings, whose relative proportions are modeled using generalized Bayesian time series models that include abrupt changepoints and smooth dynamics. The methods are described in Blei et al. (2003) <doi:10.1162/jmlr.2003.3.4-5.993>, Western and Kleykamp (2004) <doi:10.1093/pan/mph023>, Venables and Ripley (2002, ISBN-13:978-0387954578), and Christensen et al. (2018) <doi:10.1002/ecy.2373>.

Maintained by Juniper L. Simonis. Last updated 5 years ago.

changepoint lda parallel-tempering portal softmax

3.5 match 25 stars 6.93 score 45 scripts

thothorn

HSAUR:A Handbook of Statistical Analyses Using R (1st Edition)

Functions, data sets, analyses and examples from the book ''A Handbook of Statistical Analyses Using R'' (Brian S. Everitt and Torsten Hothorn, Chapman & Hall/CRC, 2006). The first chapter of the book, which is entitled ''An Introduction to R'', is completely included in this package, for all other chapters, a vignette containing all data analyses is available.

Maintained by Torsten Hothorn. Last updated 3 years ago.

4.0 match 6.07 score 253 scripts 5 dependents

projectmosaic

mosaic:Project MOSAIC Statistics and Mathematics Teaching Utilities

Data sets and utilities from Project MOSAIC (<http://www.mosaic-web.org>) used to teach mathematics, statistics, computation and modeling. Funded by the NSF, Project MOSAIC is a community of educators working to tie together aspects of quantitative work that students in science, technology, engineering and mathematics will need in their professional lives, but which are usually taught in isolation, if at all.

Maintained by Randall Pruim. Last updated 1 years ago.

1.8 match 93 stars 13.32 score 7.2k scripts 7 dependents

skranz

RTutor:Interactive R problem sets with automatic testing of solutions and automatic hints

Interactive R problem sets with automatic testing of solutions and automatic hints

Maintained by Sebastian Kranz. Last updated 1 years ago.

economics learn-to-code problem-set rstudio rtutor shiny teaching

4.0 match 205 stars 5.83 score 111 scripts 1 dependents

peterkdunn

GLMsData:Generalized Linear Model Data Sets

Data sets from the book Generalized Linear Models with Examples in R by Dunn and Smyth.

Maintained by Peter K. Dunn. Last updated 3 years ago.

9.0 match 2.61 score 220 scripts

remkoduursma

lgrdata:Example Datasets for a Learning Guide to R

A largish collection of example datasets, including several classics. Many of these datasets are well suited for regression, classification, and visualization.

Maintained by Remko Duursma. Last updated 6 years ago.

7.5 match 3.11 score 26 scripts

celevitz

touRnamentofchampions:Tournament of Champions Data

Several datasets which describe the challenges and results of competitions in Tournament of Champions. This data is useful for practicing data wrangling, graphing, and analyzing how each season of Tournament of Champions played out.

Maintained by Levitz Carly. Last updated 10 days ago.

6.0 match 3.70 score

hankstevens

primer:Functions and Data for the Book, a Primer of Ecology with R

Functions are primarily functions for systems of ordinary differential equations, difference equations, and eigenanalysis and projection of demographic matrices; data are for examples.

Maintained by Hank Stevens. Last updated 4 years ago.

3.8 match 14 stars 5.92 score 118 scripts

thothorn

HSAUR2:A Handbook of Statistical Analyses Using R (2nd Edition)

Functions, data sets, analyses and examples from the second edition of the book ''A Handbook of Statistical Analyses Using R'' (Brian S. Everitt and Torsten Hothorn, Chapman & Hall/CRC, 2008). The first chapter of the book, which is entitled ''An Introduction to R'', is completely included in this package, for all other chapters, a vignette containing all data analyses is available. In addition, the package contains Sweave code for producing slides for selected chapters (see HSAUR2/inst/slides).

Maintained by Torsten Hothorn. Last updated 2 years ago.

4.0 match 5.51 score 181 scripts 1 dependents

bioc

BiocParallel:Bioconductor facilities for parallel evaluation

This package provides modified versions and novel implementation of functions for parallel evaluation, tailored to use with Bioconductor objects.

Maintained by Martin Morgan. Last updated 29 days ago.

infrastructure bioconductor-package core-package u24ca289073 cpp

1.3 match 67 stars 17.40 score 7.3k scripts 1.1k dependents

martynplummer

rjags:Bayesian Graphical Models using MCMC

Interface to the JAGS MCMC library.

Maintained by Martyn Plummer. Last updated 7 months ago.

jags cpp

2.3 match 7 stars 9.60 score 4.0k scripts 165 dependents

gabrielshimizu

AgroR:Experimental Statistics and Graphics for Agricultural Sciences

Performs the analysis of completely randomized experimental designs (CRD), randomized blocks (RBD) and Latin square (LSD), experiments in double and triple factorial scheme (in CRD and RBD), experiments in subdivided plot scheme (in CRD and RBD), subdivided and joint analysis of experiments in CRD and RBD, linear regression analysis, test for two samples. The package performs analysis of variance, ANOVA assumptions and multiple comparison test of means or regression, according to Pimentel-Gomes (2009, ISBN: 978-85-7133-055-9), nonparametric test (Conover, 1999, ISBN: 0471160687), test for two samples, joint analysis of experiments according to Ferreira (2018, ISBN: 978-85-7269-566-4) and generalized linear model (glm) for binomial and Poisson family in CRD and RBD (Carvalho, FJ (2019), <doi:10.14393/ufu.te.2019.1244>). It can also be used to obtain descriptive measures and graphics, in addition to correlations and creative graphics used in agricultural sciences (Agronomy, Zootechnics, Food Science and related areas).

Maintained by Gabriel Danilo Shimizu. Last updated 11 months ago.

6.9 match 1 stars 3.11 score 173 scripts

bioc

GeDi:Defining and visualizing the distances between different genesets

The package provides different distances measurements to calculate the difference between genesets. Based on these scores the genesets are clustered and visualized as graph. This is all presented in an interactive Shiny application for easy usage.

Maintained by Annekathrin Nedwed. Last updated 5 months ago.

gui genesetenrichment software transcription rnaseq visualization clustering pathways reportwriting go kegg reactome shinyapps

3.9 match 1 stars 5.52 score 22 scripts

stan-dev

loo:Efficient Leave-One-Out Cross-Validation and WAIC for Bayesian Models

Efficient approximate leave-one-out cross-validation (LOO) for Bayesian models fit using Markov chain Monte Carlo, as described in Vehtari, Gelman, and Gabry (2017) <doi:10.1007/s11222-016-9696-4>. The approximation uses Pareto smoothed importance sampling (PSIS), a new procedure for regularizing importance weights. As a byproduct of the calculations, we also obtain approximate standard errors for estimated predictive errors and for the comparison of predictive errors between models. The package also provides methods for using stacking and other model weighting techniques to average Bayesian predictive distributions.

Maintained by Jonah Gabry. Last updated 6 days ago.

bayes bayesian bayesian-data-analysis bayesian-inference bayesian-methods bayesian-statistics cross-validation information-criterion model-comparison stan

1.2 match 152 stars 17.30 score 2.6k scripts 297 dependents

mlr-org

mlr3misc:Helper Functions for 'mlr3'

Frequently used helper functions and assertions used in 'mlr3' and its companion packages. Comes with helper functions for functional programming, for printing, to work with 'data.table', as well as some generally useful 'R6' classes. This package also supersedes the package 'BBmisc'.

Maintained by Marc Becker. Last updated 4 months ago.

machine-learning miscellaneous mlr3

2.0 match 12 stars 10.28 score 302 scripts 42 dependents

murrayefford

secr:Spatially Explicit Capture-Recapture

Functions to estimate the density and size of a spatially distributed animal population sampled with an array of passive detectors, such as traps, or by searching polygons or transects. Models incorporating distance-dependent detection are fitted by maximizing the likelihood. Tools are included for data manipulation and model selection.

Maintained by Murray Efford. Last updated 2 days ago.

cpp

2.0 match 3 stars 10.13 score 410 scripts 5 dependents

hdvinod

generalCorr:Generalized Correlations, Causal Paths and Portfolio Selection

Function gmcmtx0() computes a more reliable (general) correlation matrix. Since causal paths from data are important for all sciences, the package provides many sophisticated functions. causeSummBlk() and causeSum2Blk() give easy-to-interpret causal paths. Let Z denote control variables and compare two flipped kernel regressions: X=f(Y, Z)+e1 and Y=g(X, Z)+e2. Our criterion Cr1 says that if |e1*Y|>|e2*X| then variation in X is more "exogenous or independent" than in Y, and the causal path is X to Y. Criterion Cr2 requires |e2|<|e1|. These inequalities between many absolute values are quantified by four orders of stochastic dominance. Our third criterion Cr3, for the causal path X to Y, requires new generalized partial correlations to satisfy |r*(x|y,z)|< |r*(y|x,z)|. The function parcorVec() reports generalized partials between the first variable and all others. The package provides several R functions including get0outliers() for outlier detection, bigfp() for numerical integration by the trapezoidal rule, stochdom2() for stochastic dominance, pillar3D() for 3D charts, canonRho() for generalized canonical correlations, depMeas() measures nonlinear dependence, and causeSummary(mtx) reports summary of causal paths among matrix columns. Portfolio selection: decileVote(), momentVote(), dif4mtx(), exactSdMtx() can rank several stocks. Functions whose names begin with 'boot' provide bootstrap statistical inference, including a new bootGcRsq() test for "Granger-causality" allowing nonlinear relations. A new tool for evaluation of out-of-sample portfolio performance is outOFsamp(). Panel data implementation is now included. See eight vignettes of the package for theory, examples, and usage tips. See Vinod (2019) \doi{10.1080/03610918.2015.1122048}.

Maintained by H. D. Vinod. Last updated 1 years ago.

4.5 match 2 stars 4.48 score 63 scripts 1 dependents

f-rousset

spaMM:Mixed-Effect Models, with or without Spatial Random Effects

Inference based on models with or without spatially-correlated random effects, multivariate responses, or non-Gaussian random effects (e.g., Beta). Variation in residual variance (heteroscedasticity) can itself be represented by a mixed-effect model. Both classical geostatistical models (Rousset and Ferdy 2014 <doi:10.1111/ecog.00566>), and Markov random field models on irregular grids (as considered in the 'INLA' package, <https://www.r-inla.org>), can be fitted, with distinct computational procedures exploiting the sparse matrix representations for the latter case and other autoregressive models. Laplace approximations are used for likelihood or restricted likelihood. Penalized quasi-likelihood and other variants discussed in the h-likelihood literature (Lee and Nelder 2001 <doi:10.1093/biomet/88.4.987>) are also implemented.

Maintained by François Rousset. Last updated 9 months ago.

gsl cpp openmp

4.0 match 4.94 score 208 scripts 5 dependents

laresbernardo

lares:Analytics & Machine Learning Sidekick

Auxiliary package for better/faster analytics, visualization, data mining, and machine learning tasks. With a wide variety of family functions, like Machine Learning, Data Wrangling, Marketing Mix Modeling (Robyn), Exploratory, API, and Scrapper, it helps the analyst or data scientist to get quick and robust results, without the need of repetitive coding or advanced R programming skills.

Maintained by Bernardo Lares. Last updated 27 days ago.

analytics api automation automl data-science descriptive-statistics h2o machine-learning marketing mmm predictive-modeling puzzle rlanguage robyn visualization

2.0 match 233 stars 9.84 score 185 scripts 1 dependents

koheiw

wordmap:Feature Extraction and Document Classification with Noisy Labels

Extract features and classify documents with noisy labels given by document-meta data or keyword matching Watanabe & Zhou (2020) <doi:10.1177/0894439320907027>.

Maintained by Kohei Watanabe. Last updated 2 months ago.

4.0 match 2 stars 4.86 score 1 scripts

ropensci

drake:A Pipeline Toolkit for Reproducible Computation at Scale

A general-purpose computational engine for data analysis, drake rebuilds intermediate data objects when their dependencies change, and it skips work when the results are already up to date. Not every execution starts from scratch, there is native support for parallel and distributed computing, and completed projects have tangible evidence that they are reproducible. Extensive documentation, from beginner-friendly tutorials to practical examples and more, is available at the reference website <https://docs.ropensci.org/drake/> and the online manual <https://books.ropensci.org/drake/>.

Maintained by William Michael Landau. Last updated 4 months ago.

data-science drake high-performance-computing makefile peer-reviewed pipeline reproducibility reproducible-research ropensci workflow

1.7 match 1.3k stars 11.49 score 1.7k scripts 1 dependents

mandymejia

BayesfMRI:Spatial Bayesian Methods for Task Functional MRI Studies

Performs a spatial Bayesian general linear model (GLM) for task functional magnetic resonance imaging (fMRI) data on the cortical surface. Additional models include group analysis and inference to detect thresholded areas of activation. Includes direct support for the 'CIFTI' neuroimaging file format. For more information see A. F. Mejia, Y. R. Yue, D. Bolin, F. Lindgren, M. A. Lindquist (2020) <doi:10.1080/01621459.2019.1611582> and D. Spencer, Y. R. Yue, D. Bolin, S. Ryan, A. F. Mejia (2022) <doi:10.1016/j.neuroimage.2022.118908>.

Maintained by Amanda Mejia. Last updated 11 days ago.

cpp

3.3 match 26 stars 5.77 score 19 scripts

emf-creaf

medfateland:Mediterranean Landscape Simulation

Simulate forest hydrology, forest function and dynamics over landscapes [De Caceres et al. (2015) <doi:10.1016/j.agrformet.2015.06.012>]. Parallelization is allowed in several simulation functions and simulations may be conducted including spatial processes such as lateral water transfer and seed dispersal.

Maintained by Miquel De Cáceres. Last updated 28 days ago.

cpp

3.5 match 5 stars 5.41 score 41 scripts

braverock

PortfolioAnalytics:Portfolio Analysis, Including Numerical Methods for Optimization of Portfolios

Portfolio optimization and analysis routines and graphics.

Maintained by Brian G. Peterson. Last updated 4 months ago.

1.6 match 81 stars 11.49 score 626 scripts 2 dependents

jhmaindonald

hwde:Models and Tests for Departure from Hardy-Weinberg Equilibrium and Independence Between Loci

Fits models for genotypic disequilibria, as described in Huttley and Wilson (2000) <doi:10.1093/genetics/156.4.2127>, Weir (1996) and Weir and Wilson (1986). Contrast terms are available that account for first order interactions between loci. Also implements, for a single locus in a single population, a conditional exact test for Hardy-Weinberg equilibrium.

Maintained by John Maindonald. Last updated 2 years ago.

4.9 match 3.74 score 11 scripts

agronomiar

seedreg:Regression Analysis for Seed Germination as a Function of Temperature

Regression analysis using common models in seed temperature studies, such as the Gaussian model (Martins, JF, Barroso, AAM, & Alves, PLCA (2017) <doi:10.1590/s0100-83582017350100039>), quadratic (Nunes, AL, Sossmeier, S, Gotz, AP, & Bispo, NB (2018) <doi: 10.17265/2161-6264/2018.06.002>) and others with potential for use, such as those implemented in the 'drc' package (Ritz, C, Baty, F, Streibig, JC, & Gerhard, D (2015). <doi:10.1371/journal.pone.0146021>), in the estimation of the ideal and cardinal temperature for the occurrence of plant seed germination. The functions return graphs with the equations automatically.

Maintained by Gabriel Danilo Shimizu. Last updated 3 years ago.

8.9 match 2.04 score 22 scripts

mikemeredith

wiqid:Quick and Dirty Estimates for Wildlife Populations

Provides simple, fast functions for maximum likelihood and Bayesian estimates of wildlife population parameters, suitable for use with simulated data or bootstraps. Early versions were indeed quick and dirty, but optional error-checking routines and meaningful error messages have been added. Includes single and multi-season occupancy, closed capture population estimation, survival, species richness and distance measures.

Maintained by Ngumbang Juat. Last updated 2 years ago.

3.8 match 2 stars 4.84 score 115 scripts 1 dependents

predictiveecology

reproducible:Enhance Reproducibility of R Code

A collection of high-level, machine- and OS-independent tools for making reproducible and reusable content in R. The two workhorse functions are Cache() and prepInputs(). Cache() allows for nested caching, is robust to environments and objects with environments (like functions), and deals with some classes of file-backed R objects e.g., from terra and raster packages. Both functions have been developed to be foundational components of data retrieval and processing in continuous workflow situations. In both functions, efforts are made to make the first and subsequent calls of functions have the same result, but faster at subsequent times by way of checksums and digesting. Several features are still under development, including cloud storage of cached objects allowing for sharing between users. Several advanced options are available, see ?reproducibleOptions().

Maintained by Eliot J B McIntire. Last updated 1 months ago.

reproducibility reproducible-research

1.7 match 41 stars 10.52 score 122 scripts 15 dependents

ghtaranto

scapesClassification:User-Defined Classification of Raster Surfaces

Series of algorithms to translate users' mental models of seascapes, landscapes and, more generally, of geographic features into computer representations (classifications). Spaces and geographic objects are classified with user-defined rules taking into account spatial data as well as spatial relationships among different classes and objects.

Maintained by Gerald H. Taranto. Last updated 3 years ago.

classification-algorithm object-detection raster spatial

4.3 match 1 stars 4.22 score 33 scripts

nflverse

nflfastR:Functions to Efficiently Access NFL Play by Play Data

A set of functions to access National Football League play-by-play data from <https://www.nfl.com/>.

Maintained by Ben Baldwin. Last updated 2 months ago.

american-football football-data nfl nflstats nflverse sports-analytics

1.7 match 442 stars 10.40 score 596 scripts 3 dependents

dpmcsuss

iGraphMatch:Tools for Graph Matching

Versatile tools and data for graph matching analysis with various forms of prior information that supports working with 'igraph' objects, matrix objects, or lists of either.

Maintained by Daniel Sussman. Last updated 10 months ago.

graph-algorithms graph-matching cpp

3.1 match 9 stars 5.65 score 9 scripts

florianhartig

DHARMa:Residual Diagnostics for Hierarchical (Multi-Level / Mixed) Regression Models

The 'DHARMa' package uses a simulation-based approach to create readily interpretable scaled (quantile) residuals for fitted (generalized) linear mixed models. Currently supported are linear and generalized linear (mixed) models from 'lme4' (classes 'lmerMod', 'glmerMod'), 'glmmTMB', 'GLMMadaptive', and 'spaMM'; phylogenetic linear models from 'phylolm' (classes 'phylolm' and 'phyloglm'); generalized additive models ('gam' from 'mgcv'); 'glm' (including 'negbin' from 'MASS', but excluding quasi-distributions) and 'lm' model classes. Moreover, externally created simulations, e.g. posterior predictive simulations from Bayesian software such as 'JAGS', 'STAN', or 'BUGS' can be processed as well. The resulting residuals are standardized to values between 0 and 1 and can be interpreted as intuitively as residuals from a linear regression. The package also provides a number of plot and test functions for typical model misspecification problems, such as over/underdispersion, zero-inflation, and residual spatial, phylogenetic and temporal autocorrelation.

Maintained by Florian Hartig. Last updated 15 days ago.

glmm regression regression-diagnostics residual

1.2 match 226 stars 14.74 score 2.8k scripts 10 dependents

bioc

GloScope:Population-level Representation on scRNA-Seq data

This package aims at representing and summarizing the entire single-cell profile of a sample. It allows researchers to perform important bioinformatic analyses at the sample-level such as visualization and quality control. The main functions Estimate sample distribution and calculate statistical divergence among samples, and visualize the distance matrix through MDS plots.

Maintained by William Torous. Last updated 5 months ago.

datarepresentation qualitycontrol rnaseq sequencing software singlecell

2.8 match 3 stars 6.05 score 84 scripts

mclements

microsimulation:Discrete Event Simulation in R and C++, with Tools for Cost-Effectiveness Analysis

Discrete event simulation using both R and C++ (Karlsson et al 2016; <doi:10.1109/eScience.2016.7870915>). The C++ code is adapted from the SSIM library <https://www.inf.usi.ch/carzaniga/ssim/>, allowing for event-oriented simulation. The code includes a SummaryReport class for reporting events and costs by age and other covariates. The C++ code is available as a static library for linking to other packages. A priority queue implementation is given in C++ together with an S3 closure and a reference class implementation. Finally, some tools are provided for cost-effectiveness analysis.

Maintained by Mark Clements. Last updated 7 months ago.

cpp discrete-event-simulation health-economics openblas cpp

3.9 match 37 stars 4.35 score 20 scripts

mechantrouquin

landsepi:Landscape Epidemiology and Evolution

A stochastic, spatially-explicit, demo-genetic model simulating the spread and evolution of a plant pathogen in a heterogeneous landscape to assess resistance deployment strategies. It is based on a spatial geometry for describing the landscape and allocation of different cultivars, a dispersal kernel for the dissemination of the pathogen, and a SEIR ('Susceptible-Exposed-Infectious-Removed’) structure with a discrete time step. It provides a useful tool to assess the performance of a wide range of deployment options with respect to their epidemiological, evolutionary and economic outcomes. Loup Rimbaud, Julien Papaïx, Jean-François Rey, Luke G Barrett, Peter H Thrall (2018) <doi:10.1371/journal.pcbi.1006067>.

Maintained by Jean-François Rey. Last updated 6 months ago.

gsl cpp

4.6 match 3.58 score 18 scripts

arturstat

TPmsm:Estimation of Transition Probabilities in Multistate Models

Estimation of transition probabilities for the illness-death model and or the three-state progressive model.

Maintained by Artur Araujo. Last updated 1 years ago.

illness-death-model kaplan-meier monte-carlo-simulation multi-state-models openmp-parallelization survival-analysis transition-probabilities openblas openmp

3.7 match 1 stars 4.52 score 22 scripts 1 dependents

cran

crosstalkr:Analysis of Graph-Structured Data with a Focus on Protein-Protein Interaction Networks

Provides a general toolkit for drug target identification. We include functionality to reduce large graphs to subgraphs and prioritize nodes. In addition to being optimized for use with generic graphs, we also provides support to analyze protein-protein interactions networks from online repositories. For more details on core method, refer to Weaver et al. (2021) <https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008755>.

Maintained by Davis Weaver. Last updated 10 months ago.

cpp

6.1 match 2.70 score

cran

seedCCA:Seeded Canonical Correlation Analysis

Functions for dimension reduction through the seeded canonical correlation analysis are provided. A classical canonical correlation analysis (CCA) is one of useful statistical methods in multivariate data analysis, but it is limited in use due to the matrix inversion for large p small n data. To overcome this, a seeded CCA has been proposed in Im, Gang and Yoo (2015) \doi{10.1002/cem.2691}. The seeded CCA is a two-step procedure. The sets of variables are initially reduced by successively projecting cov(X,Y) or cov(Y,X) onto cov(X) and cov(Y), respectively, without loss of information on canonical correlation analysis, following Cook, Li and Chiaromonte (2007) \doi{10.1093/biomet/asm038} and Lee and Yoo (2014) \doi{10.1111/anzs.12057}. Then, the canonical correlation is finalized with the initially-reduced two sets of variables.

Maintained by Jae Keun Yoo. Last updated 3 years ago.

16.3 match 1.00 score

jpmonteagudo28

despair:Motivational Quotes and Shakespearean Bard–bits for Personal Projects

Generate motivational quotes and Shakespearean word combinations (bard–bits) that a user can consider for their personal projects. Each of the package functions takes two arguments, cat which default to any, and a a numeric or character seed to ensure reproducible results.

Maintained by JP Monteagudo. Last updated 3 months ago.

4.2 match 3 stars 3.78 score 5 scripts

laerciojuniosilva

SeedCalc:Seed Germination and Seedling Growth Indexes

Functions to calculate seed germination and seedling emergence and growth indexes. The main indexes for germination and seedling emergence, considering the time for seed germinate are: T10, T50 and T90, in Farooq et al. (2005) <10.1111/j.1744-7909.2005.00031.x>; and MGT, in Labouriau (1983). Considering the germination speed are: Germination Speed Index, in Maguire (1962), Mean Germination Rate, in Labouriau (1983); considering the homogeneity of germination are: Coefficient of Variation of the Germination Time, in Carvalho et al. (2005) <10.1590/S0100-84042005000300018>, and Variance of Germination, in Labouriau (1983); Uncertainty, in Labouriau and Valadares (1976) <ISSN:0001-3765>; and Synchrony, in Primack (1980). The main seedling indexes are Growth, in Sako (2001), Uniformity, in Sako (2001) and Castan et al. (2018) <doi:10.1590/1678-992x-2016-0401>; and Vigour, in Medeiros and Pereira (2018) <doi:10.1590/1983-40632018v4852340>.

Maintained by Laercio Junio da Silva. Last updated 6 years ago.

9.4 match 1.68 score 24 scripts

bioc

DelayedMatrixStats:Functions that Apply to Rows and Columns of 'DelayedMatrix' Objects

A port of the 'matrixStats' API for use with DelayedMatrix objects from the 'DelayedArray' package. High-performing functions operating on rows and columns of DelayedMatrix objects, e.g. col / rowMedians(), col / rowRanks(), and col / rowSds(). Functions optimized per data type and for subsetted calculations such that both memory usage and processing time is minimized.

Maintained by Peter Hickey. Last updated 2 months ago.

infrastructure datarepresentation software

1.3 match 16 stars 11.86 score 211 scripts 112 dependents

arsilva87

biotools:Tools for Biometry and Applied Statistics in Agricultural Science

Tools designed to perform and evaluate cluster analysis (including Tocher's algorithm), discriminant analysis and path analysis (standard and under collinearity), as well as some useful miscellaneous tools for dealing with sample size and optimum plot size calculations. A test for seed sample heterogeneity is now available. Mantel's permutation test can be found in this package. A new approach for calculating its power is implemented. biotools also contains tests for genetic covariance components. Heuristic approaches for performing non-parametric spatial predictions of generic response variables and spatial gene diversity are implemented.

Maintained by Anderson Rodrigo da Silva. Last updated 3 years ago.

cluster-analysis multivariate-analysis statistics tocher

2.2 match 2 stars 7.11 score 161 scripts 1 dependents

hardikghosh911

HTSeed:Fitting of Hydrotime Model for Seed Germination Time Course

The seed germination process starts with water uptake by the seed and ends with the protrusion of radicle and plumule under varying temperatures and soil water potential. Hydrotime is a way to describe the relationship between water potential and seed germination rates at germination percentages. One important quantity before applying hydrotime modeling of germination percentages is to consider the proportion of viable seeds that could germinate under saturated conditions. This package can be used to apply correction factors at various water potentials before estimating parameters like stress tolerance, and uniformity of the hydrotime model. Three different distributions namely, Gaussian, Logistic, and Extreme value distributions have been considered to fit the model to the seed germination time course. Details can be found in Bradford (2002) <https://www.jstor.org/stable/4046371>, and Bradford and Still(2004) <https://www.jstor.org/stable/23433495>.

Maintained by Dr. Himadri Ghosh. Last updated 12 months ago.

15.7 match 1.00 score

bioc

maSigPro:Significant Gene Expression Profile Differences in Time Course Gene Expression Data

maSigPro is a regression based approach to find genes for which there are significant gene expression profile differences between experimental groups in time course microarray and RNA-Seq experiments.

Maintained by Maria Jose Nueda. Last updated 5 months ago.

microarray rna-seq differential expression timecourse

3.0 match 5.18 score 76 scripts

temp20250212

MultiTraits:Analyzing and Visualizing Multidimensional Plant Traits

Implements analytical methods for multidimensional plant traits, including Competitors-Stress tolerators-Ruderals strategy analysis using leaf traits, Leaf-Height-Seed strategy analysis, Niche Periodicity Table analysis, and Trait Network analysis. Provides functions for data analysis, visualization, and network metrics calculation. Methods are based on Grime (1974) <doi:10.1038/250026a0>, Pierce et al. (2017) <doi:10.1111/1365-2435.12882>, Westoby (1998) <doi:10.1023/A:1004327224729>, Yang et al. (2022) <doi:10.1016/j.foreco.2022.120540>, Winemiller et al. (2015) <doi:10.1111/ele.12462>, He et al. (2020) <doi:10.1016/j.tree.2020.06.003>.

Maintained by Anonymous Author. Last updated 26 days ago.

3.9 match 3.90 score 16 scripts

mlopez-ibanez

irace:Iterated Racing for Automatic Algorithm Configuration

Iterated race is an extension of the Iterated F-race method for the automatic configuration of optimization algorithms, that is, (offline) tuning their parameters by finding the most appropriate settings given a set of instances of an optimization problem. M. López-Ibáñez, J. Dubois-Lacoste, L. Pérez Cáceres, T. Stützle, and M. Birattari (2016) <doi:10.1016/j.orp.2016.09.002>.

Maintained by Manuel López-Ibáñez. Last updated 1 months ago.

algorithm-configuration hyperparameter-tuning irace optimization-algorithms

1.6 match 63 stars 9.22 score 103 scripts 1 dependents

dynverse

dynwrap:Representing and Inferring Single-Cell Trajectories

Provides functionality to infer trajectories from single-cell data, represent them into a common format, and adapt them. Other biological information can also be added, such as cellular grouping, RNA velocity and annotation. Saelens et al. (2019) <doi:10.1038/s41587-019-0071-9>.

Maintained by Robrecht Cannoodt. Last updated 2 years ago.

trajectory-inference

2.0 match 16 stars 7.48 score 159 scripts 1 dependents

chabert-liddell

robber:Using Block Model to Estimate the Robustness of Ecological Network

Implementation of a variety of methods to compute the robustness of ecological interaction networks with binary interactions as described in <doi:10.1002/env.2709>. In particular, using the Stochastic Block Model and its bipartite counterpart, the Latent Block Model to put a parametric model on the network, allows the comparison of the robustness of networks differing in species richness and number of interactions. It also deals with networks that are partially sampled and/or with missing values.

Maintained by Saint-Clair Chabert-Liddell. Last updated 1 years ago.

ecological-network robber robustness

4.0 match 1 stars 3.70 score 4 scripts

nsj3

rioja:Analysis of Quaternary Science Data

Constrained clustering, transfer functions, and other methods for analysing Quaternary science data.

Maintained by Steve Juggins. Last updated 6 months ago.

cpp

2.0 match 10 stars 7.21 score 191 scripts 3 dependents

r-forge

Polychrome:Qualitative Palettes with Many Colors

Tools for creating, viewing, and assessing qualitative palettes with many (20-30 or more) colors. See Coombes and colleagues (2019) <doi:10.18637/jss.v090.c01>.

Maintained by Kevin R. Coombes. Last updated 1 months ago.

1.5 match 9.56 score 1.0k scripts 27 dependents

ropensci

beastier:Call 'BEAST2'

'BEAST2' (<https://www.beast2.org>) is a widely used Bayesian phylogenetic tool, that uses DNA/RNA/protein data and many model priors to create a posterior of jointly estimated phylogenies and parameters. 'BEAST2' is a command-line tool. This package provides a way to call 'BEAST2' from an 'R' function call.

Maintained by Richèl J.C. Bilderbeek. Last updated 25 days ago.

bayesian beast beast2 phylogenetic-inference phylogenetics openjdk

1.8 match 11 stars 7.87 score 47 scripts 4 dependents

shaunpwilkinson

kmer:Fast K-Mer Counting and Clustering for Biological Sequence Analysis

Contains tools for rapidly computing distance matrices and clustering large sequence datasets using fast alignment-free k-mer counting and recursive k-means partitioning. See Vinga and Almeida (2003) <doi:10.1093/bioinformatics/btg005> for a review of k-mer counting methods and applications for biological sequence analysis.

Maintained by Shaun Wilkinson. Last updated 6 years ago.

cpp

1.7 match 27 stars 8.24 score 71 scripts 6 dependents

bioc

alabaster.matrix:Load and Save Artifacts from File

Save matrices, arrays and similar objects into file artifacts, and load them back into memory. This is a more portable alternative to serialization of such objects into RDS files. Each artifact is associated with metadata for further interpretation; downstream applications can enrich this metadata with context-specific properties.

Maintained by Aaron Lun. Last updated 14 days ago.

dataimport datarepresentation cpp

2.0 match 7.05 score 15 scripts 8 dependents

charlie86

spotifyr:R Wrapper for the 'Spotify' Web API

An R wrapper for pulling data from the 'Spotify' Web API <https://developer.spotify.com/documentation/web-api/> in bulk, or post items on a 'Spotify' user's playlist.

Maintained by Daniel Antal. Last updated 5 months ago.

music-information-retrieval spotify

1.7 match 374 stars 8.54 score 936 scripts

bioc

netZooR:Unified methods for the inference and analysis of gene regulatory networks

netZooR unifies the implementations of several Network Zoo methods (netzoo, netzoo.github.io) into a single package by creating interfaces between network inference and network analysis methods. Currently, the package has 3 methods for network inference including PANDA and its optimized implementation OTTER (network reconstruction using mutliple lines of biological evidence), LIONESS (single-sample network inference), and EGRET (genotype-specific networks). Network analysis methods include CONDOR (community detection), ALPACA (differential community detection), CRANE (significance estimation of differential modules), MONSTER (estimation of network transition states). In addition, YARN allows to process gene expresssion data for tissue-specific analyses and SAMBAR infers missing mutation data based on pathway information.

Maintained by Tara Eicher. Last updated 1 days ago.

networkinference network generegulation geneexpression transcription microarray graphandnetwork gene-regulatory-network transcription-factors

1.8 match 105 stars 7.98 score

gertvv

hitandrun:"Hit and Run" and "Shake and Bake" for Sampling Uniformly from Convex Shapes

The "Hit and Run" Markov Chain Monte Carlo method for sampling uniformly from convex shapes defined by linear constraints, and the "Shake and Bake" method for sampling from the boundary of such shapes. Includes specialized functions for sampling normalized weights with arbitrary linear constraints. Tervonen, T., van Valkenhoef, G., Basturk, N., and Postmus, D. (2012) <doi:10.1016/j.ejor.2012.08.026>. van Valkenhoef, G., Tervonen, T., and Postmus, D. (2014) <doi:10.1016/j.ejor.2014.06.036>.

Maintained by Gert van Valkenhoef. Last updated 3 years ago.

openblas

2.0 match 16 stars 6.92 score 121 scripts 9 dependents

blawson-bates

simEd:Simulation Education

Contains various functions to be used for simulation education, including simple Monte Carlo simulation functions, queueing simulation functions, variate generation functions capable of producing independent streams and antithetic variates, functions for illustrating random variate generation for various discrete and continuous distributions, and functions to compute time-persistent statistics. Also contains functions for visualizing: event-driven details of a single-server queue model; a Lehmer random number generator; variate generation via acceptance-rejection; and of generating a non-homogeneous Poisson process via thinning. Also contains two queueing data sets (one fabricated, one real-world) to facilitate input modeling. More details on the use of these functions can be found in Lawson and Leemis (2015) <doi:10.1109/WSC.2017.8248124>, in Kudlay, Lawson, and Leemis (2020) <doi:10.1109/WSC48552.2020.9384010>, and in Lawson and Leemis (2021) <doi:10.1109/WSC52266.2021.9715299>.

Maintained by Barry Lawson. Last updated 1 years ago.

4.1 match 3.35 score 45 scripts

unuran

Runuran:R Interface to the 'UNU.RAN' Random Variate Generators

Interface to the 'UNU.RAN' library for Universal Non-Uniform RANdom variate generators. Thus it allows to build non-uniform random number generators from quite arbitrary distributions. In particular, it provides an algorithm for fast numerical inversion for distribution with given density function. In addition, the package contains densities, distribution functions and quantiles from a couple of distributions.

Maintained by Josef Leydold. Last updated 6 months ago.

2.0 match 6.87 score 180 scripts 8 dependents

aidenloe

mazeGen:Elithorn Maze Generator

A maze generator that creates the Elithorn Maze (HTML file) and the functions to calculate the associated maze parameters (i.e. Difficulty and Ability).

Maintained by Bao Sheng Loe (Aiden). Last updated 6 years ago.

3.8 match 3.59 score 77 scripts

s3alfisc

fwildclusterboot:Fast Wild Cluster Bootstrap Inference for Linear Models

Implementation of fast algorithms for wild cluster bootstrap inference developed in 'Roodman et al' (2019, 'STATA' Journal, <doi:10.1177/1536867X19830877>) and 'MacKinnon et al' (2022), which makes it feasible to quickly calculate bootstrap test statistics based on a large number of bootstrap draws even for large samples. Multiple bootstrap types as described in 'MacKinnon, Nielsen & Webb' (2022) are supported. Further, 'multiway' clustering, regression weights, bootstrap weights, fixed effects and 'subcluster' bootstrapping are supported. Further, both restricted ('WCR') and unrestricted ('WCU') bootstrap are supported. Methods are provided for a variety of fitted models, including 'lm()', 'feols()' (from package 'fixest') and 'felm()' (from package 'lfe'). Additionally implements a 'heteroskedasticity-robust' ('HC1') wild bootstrap. Last, the package provides an R binding to 'WildBootTests.jl', which provides additional speed gains and functionality, including the 'WRE' bootstrap for instrumental variable models (based on models of type 'ivreg()' from package 'ivreg') and hypotheses with q > 1.

Maintained by Alexander Fischer. Last updated 2 years ago.

clustered-standard-errors linear-regression-models wild-bootstrap wild-cluster-bootstrap openblas cpp openmp

2.0 match 24 stars 6.67 score 109 scripts 2 dependents

bayesiandemography

bage:Bayesian Estimation and Forecasting of Age-Specific Rates

Fast Bayesian estimation and forecasting of age-specific rates, probabilities, and means, based on 'Template Model Builder'.

Maintained by John Bryant. Last updated 2 days ago.

cpp

1.8 match 3 stars 7.41 score 39 scripts

jacobkap

caesar:Encrypts and Decrypts Strings

Encrypts and decrypts strings using either the Caesar cipher or a pseudorandom number generation (using set.seed()) method.

Maintained by Jacob Kaplan. Last updated 5 years ago.

2.2 match 1 stars 6.00 score 26 scripts

glsnow

TeachingDemos:Demonstrations for Teaching and Learning

Demonstration functions that can be used in a classroom to demonstrate statistical concepts, or on your own to better understand the concepts or the programming.

Maintained by Greg Snow. Last updated 1 years ago.

1.8 match 7.18 score 760 scripts 13 dependents

svmiller

stevemisc:Steve's Miscellaneous Functions

These are miscellaneous functions that I find useful for my research and teaching. The contents include themes for plots, functions for simulating quantities of interest from regression models, functions for simulating various forms of fake data for instructional/research purposes, and many more. All told, the functions provided here are broadly useful for data organization, data presentation, data recoding, and data simulation.

Maintained by Steve Miller. Last updated 8 days ago.

dplyr mixed-effects-models multivariate-normal-distribution tidyverse

1.9 match 10 stars 6.85 score 392 scripts 2 dependents

ludvigolsen

cvms:Cross-Validation for Model Selection

Cross-validate one or multiple regression and classification models and get relevant evaluation metrics in a tidy format. Validate the best model on a test set and compare it to a baseline evaluation. Alternatively, evaluate predictions from an external model. Currently supports regression and classification (binary and multiclass). Described in chp. 5 of Jeyaraman, B. P., Olsen, L. R., & Wambugu M. (2019, ISBN: 9781838550134).

Maintained by Ludvig Renbo Olsen. Last updated 13 days ago.

1.2 match 39 stars 10.31 score 492 scripts 5 dependents

bioc

microRNA:Data and functions for dealing with microRNAs

Different data resources for microRNAs and some functions for manipulating them.

Maintained by "Michael Lawrence". Last updated 1 months ago.

infrastructure genomeannotation sequencematching cpp

3.5 match 3.48 score 7 scripts

bioc

MCbiclust:Massive correlating biclusters for gene expression data and associated methods

Custom made algorithm and associated methods for finding, visualising and analysing biclusters in large gene expression data sets. Algorithm is based on with a supplied gene set of size n, finding the maximum strength correlation matrix containing m samples from the data set.

Maintained by Robert Bentham. Last updated 5 months ago.

immunooncology clustering microarray statisticalmethod software rnaseq geneexpression

3.0 match 4.00 score 2 scripts

lhvanegasp

glmtoolbox:Set of Tools to Data Analysis using Generalized Linear Models

Set of tools for the statistical analysis of data using: (1) normal linear models; (2) generalized linear models; (3) negative binomial regression models as alternative to the Poisson regression models under the presence of overdispersion; (4) beta-binomial and random-clumped binomial regression models as alternative to the binomial regression models under the presence of overdispersion; (5) Zero-inflated and zero-altered regression models to deal with zero-excess in count data; (6) generalized nonlinear models; (7) generalized estimating equations for cluster correlated data.

Maintained by Luis Hernando Vanegas. Last updated 8 months ago.

4.0 match 1 stars 3.00 score 149 scripts

john-d-fox

RcmdrMisc:R Commander Miscellaneous Functions

Various statistical, graphics, and data-management functions used by the Rcmdr package in the R Commander GUI for R.

Maintained by John Fox. Last updated 1 years ago.

1.7 match 1 stars 7.00 score 432 scripts 42 dependents

mingdeyu

dgpsi:Interface to 'dgpsi' for Deep and Linked Gaussian Process Emulations

Interface to the 'python' package 'dgpsi' for Gaussian process, deep Gaussian process, and linked deep Gaussian process emulations of computer models and networks using stochastic imputation (SI). The implementations follow Ming & Guillas (2021) <doi:10.1137/20M1323771> and Ming, Williamson, & Guillas (2023) <doi:10.1080/00401706.2022.2124311> and Ming & Williamson (2023) <doi:10.48550/arXiv.2306.01212>. To get started with the package, see <https://mingdeyu.github.io/dgpsi-R/>.

Maintained by Deyu Ming. Last updated 1 months ago.

deep-gaussian-processes emulation gaussian-processes surrogate-models

2.0 match 5.99 score 76 scripts

bioc

DelayedRandomArray:Delayed Arrays of Random Values

Implements a DelayedArray of random values where the realization of the sampled values is delayed until they are needed. Reproducible sampling within any subarray is achieved by chunking where each chunk is initialized with a different random seed and stream. The usual distributions in the stats package are supported, along with scalar, vector and arrays for the parameters.

Maintained by Aaron Lun. Last updated 3 months ago.

datarepresentation cpp

2.3 match 5.26 score 6 scripts 1 dependents

tmieno2

r.spatial.workshop.datasets:Collection of spatial datasets

This packages provides spatial datasets in various format. They are used for demonstrating spatial operations and map creation using R spatial pacakges (e.g., sf, terra, tmap).

Maintained by Taro Mieno. Last updated 6 months ago.

4.0 match 2.96 score 23 scripts

ropensci

phylotaR:Automated Phylogenetic Sequence Cluster Identification from 'GenBank'

A pipeline for the identification, within taxonomic groups, of orthologous sequence clusters from 'GenBank' <https://www.ncbi.nlm.nih.gov/genbank/> as the first step in a phylogenetic analysis. The pipeline depends on a local alignment search tool and is, therefore, not dependent on differences in gene naming conventions and naming errors.

Maintained by Shixiang Wang. Last updated 8 months ago.

blastn genbank peer-reviewed phylogenetics sequence-alignment

2.0 match 23 stars 5.86 score 156 scripts

harrysouthworth

texmex:Statistical Modelling of Extreme Values

Statistical extreme value modelling of threshold excesses, maxima and multivariate extremes. Univariate models for threshold excesses and maxima are the Generalised Pareto, and Generalised Extreme Value model respectively. These models may be fitted by using maximum (optionally penalised-)likelihood, or Bayesian estimation, and both classes of models may be fitted with covariates in any/all model parameters. Model diagnostics support the fitting process. Graphical output for visualising fitted models and return level estimates is provided. For serially dependent sequences, the intervals declustering algorithm of Ferro and Segers (2003) <doi:10.1111/1467-9868.00401> is provided, with diagnostic support to aid selection of threshold and declustering horizon. Multivariate modelling is performed via the conditional approach of Heffernan and Tawn (2004) <doi:10.1111/j.1467-9868.2004.02050.x>, with graphical tools for threshold selection and to diagnose estimation convergence.

Maintained by Harry Southworth. Last updated 1 years ago.

cpp

1.8 match 7 stars 6.44 score 66 scripts 1 dependents

gabybudel

DBHC:Sequence Clustering with Discrete-Output HMMs

Provides an implementation of a mixture of hidden Markov models (HMMs) for discrete sequence data in the Discrete Bayesian HMM Clustering (DBHC) algorithm. The DBHC algorithm is an HMM Clustering algorithm that finds a mixture of discrete-output HMMs while using heuristics based on Bayesian Information Criterion (BIC) to search for the optimal number of HMM states and the optimal number of clusters.

Maintained by Gabriel Budel. Last updated 2 years ago.

4.3 match 1 stars 2.70 score 3 scripts

svmiller

codename:Generation of Code Names for Organizations, People, Projects, and Whatever Else

This creates code names that a user can consider for their organizations, their projects, themselves, people in their organizations or projects, or whatever else. The user can also supply a numeric seed (and even a character seed) for maximum reproducibility. Use is simple and the code names produced come in various types too, contingent on what the user may be desiring as a code name or nickname.

Maintained by Steve Miller. Last updated 2 years ago.

codename reproducibility wu-tang

2.5 match 17 stars 4.62 score 49 scripts

cristianetaniguti

onemap:Construction of Genetic Maps in Experimental Crosses

Analysis of molecular marker data from model (backcrosses, F2 and recombinant inbred lines) and non-model systems (i. e. outcrossing species). For the later, it allows statistical analysis by simultaneously estimating linkage and linkage phases (genetic map construction) according to Wu et al. (2002) <doi:10.1006/tpbi.2002.1577>. All analysis are based on multipoint approaches using hidden Markov models.

Maintained by Cristiane Taniguti. Last updated 2 months ago.

cpp

1.7 match 3 stars 6.58 score 183 scripts

hanase

rlecuyer:R Interface to RNG with Multiple Streams

Provides an interface to the C implementation of the random number generator with multiple independent streams developed by L'Ecuyer et al (2002). The main purpose of this package is to enable the use of this random number generator in parallel R applications.

Maintained by Hana Sevcikova. Last updated 2 years ago.

2.0 match 2 stars 5.64 score 143 scripts 6 dependents

miraisolutions

rTRNG:Advanced and Parallel Random Number Generation via 'TRNG'

Embeds sources and headers from Tina's Random Number Generator ('TRNG') C++ library. Exposes some functionality for easier access, testing and benchmarking into R. Provides examples of how to use parallel RNG with 'RcppParallel'. The methods and techniques behind 'TRNG' are illustrated in the package vignettes and examples. Full documentation is available in Bauke (2021) <https://github.com/rabauke/trng4/blob/v4.23.1/doc/trng.pdf>.

Maintained by Riccardo Porreca. Last updated 1 years ago.

hpc parallel rcpp trng cpp

2.0 match 19 stars 5.63 score 15 scripts

sdanzige

ADAPTS:Automated Deconvolution Augmentation of Profiles for Tissue Specific Cells

Tools to construct (or add to) cell-type signature matrices using flow sorted or single cell samples and deconvolve bulk gene expression data. Useful for assessing the quality of single cell RNAseq experiments, estimating the accuracy of signature matrices, and determining cell-type spillover. Please cite: Danziger SA et al. (2019) ADAPTS: Automated Deconvolution Augmentation of Profiles for Tissue Specific cells <doi:10.1371/journal.pone.0224693>.

Maintained by Samuel A Danziger. Last updated 3 years ago.

1.7 match 2 stars 6.56 score 40 scripts 1 dependents

bioc

subSeq:Subsampling of high-throughput sequencing count data

Subsampling of high throughput sequencing count data for use in experiment design and analysis.

Maintained by Andrew J. Bass. Last updated 5 months ago.

immunooncology sequencing transcription rnaseq geneexpression differentialexpression

1.8 match 19 stars 6.36 score 20 scripts

ropensci

nlrx:Setup, Run and Analyze 'NetLogo' Model Simulations from 'R' via 'XML'

Setup, run and analyze 'NetLogo' (<https://ccl.northwestern.edu/netlogo/>) model simulations in 'R'. 'nlrx' experiments use a similar structure as 'NetLogos' Behavior Space experiments. However, 'nlrx' offers more flexibility and additional tools for running and analyzing complex simulation designs and sensitivity analyses. The user defines all information that is needed in an intuitive framework, using class objects. Experiments are submitted from 'R' to 'NetLogo' via 'XML' files that are dynamically written, based on specifications defined by the user. By nesting model calls in future environments, large simulation design with many runs can be executed in parallel. This also enables simulating 'NetLogo' experiments on remote high performance computing machines. In order to use this package, 'Java' and 'NetLogo' (>= 5.3.1) need to be available on the executing system.

Maintained by Sebastian Hanss. Last updated 7 months ago.

agent-based-modeling individual-based-modelling netlogo peer-reviewed

1.3 match 78 stars 8.86 score 195 scripts

ludvigolsen

xpectr:Generates Expectations for 'testthat' Unit Testing

Helps systematize and ease the process of building unit tests with the 'testthat' package by providing tools for generating expectations.

Maintained by Ludvig Renbo Olsen. Last updated 14 days ago.

1.8 match 37 stars 6.06 score 62 scripts

guyabel

migest:Methods for the Indirect Estimation of Bilateral Migration

Tools for estimating, measuring and working with migration data.

Maintained by Guy J. Abel. Last updated 1 months ago.

demography migration population

1.9 match 32 stars 5.80 score 86 scripts

bioc

scanMiR:scanMiR

A set of tools for working with miRNA affinity models (KdModels), efficiently scanning for miRNA binding sites, and predicting target repression. It supports scanning using miRNA seeds, full miRNA sequences (enabling 3' alignment) and KdModels, and includes the prediction of slicing and TDMD sites. Finally, it includes utility and plotting functions (e.g. for the visual representation of miRNA-target alignment).

Maintained by Pierre-Luc Germain. Last updated 5 months ago.

mirna sequencematching alignment

1.8 match 5.89 score 52 scripts 1 dependents

prakashvs613

RankAggSIgFUR:Polynomially Bounded Rank Aggregation under Kemeny's Axiomatic Approach

Polynomially bounded algorithms to aggregate complete rankings under Kemeny's axiomatic framework. 'RankAggSIgFUR' (pronounced as rank-agg-cipher) contains two heuristics algorithms: FUR and SIgFUR. For details, please see Badal and Das (2018) <doi:10.1016/j.cor.2018.06.007>.

Maintained by Rakhi Singh. Last updated 2 years ago.

4.0 match 2.70 score 3 scripts

john-d-fox

norm:Analysis of Multivariate Normal Datasets with Missing Values

An integrated set of functions for the analysis of multivariate normal datasets with missing values, including implementation of the EM algorithm, data augmentation, and multiple imputation.

Maintained by John Fox. Last updated 2 years ago.

fortran

1.8 match 5.99 score 106 scripts 33 dependents

brodieg

unitizer:Interactive R Unit Tests

Simplifies regression tests by comparing objects produced by test code with earlier versions of those same objects. If objects are unchanged the tests pass, otherwise execution stops with error details. If in interactive mode, tests can be reviewed through the provided interactive environment.

Maintained by Brodie Gaslam. Last updated 10 months ago.

unit-testing

1.5 match 39 stars 7.16 score 84 scripts

bioc

scMultiSim:Simulation of Multi-Modality Single Cell Data Guided By Gene Regulatory Networks and Cell-Cell Interactions

scMultiSim simulates paired single cell RNA-seq, single cell ATAC-seq and RNA velocity data, while incorporating mechanisms of gene regulatory networks, chromatin accessibility and cell-cell interactions. It allows users to tune various parameters controlling the amount of each biological factor, variation of gene-expression levels, the influence of chromatin accessibility on RNA sequence data, and so on. It can be used to benchmark various computational methods for single cell multi-omics data, and to assist in experimental design of wet-lab experiments.

Maintained by Hechen Li. Last updated 5 months ago.

singlecell transcriptomics geneexpression sequencing experimentaldesign

1.5 match 23 stars 7.08 score 11 scripts

nflverse

nflseedR:Functions to Efficiently Simulate and Evaluate NFL Seasons

A set of functions to simulate National Football League seasons including the sophisticated tie-breaking procedures.

Maintained by Sebastian Carl. Last updated 8 days ago.

football-simulation nfl season-simulations

1.7 match 23 stars 6.32 score 34 scripts 1 dependents

crowding

iterors:Fast, Compact Iterators and Tools

A fresh take on iterators in R. Designed to be cross-compatible with the 'iterators' package, but using the 'nextOr' method will offer better performance as well as more compact code. With batteries included: includes a collection of iterator constructors and combinators ported and refined from the 'iterators', 'itertools', and 'itertools2' packages.

Maintained by Peter Meilstrup. Last updated 2 years ago.

1.8 match 4 stars 6.02 score 21 scripts

skranz

sktools:Helpful functions used in my courses

Several helpful functions that I use in my courses

Maintained by Sebastian Kranz. Last updated 4 years ago.

4.9 match 1 stars 2.15 score 28 scripts

canmod

macpan2:Fast and Flexible Compartmental Modelling

Fast and flexible compartmental modelling with Template Model Builder.

Maintained by Steve Walker. Last updated 2 hours ago.

compartmental-models epidemiology forecasting mixed-effects model-fitting optimization simulation simulation-modeling cpp

1.2 match 4 stars 8.89 score 246 scripts 1 dependents

kaneplusplus

basket:Basket Trial Analysis

Implementation of multisource exchangeability models for Bayesian analyses of prespecified subgroups arising in the context of basket trial design and monitoring. The R 'basket' package facilitates implementation of the binary, symmetric multi-source exchangeability model (MEM) with posterior inference arising from both exact computation and Markov chain Monte Carlo sampling. Analysis output includes full posterior samples as well as posterior probabilities, highest posterior density (HPD) interval boundaries, effective sample sizes (ESS), mean and median estimations, posterior exchangeability probability matrices, and maximum a posteriori MEMs. In addition to providing "basketwise" analyses, the package includes similar calculations for "clusterwise" analyses for which subgroups are combined into meta-baskets, or clusters, using graphical clustering algorithms that treat the posterior exchangeability probabilities as edge weights. In addition plotting tools are provided to visualize basket and cluster densities as well as their exchangeability. References include Hyman, D.M., Puzanov, I., Subbiah, V., Faris, J.E., Chau, I., Blay, J.Y., Wolf, J., Raje, N.S., Diamond, E.L., Hollebecque, A. and Gervais, R (2015) <doi:10.1056/NEJMoa1502309>; Hobbs, B.P. and Landin, R. (2018) <doi:10.1002/sim.7893>; Hobbs, B.P., Kane, M.J., Hong, D.S. and Landin, R. (2018) <doi:10.1093/annonc/mdy457>; and Kaizer, A.M., Koopmeiners, J.S. and Hobbs, B.P. (2017) <doi:10.1093/biostatistics/kxx031>.

Maintained by Michael J. Kane. Last updated 2 years ago.

1.9 match 4 stars 5.46 score 36 scripts

liuyu-star

ODRF:Oblique Decision Random Forest for Classification and Regression

The oblique decision tree (ODT) uses linear combinations of predictors as partitioning variables in a decision tree. Oblique Decision Random Forest (ODRF) is an ensemble of multiple ODTs generated by feature bagging. Oblique Decision Boosting Tree (ODBT) applies feature bagging during the training process of ODT-based boosting trees to ensemble multiple boosting trees. All three methods can be used for classification and regression, and ODT and ODRF serve as supplements to the classical CART of Breiman (1984) <DOI:10.1201/9781315139470> and Random Forest of Breiman (2001) <DOI:10.1023/A:1010933404324> respectively.

Maintained by Yu Liu. Last updated 5 months ago.

cpp

2.0 match 7 stars 5.10 score 18 scripts

leoegidi

pivmet:Pivotal Methods for Bayesian Relabelling and k-Means Clustering

Collection of pivotal algorithms for: relabelling the MCMC chains in order to undo the label switching problem in Bayesian mixture models; fitting sparse finite mixtures; initializing the centers of the classical k-means algorithm in order to obtain a better clustering solution. For further details see Egidi, Pappadà, Pauli and Torelli (2018b)<ISBN:9788891910233>.

Maintained by Leonardo Egidi. Last updated 9 months ago.

jags cpp

1.7 match 5 stars 5.94 score 25 scripts

cran

flip:Multivariate Permutation Tests

It implements many univariate and multivariate permutation (and rotation) tests. Allowed tests: the t one and two samples, ANOVA, linear models, Chi Squared test, rank tests (i.e. Wilcoxon, Mann-Whitney, Kruskal-Wallis), Sign test and Mc Nemar. Test on Linear Models are performed also in presence of covariates (i.e. nuisance parameters). The permutation and the rotation methods to get the null distribution of the test statistics are available. It also implements methods for multiplicity control such as Westfall & Young minP procedure and Closed Testing (Marcus, 1976) and k-FWER. Moreover, it allows to test for fixed effects in mixed effects models.

Maintained by Livio Finos. Last updated 7 years ago.

4.5 match 2.26 score 3 dependents

jimclarkatduke

mastif:Mast Inference and Forecasting

Analyzes production and dispersal of seeds dispersed from trees and recovered in seed traps. Motivated by long-term inventory plots where seed collections are used to infer seed production by each individual plant.

Maintained by James S. Clark. Last updated 12 months ago.

openblas cpp

5.1 match 2.00 score

bioc

DaMiRseq:Data Mining for RNA-seq data: normalization, feature selection and classification

The DaMiRseq package offers a tidy pipeline of data mining procedures to identify transcriptional biomarkers and exploit them for both binary and multi-class classification purposes. The package accepts any kind of data presented as a table of raw counts and allows including both continous and factorial variables that occur with the experimental setting. A series of functions enable the user to clean up the data by filtering genomic features and samples, to adjust data by identifying and removing the unwanted source of variation (i.e. batches and confounding factors) and to select the best predictors for modeling. Finally, a "stacking" ensemble learning technique is applied to build a robust classification model. Every step includes a checkpoint that the user may exploit to assess the effects of data management by looking at diagnostic plots, such as clustering and heatmaps, RLE boxplots, MDS or correlation plot.

Maintained by Mattia Chiesa. Last updated 5 months ago.

sequencing rnaseq classification immunooncology openjdk

1.9 match 5.32 score 7 scripts 1 dependents

reedacartwright

rbedrock:Analysis and Manipulation of Data from Minecraft Bedrock Edition

Implements an interface to Minecraft (Bedrock Edition) worlds. Supports the analysis and management of these worlds and game saves.

Maintained by Reed Cartwright. Last updated 21 days ago.

zlib cpp

1.9 match 43 stars 5.24 score 3 scripts

djnavarro

jasmines:Generative Art

It doesn't do much, really.

Maintained by Danielle Navarro. Last updated 4 years ago.

2.0 match 112 stars 4.90 score 141 scripts

bioc

PICB:piRNA Cluster Builder

piRNAs (short for PIWI-interacting RNAs) and their PIWI protein partners play a key role in fertility and maintaining genome integrity by restricting mobile genetic elements (transposons) in germ cells. piRNAs originate from genomic regions known as piRNA clusters. The piRNA Cluster Builder (PICB) is a versatile toolkit designed to identify genomic regions with a high density of piRNAs. It constructs piRNA clusters through a stepwise integration of unique and multimapping piRNAs and offers wide-ranging parameter settings, supported by an optimization function that allows users to test different parameter combinations to tailor the analysis to their specific piRNA system. The output includes extensive metadata columns, enabling researchers to rank clusters and extract cluster characteristics.

Maintained by Franziska Ahrend. Last updated 1 months ago.

genetics genomeannotation sequencing functionalprediction coverage transcriptomics

1.8 match 5 stars 5.57 score

f-rousset

genepop:Population Genetic Data Analysis Using Genepop

Makes the Genepop software available in R. This software implements a mixture of traditional population genetic methods and some more focused developments: it computes exact tests for Hardy-Weinberg equilibrium, for population differentiation and for genotypic disequilibrium among pairs of loci; it computes estimates of F-statistics, null allele frequencies, allele size-based statistics for microsatellites, etc.; and it performs analyses of isolation by distance from pairwise comparisons of individuals or population samples.

Maintained by François Rousset. Last updated 2 years ago.

cpp

3.5 match 1 stars 2.78 score 54 scripts

project-gen3sis

gen3sis:General Engine for Eco-Evolutionary Simulations

Contains an engine for spatially-explicit eco-evolutionary mechanistic models with a modular implementation and several support functions. It allows exploring the consequences of ecological and macroevolutionary processes across realistic or theoretical spatio-temporal landscapes on biodiversity patterns as a general term. Reference: Oskar Hagen, Benjamin Flueck, Fabian Fopp, Juliano S. Cabral, Florian Hartig, Mikael Pontarp, Thiago F. Rangel, Loic Pellissier (2021) "gen3sis: A general engine for eco-evolutionary simulations of the processes that shape Earth's biodiversity" <doi:10.1371/journal.pbio.3001340>.

Maintained by Oskar Hagen. Last updated 1 years ago.

biodiversity ecology evolution mechanistic model modeling simulation cpp

1.3 match 29 stars 7.56 score 69 scripts

uniprjrc

fsdaR:Robust Data Analysis Through Monitoring and Dynamic Visualization

Provides interface to the 'MATLAB' toolbox 'Flexible Statistical Data Analysis (FSDA)' which is comprehensive and computationally efficient software package for robust statistics in regression, multivariate and categorical data analysis. The current R version implements tools for regression: (forward search, S- and MM-estimation, least trimmed squares (LTS) and least median of squares (LMS)), for multivariate analysis (forward search, S- and MM-estimation), for cluster analysis and cluster-wise regression. The distinctive feature of our package is the possibility of monitoring the statistics of interest as a function of breakdown point, efficiency or subset size, depending on the estimator. This is accompanied by a rich set of graphical features, such as dynamic brushing, linking, particularly useful for exploratory data analysis.

Maintained by Valentin Todorov. Last updated 1 years ago.

openjdk

1.8 match 5 stars 5.37 score 93 scripts

dkyleward

ipfr:List Balancing for Reweighting and Population Synthesis

Performs iterative proportional updating given a seed table and an arbitrary number of marginal distributions. This is commonly used in population synthesis, survey raking, matrix rebalancing, and other applications. For example, a household survey may be weighted to match the known distribution of households by size from the census. An origin/ destination trip matrix might be balanced to match traffic counts. The approach used by this package is based on a paper from Arizona State University (Ye, Xin, et. al. (2009) <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.537.723&rep=rep1&type=pdf>). Some enhancements have been made to their work including primary and secondary target balance/importance, general marginal agreement, and weight restriction.

Maintained by Kyle Ward. Last updated 5 years ago.

1.8 match 5 stars 5.06 score 23 scripts

joliencremers

bpnreg:Bayesian Projected Normal Regression Models for Circular Data

Fitting Bayesian multiple and mixed-effect regression models for circular data based on the projected normal distribution. Both continuous and categorical predictors can be included. Sampling from the posterior is performed via an MCMC algorithm. Posterior descriptives of all parameters, model fit statistics and Bayes factors for hypothesis tests for inequality constrained hypotheses are provided. See Cremers, Mulder & Klugkist (2018) <doi:10.1111/bmsp.12108> and Nuñez-Antonio & Guttiérez-Peña (2014) <doi:10.1016/j.csda.2012.07.025>.

Maintained by Jolien Cremers. Last updated 1 years ago.

openblas cpp openmp

1.5 match 14 stars 6.15 score 101 scripts

poissonconsulting

batchr:Batch Process Files

Processes multiple files with a user-supplied function. The key design principle is that only files which were last modified before the directory was configured are processed. A hidden file stores the configuration time and function etc while successfully processed files are automatically touched to update their modification date. As a result batch processing can be stopped and restarted and any files created (or modified or deleted) during processing are ignored.

Maintained by Joe Thorley. Last updated 2 months ago.

batch-processing

2.0 match 6 stars 4.56 score 8 scripts

adriancorrendo

soiltestcorr:Soil Test Correlation and Calibration

A compilation of functions designed to assist users on the correlation analysis of crop yield and soil test values. Functions to estimate crop response patterns to soil nutrient availability and critical soil test values using various approaches such as: 1) the modified arcsine-log calibration curve (Correndo et al. (2017) <doi:10.1071/CP16444>); 2) the graphical Cate-Nelson quadrants analysis (Cate & Nelson (1965)), 3) the statistical Cate-Nelson quadrants analysis (Cate & Nelson (1971) <doi:10.2136/sssaj1971.03615995003500040048x>), 4) the linear-plateau regression (Anderson & Nelson (1975) <doi:10.2307/2529422>), 5) the quadratic-plateau regression (Bullock & Bullock (1994) <doi:10.2134/agronj1994.00021962008600010033x>), and 6) the Mitscherlich-type exponential regression (Melsted & Peck (1977) <doi:10.2134/asaspecpub29.c1>). The package development stemmed from ongoing work with the Fertilizer Recommendation Support Tool (FRST) and Feed the Future Innovation Lab for Collaborative Research on Sustainable Intensification (SIIL) projects.

Maintained by Adrian A. Correndo. Last updated 9 months ago.

1.5 match 7 stars 6.04 score 31 scripts

cran

MLGdata:Datasets for Use with Salvan, Sartori and Pace (2020)

Contains the datasets for use with the book Salvan, Sartori and Pace (2020, ISBN:978-88-470-4002-1) "Modelli Lineari Generalizzati".

Maintained by Nicola Sartori. Last updated 4 years ago.

9.0 match 1.00 score

avi-kenny

SimEngine:A Modular Framework for Statistical Simulations in R

An open-source R package for structuring, maintaining, running, and debugging statistical simulations on both local and cluster-based computing environments.See full documentation at <https://avi-kenny.github.io/SimEngine/>.

Maintained by Avi Kenny. Last updated 26 days ago.

1.3 match 12 stars 7.18 score 50 scripts

jokergoo

sfcurve:2x2, 3x3 and Nxn Space-Filling Curves

Implementation of all possible forms of 2x2 and 3x3 space-filling curves, i.e., the generalized forms of the Hilbert curve <https://en.wikipedia.org/wiki/Hilbert_curve>, the Peano curve <https://en.wikipedia.org/wiki/Peano_curve> and the Peano curve in the meander type (Figure 5 in <https://eudml.org/doc/141086>). It can generates nxn curves expanded from any specific level-1 units. It also implements the H-curve and the three-dimensional Hilbert curve.

Maintained by Zuguang Gu. Last updated 6 months ago.

cpp

1.9 match 1 stars 4.66 score 13 scripts

cran

hglm.data:Data for the 'hglm' Package

This data-only package was created for distributing data used in the examples of the 'hglm' package.

Maintained by Xia Shen. Last updated 6 years ago.

3.5 match 2.48 score 1 dependents

jagonzalb

SNSequate:Standard and Nonstandard Statistical Models and Methods for Test Equating

Contains functions to perform various models and methods for test equating (Kolen and Brennan, 2014 <doi:10.1007/978-1-4939-0317-7> ; Gonzalez and Wiberg, 2017 <doi:10.1007/978-3-319-51824-4> ; von Davier et. al, 2004 <doi:10.1007/b97446>). It currently implements the traditional mean, linear and equipercentile equating methods. Both IRT observed-score and true-score equating are also supported, as well as the mean-mean, mean-sigma, Haebara and Stocking-Lord IRT linking methods. It also supports newest methods such that local equating, kernel equating (using Gaussian, logistic, Epanechnikov, uniform and adaptive kernels) with presmoothing, and IRT parameter linking methods based on asymmetric item characteristic functions. Functions to obtain both standard error of equating (SEE) and standard error of equating differences between two equating functions (SEED) are also implemented for the kernel method of equating.

Maintained by Jorge Gonzalez. Last updated 11 months ago.

6.1 match 1.43 score 27 scripts

benkeser

drtmle:Doubly-Robust Nonparametric Estimation and Inference

Targeted minimum loss-based estimators of counterfactual means and causal effects that are doubly-robust with respect both to consistency and asymptotic normality (Benkeser et al (2017), <doi:10.1093/biomet/asx053>; MJ van der Laan (2014), <doi:10.1515/ijb-2012-0038>).

Maintained by David Benkeser. Last updated 2 years ago.

causal-inference ensemble-learning iptw statistical-inference tmle

1.3 match 19 stars 6.89 score 90 scripts 1 dependents

cran

spuRs:Functions and Datasets for "Introduction to Scientific Programming and Simulation Using R"

Provides functions and datasets from Jones, O.D., R. Maillardet, and A.P. Robinson. 2014. An Introduction to Scientific Programming and Simulation, Using R. 2nd Ed. Chapman And Hall/CRC.

Maintained by Andrew Robinson. Last updated 7 years ago.

8.4 match 1 stars 1.00 score

chingchuan-chen

RcppBlaze:'Rcpp' Integration for the 'Blaze' High-Performance 'C++' Math Library

Blaze is an open-source, high-performance 'C++' math library for dense and sparse arithmetic. With its state-of-the-art Smart Expression Template implementation Blaze combines the elegance and ease of use of a domain-specific language with HPC-grade performance, making it one of the most intuitive and fastest 'C++' math libraries available. The 'RcppBlaze' package includes the header files from the 'Blaze' library with disabling some functionalities related to link to the thread and system libraries which make 'RcppBlaze' be a header-only library. Therefore, users do not need to install 'Blaze'.

Maintained by Ching-Chuan Chen. Last updated 11 months ago.

openblas cpp openmp

1.7 match 16 stars 4.86 score 2 scripts 1 dependents

johnaponte

convdistr:Convolute Probabilistic Distributions

Convolute probabilistic distributions using the random generator function of each distribution. A new random number generator function is created that perform the mathematical operation on the individual random samples from the random generator function of each distribution. See the documentation for examples.

Maintained by Aponte John. Last updated 11 months ago.

1.9 match 2 stars 4.34 score 11 scripts

prabhanjan-tattar

ACSWR:A Companion Package for the Book "A Course in Statistics with R"

A book designed to meet the requirements of masters students. Tattar, P.N., Suresh, R., and Manjunath, B.G. "A Course in Statistics with R", J. Wiley, ISBN 978-1-119-15272-9.

Maintained by Prabhanjan Tattar. Last updated 10 years ago.

4.0 match 2.03 score 106 scripts

tanaylab

tglkmeans:Efficient Implementation of K-Means++ Algorithm

Efficient implementation of K-Means++ algorithm. For more information see (1) "kmeans++ the advantages of the k-means++ algorithm" by David Arthur and Sergei Vassilvitskii (2007), Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp. 1027-1035, and (2) "The Effectiveness of Lloyd-Type Methods for the k-Means Problem" by Rafail Ostrovsky, Yuval Rabani, Leonard J. Schulman and Chaitanya Swamy <doi:10.1145/2395116.2395117>.

Maintained by Aviezer Lifshitz. Last updated 2 months ago.

algorithms-implemented kmeans cpp

1.5 match 7 stars 5.35 score 16 scripts

hanjunwei-lab

iPRISM:Intelligent Predicting Response to Cancer Immunotherapy Through Systematic Modeling

Immunotherapy has revolutionized cancer treatment, but predicting patient response remains challenging. Here, we presented Intelligent Predicting Response to cancer Immunotherapy through Systematic Modeling (iPRISM), a novel network-based model that integrates multiple data types to predict immunotherapy outcomes. It incorporates gene expression, biological functional network, tumor microenvironment characteristics, immune-related pathways, and clinical data to provide a comprehensive view of factors influencing immunotherapy efficacy. By identifying key genetic and immunological factors, it provides an insight for more personalized treatment strategies and combination therapies to overcome resistance mechanisms.

Maintained by Junwei Han. Last updated 8 months ago.

4.0 match 2.00 score

rivolli

utiml:Utilities for Multi-Label Learning

Multi-label learning strategies and others procedures to support multi- label classification in R. The package provides a set of multi-label procedures such as sampling methods, transformation strategies, threshold functions, pre-processing techniques and evaluation metrics. A complete overview of the matter can be seen in Zhang, M. and Zhou, Z. (2014) <doi:10.1109/TKDE.2013.39> and Gibaja, E. and Ventura, S. (2015) A Tutorial on Multi-label Learning.

Maintained by Adriano Rivolli. Last updated 4 years ago.

1.3 match 28 stars 6.39 score 87 scripts

troyhernandez

tinyspotifyr:Tinyverse R Wrapper for the 'Spotify' Web API

An R wrapper for the 'Spotify' Web API <https://developer.spotify.com/web-api/>.

Maintained by Troy Hernandez. Last updated 1 years ago.

1.7 match 13 stars 4.81 score 5 scripts

cran

comsimitv:Flexible Framework for Simulating Community Assembly

Flexible framework for trait-based simulation of community assembly, where components could be replaced by user-defined function and that allows variation of traits within species.

Maintained by Zoltan Botta-Dukat. Last updated 4 years ago.

3.9 match 2.00 score 3 scripts

astamm

fdacluster:Joint Clustering and Alignment of Functional Data

Implementations of the k-means, hierarchical agglomerative and DBSCAN clustering methods for functional data which allows for jointly aligning and clustering curves. It supports functional data defined on one-dimensional domains but possibly evaluating in multivariate codomains. It supports functional data defined in arrays but also via the 'fd' and 'funData' classes for functional data defined in the 'fda' and 'funData' packages respectively. It currently supports shift, dilation and affine warping functions for functional data defined on the real line and uses the SRVF framework to handle boundary-preserving warping for functional data defined on a specific interval. Main reference for the k-means algorithm: Sangalli L.M., Secchi P., Vantini S., Vitelli V. (2010) "k-mean alignment for curve clustering" <doi:10.1016/j.csda.2009.12.008>. Main reference for the SRVF framework: Tucker, J. D., Wu, W., & Srivastava, A. (2013) "Generative models for functional data using phase and amplitude separation" <doi:10.1016/j.csda.2012.12.001>.

Maintained by Aymeric Stamm. Last updated 2 months ago.

openblas cpp openmp

1.3 match 5 stars 6.14 score 31 scripts 1 dependents

prabhanjan-tattar

gpk:100 Data Sets for Statistics Education

Collection of datasets as prepared by Profs. A.P. Gore, S.A. Paranjape, and M.B. Kulkarni of Department of Statistics, Poona University, India. With their permission, first letter of their names forms the name of this package, the package has been built by me and made available for the benefit of R users. This collection requires a rich class of models and can be a very useful building block for a beginner.

Maintained by Prabhanjan Tattar. Last updated 12 years ago.

4.5 match 1.69 score 49 scripts

cran

mix:Estimation/Multiple Imputation for Mixed Categorical and Continuous Data

Estimation/multiple imputation programs for mixed categorical and continuous data.

Maintained by Brian Ripley. Last updated 3 months ago.

fortran

1.8 match 2 stars 4.21 score 5 dependents

bioc

rqubic:Qualitative biclustering algorithm for expression data analysis in R

This package implements the QUBIC algorithm introduced by Li et al. for the qualitative biclustering with gene expression data.

Maintained by Jitao David Zhang. Last updated 5 months ago.

clustering

2.0 match 3.78 score 4 scripts 1 dependents

cran

GNAR:Methods for Fitting Network Time Series Models

Simulation of, and fitting models for, Generalised Network Autoregressive (GNAR) time series models which take account of network structure, potentially with exogenous variables. Such models are described in Knight et al. (2020) <doi:10.18637/jss.v096.i05> and Nason and Wei (2021) <doi:10.1111/rssa.12875>. Diagnostic tools for GNAR(X) models can be found in Nason et al. (2023) <doi:10.48550/arXiv.2312.00530>.

Maintained by Matt Nunes. Last updated 6 months ago.

5.8 match 2 stars 1.30 score

mikejohnson51

AHGestimation:An R package for Computing Robust, Mass Preserving Hydraulic Geometries and Rating Curves

Compute mass preserving 'At a station Hydraulic Geometry' (AHG) fits from river measurements.

Maintained by Mike Johnson. Last updated 3 months ago.

1.5 match 6 stars 5.02 score 10 scripts

jonathanlees

RSEIS:Seismic Time Series Analysis Tools

Multiple interactive codes to view and analyze seismic data, via spectrum analysis, wavelet transforms, particle motion, hodograms. Includes general time-series tools, plotting, filtering, interactive display.

Maintained by Jonathan M. Lees. Last updated 6 months ago.

1.8 match 3 stars 4.27 score 262 scripts 4 dependents