R-universe search: ranking

schochastics

netrankr:Analyzing Partial Rankings in Networks

Implements methods for centrality related analyses of networks. While the package includes the possibility to build more than 20 indices, its main focus lies on index-free assessment of centrality via partial rankings obtained by neighborhood-inclusion or positional dominance. These partial rankings can be analyzed with different methods, including probabilistic methods like computing expected node ranks and relative rank probabilities (how likely is it that a node is more central than another?). The methodology is described in depth in the vignettes and in Schoch (2018) <doi:10.1016/j.socnet.2017.12.003>.

Maintained by David Schoch. Last updated 1 months ago.

network-analysis network-centrality openblas cpp openmp

35.8 match 49 stars 9.56 score 91 scripts 2 dependents

hturner

PlackettLuce:Plackett-Luce Models for Rankings

Functions to prepare rankings data and fit the Plackett-Luce model jointly attributed to Plackett (1975) <doi:10.2307/2346567> and Luce (1959, ISBN:0486441369). The standard Plackett-Luce model is generalized to accommodate ties of any order in the ranking. Partial rankings, in which only a subset of items are ranked in each ranking, are also accommodated in the implementation. Disconnected/weakly connected networks implied by the rankings may be handled by adding pseudo-rankings with a hypothetical item. Optionally, a multivariate normal prior may be set on the log-worth parameters and ranker reliabilities may be incorporated as proposed by Raman and Joachims (2014) <doi:10.1145/2623330.2623654>. Maximum a posteriori estimation is used when priors are set. Methods are provided to estimate standard errors or quasi-standard errors for inference as well as to fit Plackett-Luce trees. See the package website or vignette for further details.

Maintained by Heather Turner. Last updated 2 years ago.

plackett-luce-models preferences ranking rankings-data statistical-models

41.9 match 20 stars 7.97 score 86 scripts 3 dependents

statnet

ergm.rank:Fit, Simulate and Diagnose Exponential-Family Models for Rank-Order Relational Data

A set of extensions for the 'ergm' package to fit weighted networks whose edge weights are ranks. See Krivitsky and Butts (2017) <doi:10.1177/0081175017692623> and Krivitsky, Hunter, Morris, and Klumb (2023) <doi:10.18637/jss.v105.i06>.

Maintained by Pavel N. Krivitsky. Last updated 4 months ago.

46.6 match 4 stars 6.98 score 9 scripts 1 dependents

ocbe-uio

BayesMallows:Bayesian Preference Learning with the Mallows Rank Model

An implementation of the Bayesian version of the Mallows rank model (Vitelli et al., Journal of Machine Learning Research, 2018 <https://jmlr.org/papers/v18/15-481.html>; Crispino et al., Annals of Applied Statistics, 2019 <doi:10.1214/18-AOAS1203>; Sorensen et al., R Journal, 2020 <doi:10.32614/RJ-2020-026>; Stein, PhD Thesis, 2023 <https://eprints.lancs.ac.uk/id/eprint/195759>). Both Metropolis-Hastings and sequential Monte Carlo algorithms for estimating the models are available. Cayley, footrule, Hamming, Kendall, Spearman, and Ulam distances are supported in the models. The rank data to be analyzed can be in the form of complete rankings, top-k rankings, partially missing rankings, as well as consistent and inconsistent pairwise preferences. Several functions for plotting and studying the posterior distributions of parameters are provided. The package also provides functions for estimating the partition function (normalizing constant) of the Mallows rank model, both with the importance sampling algorithm of Vitelli et al. and asymptotic approximation with the IPFP algorithm (Mukherjee, Annals of Statistics, 2016 <doi:10.1214/15-AOS1389>).

Maintained by Oystein Sorensen. Last updated 1 months ago.

mallows-model openblas cpp openmp

39.9 match 21 stars 7.91 score 36 scripts 1 dependents

ropensci

taxize:Taxonomic Information from Around the Web

Interacts with a suite of web application programming interfaces (API) for taxonomic tasks, such as getting database specific taxonomic identifiers, verifying species names, getting taxonomic hierarchies, fetching downstream and upstream taxonomic names, getting taxonomic synonyms, converting scientific to common names and vice versa, and more. Some of the services supported include 'NCBI E-utilities' (<https://www.ncbi.nlm.nih.gov/books/NBK25501/>), 'Encyclopedia of Life' (<https://eol.org/docs/what-is-eol/data-services>), 'Global Biodiversity Information Facility' (<https://techdocs.gbif.org/en/openapi/>), and many more. Links to the API documentation for other supported services are available in the documentation for their respective functions in this package.

Maintained by Zachary Foster. Last updated 12 days ago.

taxonomy biology nomenclature json api web api-client identifiers species names api-wrapper biodiversity darwincore data taxize

22.2 match 274 stars 13.63 score 1.6k scripts 23 dependents

snoweye

pbdMPI:R Interface to MPI for HPC Clusters (Programming with Big Data Project)

A simplified, efficient, interface to MPI for HPC clusters. It is a derivation and rethinking of the Rmpi package. pbdMPI embraces the prevalent parallel programming style on HPC clusters. Beyond the interface, a collection of functions for global work with distributed data and resource-independent RNG reproducibility is included. It is based on S4 classes and methods.

Maintained by Wei-Chen Chen. Last updated 6 months ago.

openmpi

41.1 match 2 stars 7.11 score 179 scripts 3 dependents

tyee001

VGAM:Vector Generalized Linear and Additive Models

An implementation of about 6 major classes of statistical regression models. The central algorithm is Fisher scoring and iterative reweighted least squares. At the heart of this package are the vector generalized linear and additive model (VGLM/VGAM) classes. VGLMs can be loosely thought of as multivariate GLMs. VGAMs are data-driven VGLMs that use smoothing. The book "Vector Generalized Linear and Additive Models: With an Implementation in R" (Yee, 2015) <DOI:10.1007/978-1-4939-2818-7> gives details of the statistical framework and the package. Currently only fixed-effects models are implemented. Many (100+) models and distributions are estimated by maximum likelihood estimation (MLE) or penalized MLE. The other classes are RR-VGLMs (reduced-rank VGLMs), quadratic RR-VGLMs, doubly constrained RR-VGLMs, quadratic RR-VGLMs, reduced-rank VGAMs, RCIMs (row-column interaction models)---these classes perform constrained and unconstrained quadratic ordination (CQO/UQO) models in ecology, as well as constrained additive ordination (CAO). Hauck-Donner effect detection is implemented. Note that these functions are subject to change; see the NEWS and ChangeLog files for latest changes.

Maintained by Thomas Yee. Last updated 1 months ago.

fortran

19.0 match 10 stars 10.67 score 3.6k scripts 169 dependents

harrelfe

Hmisc:Harrell Miscellaneous

Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, simulation, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, recoding variables, caching, simplified parallel computing, encrypting and decrypting data using a safe workflow, general moving window statistical estimation, and assistance in interpreting principal component analysis.

Maintained by Frank E Harrell Jr. Last updated 2 days ago.

fortran

11.2 match 210 stars 17.61 score 17k scripts 750 dependents

modal-inria

Rankcluster:Model-Based Clustering for Multivariate Partial Ranking Data

Implementation of a model-based clustering algorithm for ranking data (C. Biernacki, J. Jacques (2013) <doi:10.1016/j.csda.2012.08.008>). Multivariate rankings as well as partial rankings are taken into account. This algorithm is based on an extension of the Insertion Sorting Rank (ISR) model for ranking data, which is a meaningful and effective model parametrized by a position parameter (the modal ranking, quoted by mu) and a dispersion parameter (quoted by pi). The heterogeneity of the rank population is modelled by a mixture of ISR, whereas conditional independence assumption is considered for multivariate rankings.

Maintained by Quentin Grimonprez. Last updated 2 years ago.

clustering hacktoberfest rank cpp

51.3 match 1 stars 3.74 score 37 scripts 1 dependents

andrisignorell

DescTools:Tools for Descriptive Statistics

A collection of miscellaneous basic statistic functions and convenience wrappers for efficiently describing data. The author's intention was to create a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results. The package contains furthermore functions to produce documents using MS Word (or PowerPoint) and functions to import data from Excel. Many of the included functions can be found scattered in other packages and other sources written partly by Titans of R. The reason for collecting them here, was primarily to have them consolidated in ONE instead of dozens of packages (which themselves might depend on other packages which are not needed at all), and to provide a common and consistent interface as far as function and arguments naming, NA handling, recycling rules etc. are concerned. Google style guides were used as naming rules (in absence of convincing alternatives). The 'BigCamelCase' style was consequently applied to functions borrowed from contributed R packages as well.

Maintained by Andri Signorell. Last updated 9 days ago.

fortran cpp

11.4 match 87 stars 16.68 score 7.7k scripts 99 dependents

cran

PMCMRplus:Calculate Pairwise Multiple Comparisons of Mean Rank Sums Extended

For one-way layout experiments the one-way ANOVA can be performed as an omnibus test. All-pairs multiple comparisons tests (Tukey-Kramer test, Scheffe test, LSD-test) and many-to-one tests (Dunnett test) for normally distributed residuals and equal within variance are available. Furthermore, all-pairs tests (Games-Howell test, Tamhane's T2 test, Dunnett T3 test, Ury-Wiggins-Hochberg test) and many-to-one (Tamhane-Dunnett Test) for normally distributed residuals and heterogeneous variances are provided. Van der Waerden's normal scores test for omnibus, all-pairs and many-to-one tests is provided for non-normally distributed residuals and homogeneous variances. The Kruskal-Wallis, BWS and Anderson-Darling omnibus test and all-pairs tests (Nemenyi test, Dunn test, Conover test, Dwass-Steele-Critchlow- Fligner test) as well as many-to-one (Nemenyi test, Dunn test, U-test) are given for the analysis of variance by ranks. Non-parametric trend tests (Jonckheere test, Cuzick test, Johnson-Mehrotra test, Spearman test) are included. In addition, a Friedman-test for one-way ANOVA with repeated measures on ranks (CRBD) and Skillings-Mack test for unbalanced CRBD is provided with consequent all-pairs tests (Nemenyi test, Siegel test, Miller test, Conover test, Exact test) and many-to-one tests (Nemenyi test, Demsar test, Exact test). A trend can be tested with Pages's test. Durbin's test for a two-way balanced incomplete block design (BIBD) is given in this package as well as Gore's test for CRBD with multiple observations per cell is given. Outlier tests, Mandel's k- and h statistic as well as functions for Type I error and Power analysis as well as generic summary, print and plot methods are provided.

Maintained by Thorsten Pohlert. Last updated 6 months ago.

fortran

25.1 match 6 stars 7.24 score 12 dependents

mjskay

ARTool:Aligned Rank Transform

The aligned rank transform for nonparametric factorial ANOVAs as described by Wobbrock, Findlater, Gergle, and Higgins (2011) <doi:10.1145/1978942.1978963>. Also supports aligned rank transform contrasts as described by Elkin, Kay, Higgins, and Wobbrock (2021) <doi:10.1145/3472749.3474784>.

Maintained by Matthew Kay. Last updated 3 years ago.

20.4 match 60 stars 8.64 score 307 scripts

r-lib

bit64:A S3 Class for Vectors of 64bit Integers

Package 'bit64' provides serializable S3 atomic 64bit (signed) integers. These are useful for handling database keys and exact counting in +-2^63. WARNING: do not use them as replacement for 32bit integers, integer64 are not supported for subscripting by R-core and they have different semantics when combined with double, e.g. integer64 + double => integer64. Class integer64 can be used in vectors, matrices, arrays and data.frames. Methods are available for coercion from and to logicals, integers, doubles, characters and factors as well as many elementwise and summary functions. Many fast algorithmic operations such as 'match' and 'order' support inter- active data exploration and manipulation and optionally leverage caching.

Maintained by Michael Chirico. Last updated 3 days ago.

11.7 match 35 stars 14.91 score 1.5k scripts 3.2k dependents

happma

pseudorank:Pseudo-Ranks

Efficient calculation of pseudo-ranks and (pseudo)-rank based test statistics. In case of equal sample sizes, pseudo-ranks and mid-ranks are equal. When used for inference mid-ranks may lead to paradoxical results. Pseudo-ranks are in general not affected by such a problem. See Happ et al. (2020, <doi:10.18637/jss.v095.c01>) for details.

Maintained by Martin Happ. Last updated 26 days ago.

cpp nonparametric nonparametric-statistics pseudo-rank pseudo-ranks rank rank-tests trend-test cpp

44.5 match 3 stars 3.71 score 17 scripts

antdambr

ConsRank:Compute the Median Ranking(s) According to the Kemeny's Axiomatic Approach

Compute the median ranking according to the Kemeny's axiomatic approach. Rankings can or cannot contain ties, rankings can be both complete or incomplete. The package contains both branch-and-bound algorithms and heuristic solutions recently proposed. The searching space of the solution can either be restricted to the universe of the permutations or unrestricted to all possible ties. The package also provide some useful utilities for deal with preference rankings, including both element-weight Kemeny distance and correlation coefficient. This release declare as deprecated some functions that are still in the package for compatibility. Next release will not contains these functions. Please type '?ConsRank-deprecated' Essential references: Emond, E.J., and Mason, D.W. (2002) <doi:10.1002/mcda.313>; D'Ambrosio, A., Amodio, S., and Iorio, C. (2015) <doi:10.1285/i20705948v8n2p198>; Amodio, S., D'Ambrosio, A., and Siciliano R. (2016) <doi:10.1016/j.ejor.2015.08.048>; D'Ambrosio, A., Mazzeo, G., Iorio, C., and Siciliano, R. (2017) <doi:10.1016/j.cor.2017.01.017>; Albano, A., and Plaia, A. (2021) <doi:10.1285/i20705948v14n1p117>.

Maintained by Antonio DAmbrosio. Last updated 1 years ago.

41.1 match 3 stars 3.97 score 24 scripts 8 dependents

danielwilhelm

csranks:Statistical Tools for Ranks

Account for uncertainty when working with ranks. Estimate standard errors consistently in linear regression with ranked variables. Construct confidence sets of various kinds for positions of populations in a ranking based on values of a certain feature and their estimation errors. Theory based on Mogstad, Romano, Shaikh, and Wilhelm (2023)<doi:10.1093/restud/rdad006> and Chetverikov and Wilhelm (2023) <arXiv:2310.15512>.

Maintained by Daniel Wilhelm. Last updated 6 months ago.

31.4 match 5 stars 5.18 score 7 scripts

hugaped

MBNMAdose:Dose-Response MBNMA Models

Fits Bayesian dose-response model-based network meta-analysis (MBNMA) that incorporate multiple doses within an agent by modelling different dose-response functions, as described by Mawdsley et al. (2016) <doi:10.1002/psp4.12091>. By modelling dose-response relationships this can connect networks of evidence that might otherwise be disconnected, and can improve precision on treatment estimates. Several common dose-response functions are provided; others may be added by the user. Various characteristics and assumptions can be flexibly added to the models, such as shared class effects. The consistency of direct and indirect evidence in the network can be assessed using unrelated mean effects models and/or by node-splitting at the treatment level.

Maintained by Hugo Pedder. Last updated 1 months ago.

jags cpp

24.1 match 10 stars 6.60 score

eheinzen

elo:Ranking Teams by Elo Rating and Comparable Methods

A flexible framework for calculating Elo ratings and resulting rankings of any two-team-per-matchup system (chess, sports leagues, 'Go', etc.). This implementation is capable of evaluating a variety of matchups, Elo rating updates, and win probabilities, all based on the basic Elo rating system. It also includes methods to benchmark performance, including logistic regression and Markov chain models.

Maintained by Ethan Heinzen. Last updated 1 years ago.

elo elo-rating logistic-regression markov-chain markov-model ranking sports-analytics cpp

22.3 match 37 stars 7.05 score 153 scripts

bioc

UCell:Rank-based signature enrichment analysis for single-cell data

UCell is a package for evaluating gene signatures in single-cell datasets. UCell signature scores, based on the Mann-Whitney U statistic, are robust to dataset size and heterogeneity, and their calculation demands less computing time and memory than other available methods, enabling the processing of large datasets in a few minutes even on machines with limited computing power. UCell can be applied to any single-cell data matrix, and includes functions to directly interact with SingleCellExperiment and Seurat objects.

Maintained by Massimo Andreatta. Last updated 5 months ago.

singlecell genesetenrichment transcriptomics geneexpression cellbasedassays

14.4 match 143 stars 10.43 score 454 scripts 2 dependents

hugaped

MBNMAtime:Run Time-Course Model-Based Network Meta-Analysis (MBNMA) Models

Fits Bayesian time-course models for model-based network meta-analysis (MBNMA) that allows inclusion of multiple time-points from studies. Repeated measures over time are accounted for within studies by applying different time-course functions, following the method of Pedder et al. (2019) <doi:10.1002/jrsm.1351>. The method allows synthesis of studies with multiple follow-up measurements that can account for time-course for a single or multiple treatment comparisons. Several general time-course functions are provided; others may be added by the user. Various characteristics can be flexibly added to the models, such as correlation between time points and shared class effects. The consistency of direct and indirect evidence in the network can be assessed using unrelated mean effects models and/or by node-splitting.

Maintained by Hugo Pedder. Last updated 1 months ago.

jags cpp

24.0 match 7 stars 6.10 score

gvalentini58

RANKS:Ranking of Nodes with Kernelized Score Functions

Implementation of Kernelized score functions and other semi-supervised learning algorithms for node label ranking to analyze biomolecular networks. RANKS can be easily applied to a large set of different relevant problems in computational biology, ranging from automatic protein function prediction, to gene disease prioritization and drug repositioning, and more in general to any bioinformatics problem that can be formalized as a node label ranking problem in a graph. The modular nature of the implementation allows to experiment with different score functions and kernels and to easily compare the results with baseline network-based methods such as label propagation and random walk algorithms, as well as to enlarge the algorithmic scheme by adding novel user-defined score functions and kernels.

Maintained by Giorgio Valentini. Last updated 2 years ago.

76.2 match 1.90 score 7 scripts

nceas

codyn:Community Dynamics Metrics

Univariate and multivariate temporal and spatial diversity indices, rank abundance curves, and community stability measures. The functions implement measures that are either explicitly temporal and include the option to calculate them over multiple replicates, or spatial and include the option to calculate them over multiple time points. Functions fall into five categories: static diversity indices, temporal diversity indices, spatial diversity indices, rank abundance curves, and community stability measures. The diversity indices are temporal and spatial analogs to traditional diversity indices. Specifically, the package includes functions to calculate community richness, evenness and diversity at a given point in space and time. In addition, it contains functions to calculate species turnover, mean rank shifts, and lags in community similarity between two time points. Details of the methods are available in Hallett et al. (2016) <doi:10.1111/2041-210X.12569> and Avolio et al. (2019) <doi:10.1002/ecs2.2881>.

Maintained by Matthew B. Jones. Last updated 4 years ago.

16.0 match 34 stars 9.07 score 230 scripts

neurodata

lolR:Linear Optimal Low-Rank Projection

Supervised learning techniques designed for the situation when the dimensionality exceeds the sample size have a tendency to overfit as the dimensionality of the data increases. To remedy this High dimensionality; low sample size (HDLSS) situation, we attempt to learn a lower-dimensional representation of the data before learning a classifier. That is, we project the data to a situation where the dimensionality is more manageable, and then are able to better apply standard classification or clustering techniques since we will have fewer dimensions to overfit. A number of previous works have focused on how to strategically reduce dimensionality in the unsupervised case, yet in the supervised HDLSS regime, few works have attempted to devise dimensionality reduction techniques that leverage the labels associated with the data. In this package and the associated manuscript Vogelstein et al. (2017) <arXiv:1709.01233>, we provide several methods for feature extraction, some utilizing labels and some not, along with easily extensible utilities to simplify cross-validative efforts to identify the best feature extraction method. Additionally, we include a series of adaptable benchmark simulations to serve as a standard for future investigative efforts into supervised HDLSS. Finally, we produce a comprehensive comparison of the included algorithms across a range of benchmark simulations and real data applications.

Maintained by Eric Bridgeford. Last updated 4 years ago.

19.8 match 20 stars 7.28 score 80 scripts

rstudio

sortable:Drag-and-Drop in 'shiny' Apps with 'SortableJS'

Enables drag-and-drop behaviour in Shiny apps, by exposing the functionality of the 'SortableJS' <https://sortablejs.github.io/Sortable/> JavaScript library as an 'htmlwidget'. You can use this in Shiny apps and widgets, 'learnr' tutorials as well as R Markdown. In addition, provides a custom 'learnr' question type - 'question_rank()' - that allows ranking questions with drag-and-drop.

Maintained by Andrie de Vries. Last updated 6 months ago.

htmlwidget

12.2 match 135 stars 11.62 score 368 scripts 13 dependents

tidyverse

dplyr:A Grammar of Data Manipulation

A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

Maintained by Hadley Wickham. Last updated 12 days ago.

data-manipulation grammar cpp

5.5 match 4.8k stars 24.68 score 659k scripts 7.8k dependents

kassambara

survminer:Drawing Survival Curves using 'ggplot2'

Contains the function 'ggsurvplot()' for drawing easily beautiful and 'ready-to-publish' survival curves with the 'number at risk' table and 'censoring count plot'. Other functions are also available to plot adjusted curves for `Cox` model and to visually examine 'Cox' model assumptions.

Maintained by Alboukadel Kassambara. Last updated 5 months ago.

8.1 match 524 stars 15.87 score 7.0k scripts 55 dependents

stan-dev

projpred:Projection Predictive Feature Selection

Performs projection predictive feature selection for generalized linear models (Piironen, Paasiniemi, and Vehtari, 2020, <doi:10.1214/20-EJS1711>) with or without multilevel or additive terms (Catalina, Bürkner, and Vehtari, 2022, <https://proceedings.mlr.press/v151/catalina22a.html>), for some ordinal and nominal regression models (Weber, Glass, and Vehtari, 2023, <arXiv:2301.01660>), and for many other regression models (using the latent projection by Catalina, Bürkner, and Vehtari, 2021, <arXiv:2109.04702>, which can also be applied to most of the former models). The package is compatible with the 'rstanarm' and 'brms' packages, but other reference models can also be used. See the vignettes and the documentation for more information and examples.

Maintained by Frank Weber. Last updated 1 months ago.

bayes bayesian bayesian-inference rstanarm stan statistics variable-selection openblas cpp

12.1 match 112 stars 10.08 score 241 scripts

agrdatasci

gosset:Tools for Data Analysis in Experimental Agriculture

Methods to analyse experimental agriculture data, from data synthesis to model selection and visualisation. The package is named after W.S. Gosset aka ‘Student’, a pioneer of modern statistics in small sample experimental design and analysis.

Maintained by Kauê de Sousa. Last updated 3 months ago.

experimental-design rankings-data

18.5 match 6 stars 6.44 score 23 scripts

pearce790

rankrate:Joint Statistical Models for Preference Learning with Rankings and Ratings

Statistical tools for the Mallows-Binomial model, the first joint statistical model for preference learning for rankings and ratings. This project was supported by the National Science Foundation under Grant No. 2019901.

Maintained by Michael Pearce. Last updated 10 months ago.

28.5 match 4.00 score 6 scripts

bluefoxr

COINr:Composite Indicator Construction and Analysis

A comprehensive high-level package, for composite indicator construction and analysis. It is a "development environment" for composite indicators and scoreboards, which includes utilities for construction (indicator selection, denomination, imputation, data treatment, normalisation, weighting and aggregation) and analysis (multivariate analysis, correlation plotting, short cuts for principal component analysis, global sensitivity analysis, and more). A composite indicator is completely encapsulated inside a single hierarchical list called a "coin". This allows a fast and efficient work flow, as well as making quick copies, testing methodological variations and making comparisons. It also includes many plotting options, both statistical (scatter plots, distribution plots) as well as for presenting results.

Maintained by William Becker. Last updated 2 months ago.

12.6 match 26 stars 9.07 score 73 scripts 1 dependents

jwood000

RcppAlgos:High Performance Tools for Combinatorics and Computational Mathematics

Provides optimized functions and flexible iterators implemented in C++ for solving problems in combinatorics and computational mathematics. Handles various combinatorial objects including combinations, permutations, integer partitions and compositions, Cartesian products, unordered Cartesian products, and partition of groups. Utilizes the RMatrix class from 'RcppParallel' for thread safety. The combination and permutation functions contain constraint parameters that allow for generation of all results of a vector meeting specific criteria (e.g. finding all combinations such that the sum is between two bounds). Capable of ranking/unranking combinatorial objects efficiently (e.g. retrieve only the nth lexicographical result) which sets up nicely for parallelization as well as random sampling. Gmp support permits exploration where the total number of results is large (e.g. comboSample(10000, 500, n = 4)). Additionally, there are several high performance number theoretic functions that are useful for problems common in computational mathematics. Some of these functions make use of the fast integer division library 'libdivide'. The primeSieve function is based on the segmented sieve of Eratosthenes implementation by Kim Walisch. It is also efficient for large numbers by using the cache friendly improvements originally developed by Tomás Oliveira. Finally, there is a prime counting function that implements Legendre's formula based on the work of Kim Walisch.

Maintained by Joseph Wood. Last updated 1 months ago.

combinations combinatorics factorization number-theory parallel permutation prime-factorizations primesieve gmp cpp

11.2 match 45 stars 10.04 score 153 scripts 12 dependents

microsoft

wpa:Tools for Analysing and Visualising Viva Insights Data

Opinionated functions that enable easier and faster analysis of Viva Insights data. There are three main types of functions in 'wpa': (i) Standard functions create a 'ggplot' visual or a summary table based on a specific Viva Insights metric; (2) Report Generation functions generate HTML reports on a specific analysis area, e.g. Collaboration; (3) Other miscellaneous functions cover more specific applications (e.g. Subject Line text mining) of Viva Insights data. This package adheres to 'tidyverse' principles and works well with the pipe syntax. 'wpa' is built with the beginner-to-intermediate R users in mind, and is optimised for simplicity.

Maintained by Martin Chan. Last updated 4 months ago.

workplace-analytics

16.7 match 30 stars 6.69 score 39 scripts 1 dependents

pdhoff

amen:Additive and Multiplicative Effects Models for Networks and Relational Data

Analysis of dyadic network and relational data using additive and multiplicative effects (AME) models. The basic model includes regression terms, the covariance structure of the social relations model (Warner, Kenny and Stoto (1979) <DOI:10.1037/0022-3514.37.10.1742>, Wong (1982) <DOI:10.2307/2287296>), and multiplicative factor models (Hoff(2009) <DOI:10.1007/s10588-008-9040-4>). Several different link functions accommodate different relational data structures, including binary/network data, normal relational data, zero-inflated positive outcomes using a tobit model, ordinal relational data and data from fixed-rank nomination schemes. Several of these link functions are discussed in Hoff, Fosdick, Volfovsky and Stovel (2013) <DOI:10.1017/nws.2013.17>. Development of this software was supported in part by NIH grant R01HD067509.

Maintained by Peter Hoff. Last updated 4 years ago.

16.3 match 28 stars 6.81 score 153 scripts

pilaboratory

sads:Maximum Likelihood Models for Species Abundance Distributions

Maximum likelihood tools to fit and compare models of species abundance distributions and of species rank-abundance distributions.

Maintained by Paulo I. Prado. Last updated 1 years ago.

12.8 match 23 stars 8.66 score 244 scripts 3 dependents

bioc

phyloseq:Handling and analysis of high-throughput microbiome census data

phyloseq provides a set of classes and tools to facilitate the import, storage, analysis, and graphical display of microbiome census data.

Maintained by Paul J. McMurdie. Last updated 5 months ago.

immunooncology sequencing microbiome metagenomics clustering classification multiplecomparison geneticvariability

7.8 match 597 stars 13.90 score 8.4k scripts 37 dependents

bioc

PhyloProfile:PhyloProfile

PhyloProfile is a tool for exploring complex phylogenetic profiles. Phylogenetic profiles, presence/absence patterns of genes over a set of species, are commonly used to trace the functional and evolutionary history of genes across species and time. With PhyloProfile we can enrich regular phylogenetic profiles with further data like sequence/structure similarity, to make phylogenetic profiling more meaningful. Besides the interactive visualisation powered by R-Shiny, the package offers a set of further analysis features to gain insights like the gene age estimation or core gene identification.

Maintained by Vinh Tran. Last updated 6 days ago.

software visualization datarepresentation multiplecomparison functionalprediction dimensionreduction bioinformatics heatmap interactive-visualizations orthologs phylogenetic-profile shiny

13.9 match 33 stars 7.77 score 10 scripts

nflverse

nflreadr:Download 'nflverse' Data

A minimal package for downloading data from 'GitHub' repositories of the 'nflverse' project.

Maintained by Tan Ho. Last updated 4 months ago.

nfl nflfastr nflverse sports-data

8.6 match 66 stars 12.46 score 476 scripts 10 dependents

bioc

RankProd:Rank Product method for identifying differentially expressed genes with application in meta-analysis

Non-parametric method for identifying differentially expressed (up- or down- regulated) genes based on the estimated percentage of false predictions (pfp). The method can combine data sets from different origins (meta-analysis) to increase the power of the identification.

Maintained by Francesco Del Carratore. Last updated 5 months ago.

differentialexpression statisticalmethod software researchfield metabolomics lipidomics proteomics systemsbiology geneexpression microarray genesignaling

16.4 match 6.46 score 81 scripts 6 dependents

asalavaty

influential:Identification and Classification of the Most Influential Nodes

Contains functions for the classification and ranking of top candidate features, reconstruction of networks from adjacency matrices and data frames, analysis of the topology of the network and calculation of centrality measures, and identification of the most influential nodes. Also, a function is provided for running SIRIR model, which is the combination of leave-one-out cross validation technique and the conventional SIR model, on a network to unsupervisedly rank the true influence of vertices. Additionally, some functions have been provided for the assessment of dependence and correlation of two network centrality measures as well as the conditional probability of deviation from their corresponding means in opposite direction. Fred Viole and David Nawrocki (2013, ISBN:1490523995). Csardi G, Nepusz T (2006). "The igraph software package for complex network research." InterJournal, Complex Systems, 1695. Adopted algorithms and sources are referenced in function document.

Maintained by Adrian Salavaty. Last updated 5 months ago.

centrality-measures classification-model influence-ranking network-analysis priaritization-model

16.0 match 27 stars 6.54 score 43 scripts 1 dependents

braverock

PortfolioAnalytics:Portfolio Analysis, Including Numerical Methods for Optimization of Portfolios

Portfolio optimization and analysis routines and graphics.

Maintained by Brian G. Peterson. Last updated 3 months ago.

9.0 match 81 stars 11.49 score 626 scripts 2 dependents

grunwaldlab

metacoder:Tools for Parsing, Manipulating, and Graphing Taxonomic Abundance Data

Reads, plots, and manipulates large taxonomic data sets, like those generated from modern high-throughput sequencing, such as metabarcoding (i.e. amplification metagenomics, 16S metagenomics, etc). It provides a tree-based visualization called "heat trees" used to depict statistics for every taxon in a taxonomy using color and size. It also provides various functions to do common tasks in microbiome bioinformatics on data in the 'taxmap' format defined by the 'taxa' package. The 'metacoder' package is described in the publication by Foster et al. (2017) <doi:10.1371/journal.pcbi.1005404>.

Maintained by Zachary Foster. Last updated 1 months ago.

community-diversity hierarchical metabarcoding pcr taxonomy trees cpp

10.6 match 140 stars 9.64 score 328 scripts

alexkz

kernlab:Kernel-Based Machine Learning Lab

Kernel-based machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction. Among other methods 'kernlab' includes Support Vector Machines, Spectral Clustering, Kernel PCA, Gaussian Processes and a QP solver.

Maintained by Alexandros Karatzoglou. Last updated 7 months ago.

openblas cpp

8.3 match 21 stars 12.26 score 7.8k scripts 487 dependents

rkoenker

quantreg:Quantile Regression

Estimation and inference methods for models for conditional quantile functions: Linear and nonlinear parametric and non-parametric (total variation penalized) models for conditional quantiles of a univariate response and several methods for handling censored survival data. Portfolio selection methods based on expected shortfall risk are also now included. See Koenker, R. (2005) Quantile Regression, Cambridge U. Press, <doi:10.1017/CBO9780511754098> and Koenker, R. et al. (2017) Handbook of Quantile Regression, CRC Press, <doi:10.1201/9781315120256>.

Maintained by Roger Koenker. Last updated 6 days ago.

fortran openblas

7.0 match 18 stars 13.93 score 2.6k scripts 1.5k dependents

alanarnholt

BSDA:Basic Statistics and Data Analysis

Data sets for book "Basic Statistics and Data Analysis" by Larry J. Kitchens.

Maintained by Alan T. Arnholt. Last updated 2 years ago.

10.6 match 7 stars 9.11 score 1.3k scripts 6 dependents

talgalili

dendextend:Extending 'dendrogram' Functionality in R

Offers a set of functions for extending 'dendrogram' objects in R, letting you visualize and compare trees of 'hierarchical clusterings'. You can (1) Adjust a tree's graphical parameters - the color, size, type, etc of its branches, nodes and labels. (2) Visually and statistically compare different 'dendrograms' to one another.

Maintained by Tal Galili. Last updated 2 months ago.

5.6 match 154 stars 17.02 score 6.0k scripts 164 dependents

chrisaddy

rrr:Reduced-Rank Regression

Reduced-rank regression, diagnostics and graphics.

Maintained by Chris Addy. Last updated 8 years ago.

18.6 match 10 stars 5.06 score 23 scripts

bioc

clustifyr:Classifier for Single-cell RNA-seq Using Cell Clusters

Package designed to aid in classifying cells from single-cell RNA sequencing data using external reference data (e.g., bulk RNA-seq, scRNA-seq, microarray, gene lists). A variety of correlation based methods and gene list enrichment methods are provided to assist cell type assignment.

Maintained by Rui Fu. Last updated 5 months ago.

singlecell annotation sequencing microarray geneexpression assign-identities clusters marker-genes rna-seq single-cell-rna-seq

9.6 match 119 stars 9.63 score 296 scripts

microsoft

vivainsights:Analyze and Visualize Data from 'Microsoft Viva Insights'

Provides a versatile range of functions, including exploratory data analysis, time-series analysis, organizational network analysis, and data validation, whilst at the same time implements a set of best practices in analyzing and visualizing data specific to 'Microsoft Viva Insights'.

Maintained by Martin Chan. Last updated 23 days ago.

15.0 match 11 stars 6.12 score 68 scripts

cran

mgcv:Mixed GAM Computation Vehicle with Automatic Smoothness Estimation

Generalized additive (mixed) models, some of their extensions and other generalized ridge regression with multiple smoothing parameter estimation by (Restricted) Marginal Likelihood, Generalized Cross Validation and similar, or using iterated nested Laplace approximation for fully Bayesian inference. See Wood (2017) <doi:10.1201/9781315370279> for an overview. Includes a gam() function, a wide variety of smoothers, 'JAGS' support and distributions beyond the exponential family.

Maintained by Simon Wood. Last updated 1 years ago.

openblas openmp

7.1 match 32 stars 12.71 score 17k scripts 7.8k dependents

bioc

ccfindR:Cancer Clone Finder

A collection of tools for cancer genomic data clustering analyses, including those for single cell RNA-seq. Cell clustering and feature gene selection analysis employ Bayesian (and maximum likelihood) non-negative matrix factorization (NMF) algorithm. Input data set consists of RNA count matrix, gene, and cell bar code annotations. Analysis outputs are factor matrices for multiple ranks and marginal likelihood values for each rank. The package includes utilities for downstream analyses, including meta-gene identification, visualization, and construction of rank-based trees for clusters.

Maintained by Jun Woo. Last updated 5 months ago.

transcriptomics singlecell immunooncology bayesian clustering gsl cpp

22.7 match 4.00 score 9 scripts

igraph

igraph:Network Analysis and Visualization

Routines for simple graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more.

Maintained by Kirill Müller. Last updated 2 days ago.

complex-networks graph-algorithms graph-theory mathematics network-analysis network-graph fortran libxml2 glpk openblas cpp

4.3 match 581 stars 21.10 score 31k scripts 1.9k dependents

jassler

socialranking:Social Ranking Solutions for Power Relations on Coalitions

The notion of power index has been widely used in literature to evaluate the influence of individual players (e.g., voters, political parties, nations, stockholders, etc.) involved in a collective decision situation like an electoral system, a parliament, a council, a management board, etc., where players may form coalitions. Traditionally this ranking is determined through numerical evaluation. More often than not however only ordinal data between coalitions is known. The package 'socialranking' offers a set of solutions to rank players based on a transitive ranking between coalitions, including through CP-Majority, ordinal Banzhaf or lexicographic excellence solution summarized by Tahar Allouche, Bruno Escoffier, Stefano Moretti and Meltem Öztürk (2020, <doi:10.24963/ijcai.2020/3>).

Maintained by Felix Fritz. Last updated 2 days ago.

17.5 match 6 stars 5.08 score 6 scripts

vegandevs

vegan:Community Ecology Package

Ordination methods, diversity analysis and other functions for community and vegetation ecologists.

Maintained by Jari Oksanen. Last updated 16 days ago.

ecological-modelling ecology ordination fortran openblas

4.6 match 472 stars 19.41 score 15k scripts 440 dependents

matthieustigler

tsDyn:Nonlinear Time Series Models with Regime Switching

Implements nonlinear autoregressive (AR) time series models. For univariate series, a non-parametric approach is available through additive nonlinear AR. Parametric modeling and testing for regime switching dynamics is available when the transition is either direct (TAR: threshold AR) or smooth (STAR: smooth transition AR, LSTAR). For multivariate series, one can estimate a range of TVAR or threshold cointegration TVECM models with two or three regimes. Tests can be conducted for TVAR as well as for TVECM (Hansen and Seo 2002 and Seo 2006).

Maintained by Matthieu Stigler. Last updated 5 months ago.

8.3 match 34 stars 10.56 score 684 scripts 3 dependents

pievos101

TopKSignal:A Convex Optimization Tool for Signal Reconstruction from Multiple Ranked Lists

A mathematical optimization procedure in combination with statistical bootstrap for the estimation of the latent signals (sometimes called scores) informing the global consensus ranking (often named aggregation ranking). To solve mid/large-scale problems, users should install the 'gurobi' optimiser (available from <https://www.gurobi.com/>).

Maintained by Bastian Pfeifer Maintainer. Last updated 1 years ago.

convex-optimization ranking-data signal-estimation

20.7 match 3 stars 4.18 score 1 scripts

r-hub

pkgsearch:Search and Query CRAN R Packages

Search CRAN metadata about packages by keyword, popularity, recent activity, package name and more. Uses the 'R-hub' search server, see <https://r-pkg.org> and the CRAN metadata database, that contains information about CRAN packages. Note that this is _not_ a CRAN project.

Maintained by Gábor Csárdi. Last updated 2 months ago.

ranking search-engine

10.0 match 109 stars 8.62 score 64 scripts 10 dependents

alexkowa

EnvStats:Package for Environmental Statistics, Including US EPA Guidance

Graphical and statistical analyses of environmental data, with focus on analyzing chemical concentrations and physical parameters, usually in the context of mandated environmental monitoring. Major environmental statistical methods found in the literature and regulatory guidance documents, with extensive help that explains what these methods do, how to use them, and where to find them in the literature. Numerous built-in data sets from regulatory guidance documents and environmental statistics literature. Includes scripts reproducing analyses presented in the book "EnvStats: An R Package for Environmental Statistics" (Millard, 2013, Springer, ISBN 978-1-4614-8455-4, <doi:10.1007/978-1-4614-8456-1>).

Maintained by Alexander Kowarik. Last updated 16 days ago.

6.6 match 26 stars 12.80 score 2.4k scripts 46 dependents

thomasp85

tidygraph:A Tidy API for Graph Manipulation

A graph, while not "tidy" in itself, can be thought of as two tidy data frames describing node and edge data respectively. 'tidygraph' provides an approach to manipulate these two virtual data frames using the API defined in the 'dplyr' package, as well as provides tidy interfaces to a lot of common graph algorithms.

Maintained by Thomas Lin Pedersen. Last updated 1 months ago.

graph-algorithms graph-manipulation igraph network-analysis tidyverse cpp

5.7 match 553 stars 14.74 score 4.6k scripts 136 dependents

ohyoo

rrpack:Reduced-Rank Regression

Multivariate regression methodologies including classical reduced-rank regression (RRR) studied by Anderson (1951) <doi:10.1214/aoms/1177729580> and Reinsel and Velu (1998) <doi:10.1007/978-1-4757-2853-8>, reduced-rank regression via adaptive nuclear norm penalization proposed by Chen et al. (2013) <doi:10.1093/biomet/ast036> and Mukherjee et al. (2015) <doi:10.1093/biomet/asx080>, robust reduced-rank regression (R4) proposed by She and Chen (2017) <doi:10.1093/biomet/asx032>, generalized/mixed-response reduced-rank regression (mRRR) proposed by Luo et al. (2018) <doi:10.1016/j.jmva.2018.04.011>, row-sparse reduced-rank regression (SRRR) proposed by Chen and Huang (2012) <doi:10.1080/01621459.2012.734178>, reduced-rank regression with a sparse singular value decomposition (RSSVD) proposed by Chen et al. (2012) <doi:10.1111/j.1467-9868.2011.01002.x> and sparse and orthogonal factor regression (SOFAR) proposed by Uematsu et al. (2019) <doi:10.1109/TIT.2019.2909889>.

Maintained by Kun Chen. Last updated 3 years ago.

openblas cpp openmp

28.0 match 2 stars 3.00 score 83 scripts 2 dependents

bioc

singscore:Rank-based single-sample gene set scoring method

A simple single-sample gene signature scoring method that uses rank-based statistics to analyze the sample's gene expression profile. It scores the expression activities of gene sets at a single-sample level.

Maintained by Malvika Kharbanda. Last updated 5 months ago.

software geneexpression genesetenrichment bioinformatics

8.3 match 41 stars 10.03 score 124 scripts 4 dependents

salvatoremangiafico

rcompanion:Functions to Support Extension Education Program Evaluation

Functions and datasets to support Summary and Analysis of Extension Program Evaluation in R, and An R Companion for the Handbook of Biological Statistics. Vignettes are available at <https://rcompanion.org>.

Maintained by Salvatore Mangiafico. Last updated 30 days ago.

10.2 match 4 stars 8.01 score 2.4k scripts 5 dependents

bioc

findIPs:Influential Points Detection for Feature Rankings

Feature rankings can be distorted by a single case in the context of high-dimensional data. The cases exerts abnormal influence on feature rankings are called influential points (IPs). The package aims at detecting IPs based on case deletion and quantifies their effects by measuring the rank changes (DOI:10.48550/arXiv.2303.10516). The package applies a novel rank comparing measure using the adaptive weights that stress the top-ranked important features and adjust the weights to ranking properties.

Maintained by Shuo Wang. Last updated 4 months ago.

geneexpression differentialexpression regression survival

17.7 match 4.60 score 4 scripts

rudeboybert

fivethirtyeight:Data and Code Behind the Stories and Interactives at 'FiveThirtyEight'

Datasets and code published by the data journalism website 'FiveThirtyEight' available at <https://github.com/fivethirtyeight/data>. Note that while we received guidance from editors at 'FiveThirtyEight', this package is not officially published by 'FiveThirtyEight'.

Maintained by Albert Y. Kim. Last updated 2 years ago.

data-science datajournalism fivethirtyeight statistics

7.4 match 453 stars 10.98 score 1.7k scripts

gdkrmr

coRanking:Co-Ranking Matrix

Calculates the co-ranking matrix to assess the quality of a dimensionality reduction.

Maintained by Guido Kraemer. Last updated 6 months ago.

dimensionality-reduction manifold-learning quality statistics unsupervised-learning cpp

14.7 match 9 stars 5.43 score 20 scripts 1 dependents

lindbrook

packageRank:Computation and Visualization of Package Download Counts and Percentile Ranks

Compute and visualize package download counts and percentile ranks from Posit/RStudio's CRAN mirror.

Maintained by lindbrook. Last updated 4 days ago.

bioconductor-packages

12.9 match 28 stars 6.13 score 27 scripts

tidyverse

tidyr:Tidy Messy Data

Tools to help to create tidy data, where each column is a variable, each row is an observation, and each cell contains a single value. 'tidyr' contains tools for changing the shape (pivoting) and hierarchy (nesting and 'unnesting') of a dataset, turning deeply nested lists into rectangular data frames ('rectangling'), and extracting values out of string columns. It also includes tools for working with missing values (both implicit and explicit).

Maintained by Hadley Wickham. Last updated 12 days ago.

tidy-data cpp

3.4 match 1.4k stars 22.88 score 168k scripts 5.5k dependents

t-kalinowski

keras:R Interface to 'Keras'

Interface to 'Keras' <https://keras.io>, a high-level neural networks 'API'. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both 'CPU' and 'GPU' devices.

Maintained by Tomasz Kalinowski. Last updated 11 months ago.

7.2 match 10.82 score 10k scripts 54 dependents

bioc

ROSeq:Modeling expression ranks for noise-tolerant differential expression analysis of scRNA-Seq data

ROSeq - A rank based approach to modeling gene expression with filtered and normalized read count matrix. ROSeq takes filtered and normalized read matrix and cell-annotation/condition as input and determines the differentially expressed genes between the contrasting groups of single cells. One of the input parameters is the number of cores to be used.

Maintained by Krishan Gupta. Last updated 5 months ago.

geneexpression differentialexpression singlecell count-data gene-expression gene-expression-profiles normalization populations rank tmm tung tung-dataset tutorial vignette

17.5 match 2 stars 4.34 score 11 scripts

bioc

RcisTarget:RcisTarget Identify transcription factor binding motifs enriched on a list of genes or genomic regions

RcisTarget identifies transcription factor binding motifs (TFBS) over-represented on a gene list. In a first step, RcisTarget selects DNA motifs that are significantly over-represented in the surroundings of the transcription start site (TSS) of the genes in the gene-set. This is achieved by using a database that contains genome-wide cross-species rankings for each motif. The motifs that are then annotated to TFs and those that have a high Normalized Enrichment Score (NES) are retained. Finally, for each motif and gene-set, RcisTarget predicts the candidate target genes (i.e. genes in the gene-set that are ranked above the leading edge).

Maintained by Gert Hulselmans. Last updated 5 months ago.

generegulation motifannotation transcriptomics transcription genesetenrichment genetarget

7.9 match 37 stars 9.47 score 191 scripts

kaifenglu

lrstat:Power and Sample Size Calculation for Non-Proportional Hazards and Beyond

Performs power and sample size calculation for non-proportional hazards model using the Fleming-Harrington family of weighted log-rank tests. The sequentially calculated log-rank test score statistics are assumed to have independent increments as characterized in Anastasios A. Tsiatis (1982) <doi:10.1080/01621459.1982.10477898>. The mean and variance of log-rank test score statistics are calculated based on Kaifeng Lu (2021) <doi:10.1002/pst.2069>. The boundary crossing probabilities are calculated using the recursive integration algorithm described in Christopher Jennison and Bruce W. Turnbull (2000, ISBN:0849303168). The package can also be used for continuous, binary, and count data. For continuous data, it can handle missing data through mixed-model for repeated measures (MMRM). In crossover designs, it can estimate direct treatment effects while accounting for carryover effects. For binary data, it can design Simon's 2-stage, modified toxicity probability-2 (mTPI-2), and Bayesian optimal interval (BOIN) trials. For count data, it can design group sequential trials for negative binomial endpoints with censoring. Additionally, it facilitates group sequential equivalence trials for all supported data types. Moreover, it can design adaptive group sequential trials for changes in sample size, error spending function, number and spacing or future looks. Finally, it offers various options for adjusted p-values, including graphical and gatekeeping procedures.

Maintained by Kaifeng Lu. Last updated 3 months ago.

cpp

13.4 match 2 stars 5.58 score 30 scripts

jranke

mkin:Kinetic Evaluation of Chemical Degradation Data

Calculation routines based on the FOCUS Kinetics Report (2006, 2014). Includes a function for conveniently defining differential equation models, model solution based on eigenvalues if possible or using numerical solvers. If a C compiler (on windows: 'Rtools') is installed, differential equation models are solved using automatically generated C functions. Non-constant errors can be taken into account using variance by variable or two-component error models <doi:10.3390/environments6120124>. Hierarchical degradation models can be fitted using nonlinear mixed-effects model packages as a back end <doi:10.3390/environments8080071>. Please note that no warranty is implied for correctness of results or fitness for a particular purpose.

Maintained by Johannes Ranke. Last updated 30 days ago.

degradation focus-kinetics kinetic-models kinetics ode ode-model

9.0 match 11 stars 8.18 score 78 scripts 1 dependents

r-forge

TopKLists:Inference, Aggregation and Visualization for Top-K Ranked Lists

For multiple ranked input lists (full or partial) representing the same set of N objects, the package TopKLists offers (1) statistical inference on the lengths of informative top-k lists, (2) stochastic aggregation of full or partial lists, and (3) graphical tools for the statistical exploration of input lists, and for the visualization of aggregation results.

Maintained by Michael G. Schimek. Last updated 9 years ago.

18.3 match 4.05 score 37 scripts 1 dependents

bioc

scran:Methods for Single-Cell RNA-Seq Data Analysis

Implements miscellaneous functions for interpretation of single-cell RNA-seq data. Methods are provided for assignment of cell cycle phase, detection of highly variable and significantly correlated genes, identification of marker genes, and other common tasks in routine single-cell analysis workflows.

Maintained by Aaron Lun. Last updated 5 months ago.

immunooncology normalization sequencing rnaseq software geneexpression transcriptomics singlecell clustering bioconductor-package human-cell-atlas single-cell-rna-seq openblas cpp

5.6 match 41 stars 13.14 score 7.6k scripts 36 dependents

bioc

limma:Linear Models for Microarray and Omics Data

Data analysis, linear models and differential expression for omics data.

Maintained by Gordon Smyth. Last updated 5 days ago.

exonarray geneexpression transcription alternativesplicing differentialexpression differentialsplicing genesetenrichment dataimport bayesian clustering regression timecourse microarray micrornaarray mrnamicroarray onechannel proprietaryplatforms twochannel sequencing rnaseq batcheffect multiplecomparison normalization preprocessing qualitycontrol biomedicalinformatics cellbiology cheminformatics epigenetics functionalgenomics genetics immunooncology metabolomics proteomics systemsbiology transcriptomics

5.3 match 13.81 score 16k scripts 585 dependents

guido-s

netmeta:Network Meta-Analysis using Frequentist Methods

A comprehensive set of functions providing frequentist methods for network meta-analysis (Balduzzi et al., 2023) <doi:10.18637/jss.v106.i02> and supporting Schwarzer et al. (2015) <doi:10.1007/978-3-319-21416-0>, Chapter 8 "Network Meta-Analysis": - frequentist network meta-analysis following Rücker (2012) <doi:10.1002/jrsm.1058>; - additive network meta-analysis for combinations of treatments (Rücker et al., 2020) <doi:10.1002/bimj.201800167>; - network meta-analysis of binary data using the Mantel-Haenszel or non-central hypergeometric distribution method (Efthimiou et al., 2019) <doi:10.1002/sim.8158>, or penalised logistic regression (Evrenoglou et al., 2022) <doi:10.1002/sim.9562>; - rankograms and ranking of treatments by the Surface under the cumulative ranking curve (SUCRA) (Salanti et al., 2013) <doi:10.1016/j.jclinepi.2010.03.016>; - ranking of treatments using P-scores (frequentist analogue of SUCRAs without resampling) according to Rücker & Schwarzer (2015) <doi:10.1186/s12874-015-0060-8>; - split direct and indirect evidence to check consistency (Dias et al., 2010) <doi:10.1002/sim.3767>, (Efthimiou et al., 2019) <doi:10.1002/sim.8158>; - league table with network meta-analysis results; - 'comparison-adjusted' funnel plot (Chaimani & Salanti, 2012) <doi:10.1002/jrsm.57>; - net heat plot and design-based decomposition of Cochran's Q according to Krahn et al. (2013) <doi:10.1186/1471-2288-13-35>; - measures characterizing the flow of evidence between two treatments by König et al. (2013) <doi:10.1002/sim.6001>; - automated drawing of network graphs described in Rücker & Schwarzer (2016) <doi:10.1002/jrsm.1143>; - partial order of treatment rankings ('poset') and Hasse diagram for 'poset' (Carlsen & Bruggemann, 2014) <doi:10.1002/cem.2569>; (Rücker & Schwarzer, 2017) <doi:10.1002/jrsm.1270>; - contribution matrix as described in Papakonstantinou et al. (2018) <doi:10.12688/f1000research.14770.3> and Davies et al. (2022) <doi:10.1002/sim.9346>; - subgroup network meta-analysis.

Maintained by Guido Schwarzer. Last updated 2 days ago.

meta-analysis network-meta-analysis rstudio

6.2 match 33 stars 11.82 score 199 scripts 10 dependents

bioc

BiocGenerics:S4 generic functions used in Bioconductor

The package defines many S4 generic functions used in Bioconductor.

Maintained by Hervé Pagès. Last updated 1 months ago.

infrastructure bioconductor-package core-package

5.1 match 12 stars 14.22 score 612 scripts 2.2k dependents

finyang

RRRR:Online Robust Reduced-Rank Regression Estimation

Methods for estimating online robust reduced-rank regression. The Gaussian maximum likelihood estimation method is described in Johansen, S. (1991) <doi:10.2307/2938278>. The majorisation-minimisation estimation method is partly described in Zhao, Z., & Palomar, D. P. (2017) <doi:10.1109/GlobalSIP.2017.8309093>. The description of the generic stochastic successive upper-bound minimisation method and the sample average approximation can be found in Razaviyayn, M., Sanjabi, M., & Luo, Z. Q. (2016) <doi:10.1007/s10107-016-1021-7>.

Maintained by Yangzhuoran Fin Yang. Last updated 2 years ago.

17.4 match 3 stars 4.18 score 10 scripts

abichat

yatah:Yet Another TAxonomy Handler

Provides functions to manage taxonomy when lineages are described with strings and ranks separated with special patterns like "|*__" or ";*__".

Maintained by Antoine Bichat. Last updated 11 months ago.

clades taxonomy

15.6 match 6 stars 4.65 score 15 scripts

ffverse

ffsimulator:Simulate Fantasy Football Seasons

Uses bootstrap resampling to run fantasy football season simulations supported by historical rankings and 'nflfastR' data, calculating optimal lineups, and returning aggregated results.

Maintained by Tan Ho. Last updated 5 months ago.

fantasy-football simulation

13.7 match 17 stars 5.17 score 44 scripts

cran

Rfit:Rank-Based Estimation for Linear Models

Rank-based (R) estimation and inference for linear models. Estimation is for general scores and a library of commonly used score functions is included.

Maintained by John Kloke. Last updated 10 months ago.

fortran

16.0 match 4.35 score 9 dependents

hwborchers

pracma:Practical Numerical Math Functions

Provides a large number of functions from numerical analysis and linear algebra, numerical optimization, differential equations, time series, plus some well-known special mathematical functions. Uses 'MATLAB' function names where appropriate to simplify porting.

Maintained by Hans W. Borchers. Last updated 1 years ago.

5.6 match 29 stars 12.34 score 6.6k scripts 931 dependents

bioc

NetPathMiner:NetPathMiner for Biological Network Construction, Path Mining and Visualization

NetPathMiner is a general framework for network path mining using genome-scale networks. It constructs networks from KGML, SBML and BioPAX files, providing three network representations, metabolic, reaction and gene representations. NetPathMiner finds active paths and applies machine learning methods to summarize found paths for easy interpretation. It also provides static and interactive visualizations of networks and paths to aid manual investigation.

Maintained by Ahmed Mohamed. Last updated 4 months ago.

graphandnetwork pathways network clustering classification libsbml libxml2 openblas cpp

10.4 match 9 stars 6.56 score 9 scripts

robinhankin

hyper2:The Hyperdirichlet Distribution, Mark 2

A suite of routines for the hyperdirichlet distribution and reified Bradley-Terry; supersedes the 'hyperdirichlet' package; uses 'disordR' discipline <doi:10.48550/ARXIV.2210.03856>. To cite in publications please use Hankin 2017 <doi:10.32614/rj-2017-061>, and for Generalized Plackett-Luce likelihoods use Hankin 2024 <doi:10.18637/jss.v109.i08>.

Maintained by Robin K. S. Hankin. Last updated 3 days ago.

cpp

11.3 match 5 stars 6.01 score 38 scripts 1 dependents

civilstat

RankingProject:The Ranking Project: Visualizations for Comparing Populations

Functions to generate plots and tables for comparing independently-sampled populations. Companion package to "A Primer on Visualizations for Comparing Populations, Including the Issue of Overlapping Confidence Intervals" by Wright, Klein, and Wieczorek (2019) <DOI:10.1080/00031305.2017.1392359> and "A Joint Confidence Region for an Overall Ranking of Populations" by Klein, Wright, and Wieczorek (2020) <DOI:10.1111/rssc.12402>.

Maintained by Jerzy Wieczorek. Last updated 3 years ago.

13.4 match 7 stars 5.02 score 10 scripts

r-forge

mlogit:Multinomial Logit Models

Maximum Likelihood estimation of random utility discrete choice models, as described in Kenneth Train (2009) Discrete Choice Methods with Simulations <doi:10.1017/CBO9780511805271>.

Maintained by Yves Croissant. Last updated 5 years ago.

6.9 match 9.81 score 1.2k scripts 14 dependents

cmollica

PLMIX:Bayesian Analysis of Finite Mixture of Plackett-Luce Models

Fit finite mixtures of Plackett-Luce models for partial top rankings/orderings within the Bayesian framework. It provides MAP point estimates via EM algorithm and posterior MCMC simulations via Gibbs Sampling. It also fits MLE as a special case of the noninformative Bayesian analysis with vague priors. In addition to inferential techniques, the package assists other fundamental phases of a model-based analysis for partial rankings/orderings, by including functions for data manipulation, simulation, descriptive summary, model selection and goodness-of-fit evaluation. Main references on the methods are Mollica and Tardella (2017) <doi.org/10.1007/s11336-016-9530-0> and Mollica and Tardella (2014) <doi/10.1002/sim.6224>.

Maintained by Cristina Mollica. Last updated 4 years ago.

cpp

21.4 match 3.15 score 28 scripts

rfastofficial

Rfast:A Collection of Efficient and Extremely Fast R Functions

A collection of fast (utility) functions for data analysis. Column and row wise means, medians, variances, minimums, maximums, many t, F and G-square tests, many regressions (normal, logistic, Poisson), are some of the many fast functions. References: a) Tsagris M., Papadakis M. (2018). Taking R to its limits: 70+ tips. PeerJ Preprints 6:e26605v1 <doi:10.7287/peerj.preprints.26605v1>. b) Tsagris M. and Papadakis M. (2018). Forward regression in R: from the extreme slow to the extreme fast. Journal of Data Science, 16(4): 771--780. <doi:10.6339/JDS.201810_16(4).00006>. c) Chatzipantsiou C., Dimitriadis M., Papadakis M. and Tsagris M. (2020). Extremely Efficient Permutation and Bootstrap Hypothesis Tests Using Hypothesis Tests Using R. Journal of Modern Applied Statistical Methods, 18(2), eP2898. <doi:10.48550/arXiv.1806.10947>. d) Tsagris M., Papadakis M., Alenazi A. and Alzeley O. (2024). Computationally Efficient Outlier Detection for High-Dimensional Data Using the MDP Algorithm. Computation, 12(9): 185. <doi:10.3390/computation12090185>. e) Tsagris M. and Papadakis M. (2025). Fast and light-weight energy statistics using the R package Rfast. <doi:10.48550/arXiv.2501.02849>.

Maintained by Manos Papadakis. Last updated 17 days ago.

openblas cpp openmp

5.2 match 147 stars 12.54 score 1.2k scripts 166 dependents

rivolli

utiml:Utilities for Multi-Label Learning

Multi-label learning strategies and others procedures to support multi- label classification in R. The package provides a set of multi-label procedures such as sampling methods, transformation strategies, threshold functions, pre-processing techniques and evaluation metrics. A complete overview of the matter can be seen in Zhang, M. and Zhou, Z. (2014) <doi:10.1109/TKDE.2013.39> and Gibaja, E. and Ventura, S. (2015) A Tutorial on Multi-label Learning.

Maintained by Adriano Rivolli. Last updated 4 years ago.

10.1 match 28 stars 6.39 score 87 scripts

mlverse

torch:Tensors and Neural Networks with 'GPU' Acceleration

Provides functionality to define and train neural networks similar to 'PyTorch' by Paszke et al (2019) <doi:10.48550/arXiv.1912.01703> but written entirely in R using the 'libtorch' library. Also supports low-level tensor operations and 'GPU' acceleration.

Maintained by Daniel Falbel. Last updated 5 days ago.

autograd deep-learning torch cpp

3.9 match 520 stars 16.52 score 1.4k scripts 38 dependents

kjhealy

gssrdoc:Document General Social Survey Variable

The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the a tibble with information on the survey variables, together with every variable documented as an R help page. For more information on the GSS see \url{http://gss.norc.org}.

Maintained by Kieran Healy. Last updated 11 months ago.

27.9 match 2.28 score 38 scripts

lineupjs

lineupjs:'HTMLWidget' Wrapper of 'LineUp' for Visual Analysis of Multi-Attribute Rankings

'LineUp' is an interactive technique designed to create, visualize and explore rankings of items based on a set of heterogeneous attributes. This is a 'htmlwidget' wrapper around the JavaScript library 'LineUp.js'. It is designed to be used in 'R Shiny' apps and 'R Markddown' files. Due to an outdated 'webkit' version of 'RStudio' it won't work in the integrated viewer.

Maintained by Samuel Gratzl. Last updated 3 years ago.

crosstalk htmlwidget htmlwidgets lineup ranking shiny

13.3 match 55 stars 4.76 score 21 scripts

cran

ICSNP:Tools for Multivariate Nonparametrics

Tools for multivariate nonparametrics, as location tests based on marginal ranks, spatial median and spatial signs computation, Hotelling's T-test, estimates of shape are implemented.

Maintained by Klaus Nordhausen. Last updated 1 years ago.

cpp

16.5 match 3.84 score 14 dependents

bioc

mixOmics:Omics Data Integration Project

Multivariate methods are well suited to large omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (components), which are defined as combinations of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structures between the different data sets that are integrated. mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data. A non exhaustive list of methods include variants of generalised Canonical Correlation Analysis, sparse Partial Least Squares and sparse Discriminant Analysis. Recently we implemented integrative methods to combine multiple data sets: N-integration with variants of Generalised Canonical Correlation Analysis and P-integration with variants of multi-group Partial Least Squares.

Maintained by Eva Hamrud. Last updated 3 days ago.

immunooncology microarray sequencing metabolomics metagenomics proteomics geneprediction multiplecomparison classification regression bioconductor genomics genomics-data genomics-visualization multivariate-analysis multivariate-statistics omics r-pkg r-project

4.5 match 182 stars 13.71 score 1.3k scripts 22 dependents

r-forge

TailRank:The Tail-Rank Statistic

Implements the tail-rank statistic for selecting biomarkers from a microarray data set, an efficient nonparametric test focused on the distributional tails. See <https://gitlab.com/krcoombes/coombeslab/-/blob/master/doc/papers/tolstoy-new.pdf>.

Maintained by Kevin R. Coombes. Last updated 1 months ago.

9.5 match 6.38 score 37 scripts 3 dependents

ubod

rococo:Robust Rank Correlation Coefficient and Test

Provides the robust gamma rank correlation coefficient as introduced by Bodenhofer, Krone, and Klawonn (2013) <DOI:10.1016/j.ins.2012.11.026> along with a permutation-based rank correlation test. The rank correlation coefficient and the test are explicitly designed for dealing with noisy numerical data.

Maintained by Ulrich Bodenhofer. Last updated 11 months ago.

cpp

14.1 match 4.32 score 21 scripts

kurthornik

relations:Data Structures and Algorithms for Relations

Data structures and algorithms for k-ary relations with arbitrary domains, featuring relational algebra, predicate functions, and fitters for consensus relations.

Maintained by Kurt Hornik. Last updated 25 days ago.

12.0 match 5.04 score 58 scripts 14 dependents

aebilgrau

GMCM:Fast Estimation of Gaussian Mixture Copula Models

Unsupervised Clustering and Meta-analysis using Gaussian Mixture Copula Models.

Maintained by Anders Ellern Bilgrau. Last updated 3 years ago.

clustering gaussian-mixture-models meta-analysis rank unsupervised-cluster-analysis openblas cpp

13.0 match 15 stars 4.62 score 56 scripts

bioc

msImpute:Imputation of label-free mass spectrometry peptides

MsImpute is a package for imputation of peptide intensity in proteomics experiments. It additionally contains tools for MAR/MNAR diagnosis and assessment of distortions to the probability distribution of the data post imputation. The missing values are imputed by low-rank approximation of the underlying data matrix if they are MAR (method = "v2"), by Barycenter approach if missingness is MNAR ("v2-mnar"), or by Peptide Identity Propagation (PIP).

Maintained by Soroor Hediyeh-zadeh. Last updated 5 months ago.

massspectrometry proteomics software label-free-proteomics low-rank-approximation

11.7 match 14 stars 5.15 score 7 scripts

merck

gsDesign2:Group Sequential Design with Non-Constant Effect

The goal of 'gsDesign2' is to enable fixed or group sequential design under non-proportional hazards. To enable highly flexible enrollment, time-to-event and time-to-dropout assumptions, 'gsDesign2' offers piecewise constant enrollment, failure rates, and dropout rates for a stratified population. This package includes three methods for designs: average hazard ratio, weighted logrank tests in Yung and Liu (2019) <doi:10.1111/biom.13196>, and MaxCombo tests. Substantial flexibility on top of what is in the 'gsDesign' package is intended for selecting boundaries.

Maintained by Yujie Zhao. Last updated 2 days ago.

cpp

6.7 match 22 stars 8.91 score 186 scripts

esteban-alfaro

adabag:Applies Multiclass AdaBoost.M1, SAMME and Bagging

It implements Freund and Schapire's Adaboost.M1 algorithm and Breiman's Bagging algorithm using classification trees as individual classifiers. Once these classifiers have been trained, they can be used to predict on new data. Also, cross validation estimation of the error can be done. Since version 2.0 the function margins() is available to calculate the margins for these classifiers. Also a higher flexibility is achieved giving access to the rpart.control() argument of 'rpart'. Four important new features were introduced on version 3.0, AdaBoost-SAMME (Zhu et al., 2009) is implemented and a new function errorevol() shows the error of the ensembles as a function of the number of iterations. In addition, the ensembles can be pruned using the option 'newmfinal' in the predict.bagging() and predict.boosting() functions and the posterior probability of each class for observations can be obtained. Version 3.1 modifies the relative importance measure to take into account the gain of the Gini index given by a variable in each tree and the weights of these trees. Version 4.0 includes the margin-based ordered aggregation for Bagging pruning (Guo and Boukir, 2013) and a function to auto prune the 'rpart' tree. Moreover, three new plots are also available importanceplot(), plot.errorevol() and plot.margins(). Version 4.1 allows to predict on unlabeled data. Version 4.2 includes the parallel computation option for some of the functions. Version 5.0 includes the Boosting and Bagging algorithms for label ranking (Albano, Sciandra and Plaia, 2023).

Maintained by Esteban Alfaro. Last updated 2 years ago.

9.6 match 5 stars 6.27 score 720 scripts 6 dependents

ludovikcoba

rrecsys:Environment for Evaluating Recommender Systems

Processes standard recommendation datasets (e.g., a user-item rating matrix) as input and generates rating predictions and lists of recommended items. Standard algorithm implementations which are included in this package are the following: Global/Item/User-Average baselines, Weighted Slope One, Item-Based KNN, User-Based KNN, FunkSVD, BPR and weighted ALS. They can be assessed according to the standard offline evaluation methodology (Shani, et al. (2011) <doi:10.1007/978-0-387-85820-3_8>) for recommender systems using measures such as MAE, RMSE, Precision, Recall, F1, AUC, NDCG, RankScore and coverage measures. The package (Coba, et al.(2017) <doi: 10.1007/978-3-319-60042-0_36>) is intended for rapid prototyping of recommendation algorithms and education purposes.

Maintained by Ludovik Çoba. Last updated 3 years ago.

cpp

8.7 match 23 stars 6.84 score 25 scripts

zrmacc

RNOmni:Rank Normal Transformation Omnibus Test

Inverse normal transformation (INT) based genetic association testing. These tests are recommend for continuous traits with non-normally distributed residuals. INT-based tests robustly control the type I error in settings where standard linear regression does not, as when the residual distribution exhibits excess skew or kurtosis. Moreover, INT-based tests outperform standard linear regression in terms of power. These tests may be classified into two types. In direct INT (D-INT), the phenotype is itself transformed. In indirect INT (I-INT), phenotypic residuals are transformed. The omnibus test (O-INT) adaptively combines D-INT and I-INT into a single robust and statistically powerful approach. See McCaw ZR, Lane JM, Saxena R, Redline S, Lin X. "Operating characteristics of the rank-based inverse normal transformation for quantitative trait analysis in genome-wide association studies" <doi:10.1111/biom.13214>.

Maintained by Zachary McCaw. Last updated 4 months ago.

openblas cpp

8.8 match 6.80 score 303 scripts 3 dependents

jranke

chemCal:Calibration Functions for Analytical Chemistry

Simple functions for plotting linear calibration functions and estimating standard errors for measurements according to the Handbook of Chemometrics and Qualimetrics: Part A by Massart et al. (1997) There are also functions estimating the limit of detection (LOD) and limit of quantification (LOQ). The functions work on model objects from - optionally weighted - linear regression (lm) or robust linear regression ('rlm' from the 'MASS' package).

Maintained by Johannes Ranke. Last updated 2 months ago.

9.1 match 6 stars 6.52 score 55 scripts

tvganesh

yorkr:Analyze Cricket Performances Based on Data from Cricsheet

Analyzing performances of cricketers and cricket teams based on 'yaml' match data from Cricsheet <https://cricsheet.org/>.

Maintained by Tinniam V Ganesh. Last updated 2 years ago.

11.8 match 17 stars 5.00 score 118 scripts

thothorn

exactRankTests:Exact Distributions for Rank and Permutation Tests

Computes exact conditional p-values and quantiles using an implementation of the Shift-Algorithm by Streitberg & Roehmel.

Maintained by Torsten Hothorn. Last updated 3 years ago.

8.3 match 1 stars 7.13 score 276 scripts 65 dependents

bioc

Biostrings:Efficient manipulation of biological strings

Memory efficient string containers, string matching algorithms, and other utilities, for fast manipulation of large biological sequences or sets of sequences.

Maintained by Hervé Pagès. Last updated 23 days ago.

sequencematching alignment sequencing genetics dataimport datarepresentation infrastructure bioconductor-package core-package

3.3 match 61 stars 17.83 score 8.6k scripts 1.2k dependents

bioc

GenomicRanges:Representation and manipulation of genomic intervals

The ability to efficiently represent and manipulate genomic annotations and alignments is playing a central role when it comes to analyzing high-throughput sequencing data (a.k.a. NGS data). The GenomicRanges package defines general purpose containers for storing and manipulating genomic intervals and variables defined along a genome. More specialized containers for representing and manipulating short alignments against a reference genome, or a matrix-like summarization of an experiment, are defined in the GenomicAlignments and SummarizedExperiment packages, respectively. Both packages build on top of the GenomicRanges infrastructure.

Maintained by Hervé Pagès. Last updated 4 months ago.

genetics infrastructure datarepresentation sequencing annotation genomeannotation coverage bioconductor-package core-package

3.3 match 44 stars 17.75 score 13k scripts 1.3k dependents

easystats

effectsize:Indices of Effect Size

Provide utilities to work with indices of effect size for a wide variety of models and hypothesis tests (see list of supported models using the function 'insight::supported_models()'), allowing computation of and conversion between indices such as Cohen's d, r, odds, etc. References: Ben-Shachar et al. (2020) <doi:10.21105/joss.02815>.

Maintained by Mattan S. Ben-Shachar. Last updated 1 months ago.

anova cohens-d compute conversion correlation effect-size effectsize hacktoberfest hedges-g interpretation standardization standardized statistics

3.5 match 344 stars 16.38 score 1.8k scripts 29 dependents

n8thangreen

BCEA:Bayesian Cost Effectiveness Analysis

Produces an economic evaluation of a sample of suitable variables of cost and effectiveness / utility for two or more interventions, e.g. from a Bayesian model in the form of MCMC simulations. This package computes the most cost-effective alternative and produces graphical summaries and probabilistic sensitivity analysis, see Baio et al (2017) <doi:10.1007/978-3-319-55718-2>.

Maintained by Gianluca Baio. Last updated 2 months ago.

bayesian cost-effectiveness

5.8 match 3 stars 9.90 score 243 scripts 3 dependents

agroscope-ch

srppp:Read the Swiss Register of Plant Protection Products

Generate data objects from XML versions of the Swiss Register of Plant Protection Products. An online version of the register can be accessed at <https://www.psm.admin.ch/de/produkte>. There is no guarantee of correspondence of the data read in using this package with that online version, or with the original registration documents. Also, the Federal Food Safety and Veterinary Office, coordinating the authorisation of plant protection products in Switzerland, does not answer requests regarding this package.

Maintained by Johannes Ranke. Last updated 24 days ago.

9.0 match 6.31 score 20 scripts 1 dependents

cmmr

rbiom:Read/Write, Analyze, and Visualize 'BIOM' Data

A toolkit for working with Biological Observation Matrix ('BIOM') files. Read/write all 'BIOM' formats. Compute rarefaction, alpha diversity, and beta diversity (including 'UniFrac'). Summarize counts by taxonomic level. Subset based on metadata. Generate visualizations and statistical analyses. CPU intensive operations are coded in C for speed.

Maintained by Daniel P. Smith. Last updated 6 days ago.

6.2 match 15 stars 9.02 score 117 scripts 6 dependents

wenjie2wang

clusrank:Wilcoxon Rank Tests for Clustered Data

Non-parametric tests (Wilcoxon rank sum test and Wilcoxon signed rank test) for clustered data documented in Jiang et. al (2020) <doi:10.18637/jss.v096.i06>.

Maintained by Wenjie Wang. Last updated 1 years ago.

cpp

12.7 match 3 stars 4.39 score 17 scripts

schochastics

networkdata:Repository of Network Datasets

The package contains a large collection of network dataset with different context. This includes social networks, animal networks and movie networks. All datasets are in 'igraph' format.

Maintained by David Schoch. Last updated 12 months ago.

dataset network-analysis

11.1 match 143 stars 5.01 score 143 scripts

bioc

SummarizedExperiment:A container (S4 class) for matrix-like assays

The SummarizedExperiment container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples.

Maintained by Hervé Pagès. Last updated 5 months ago.

genetics infrastructure sequencing annotation coverage genomeannotation bioconductor-package core-package

3.3 match 34 stars 16.85 score 8.6k scripts 1.2k dependents

gagolews

stringi:Fast and Portable Character String Processing Facilities

A collection of character string/text/natural language processing tools for pattern searching (e.g., with 'Java'-like regular expressions or the 'Unicode' collation algorithm), random string generation, case mapping, string transliteration, concatenation, sorting, padding, wrapping, Unicode normalisation, date-time formatting and parsing, and many more. They are fast, consistent, convenient, and - thanks to 'ICU' (International Components for Unicode) - portable across all locales and platforms. Documentation about 'stringi' is provided via its website at <https://stringi.gagolewski.com/> and the paper by Gagolewski (2022, <doi:10.18637/jss.v103.i02>).

Maintained by Marek Gagolewski. Last updated 1 months ago.

icu icu4c natural-language-processing nlp regex regexp string-manipulation stringi stringr text text-processing tidy-data unicode cpp

3.0 match 309 stars 18.31 score 10k scripts 8.6k dependents

echasnovski

comperank:Ranking Methods for Competition Results

Compute ranking and rating based on competition results. Methods of different nature are implemented: with fixed Head-to-Head structure, with variable Head-to-Head structure and with iterative nature. All algorithms are taken from the book 'Who’s #1?: The science of rating and ranking' by Amy N. Langville and Carl D. Meyer (2012, ISBN:978-0-691-15422-0).

Maintained by Evgeni Chasnovski. Last updated 2 years ago.

cpp

9.7 match 24 stars 5.65 score 37 scripts

cmollica

MSmix:Finite Mixtures of Mallows Models with Spearman Distance for Full and Partial Rankings

Fit and analysis of finite Mixtures of Mallows models with Spearman Distance for full and partial rankings with arbitrary missing positions. Inference is conducted within the maximum likelihood framework via Expectation-Maximization algorithms. Estimation uncertainty is tackled via diverse versions of bootstrapping as well as via Hessian-based standard errors calculations. The most relevant reference of the methods is Crispino, Mollica, Astuti and Tardella (2023) <doi:10.1007/s11222-023-10266-8>.

Maintained by Cristina Mollica. Last updated 9 months ago.

cpp

36.9 match 1.48 score

tim-tu

weibulltools:Statistical Methods for Life Data Analysis

Provides statistical methods and visualizations that are often used in reliability engineering. Comprises a compact and easily accessible set of methods and visualization tools that make the examination and adjustment as well as the analysis and interpretation of field data (and bench tests) as simple as possible. Non-parametric estimators like Median Ranks, Kaplan-Meier (Abernethy, 2006, <ISBN:978-0-9653062-3-2>), Johnson (Johnson, 1964, <ISBN:978-0444403223>), and Nelson-Aalen for failure probability estimation within samples that contain failures as well as censored data are included. The package supports methods like Maximum Likelihood and Rank Regression, (Genschel and Meeker, 2010, <DOI:10.1080/08982112.2010.503447>) for the estimation of multiple parametric lifetime distributions, as well as the computation of confidence intervals of quantiles and probabilities using the delta method related to Fisher's confidence intervals (Meeker and Escobar, 1998, <ISBN:9780471673279>) and the beta-binomial confidence bounds. If desired, mixture model analysis can be done with segmented regression and the EM algorithm. Besides the well-known Weibull analysis, the package also contains Monte Carlo methods for the correction and completion of imprecisely recorded or unknown lifetime characteristics. (Verband der Automobilindustrie e.V. (VDA), 2016, <ISSN:0943-9412>). Plots are created statically ('ggplot2') or interactively ('plotly') and can be customized with functions of the respective visualization package. The graphical technique of probability plotting as well as the addition of regression lines and confidence bounds to existing plots are supported.

Maintained by Tim-Gunnar Hensel. Last updated 2 years ago.

field-data-analysis interactive-visualizations plotly reliability-analysis weibull-analysis weibulltools openblas cpp

8.8 match 13 stars 6.15 score 54 scripts

myllym

GET:Global Envelopes

Implementation of global envelopes for a set of general d-dimensional vectors T in various applications. A 100(1-alpha)% global envelope is a band bounded by two vectors such that the probability that T falls outside this envelope in any of the d points is equal to alpha. Global means that the probability is controlled simultaneously for all the d elements of the vectors. The global envelopes can be used for graphical Monte Carlo and permutation tests where the test statistic is a multivariate vector or function (e.g. goodness-of-fit testing for point patterns and random sets, functional analysis of variance, functional general linear model, n-sample test of correspondence of distribution functions), for central regions of functional or multivariate data (e.g. outlier detection, functional boxplot) and for global confidence and prediction bands (e.g. confidence band in polynomial regression, Bayesian posterior prediction). See Myllymäki and Mrkvička (2024) <doi:10.18637/jss.v111.i03>, Myllymäki et al. (2017) <doi:10.1111/rssb.12172>, Mrkvička and Myllymäki (2023) <doi:10.1007/s11222-023-10275-7>, Mrkvička et al. (2016) <doi:10.1016/j.spasta.2016.04.005>, Mrkvička et al. (2017) <doi:10.1007/s11222-016-9683-9>, Mrkvička et al. (2020) <doi:10.14736/kyb-2020-3-0432>, Mrkvička et al. (2021) <doi:10.1007/s11009-019-09756-y>, Myllymäki et al. (2021) <doi:10.1016/j.spasta.2020.100436>, Mrkvička et al. (2022) <doi:10.1002/sim.9236>, Dai et al. (2022) <doi:10.5772/intechopen.100124>, Dvořák and Mrkvička (2022) <doi:10.1007/s00180-021-01134-y>, Mrkvička et al. (2023) <doi:10.48550/arXiv.2309.04746>, and Konstantinou et al. (2024) <doi: 10.1007/s00180-024-01569-z>.

Maintained by Mari Myllymäki. Last updated 4 months ago.

5.8 match 11 stars 9.33 score 46 scripts 5 dependents

bioc

S4Vectors:Foundation of vector-like and list-like containers in Bioconductor

The S4Vectors package defines the Vector and List virtual classes and a set of generic functions that extend the semantic of ordinary vectors and lists in R. Package developers can easily implement vector-like or list-like objects as concrete subclasses of Vector or List. In addition, a few low-level concrete subclasses of general interest (e.g. DataFrame, Rle, Factor, and Hits) are implemented in the S4Vectors package itself (many more are implemented in the IRanges package and in other Bioconductor infrastructure packages).

Maintained by Hervé Pagès. Last updated 1 months ago.

infrastructure datarepresentation bioconductor-package core-package

3.3 match 18 stars 16.05 score 1.0k scripts 1.9k dependents

rdatatable

data.table:Extension of `data.frame`

Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development.

Maintained by Tyson Barrett. Last updated 2 days ago.

2.3 match 3.7k stars 23.53 score 230k scripts 4.6k dependents

knausb

vcfR:Manipulate and Visualize VCF Data

Facilitates easy manipulation of variant call format (VCF) data. Functions are provided to rapidly read from and write to VCF files. Once VCF data is read into R a parser function extracts matrices of data. This information can then be used for quality control or other purposes. Additional functions provide visualization of genomic data. Once processing is complete data may be written to a VCF file (*.vcf.gz). It also may be converted into other popular R objects (e.g., genlight, DNAbin). VcfR provides a link between VCF data and familiar R software.

Maintained by Brian J. Knaus. Last updated 22 days ago.

genomics population-genetics population-genomics rcpp vcf-data visualization zlib cpp

3.9 match 254 stars 13.59 score 3.1k scripts 19 dependents

sdctools

sdcMicro:Statistical Disclosure Control Methods for Anonymization of Data and Risk Estimation

Data from statistical agencies and other institutions are mostly confidential. This package, introduced in Templ, Kowarik and Meindl (2017) <doi:10.18637/jss.v067.i04>, can be used for the generation of anonymized (micro)data, i.e. for the creation of public- and scientific-use files. The theoretical basis for the methods implemented can be found in Templ (2017) <doi:10.1007/978-3-319-50272-4>. Various risk estimation and anonymization methods are included. Note that the package includes a graphical user interface published in Meindl and Templ (2019) <doi:10.3390/a12090191> that allows to use various methods of this package.

Maintained by Matthias Templ. Last updated 26 days ago.

cpp

5.3 match 83 stars 9.89 score 258 scripts

bioc

rhdf5:R Interface to HDF5

This package provides an interface between HDF5 and R. HDF5's main features are the ability to store and access very large and/or complex datasets and a wide variety of metadata on mass storage (disk) through a completely portable file format. The rhdf5 package is thus suited for the exchange of large and/or complex datasets between R and other software package, and for letting R applications work on datasets that are larger than the available RAM.

Maintained by Mike Smith. Last updated 2 months ago.

infrastructure dataimport hdf5 rhdf5 openssl curl zlib cpp

3.3 match 62 stars 15.93 score 4.2k scripts 232 dependents

cran

timeSeries:Financial Time Series Objects (Rmetrics)

'S4' classes and various tools for financial time series: Basic functions such as scaling and sorting, subsetting, mathematical operations and statistical functions.

Maintained by Georgi N. Boshnakov. Last updated 6 months ago.

5.2 match 2 stars 9.90 score 1.3k scripts 145 dependents

stanlazic

desiR:Desirability Functions for Ranking, Selecting, and Integrating Data

Functions for (1) ranking, selecting, and prioritising genes, proteins, and metabolites from high dimensional biology experiments, (2) multivariate hit calling in high content screens, and (3) combining data from diverse sources.

Maintained by Stanley E. Lazic. Last updated 4 years ago.

11.8 match 3 stars 4.26 score 12 scripts

sherrillmix

taxonomizr:Functions to Work with NCBI Accessions and Taxonomy

Functions for assigning taxonomy to NCBI accession numbers and taxon IDs based on NCBI's accession2taxid and taxdump files. This package allows the user to download NCBI data dumps and create a local database for fast and local taxonomic assignment.

Maintained by Scott Sherrill-Mix. Last updated 3 days ago.

5.6 match 72 stars 8.85 score 255 scripts 2 dependents

bioc

TREG:Tools for finding Total RNA Expression Genes in single nucleus RNA-seq data

RNA abundance and cell size parameters could improve RNA-seq deconvolution algorithms to more accurately estimate cell type proportions given the different cell type transcription activity levels. A Total RNA Expression Gene (TREG) can facilitate estimating total RNA content using single molecule fluorescent in situ hybridization (smFISH). We developed a data-driven approach using a measure of expression invariance to find candidate TREGs in postmortem human brain single nucleus RNA-seq. This R package implements the method for identifying candidate TREGs from snRNA-seq data.

Maintained by Louise Huuki-Myers. Last updated 3 months ago.

software singlecell rnaseq geneexpression transcriptomics transcription sequencing bioconductor deconvolution rnascope scrna-seq smfish snrna-seq treg

9.5 match 4 stars 5.20 score 5 scripts

r-forge

robustbase:Basic Robust Statistics

"Essential" Robust Statistics. Tools allowing to analyze data with robust methods. This includes regression methodology including model selections and multivariate statistics where we strive to cover the book "Robust Statistics, Theory and Methods" by 'Maronna, Martin and Yohai'; Wiley 2006.

Maintained by Martin Maechler. Last updated 4 months ago.

fortran openblas

3.7 match 13.33 score 1.7k scripts 480 dependents

klauschn

SpatialNP:Multivariate Nonparametric Methods Based on Spatial Signs and Ranks

Test and estimates of location, tests of independence, tests of sphericity and several estimates of shape all based on spatial signs, symmetrized signs, ranks and signed ranks. For details, see Oja and Randles (2004) <doi:10.1214/088342304000000558> and Oja (2010) <doi:10.1007/978-1-4419-0468-3>.

Maintained by Klaus Nordhausen. Last updated 3 years ago.

cpp

22.6 match 2.18 score 10 scripts 5 dependents

pweidemueller

fullRankMatrix:Generation of Full Rank Design Matrix

Creates a full rank matrix out of a given matrix. The intended use is for one-hot encoded design matrices that should be used in linear models to ensure that significant associations can be correctly interpreted. However, 'fullRankMatrix' can be applied to any matrix to make it full rank. It removes columns with only 0's, merges duplicated columns and discovers linearly dependent columns and replaces them with linearly independent columns that span the space of the original columns. Columns are renamed to reflect those modifications. This results in a full rank matrix that can be used as a design matrix in linear models. The algorithm and some functions are inspired by Kuhn, M. (2008) <doi:10.18637/jss.v028.i05>.

Maintained by Paula Weidemueller. Last updated 9 months ago.

8.7 match 14 stars 5.62 score 6 scripts

sportsdataverse

hoopR:Access Men's Basketball Play by Play Data

A utility to quickly obtain clean and tidy men's basketball play by play data. Provides functions to access live play by play and box score data from ESPN<https://www.espn.com> with shot locations when available. It is also a full NBA Stats API<https://www.nba.com/stats/> wrapper. It is also a scraping and aggregating interface for Ken Pomeroy's men's college basketball statistics website<https://kenpom.com>. It provides users with an active subscription the capability to scrape the website tables and analyze the data for themselves.

Maintained by Saiem Gilani. Last updated 1 years ago.

basketball college-basketball espn kenpom nba nba-analytics nba-api nba-data nba-statistics nba-stats nba-stats-api ncaa ncaa-basketball ncaa-bracket ncaa-players ncaa-ratings ncaam sportsdataverse

7.0 match 91 stars 6.93 score 261 scripts

gertvv

gemtc:Network Meta-Analysis Using Bayesian Methods

Network meta-analyses (mixed treatment comparisons) in the Bayesian framework using JAGS. Includes methods to assess heterogeneity and inconsistency, and a number of standard visualizations.

Maintained by Gert van Valkenhoef. Last updated 5 years ago.

jags cpp

6.5 match 44 stars 7.48 score 71 scripts 1 dependents

jtfeld

EloOptimized:Optimized Elo Rating Method for Obtaining Dominance Ranks

Provides an implementation of the maximum likelihood methods for deriving Elo scores as published in Foerster, Franz et al. (2016) <DOI:10.1038/srep35404>.

Maintained by Joseph Feldblum. Last updated 10 months ago.

elo elo-rating maximum-likelihood-estimation

11.5 match 3 stars 4.18 score 9 scripts

hosseinazari

StatRank:Statistical Rank Aggregation: Inference, Evaluation, and Visualization

A set of methods to implement Generalized Method of Moments and Maximal Likelihood methods for Random Utility Models. These methods are meant to provide inference on rank comparison data. These methods accept full, partial, and pairwise rankings, and provides methods to break down full or partial rankings into their pairwise components. Please see Generalized Method-of-Moments for Rank Aggregation from NIPS 2013 for a description of some of our methods.

Maintained by Hossein Azari Soufiani. Last updated 10 years ago.

14.8 match 3.24 score 58 scripts 2 dependents

petersonr

sparseR:Variable Selection under Ranked Sparsity Principles for Interactions and Polynomials

An implementation of ranked sparsity methods, including penalized regression methods such as the sparsity-ranked lasso, its non-convex alternatives, and elastic net, as well as the sparsity-ranked Bayesian Information Criterion. As described in Peterson and Cavanaugh (2022) <doi:10.1007/s10182-021-00431-7>, ranked sparsity is a philosophy with methods primarily useful for variable selection in the presence of prior informational asymmetry, which occurs in the context of trying to perform variable selection in the presence of interactions and/or polynomials. Ultimately, this package attempts to facilitate dealing with cumbersome interactions and polynomials while not avoiding them entirely. Typically, models selected under ranked sparsity principles will also be more transparent, having fewer falsely selected interactions and polynomials than other methods.

Maintained by Ryan Andrew Peterson. Last updated 8 months ago.

9.0 match 6 stars 5.23 score 14 scripts

spholmes

XICOR:Association Measurement Through Cross Rank Increments

Computes robust association measures that do not presuppose linearity. The xi correlation (xicor) is based on cross correlation between ranked increments. The reference for the methods implemented here is Chatterjee, Sourav (2020) <arXiv:1909.10140> This package includes the Galton peas example.

Maintained by Susan Holmes. Last updated 3 months ago.

10.3 match 1 stars 4.56 score 61 scripts 2 dependents

transbiozi

RMTL:Regularized Multi-Task Learning

Efficient solvers for 10 regularized multi-task learning algorithms applicable for regression, classification, joint feature selection, task clustering, low-rank learning, sparse learning and network incorporation. Based on the accelerated gradient descent method, the algorithms feature a state-of-art computational complexity O(1/k^2). Sparse model structure is induced by the solving the proximal operator. The detail of the package is described in the paper of Han Cao and Emanuel Schwarz (2018) <doi:10.1093/bioinformatics/bty831>.

Maintained by Han Cao. Last updated 6 years ago.

low-rank-representaion multi-task-learning regularization sparse-coding

8.4 match 19 stars 5.60 score 21 scripts

egeulgen

driveR:Prioritizing Cancer Driver Genes Using Genomics Data

Cancer genomes contain large numbers of somatic alterations but few genes drive tumor development. Identifying cancer driver genes is critical for precision oncology. Most of current approaches either identify driver genes based on mutational recurrence or using estimated scores predicting the functional consequences of mutations. 'driveR' is a tool for personalized or batch analysis of genomic data for driver gene prioritization by combining genomic information and prior biological knowledge. As features, 'driveR' uses coding impact metaprediction scores, non-coding impact scores, somatic copy number alteration scores, hotspot gene/double-hit gene condition, 'phenolyzer' gene scores and memberships to cancer-related KEGG pathways. It uses these features to estimate cancer-type-specific probability for each gene of being a cancer driver using the related task of a multi-task learning classification model. The method is described in detail in Ulgen E, Sezerman OU. 2021. driveR: driveR: a novel method for prioritizing cancer driver genes using somatic genomics data. BMC Bioinformatics <doi:10.1186/s12859-021-04203-7>.

Maintained by Ege Ulgen. Last updated 2 years ago.

cancer-driverness driver driver-gene-prioritization identify-driver-genes ranking-genes scoring

7.5 match 15 stars 6.29 score 260 scripts

bioc

GSEABenchmarkeR:Reproducible GSEA Benchmarking

The GSEABenchmarkeR package implements an extendable framework for reproducible evaluation of set- and network-based methods for enrichment analysis of gene expression data. This includes support for the efficient execution of these methods on comprehensive real data compendia (microarray and RNA-seq) using parallel computation on standard workstations and institutional computer grids. Methods can then be assessed with respect to runtime, statistical significance, and relevance of the results for the phenotypes investigated.

Maintained by Ludwig Geistlinger. Last updated 5 months ago.

immunooncology microarray rnaseq geneexpression differentialexpression pathways graphandnetwork network genesetenrichment networkenrichment visualization reportwriting bioconductor-package u24ca289073

7.2 match 13 stars 6.55 score 23 scripts

ropensci

phylotaR:Automated Phylogenetic Sequence Cluster Identification from 'GenBank'

A pipeline for the identification, within taxonomic groups, of orthologous sequence clusters from 'GenBank' <https://www.ncbi.nlm.nih.gov/genbank/> as the first step in a phylogenetic analysis. The pipeline depends on a local alignment search tool and is, therefore, not dependent on differences in gene naming conventions and naming errors.

Maintained by Shixiang Wang. Last updated 8 months ago.

blastn genbank peer-reviewed phylogenetics sequence-alignment

8.0 match 23 stars 5.86 score 156 scripts

insightsengineering

cardx:Extra Analysis Results Data Utilities

Create extra Analysis Results Data (ARD) summary objects. The package supplements the simple ARD functions from the 'cards' package, exporting functions to put statistical results in the ARD format. These objects are used and re-used to construct summary tables, visualizations, and written reports.

Maintained by Daniel D. Sjoberg. Last updated 19 days ago.

5.4 match 19 stars 8.46 score 50 scripts

ehrlinger

ggRandomForests:Visually Exploring Random Forests

Graphic elements for exploring Random Forests using the 'randomForest' or 'randomForestSRC' package for survival, regression and classification forests and 'ggplot2' package plotting.

Maintained by John Ehrlinger. Last updated 5 days ago.

5.1 match 148 stars 8.94 score 197 scripts

bioc

ELMER:Inferring Regulatory Element Landscapes and Transcription Factor Networks Using Cancer Methylomes

ELMER is designed to use DNA methylation and gene expression from a large number of samples to infere regulatory element landscape and transcription factor network in primary tissue.

Maintained by Tiago Chedraoui Silva. Last updated 5 months ago.

dnamethylation geneexpression motifannotation software generegulation transcription network

6.1 match 7.42 score 176 scripts

yongkai17

spFSR:Feature Selection and Ranking by Simultaneous Perturbation Stochastic Approximation

An implementation of feature selection and ranking via simultaneous perturbation stochastic approximation (SPSA-FSR) based on works by V. Aksakalli and M. Malekipirbazari (2015) <arXiv:1508.07630> and Zeren D. Yenice and et al. (2018) <arXiv:1804.05589>. The SPSA-FSR algorithm searches for a locally optimal set of features that yield the best predictive performance using a specified error measure such as mean squared error (for regression problems) and accuracy rate (for classification problems). This package requires an object of class 'task' and an object of class 'Learner' from the 'mlr' package.

Maintained by Vural Aksakalli. Last updated 6 years ago.

11.3 match 2 stars 4.00 score 8 scripts

bioc

GenomicAlignments:Representation and manipulation of short genomic alignments

Provides efficient containers for storing and manipulating short genomic alignments (typically obtained by aligning short reads to a reference genome). This includes read counting, computing the coverage, junction detection, and working with the nucleotide content of the alignments.

Maintained by Hervé Pagès. Last updated 5 months ago.

infrastructure dataimport genetics sequencing rnaseq snp coverage alignment immunooncology bioconductor-package core-package

3.3 match 10 stars 13.61 score 3.1k scripts 529 dependents

bioc

AUCell:AUCell: Analysis of 'gene set' activity in single-cell RNA-seq data (e.g. identify cells with specific gene signatures)

AUCell allows to identify cells with active gene sets (e.g. signatures, gene modules...) in single-cell RNA-seq data. AUCell uses the "Area Under the Curve" (AUC) to calculate whether a critical subset of the input gene set is enriched within the expressed genes for each cell. The distribution of AUC scores across all the cells allows exploring the relative expression of the signature. Since the scoring method is ranking-based, AUCell is independent of the gene expression units and the normalization procedure. In addition, since the cells are evaluated individually, it can easily be applied to bigger datasets, subsetting the expression matrix if needed.

Maintained by Gert Hulselmans. Last updated 5 months ago.

singlecell genesetenrichment transcriptomics transcription geneexpression workflowstep normalization

5.2 match 8.59 score 860 scripts 4 dependents

mikejseo

bnma:Bayesian Network Meta-Analysis using 'JAGS'

Network meta-analyses using Bayesian framework following Dias et al. (2013) <DOI:10.1177/0272989X12458724>. Based on the data input, creates prior, model file, and initial values needed to run models in 'rjags'. Able to handle binomial, normal and multinomial arm-level data. Can handle multi-arm trials and includes methods to incorporate covariate and baseline risk effects. Includes standard diagnostics and visualization tools to evaluate the results.

Maintained by Michael Seo. Last updated 1 years ago.

jags cpp

9.7 match 7 stars 4.54 score 7 scripts

jburns88

RankAggregator:Aggregation of (Partial) Ordinal Rankings

Easily compute an aggregate ranking (also called a median ranking or a consensus ranking) according to the axiomatic approach presented by Cook et al. (2007). This approach minimises the number of violations between all candidate consensus rankings and all input (partial) rankings, and draws on a branch and bound algorithm and a heuristic algorithm to drastically improve speed. The package also provides an option to bootstrap a consensus ranking based on resampling input rankings (with replacement). Input rankings can be either incomplete (partial) or complete. Reference: Cook, W.D., Golany, B., Penn, M. and Raviv, T. (2007) <doi:10.1016/j.cor.2005.05.030>.

Maintained by Jay Burns. Last updated 5 years ago.

16.2 match 2.70 score

ropensci

taxa:Classes for Storing and Manipulating Taxonomic Data

Provides classes for storing and manipulating taxonomic data. Most of the classes can be treated like base R vectors (e.g. can be used in tables as columns and can be named). Vectorized classes can store taxon names and authorities, taxon IDs from databases, taxon ranks, and other types of information. More complex classes are provided to store taxonomic trees and user-defined data associated with them.

Maintained by Zachary Foster. Last updated 1 years ago.

taxonomy biology hierarchy data-cleaning taxon

6.4 match 48 stars 6.80 score 217 scripts

mflores72000

qcr:Quality Control Review

Univariate and multivariate SQC tools that completes and increases the SQC techniques available in R. Apart from integrating different R packages devoted to SQC ('qcc','MSQC'), provides nonparametric tools that are highly useful when Gaussian assumption is not met. This package computes standard univariate control charts for individual measurements, X-bar, S, R, p, np, c, u, EWMA and CUSUM. In addition, it includes functions to perform multivariate control charts such as Hotelling T2, MEWMA and MCUSUM. As representative feature, multivariate nonparametric alternatives based on data depth are implemented in this package: r, Q and S control charts. In addition, Phase I and II control charts for functional data are included. This package also allows the estimation of the most complete set of capability indices from first to fourth generation, covering the nonparametric alternatives, and performing the corresponding capability analysis graphical outputs, including the process capability plots.

Maintained by Miguel Flores. Last updated 2 years ago.

9.2 match 4.71 score 172 scripts 1 dependents

klainfo

ScottKnottESD:The Non-Parametric Scott-Knott Effect Size Difference (ESD) Test

The Non-Parametric Scott-Knott Effect Size Difference (ESD) test is a mean comparison approach that leverages a hierarchical clustering to partition the set of treatment means (e.g., means of variable importance scores, means of model performance) into statistically distinct groups with non-negligible difference [Tantithamthavorn et al., (2018) <doi:10.1109/TSE.2018.2794977>].

Maintained by Chakkrit Tantithamthavorn. Last updated 2 years ago.

defect-prediction-models effect-size multiple-comparisons ranking-algorithm scott-knott statistical-tests

7.5 match 43 stars 5.77 score 68 scripts

robjhyndman

rcademy:Tools to assist with academic promotions

Ideas and tools to help with preparing documentation for promotions at universities.

Maintained by Rob Hyndman. Last updated 6 months ago.

10.2 match 14 stars 4.23 score 9 scripts

bioc

target:Predict Combined Function of Transcription Factors

Implement the BETA algorithm for infering direct target genes from DNA-binding and perturbation expression data Wang et al. (2013) <doi: 10.1038/nprot.2013.150>. Extend the algorithm to predict the combined function of two DNA-binding elements from comprable binding and expression data.

Maintained by Mahmoud Ahmed. Last updated 5 months ago.

software statisticalmethod transcription algorithm chip-seq dna-binding gene-regulation transcription-factors

5.5 match 4 stars 7.79 score 1.3k scripts

mikejareds

hermiter:Efficient Sequential and Batch Estimation of Univariate and Bivariate Probability Density Functions and Cumulative Distribution Functions along with Quantiles (Univariate) and Nonparametric Correlation (Bivariate)

Facilitates estimation of full univariate and bivariate probability density functions and cumulative distribution functions along with full quantile functions (univariate) and nonparametric correlation (bivariate) using Hermite series based estimators. These estimators are particularly useful in the sequential setting (both stationary and non-stationary) and one-pass batch estimation setting for large data sets. Based on: Stephanou, Michael, Varughese, Melvin and Macdonald, Iain. "Sequential quantiles via Hermite series density estimation." Electronic Journal of Statistics 11.1 (2017): 570-607 <doi:10.1214/17-EJS1245>, Stephanou, Michael and Varughese, Melvin. "On the properties of Hermite series based distribution function estimators." Metrika (2020) <doi:10.1007/s00184-020-00785-z> and Stephanou, Michael and Varughese, Melvin. "Sequential estimation of Spearman rank correlation using Hermite series estimators." Journal of Multivariate Analysis (2021) <doi:10.1016/j.jmva.2021.104783>.

Maintained by Michael Stephanou. Last updated 7 months ago.

cumulative-distribution-function kendall-correlation-coefficient online-algorithms probability-density-function quantile spearman-correlation-coefficient statistics streaming-algorithms streaming-data cpp

7.7 match 15 stars 5.58 score 17 scripts

r-lib

vctrs:Vector Helpers

Defines new notions of prototype and size that are used to provide tools for consistent and well-founded type-coercion and size-recycling, and are in turn connected to ideas of type- and size-stability useful for analysing function interfaces.

Maintained by Davis Vaughan. Last updated 5 months ago.

s3-vectors

2.3 match 290 stars 18.97 score 1.1k scripts 13k dependents

wanchanglin

mt:Metabolomics Data Analysis Toolbox

Functions for metabolomics data analysis: data preprocessing, orthogonal signal correction, PCA analysis, PCA-DA analysis, PLS-DA analysis, classification, feature selection, correlation analysis, data visualisation and re-sampling strategies.

Maintained by Wanchang Lin. Last updated 1 years ago.

9.2 match 3 stars 4.57 score 50 scripts

thothorn

maxstat:Maximally Selected Rank Statistics

Maximally selected rank statistics with several p-value approximations.

Maintained by Torsten Hothorn. Last updated 8 years ago.

5.6 match 1 stars 7.58 score 107 scripts 56 dependents

fk83

scoringRules:Scoring Rules for Parametric and Simulated Distribution Forecasts

Dictionary-like reference for computing scoring rules in a wide range of situations. Covers both parametric forecast distributions (such as mixtures of Gaussians) and distributions generated via simulation. Further details can be found in the package vignettes <doi:10.18637/jss.v090.i12>, <doi:10.18637/jss.v110.i08>.

Maintained by Fabian Krueger. Last updated 6 months ago.

openblas cpp

3.7 match 59 stars 11.33 score 408 scripts 13 dependents

epiforecasts

scoringutils:Utilities for Scoring and Assessing Predictions

Facilitate the evaluation of forecasts in a convenient framework based on data.table. It allows user to to check their forecasts and diagnose issues, to visualise forecasts and missing data, to transform data before scoring, to handle missing forecasts, to aggregate scores, and to visualise the results of the evaluation. The package mostly focuses on the evaluation of probabilistic forecasts and allows evaluating several different forecast types and input formats. Find more information about the package in the Vignettes as well as in the accompanying paper, <doi:10.48550/arXiv.2205.07090>.

Maintained by Nikos Bosse. Last updated 13 days ago.

forecast-evaluation forecasting

3.7 match 52 stars 11.37 score 326 scripts 7 dependents

bioc

siggenes:Multiple Testing using SAM and Efron's Empirical Bayes Approaches

Identification of differentially expressed genes and estimation of the False Discovery Rate (FDR) using both the Significance Analysis of Microarrays (SAM) and the Empirical Bayes Analyses of Microarrays (EBAM).

Maintained by Holger Schwender. Last updated 5 months ago.

multiplecomparison microarray geneexpression snp exonarray differentialexpression

5.3 match 7.86 score 74 scripts 33 dependents

nliulab

AutoScore:An Interpretable Machine Learning-Based Automatic Clinical Score Generator

A novel interpretable machine learning-based framework to automate the development of a clinical scoring model for predefined outcomes. Our novel framework consists of six modules: variable ranking with machine learning, variable transformation, score derivation, model selection, domain knowledge-based score fine-tuning, and performance evaluation.The details are described in our research paper<doi:10.2196/21798>. Users or clinicians could seamlessly generate parsimonious sparse-score risk models (i.e., risk scores), which can be easily implemented and validated in clinical practice. We hope to see its application in various medical case studies.

Maintained by Feng Xie. Last updated 14 days ago.

5.4 match 32 stars 7.70 score 30 scripts

bioc

ORFik:Open Reading Frames in Genomics

R package for analysis of transcript and translation features through manipulation of sequence data and NGS data like Ribo-Seq, RNA-Seq, TCP-Seq and CAGE. It is generalized in the sense that any transcript region can be analysed, as the name hints to it was made with investigation of ribosomal patterns over Open Reading Frames (ORFs) as it's primary use case. ORFik is extremely fast through use of C++, data.table and GenomicRanges. Package allows to reassign starts of the transcripts with the use of CAGE-Seq data, automatic shifting of RiboSeq reads, finding of Open Reading Frames for whole genomes and much more.

Maintained by Haakon Tjeldnes. Last updated 27 days ago.

immunooncology software sequencing riboseq rnaseq functionalgenomics coverage alignment dataimport cpp

3.9 match 33 stars 10.63 score 115 scripts 2 dependents

gertvv

smaa:Stochastic Multi-Criteria Acceptability Analysis

Implementation of the Stochastic Multi-Criteria Acceptability Analysis (SMAA) family of Multiple Criteria Decision Analysis (MCDA) methods. Tervonen, T. and Figueira, J. R. (2008) <doi:10.1002/mcda.407>.

Maintained by Gert van Valkenhoef. Last updated 2 years ago.

openblas

8.5 match 10 stars 4.84 score 23 scripts 1 dependents

brianaronson

birankr:Ranking Nodes in Bipartite and Weighted Networks

Highly efficient functions for estimating various rank (centrality) measures of nodes in bipartite graphs (two-mode networks). Includes methods for estimating HITS, CoHITS, BGRM, and BiRank with implementation primarily inspired by He et al. (2016) <doi:10.1109/TKDE.2016.2611584>. Also provides easy-to-use tools for efficiently estimating PageRank in one-mode graphs, incorporating or removing edge-weights during rank estimation, projecting two-mode graphs to one-mode, and for converting edgelists and matrices to sparseMatrix format. Best of all, the package's rank estimators can work directly with common formats of network data including edgelists (class data.frame, data.table, or tbl_df) and adjacency matrices (class matrix or dgCMatrix).

Maintained by Brian Aronson. Last updated 3 years ago.

8.4 match 37 stars 4.87 score 9 scripts

hputter

mstate:Data Preparation, Estimation and Prediction in Multi-State Models

Contains functions for data preparation, descriptives, hazard estimation and prediction with Aalen-Johansen or simulation in competing risks and multi-state models, see Putter, Fiocco, Geskus (2007) <doi:10.1002/sim.2712>.

Maintained by Hein Putter. Last updated 28 days ago.

3.4 match 11 stars 12.13 score 322 scripts 55 dependents

welch-lab

cytosignal:What the Package Does (One Line, Title Case)

What the package does (one paragraph).

Maintained by Jialin Liu. Last updated 6 days ago.

openblas cpp

6.8 match 16 stars 5.95 score 6 scripts

ncss-tech

aqp:Algorithms for Quantitative Pedology

The Algorithms for Quantitative Pedology (AQP) project was started in 2009 to organize a loosely-related set of concepts and source code on the topic of soil profile visualization, aggregation, and classification into this package (aqp). Over the past 8 years, the project has grown into a suite of related R packages that enhance and simplify the quantitative analysis of soil profile data. Central to the AQP project is a new vocabulary of specialized functions and data structures that can accommodate the inherent complexity of soil profile information; freeing the scientist to focus on ideas rather than boilerplate data processing tasks <doi:10.1016/j.cageo.2012.10.020>. These functions and data structures have been extensively tested and documented, applied to projects involving hundreds of thousands of soil profiles, and deeply integrated into widely used tools such as SoilWeb <https://casoilresource.lawr.ucdavis.edu/soilweb-apps>. Components of the AQP project (aqp, soilDB, sharpshootR, soilReports packages) serve an important role in routine data analysis within the USDA-NRCS Soil Science Division. The AQP suite of R packages offer a convenient platform for bridging the gap between pedometric theory and practice.

Maintained by Dylan Beaudette. Last updated 28 days ago.

digital-soil-mapping ncss-tech nrcs pedology pedometrics soil soil-survey usda

3.4 match 55 stars 11.77 score 1.2k scripts 2 dependents

maiermarco

prefmod:Utilities to Fit Paired Comparison Models for Preferences

Generates design matrix for analysing real paired comparisons and derived paired comparison data (Likert type items/ratings or rankings) using a loglinear approach. Fits loglinear Bradley-Terry model (LLBT) exploiting an eliminate feature. Computes pattern models for paired comparisons, rankings, and ratings. Some treatment of missing values (MCAR and MNAR). Fits latent class (mixture) models for paired comparison, rating and ranking patterns using a non-parametric ML approach.

Maintained by Marco Johannes Maier. Last updated 1 years ago.

fortran

16.5 match 2.41 score 53 scripts 1 dependents

ampl-psych

EMC2:Bayesian Hierarchical Analysis of Cognitive Models of Choice

Fit Bayesian (hierarchical) cognitive models using a linear modeling language interface using particle Metropolis Markov chain Monte Carlo sampling with Gibbs steps. The diffusion decision model (DDM), linear ballistic accumulator model (LBA), racing diffusion model (RDM), and the lognormal race model (LNR) are supported. Additionally, users can specify their own likelihood function and/or choose for non-hierarchical estimation, as well as for a diagonal, blocked or full multivariate normal group-level distribution to test individual differences. Prior specification is facilitated through methods that visualize the (implied) prior. A wide range of plotting functions assist in assessing model convergence and posterior inference. Models can be easily evaluated using functions that plot posterior predictions or using relative model comparison metrics such as information criteria or Bayes factors. References: Stevenson et al. (2024) <doi:10.31234/osf.io/2e4dq>.

Maintained by Niek Stevenson. Last updated 6 days ago.

cpp

4.8 match 13 stars 8.25 score 392 scripts

tidyverse

stringr:Simple, Consistent Wrappers for Common String Operations

A consistent, simple and easy to use set of wrappers around the fantastic 'stringi' package. All function and argument names (and positions) are consistent, all functions deal with "NA"'s and zero length vectors in the same way, and the output from one function is easy to feed into the input of another.

Maintained by Hadley Wickham. Last updated 7 months ago.

regular-expression strings

1.8 match 622 stars 21.97 score 164k scripts 8.2k dependents

f-caeiro

randtests:Testing Randomness in R

Provides several non parametric randomness tests for numeric sequences.

Maintained by Frederico Caeiro. Last updated 11 months ago.

9.9 match 4.00 score 235 scripts 3 dependents

pmartr

pmartR:Panomics Marketplace - Quality Control and Statistical Analysis for Panomics Data

Provides functionality for quality control processing and statistical analysis of mass spectrometry (MS) omics data, in particular proteomic (either at the peptide or the protein level), lipidomic, and metabolomic data, as well as RNA-seq based count data and nuclear magnetic resonance (NMR) data. This includes data transformation, specification of groups that are to be compared against each other, filtering of features and/or samples, data normalization, data summarization (correlation, PCA), and statistical comparisons between defined groups. Implements methods described in: Webb-Robertson et al. (2014) <doi:10.1074/mcp.M113.030932>. Webb-Robertson et al. (2011) <doi:10.1002/pmic.201100078>. Matzke et al. (2011) <doi:10.1093/bioinformatics/btr479>. Matzke et al. (2013) <doi:10.1002/pmic.201200269>. Polpitiya et al. (2008) <doi:10.1093/bioinformatics/btn217>. Webb-Robertson et al. (2010) <doi:10.1021/pr1005247>.

Maintained by Lisa Bramer. Last updated 3 days ago.

data-summarization lipids mass-spectrometry metabolites metabolomics-data peptides proteins rna-seq-analysis openblas cpp

5.1 match 40 stars 7.69 score 144 scripts

bioc

PDATK:Pancreatic Ductal Adenocarcinoma Tool-Kit

Pancreatic ductal adenocarcinoma (PDA) has a relatively poor prognosis and is one of the most lethal cancers. Molecular classification of gene expression profiles holds the potential to identify meaningful subtypes which can inform therapeutic strategy in the clinical setting. The Pancreatic Cancer Adenocarcinoma Tool-Kit (PDATK) provides an S4 class-based interface for performing unsupervised subtype discovery, cross-cohort meta-clustering, gene-expression-based classification, and subsequent survival analysis to identify prognostically useful subtypes in pancreatic cancer and beyond. Two novel methods, Consensus Subtypes in Pancreatic Cancer (CSPC) and Pancreatic Cancer Overall Survival Predictor (PCOSP) are included for consensus-based meta-clustering and overall-survival prediction, respectively. Additionally, four published subtype classifiers and three published prognostic gene signatures are included to allow users to easily recreate published results, apply existing classifiers to new data, and benchmark the relative performance of new methods. The use of existing Bioconductor classes as input to all PDATK classes and methods enables integration with existing Bioconductor datasets, including the 21 pancreatic cancer patient cohorts available in the MetaGxPancreas data package. PDATK has been used to replicate results from Sandhu et al (2019) [https://doi.org/10.1200/cci.18.00102] and an additional paper is in the works using CSPC to validate subtypes from the included published classifiers, both of which use the data available in MetaGxPancreas. The inclusion of subtype centroids and prognostic gene signatures from these and other publications will enable researchers and clinicians to classify novel patient gene expression data, allowing the direct clinical application of the classifiers included in PDATK. Overall, PDATK provides a rich set of tools to identify and validate useful prognostic and molecular subtypes based on gene-expression data, benchmark new classifiers against existing ones, and apply discovered classifiers on novel patient data to inform clinical decision making.

Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.

geneexpression pharmacogenetics pharmacogenomics software classification survival clustering geneprediction

9.1 match 1 stars 4.31 score 17 scripts

hanettools

Perc:Using Percolation and Conductance to Find Information Flow Certainty in a Direct Network

To find the certainty of dominance interactions with indirect interactions being considered.

Maintained by Jessica Vandeleest. Last updated 4 years ago.

6.7 match 5.88 score 38 scripts

tomasfryda

h2o:R Interface for the 'H2O' Scalable Machine Learning Platform

R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).

Maintained by Tomas Fryda. Last updated 1 years ago.

4.8 match 3 stars 8.20 score 7.8k scripts 11 dependents

vladimirholy

gasmodel:Generalized Autoregressive Score Models

Estimation, forecasting, and simulation of generalized autoregressive score (GAS) models of Creal, Koopman, and Lucas (2013) <doi:10.1002/jae.1279> and Harvey (2013) <doi:10.1017/cbo9781139540933>. Model specification allows for various data types and distributions, different parametrizations, exogenous variables, joint and separate modeling of exogenous variables and dynamics, higher score and autoregressive orders, custom and unconditional initial values of time-varying parameters, fixed and bounded values of coefficients, and missing values. Model estimation is performed by the maximum likelihood method.

Maintained by Vladimír Holý. Last updated 1 years ago.

dcs gas time-series

7.2 match 14 stars 5.45 score 2 scripts

r-forge

CHNOSZ:Thermodynamic Calculations and Diagrams for Geochemistry

An integrated set of tools for thermodynamic calculations in aqueous geochemistry and geobiochemistry. Functions are provided for writing balanced reactions to form species from user-selected basis species and for calculating the standard molal properties of species and reactions, including the standard Gibbs energy and equilibrium constant. Calculations of the non-equilibrium chemical affinity and equilibrium chemical activity of species can be portrayed on diagrams as a function of temperature, pressure, or activity of basis species; in two dimensions, this gives a maximum affinity or predominance diagram. The diagrams have formatted chemical formulas and axis labels, and water stability limits can be added to Eh-pH, oxygen fugacity- temperature, and other diagrams with a redox variable. The package has been developed to handle common calculations in aqueous geochemistry, such as solubility due to complexation of metal ions, mineral buffers of redox or pH, and changing the basis species across a diagram ("mosaic diagrams"). CHNOSZ also implements a group additivity algorithm for the standard thermodynamic properties of proteins.

Maintained by Jeffrey Dick. Last updated 8 days ago.

fortran

4.1 match 9.46 score 238 scripts 4 dependents

albertocalore

MARMoT:Matching on Poset-Based Average Rank for Multiple Treatments (MARMoT)

It contains the function to apply MARMoT balancing technique discussed in: Silan, Boccuzzo, Arpino (2021) <DOI:10.1002/sim.9192>, Silan, Belloni, Boccuzzo, (2023) <DOI:10.1007/s10260-023-00695-0>; furthermore it contains a function for computing the Deloof's approximation of the average rank (and also a parallelized version) and a function to compute the Absolute Standardized Bias.

Maintained by Alberto Calore. Last updated 7 months ago.

asb average-rank poset propensity-score propensity-scores template-matching

14.4 match 2.70 score 6 scripts

alexanderjwhite

mixedLSR:Mixed, Low-Rank, and Sparse Multivariate Regression on High-Dimensional Data

Mixed, low-rank, and sparse multivariate regression ('mixedLSR') provides tools for performing mixture regression when the coefficient matrix is low-rank and sparse. 'mixedLSR' allows subgroup identification by alternating optimization with simulated annealing to encourage global optimum convergence. This method is data-adaptive, automatically performing parameter selection to identify low-rank substructures in the coefficient matrix.

Maintained by Alexander White. Last updated 2 years ago.

10.5 match 3.70 score 2 scripts

bioc

DESeq2:Differential gene expression analysis based on the negative binomial distribution

Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution.

Maintained by Michael Love. Last updated 11 days ago.

sequencing rnaseq chipseq geneexpression transcription normalization differentialexpression bayesian regression principalcomponent clustering immunooncology openblas cpp

2.4 match 375 stars 16.11 score 17k scripts 115 dependents

bioc

GeneExpressionSignature:Gene Expression Signature based Similarity Metric

This package gives the implementations of the gene expression signature and its distance to each. Gene expression signature is represented as a list of genes whose expression is correlated with a biological state of interest. And its distance is defined using a nonparametric, rank-based pattern-matching strategy based on the Kolmogorov-Smirnov statistic. Gene expression signature and its distance can be used to detect similarities among the signatures of drugs, diseases, and biological states of interest.

Maintained by Yang Cao. Last updated 5 months ago.

geneexpression

7.8 match 1 stars 5.00 score 5 scripts

zebang

tensorTS:Factor and Autoregressive Models for Tensor Time Series

Factor and autoregressive models for matrix and tensor valued time series. We provide functions for estimation, simulation and prediction. The models are discussed in Li et al (2021) <doi:10.48550/arXiv.2110.00928>, Chen et al (2020) <DOI:10.1080/01621459.2021.1912757>, Chen et al (2020) <DOI:10.1016/j.jeconom.2020.07.015>, and Xiao et al (2020) <doi:10.48550/arXiv.2006.02611>.

Maintained by Zebang Li. Last updated 13 days ago.

7.4 match 123 stars 5.27 score 4 scripts

r-forge

Matrix:Sparse and Dense Matrix Classes and Methods

A rich hierarchy of sparse and dense matrix classes, including general, symmetric, triangular, and diagonal matrices with numeric, logical, or pattern entries. Efficient methods for operating on such matrices, often wrapping the 'BLAS', 'LAPACK', and 'SuiteSparse' libraries.

Maintained by Martin Maechler. Last updated 6 days ago.

openblas

2.3 match 1 stars 17.23 score 33k scripts 12k dependents

leoegidi

footBayes:Fitting Bayesian and MLE Football Models

This is the first package allowing for the estimation, visualization and prediction of the most well-known football models: double Poisson, bivariate Poisson, Skellam, student_t, diagonal-inflated bivariate Poisson, and zero-inflated Skellam. It supports both maximum likelihood estimation (MLE, for 'static' models only) and Bayesian inference. For Bayesian methods, it incorporates several techniques: MCMC sampling with Hamiltonian Monte Carlo, variational inference using either the Pathfinder algorithm or Automatic Differentiation Variational Inference (ADVI), and the Laplace approximation. The package compiles all the 'CmdStan' models once during installation using the 'instantiate' package. The model construction relies on the most well-known football references, such as Dixon and Coles (1997) <doi:10.1111/1467-9876.00065>, Karlis and Ntzoufras (2003) <doi:10.1111/1467-9884.00366> and Egidi, Pauli and Torelli (2018) <doi:10.1177/1471082X18798414>.

Maintained by Leonardo Egidi. Last updated 8 hours ago.

5.2 match 42 stars 7.43 score 53 scripts

covaruber

sommer:Solving Mixed Model Equations in R

Structural multivariate-univariate linear mixed model solver for estimation of multiple random effects with unknown variance-covariance structures (e.g., heterogeneous and unstructured) and known covariance among levels of random effects (e.g., pedigree and genomic relationship matrices) (Covarrubias-Pazaran, 2016 <doi:10.1371/journal.pone.0156744>; Maier et al., 2015 <doi:10.1016/j.ajhg.2014.12.006>; Jensen et al., 1997). REML estimates can be obtained using the Direct-Inversion Newton-Raphson and Direct-Inversion Average Information algorithms for the problems r x r (r being the number of records) or using the Henderson-based average information algorithm for the problem c x c (c being the number of coefficients to estimate). Spatial models can also be fitted using the two-dimensional spline functionality available.

Maintained by Giovanny Covarrubias-Pazaran. Last updated 21 days ago.

average-information mixed-models rcpparmadillo openblas cpp openmp

3.0 match 43 stars 12.70 score 300 scripts 9 dependents

peiliangbai92

LSVAR:Estimation of Low Rank Plus Sparse Structured Vector Auto-Regressive (VAR) Model

Implementations of estimation algorithm of low rank plus sparse structured VAR model by using Fast Iterative Shrinkage-Thresholding Algorithm (FISTA). It relates to the algorithm in Sumanta, Li, and Michailidis (2019) <doi:10.1109/TSP.2018.2887401>.

Maintained by Peiliang Bai. Last updated 4 years ago.

8.4 match 4 stars 4.60 score 5 scripts

dominicmagirr

nphRCT:Non-Proportional Hazards in Randomized Controlled Trials

Perform a stratified weighted log-rank test in a randomized controlled trial. Tests can be visualized as a difference in average score on the two treatment arms. These methods are described in Magirr and Burman (2018) <doi:10.48550/arXiv.1807.11097>, Magirr (2020) <doi:10.48550/arXiv.2007.04767>, and Magirr and Jimenez (2022) <doi:10.48550/arXiv.2201.10445>.

Maintained by Dominic Magirr. Last updated 9 months ago.

9.3 match 4.16 score 24 scripts 1 dependents

sieste

SpecsVerification:Forecast Verification Routines for Ensemble Forecasts of Weather and Climate

A collection of forecast verification routines developed for the SPECS FP7 project. The emphasis is on comparative verification of ensemble forecasts of weather and climate.

Maintained by Stefan Siegert. Last updated 5 years ago.

cpp

12.2 match 1 stars 3.12 score 44 scripts 5 dependents

bioc

epiNEM:epiNEM

epiNEM is an extension of the original Nested Effects Models (NEM). EpiNEM is able to take into account double knockouts and infer more complex network signalling pathways. It is tailored towards large scale double knock-out screens.

Maintained by Martin Pirkl. Last updated 5 months ago.

pathways systemsbiology networkinference network

6.5 match 1 stars 5.83 score 1 scripts 3 dependents

bioc

XVector:Foundation of external vector representation and manipulation in Bioconductor

Provides memory efficient S4 classes for storing sequences "externally" (e.g. behind an R external pointer, or on disk).

Maintained by Hervé Pagès. Last updated 2 months ago.

infrastructure datarepresentation bioconductor-package core-package zlib

3.3 match 2 stars 11.36 score 67 scripts 1.7k dependents

bcmullins

mobilityIndexR:Calculates Transition Matrices and Mobility Indices

Measures mobility in a population through transition matrices and mobility indices. Relative, mixed, and absolute transition matrices are supported. The Prais-Bibby, Absolute Movement, Origin Specific, and Weighted Group Mobility indices are supported. Example income and grade data are included.

Maintained by Brett Mullins. Last updated 3 years ago.

ranks transition-matrices

10.0 match 3.70 score 4 scripts

gobbios

EloRating:Animal Dominance Hierarchies by Elo Rating

Provides functions to quantify animal dominance hierarchies. The major focus is on Elo rating and its ability to deal with temporal dynamics in dominance interaction sequences. For static data, David's score and de Vries' I&SI are also implemented. In addition, the package provides functions to assess transitivity, linearity and stability of dominance networks. See Neumann et al (2011) <doi:10.1016/j.anbehav.2011.07.016> for an introduction.

Maintained by Christof Neumann. Last updated 8 months ago.

cpp

5.4 match 4 stars 6.86 score 61 scripts 1 dependents

rstudio

keras3:R Interface to 'Keras'

Interface to 'Keras' <https://keras.io>, a high-level neural networks API. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both CPU and GPU devices.

Maintained by Tomasz Kalinowski. Last updated 4 days ago.

2.7 match 845 stars 13.57 score 264 scripts 2 dependents

bioc

GSAR:Gene Set Analysis in R

Gene set analysis using specific alternative hypotheses. Tests for differential expression, scale and net correlation structure.

Maintained by Yasir Rahmatallah. Last updated 5 months ago.

software statisticalmethod differentialexpression

8.3 match 4.38 score 7 scripts

angusian

Kendall:Kendall Rank Correlation and Mann-Kendall Trend Test

Computes the Kendall rank correlation and Mann-Kendall trend test. See documentation for use of block bootstrap when there is autocorrelation.

Maintained by A.I. McLeod. Last updated 3 years ago.

fortran

5.4 match 6.74 score 864 scripts 25 dependents

corybrunson

ordr:A Tidyverse Extension for Ordinations and Biplots

Ordination comprises several multivariate exploratory and explanatory techniques with theoretical foundations in geometric data analysis; see Podani (2000, ISBN:90-5782-067-6) for techniques and applications and Le Roux & Rouanet (2005) <doi:10.1007/1-4020-2236-0> for foundations. Greenacre (2010, ISBN:978-84-923846) shows how the most established of these, including principal components analysis, correspondence analysis, multidimensional scaling, factor analysis, and discriminant analysis, rely on eigen-decompositions or singular value decompositions of pre-processed numeric matrix data. These decompositions give rise to a set of shared coordinates along which the row and column elements can be measured. The overlay of their scatterplots on these axes, introduced by Gabriel (1971) <doi:10.1093/biomet/58.3.453>, is called a biplot. 'ordr' provides inspection, extraction, manipulation, and visualization tools for several popular ordination classes supported by a set of recovery methods. It is inspired by and designed to integrate into 'tidyverse' workflows provided by Wickham et al (2019) <doi:10.21105/joss.01686>.

Maintained by Jason Cory Brunson. Last updated 12 days ago.

biplot data-visualization dimension-reduction geometric-data-analysis grammar-of-graphics log-ratio-analysis multivariate-analysis multivariate-statistics ordination tidymodels tidyverse

5.0 match 24 stars 7.26 score 28 scripts

adamlilith

statisfactory:Statistical and Geometrical Tools

A collection of statistical and geometrical tools including the aligned rank transform (ART; Higgins et al. 1990 <doi:10.4148/2475-7772.1443>; Peterson 2002 <doi:10.22237/jmasm/1020255240>; Wobbrock et al. 2011 <doi:10.1145/1978942.1978963>), 2-D histograms and histograms with overlapping bins, a function for making all possible formulae within a set of constraints, amongst others.

Maintained by Adam B. Smith. Last updated 5 months ago.

2d-histograms aligned-rank-transform sampling

10.7 match 3.38 score 16 scripts 1 dependents

bioc

surfaltr:Rapid Comparison of Surface Protein Isoform Membrane Topologies Through surfaltr

Cell surface proteins form a major fraction of the druggable proteome and can be used for tissue-specific delivery of oligonucleotide/cell-based therapeutics. Alternatively spliced surface protein isoforms have been shown to differ in their subcellular localization and/or their transmembrane (TM) topology. Surface proteins are hydrophobic and remain difficult to study thereby necessitating the use of TM topology prediction methods such as TMHMM and Phobius. However, there exists a need for bioinformatic approaches to streamline batch processing of isoforms for comparing and visualizing topologies. To address this gap, we have developed an R package, surfaltr. It pairs inputted isoforms, either known alternatively spliced or novel, with their APPRIS annotated principal counterparts, predicts their TM topologies using TMHMM or Phobius, and generates a customizable graphical output. Further, surfaltr facilitates the prioritization of biologically diverse isoform pairs through the incorporation of three different ranking metrics and through protein alignment functions. Citations for programs mentioned here can be found in the vignette.

Maintained by Pooja Gangras. Last updated 5 months ago.

software visualization datarepresentation splicedalignment alignment multiplesequencealignment multiplecomparison

9.1 match 4.00 score 2 scripts

bioc

cmapR:CMap Tools in R

The Connectivity Map (CMap) is a massive resource of perturbational gene expression profiles built by researchers at the Broad Institute and funded by the NIH Library of Integrated Network-Based Cellular Signatures (LINCS) program. Please visit https://clue.io for more information. The cmapR package implements methods to parse, manipulate, and write common CMap data objects, such as annotated matrices and collections of gene sets.

Maintained by Ted Natoli. Last updated 5 months ago.

dataimport datarepresentation geneexpression bioconductor bioinformatics cmap

4.0 match 89 stars 8.85 score 298 scripts