Showing 200 of total 1256 results (show query)
schochastics
netrankr:Analyzing Partial Rankings in Networks
Implements methods for centrality related analyses of networks. While the package includes the possibility to build more than 20 indices, its main focus lies on index-free assessment of centrality via partial rankings obtained by neighborhood-inclusion or positional dominance. These partial rankings can be analyzed with different methods, including probabilistic methods like computing expected node ranks and relative rank probabilities (how likely is it that a node is more central than another?). The methodology is described in depth in the vignettes and in Schoch (2018) <doi:10.1016/j.socnet.2017.12.003>.
Maintained by David Schoch. Last updated 1 months ago.
network-analysisnetwork-centralityopenblascppopenmp
35.8 match 49 stars 9.56 score 91 scripts 2 dependentshturner
PlackettLuce:Plackett-Luce Models for Rankings
Functions to prepare rankings data and fit the Plackett-Luce model jointly attributed to Plackett (1975) <doi:10.2307/2346567> and Luce (1959, ISBN:0486441369). The standard Plackett-Luce model is generalized to accommodate ties of any order in the ranking. Partial rankings, in which only a subset of items are ranked in each ranking, are also accommodated in the implementation. Disconnected/weakly connected networks implied by the rankings may be handled by adding pseudo-rankings with a hypothetical item. Optionally, a multivariate normal prior may be set on the log-worth parameters and ranker reliabilities may be incorporated as proposed by Raman and Joachims (2014) <doi:10.1145/2623330.2623654>. Maximum a posteriori estimation is used when priors are set. Methods are provided to estimate standard errors or quasi-standard errors for inference as well as to fit Plackett-Luce trees. See the package website or vignette for further details.
Maintained by Heather Turner. Last updated 2 years ago.
plackett-luce-modelspreferencesrankingrankings-datastatistical-models
41.9 match 20 stars 7.97 score 86 scripts 3 dependentsstatnet
ergm.rank:Fit, Simulate and Diagnose Exponential-Family Models for Rank-Order Relational Data
A set of extensions for the 'ergm' package to fit weighted networks whose edge weights are ranks. See Krivitsky and Butts (2017) <doi:10.1177/0081175017692623> and Krivitsky, Hunter, Morris, and Klumb (2023) <doi:10.18637/jss.v105.i06>.
Maintained by Pavel N. Krivitsky. Last updated 4 months ago.
46.6 match 4 stars 6.98 score 9 scripts 1 dependentsocbe-uio
BayesMallows:Bayesian Preference Learning with the Mallows Rank Model
An implementation of the Bayesian version of the Mallows rank model (Vitelli et al., Journal of Machine Learning Research, 2018 <https://jmlr.org/papers/v18/15-481.html>; Crispino et al., Annals of Applied Statistics, 2019 <doi:10.1214/18-AOAS1203>; Sorensen et al., R Journal, 2020 <doi:10.32614/RJ-2020-026>; Stein, PhD Thesis, 2023 <https://eprints.lancs.ac.uk/id/eprint/195759>). Both Metropolis-Hastings and sequential Monte Carlo algorithms for estimating the models are available. Cayley, footrule, Hamming, Kendall, Spearman, and Ulam distances are supported in the models. The rank data to be analyzed can be in the form of complete rankings, top-k rankings, partially missing rankings, as well as consistent and inconsistent pairwise preferences. Several functions for plotting and studying the posterior distributions of parameters are provided. The package also provides functions for estimating the partition function (normalizing constant) of the Mallows rank model, both with the importance sampling algorithm of Vitelli et al. and asymptotic approximation with the IPFP algorithm (Mukherjee, Annals of Statistics, 2016 <doi:10.1214/15-AOS1389>).
Maintained by Oystein Sorensen. Last updated 1 months ago.
mallows-modelopenblascppopenmp
39.9 match 21 stars 7.91 score 36 scripts 1 dependentsropensci
taxize:Taxonomic Information from Around the Web
Interacts with a suite of web application programming interfaces (API) for taxonomic tasks, such as getting database specific taxonomic identifiers, verifying species names, getting taxonomic hierarchies, fetching downstream and upstream taxonomic names, getting taxonomic synonyms, converting scientific to common names and vice versa, and more. Some of the services supported include 'NCBI E-utilities' (<https://www.ncbi.nlm.nih.gov/books/NBK25501/>), 'Encyclopedia of Life' (<https://eol.org/docs/what-is-eol/data-services>), 'Global Biodiversity Information Facility' (<https://techdocs.gbif.org/en/openapi/>), and many more. Links to the API documentation for other supported services are available in the documentation for their respective functions in this package.
Maintained by Zachary Foster. Last updated 12 days ago.
taxonomybiologynomenclaturejsonapiwebapi-clientidentifiersspeciesnamesapi-wrapperbiodiversitydarwincoredatataxize
22.2 match 274 stars 13.63 score 1.6k scripts 23 dependentssnoweye
pbdMPI:R Interface to MPI for HPC Clusters (Programming with Big Data Project)
A simplified, efficient, interface to MPI for HPC clusters. It is a derivation and rethinking of the Rmpi package. pbdMPI embraces the prevalent parallel programming style on HPC clusters. Beyond the interface, a collection of functions for global work with distributed data and resource-independent RNG reproducibility is included. It is based on S4 classes and methods.
Maintained by Wei-Chen Chen. Last updated 6 months ago.
41.1 match 2 stars 7.11 score 179 scripts 3 dependentsharrelfe
Hmisc:Harrell Miscellaneous
Contains many functions useful for data analysis, high-level graphics, utility operations, functions for computing sample size and power, simulation, importing and annotating datasets, imputing missing values, advanced table making, variable clustering, character string manipulation, conversion of R objects to LaTeX and html code, recoding variables, caching, simplified parallel computing, encrypting and decrypting data using a safe workflow, general moving window statistical estimation, and assistance in interpreting principal component analysis.
Maintained by Frank E Harrell Jr. Last updated 2 days ago.
11.2 match 210 stars 17.61 score 17k scripts 750 dependentsmodal-inria
Rankcluster:Model-Based Clustering for Multivariate Partial Ranking Data
Implementation of a model-based clustering algorithm for ranking data (C. Biernacki, J. Jacques (2013) <doi:10.1016/j.csda.2012.08.008>). Multivariate rankings as well as partial rankings are taken into account. This algorithm is based on an extension of the Insertion Sorting Rank (ISR) model for ranking data, which is a meaningful and effective model parametrized by a position parameter (the modal ranking, quoted by mu) and a dispersion parameter (quoted by pi). The heterogeneity of the rank population is modelled by a mixture of ISR, whereas conditional independence assumption is considered for multivariate rankings.
Maintained by Quentin Grimonprez. Last updated 2 years ago.
clusteringhacktoberfestrankcpp
51.3 match 1 stars 3.74 score 37 scripts 1 dependentsmjskay
ARTool:Aligned Rank Transform
The aligned rank transform for nonparametric factorial ANOVAs as described by Wobbrock, Findlater, Gergle, and Higgins (2011) <doi:10.1145/1978942.1978963>. Also supports aligned rank transform contrasts as described by Elkin, Kay, Higgins, and Wobbrock (2021) <doi:10.1145/3472749.3474784>.
Maintained by Matthew Kay. Last updated 3 years ago.
20.4 match 60 stars 8.64 score 307 scriptsr-lib
bit64:A S3 Class for Vectors of 64bit Integers
Package 'bit64' provides serializable S3 atomic 64bit (signed) integers. These are useful for handling database keys and exact counting in +-2^63. WARNING: do not use them as replacement for 32bit integers, integer64 are not supported for subscripting by R-core and they have different semantics when combined with double, e.g. integer64 + double => integer64. Class integer64 can be used in vectors, matrices, arrays and data.frames. Methods are available for coercion from and to logicals, integers, doubles, characters and factors as well as many elementwise and summary functions. Many fast algorithmic operations such as 'match' and 'order' support inter- active data exploration and manipulation and optionally leverage caching.
Maintained by Michael Chirico. Last updated 3 days ago.
11.7 match 35 stars 14.91 score 1.5k scripts 3.2k dependentshappma
pseudorank:Pseudo-Ranks
Efficient calculation of pseudo-ranks and (pseudo)-rank based test statistics. In case of equal sample sizes, pseudo-ranks and mid-ranks are equal. When used for inference mid-ranks may lead to paradoxical results. Pseudo-ranks are in general not affected by such a problem. See Happ et al. (2020, <doi:10.18637/jss.v095.c01>) for details.
Maintained by Martin Happ. Last updated 26 days ago.
cppnonparametricnonparametric-statisticspseudo-rankpseudo-ranksrankrank-teststrend-testcpp
44.5 match 3 stars 3.71 score 17 scriptsdanielwilhelm
csranks:Statistical Tools for Ranks
Account for uncertainty when working with ranks. Estimate standard errors consistently in linear regression with ranked variables. Construct confidence sets of various kinds for positions of populations in a ranking based on values of a certain feature and their estimation errors. Theory based on Mogstad, Romano, Shaikh, and Wilhelm (2023)<doi:10.1093/restud/rdad006> and Chetverikov and Wilhelm (2023) <arXiv:2310.15512>.
Maintained by Daniel Wilhelm. Last updated 6 months ago.
31.4 match 5 stars 5.18 score 7 scriptshugaped
MBNMAdose:Dose-Response MBNMA Models
Fits Bayesian dose-response model-based network meta-analysis (MBNMA) that incorporate multiple doses within an agent by modelling different dose-response functions, as described by Mawdsley et al. (2016) <doi:10.1002/psp4.12091>. By modelling dose-response relationships this can connect networks of evidence that might otherwise be disconnected, and can improve precision on treatment estimates. Several common dose-response functions are provided; others may be added by the user. Various characteristics and assumptions can be flexibly added to the models, such as shared class effects. The consistency of direct and indirect evidence in the network can be assessed using unrelated mean effects models and/or by node-splitting at the treatment level.
Maintained by Hugo Pedder. Last updated 1 months ago.
24.1 match 10 stars 6.60 scoreeheinzen
elo:Ranking Teams by Elo Rating and Comparable Methods
A flexible framework for calculating Elo ratings and resulting rankings of any two-team-per-matchup system (chess, sports leagues, 'Go', etc.). This implementation is capable of evaluating a variety of matchups, Elo rating updates, and win probabilities, all based on the basic Elo rating system. It also includes methods to benchmark performance, including logistic regression and Markov chain models.
Maintained by Ethan Heinzen. Last updated 1 years ago.
eloelo-ratinglogistic-regressionmarkov-chainmarkov-modelrankingsports-analyticscpp
22.3 match 37 stars 7.05 score 153 scriptsbioc
UCell:Rank-based signature enrichment analysis for single-cell data
UCell is a package for evaluating gene signatures in single-cell datasets. UCell signature scores, based on the Mann-Whitney U statistic, are robust to dataset size and heterogeneity, and their calculation demands less computing time and memory than other available methods, enabling the processing of large datasets in a few minutes even on machines with limited computing power. UCell can be applied to any single-cell data matrix, and includes functions to directly interact with SingleCellExperiment and Seurat objects.
Maintained by Massimo Andreatta. Last updated 5 months ago.
singlecellgenesetenrichmenttranscriptomicsgeneexpressioncellbasedassays
14.4 match 143 stars 10.43 score 454 scripts 2 dependentsgvalentini58
RANKS:Ranking of Nodes with Kernelized Score Functions
Implementation of Kernelized score functions and other semi-supervised learning algorithms for node label ranking to analyze biomolecular networks. RANKS can be easily applied to a large set of different relevant problems in computational biology, ranging from automatic protein function prediction, to gene disease prioritization and drug repositioning, and more in general to any bioinformatics problem that can be formalized as a node label ranking problem in a graph. The modular nature of the implementation allows to experiment with different score functions and kernels and to easily compare the results with baseline network-based methods such as label propagation and random walk algorithms, as well as to enlarge the algorithmic scheme by adding novel user-defined score functions and kernels.
Maintained by Giorgio Valentini. Last updated 2 years ago.
76.2 match 1.90 score 7 scriptsrstudio
sortable:Drag-and-Drop in 'shiny' Apps with 'SortableJS'
Enables drag-and-drop behaviour in Shiny apps, by exposing the functionality of the 'SortableJS' <https://sortablejs.github.io/Sortable/> JavaScript library as an 'htmlwidget'. You can use this in Shiny apps and widgets, 'learnr' tutorials as well as R Markdown. In addition, provides a custom 'learnr' question type - 'question_rank()' - that allows ranking questions with drag-and-drop.
Maintained by Andrie de Vries. Last updated 6 months ago.
12.2 match 135 stars 11.62 score 368 scripts 13 dependentstidyverse
dplyr:A Grammar of Data Manipulation
A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
Maintained by Hadley Wickham. Last updated 12 days ago.
5.5 match 4.8k stars 24.68 score 659k scripts 7.8k dependentskassambara
survminer:Drawing Survival Curves using 'ggplot2'
Contains the function 'ggsurvplot()' for drawing easily beautiful and 'ready-to-publish' survival curves with the 'number at risk' table and 'censoring count plot'. Other functions are also available to plot adjusted curves for `Cox` model and to visually examine 'Cox' model assumptions.
Maintained by Alboukadel Kassambara. Last updated 5 months ago.
8.1 match 524 stars 15.87 score 7.0k scripts 55 dependentsstan-dev
projpred:Projection Predictive Feature Selection
Performs projection predictive feature selection for generalized linear models (Piironen, Paasiniemi, and Vehtari, 2020, <doi:10.1214/20-EJS1711>) with or without multilevel or additive terms (Catalina, Bürkner, and Vehtari, 2022, <https://proceedings.mlr.press/v151/catalina22a.html>), for some ordinal and nominal regression models (Weber, Glass, and Vehtari, 2023, <arXiv:2301.01660>), and for many other regression models (using the latent projection by Catalina, Bürkner, and Vehtari, 2021, <arXiv:2109.04702>, which can also be applied to most of the former models). The package is compatible with the 'rstanarm' and 'brms' packages, but other reference models can also be used. See the vignettes and the documentation for more information and examples.
Maintained by Frank Weber. Last updated 1 months ago.
bayesbayesianbayesian-inferencerstanarmstanstatisticsvariable-selectionopenblascpp
12.1 match 112 stars 10.08 score 241 scriptsagrdatasci
gosset:Tools for Data Analysis in Experimental Agriculture
Methods to analyse experimental agriculture data, from data synthesis to model selection and visualisation. The package is named after W.S. Gosset aka ‘Student’, a pioneer of modern statistics in small sample experimental design and analysis.
Maintained by Kauê de Sousa. Last updated 3 months ago.
experimental-designrankings-data
18.5 match 6 stars 6.44 score 23 scriptspearce790
rankrate:Joint Statistical Models for Preference Learning with Rankings and Ratings
Statistical tools for the Mallows-Binomial model, the first joint statistical model for preference learning for rankings and ratings. This project was supported by the National Science Foundation under Grant No. 2019901.
Maintained by Michael Pearce. Last updated 10 months ago.
28.5 match 4.00 score 6 scriptsbluefoxr
COINr:Composite Indicator Construction and Analysis
A comprehensive high-level package, for composite indicator construction and analysis. It is a "development environment" for composite indicators and scoreboards, which includes utilities for construction (indicator selection, denomination, imputation, data treatment, normalisation, weighting and aggregation) and analysis (multivariate analysis, correlation plotting, short cuts for principal component analysis, global sensitivity analysis, and more). A composite indicator is completely encapsulated inside a single hierarchical list called a "coin". This allows a fast and efficient work flow, as well as making quick copies, testing methodological variations and making comparisons. It also includes many plotting options, both statistical (scatter plots, distribution plots) as well as for presenting results.
Maintained by William Becker. Last updated 2 months ago.
12.6 match 26 stars 9.07 score 73 scripts 1 dependentsjwood000
RcppAlgos:High Performance Tools for Combinatorics and Computational Mathematics
Provides optimized functions and flexible iterators implemented in C++ for solving problems in combinatorics and computational mathematics. Handles various combinatorial objects including combinations, permutations, integer partitions and compositions, Cartesian products, unordered Cartesian products, and partition of groups. Utilizes the RMatrix class from 'RcppParallel' for thread safety. The combination and permutation functions contain constraint parameters that allow for generation of all results of a vector meeting specific criteria (e.g. finding all combinations such that the sum is between two bounds). Capable of ranking/unranking combinatorial objects efficiently (e.g. retrieve only the nth lexicographical result) which sets up nicely for parallelization as well as random sampling. Gmp support permits exploration where the total number of results is large (e.g. comboSample(10000, 500, n = 4)). Additionally, there are several high performance number theoretic functions that are useful for problems common in computational mathematics. Some of these functions make use of the fast integer division library 'libdivide'. The primeSieve function is based on the segmented sieve of Eratosthenes implementation by Kim Walisch. It is also efficient for large numbers by using the cache friendly improvements originally developed by Tomás Oliveira. Finally, there is a prime counting function that implements Legendre's formula based on the work of Kim Walisch.
Maintained by Joseph Wood. Last updated 1 months ago.
combinationscombinatoricsfactorizationnumber-theoryparallelpermutationprime-factorizationsprimesievegmpcpp
11.2 match 45 stars 10.04 score 153 scripts 12 dependentsmicrosoft
wpa:Tools for Analysing and Visualising Viva Insights Data
Opinionated functions that enable easier and faster analysis of Viva Insights data. There are three main types of functions in 'wpa': (i) Standard functions create a 'ggplot' visual or a summary table based on a specific Viva Insights metric; (2) Report Generation functions generate HTML reports on a specific analysis area, e.g. Collaboration; (3) Other miscellaneous functions cover more specific applications (e.g. Subject Line text mining) of Viva Insights data. This package adheres to 'tidyverse' principles and works well with the pipe syntax. 'wpa' is built with the beginner-to-intermediate R users in mind, and is optimised for simplicity.
Maintained by Martin Chan. Last updated 4 months ago.
16.7 match 30 stars 6.69 score 39 scripts 1 dependentspilaboratory
sads:Maximum Likelihood Models for Species Abundance Distributions
Maximum likelihood tools to fit and compare models of species abundance distributions and of species rank-abundance distributions.
Maintained by Paulo I. Prado. Last updated 1 years ago.
12.8 match 23 stars 8.66 score 244 scripts 3 dependentsbioc
phyloseq:Handling and analysis of high-throughput microbiome census data
phyloseq provides a set of classes and tools to facilitate the import, storage, analysis, and graphical display of microbiome census data.
Maintained by Paul J. McMurdie. Last updated 5 months ago.
immunooncologysequencingmicrobiomemetagenomicsclusteringclassificationmultiplecomparisongeneticvariability
7.8 match 597 stars 13.90 score 8.4k scripts 37 dependentsbioc
PhyloProfile:PhyloProfile
PhyloProfile is a tool for exploring complex phylogenetic profiles. Phylogenetic profiles, presence/absence patterns of genes over a set of species, are commonly used to trace the functional and evolutionary history of genes across species and time. With PhyloProfile we can enrich regular phylogenetic profiles with further data like sequence/structure similarity, to make phylogenetic profiling more meaningful. Besides the interactive visualisation powered by R-Shiny, the package offers a set of further analysis features to gain insights like the gene age estimation or core gene identification.
Maintained by Vinh Tran. Last updated 6 days ago.
softwarevisualizationdatarepresentationmultiplecomparisonfunctionalpredictiondimensionreductionbioinformaticsheatmapinteractive-visualizationsorthologsphylogenetic-profileshiny
13.9 match 33 stars 7.77 score 10 scriptsnflverse
nflreadr:Download 'nflverse' Data
A minimal package for downloading data from 'GitHub' repositories of the 'nflverse' project.
Maintained by Tan Ho. Last updated 4 months ago.
nflnflfastrnflversesports-data
8.6 match 66 stars 12.46 score 476 scripts 10 dependentsbioc
RankProd:Rank Product method for identifying differentially expressed genes with application in meta-analysis
Non-parametric method for identifying differentially expressed (up- or down- regulated) genes based on the estimated percentage of false predictions (pfp). The method can combine data sets from different origins (meta-analysis) to increase the power of the identification.
Maintained by Francesco Del Carratore. Last updated 5 months ago.
differentialexpressionstatisticalmethodsoftwareresearchfieldmetabolomicslipidomicsproteomicssystemsbiologygeneexpressionmicroarraygenesignaling
16.4 match 6.46 score 81 scripts 6 dependentsasalavaty
influential:Identification and Classification of the Most Influential Nodes
Contains functions for the classification and ranking of top candidate features, reconstruction of networks from adjacency matrices and data frames, analysis of the topology of the network and calculation of centrality measures, and identification of the most influential nodes. Also, a function is provided for running SIRIR model, which is the combination of leave-one-out cross validation technique and the conventional SIR model, on a network to unsupervisedly rank the true influence of vertices. Additionally, some functions have been provided for the assessment of dependence and correlation of two network centrality measures as well as the conditional probability of deviation from their corresponding means in opposite direction. Fred Viole and David Nawrocki (2013, ISBN:1490523995). Csardi G, Nepusz T (2006). "The igraph software package for complex network research." InterJournal, Complex Systems, 1695. Adopted algorithms and sources are referenced in function document.
Maintained by Adrian Salavaty. Last updated 5 months ago.
centrality-measuresclassification-modelinfluence-rankingnetwork-analysispriaritization-model
16.0 match 27 stars 6.54 score 43 scripts 1 dependentsbraverock
PortfolioAnalytics:Portfolio Analysis, Including Numerical Methods for Optimization of Portfolios
Portfolio optimization and analysis routines and graphics.
Maintained by Brian G. Peterson. Last updated 3 months ago.
9.0 match 81 stars 11.49 score 626 scripts 2 dependentsgrunwaldlab
metacoder:Tools for Parsing, Manipulating, and Graphing Taxonomic Abundance Data
Reads, plots, and manipulates large taxonomic data sets, like those generated from modern high-throughput sequencing, such as metabarcoding (i.e. amplification metagenomics, 16S metagenomics, etc). It provides a tree-based visualization called "heat trees" used to depict statistics for every taxon in a taxonomy using color and size. It also provides various functions to do common tasks in microbiome bioinformatics on data in the 'taxmap' format defined by the 'taxa' package. The 'metacoder' package is described in the publication by Foster et al. (2017) <doi:10.1371/journal.pcbi.1005404>.
Maintained by Zachary Foster. Last updated 1 months ago.
community-diversityhierarchicalmetabarcodingpcrtaxonomytreescpp
10.6 match 140 stars 9.64 score 328 scriptsalexkz
kernlab:Kernel-Based Machine Learning Lab
Kernel-based machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction. Among other methods 'kernlab' includes Support Vector Machines, Spectral Clustering, Kernel PCA, Gaussian Processes and a QP solver.
Maintained by Alexandros Karatzoglou. Last updated 7 months ago.
8.3 match 21 stars 12.26 score 7.8k scripts 487 dependentsrkoenker
quantreg:Quantile Regression
Estimation and inference methods for models for conditional quantile functions: Linear and nonlinear parametric and non-parametric (total variation penalized) models for conditional quantiles of a univariate response and several methods for handling censored survival data. Portfolio selection methods based on expected shortfall risk are also now included. See Koenker, R. (2005) Quantile Regression, Cambridge U. Press, <doi:10.1017/CBO9780511754098> and Koenker, R. et al. (2017) Handbook of Quantile Regression, CRC Press, <doi:10.1201/9781315120256>.
Maintained by Roger Koenker. Last updated 6 days ago.
7.0 match 18 stars 13.93 score 2.6k scripts 1.5k dependentsalanarnholt
BSDA:Basic Statistics and Data Analysis
Data sets for book "Basic Statistics and Data Analysis" by Larry J. Kitchens.
Maintained by Alan T. Arnholt. Last updated 2 years ago.
10.6 match 7 stars 9.11 score 1.3k scripts 6 dependentstalgalili
dendextend:Extending 'dendrogram' Functionality in R
Offers a set of functions for extending 'dendrogram' objects in R, letting you visualize and compare trees of 'hierarchical clusterings'. You can (1) Adjust a tree's graphical parameters - the color, size, type, etc of its branches, nodes and labels. (2) Visually and statistically compare different 'dendrograms' to one another.
Maintained by Tal Galili. Last updated 2 months ago.
5.6 match 154 stars 17.02 score 6.0k scripts 164 dependentschrisaddy
rrr:Reduced-Rank Regression
Reduced-rank regression, diagnostics and graphics.
Maintained by Chris Addy. Last updated 8 years ago.
18.6 match 10 stars 5.06 score 23 scriptsbioc
clustifyr:Classifier for Single-cell RNA-seq Using Cell Clusters
Package designed to aid in classifying cells from single-cell RNA sequencing data using external reference data (e.g., bulk RNA-seq, scRNA-seq, microarray, gene lists). A variety of correlation based methods and gene list enrichment methods are provided to assist cell type assignment.
Maintained by Rui Fu. Last updated 5 months ago.
singlecellannotationsequencingmicroarraygeneexpressionassign-identitiesclustersmarker-genesrna-seqsingle-cell-rna-seq
9.6 match 119 stars 9.63 score 296 scriptsmicrosoft
vivainsights:Analyze and Visualize Data from 'Microsoft Viva Insights'
Provides a versatile range of functions, including exploratory data analysis, time-series analysis, organizational network analysis, and data validation, whilst at the same time implements a set of best practices in analyzing and visualizing data specific to 'Microsoft Viva Insights'.
Maintained by Martin Chan. Last updated 23 days ago.
15.0 match 11 stars 6.12 score 68 scriptscran
mgcv:Mixed GAM Computation Vehicle with Automatic Smoothness Estimation
Generalized additive (mixed) models, some of their extensions and other generalized ridge regression with multiple smoothing parameter estimation by (Restricted) Marginal Likelihood, Generalized Cross Validation and similar, or using iterated nested Laplace approximation for fully Bayesian inference. See Wood (2017) <doi:10.1201/9781315370279> for an overview. Includes a gam() function, a wide variety of smoothers, 'JAGS' support and distributions beyond the exponential family.
Maintained by Simon Wood. Last updated 1 years ago.
7.1 match 32 stars 12.71 score 17k scripts 7.8k dependentsbioc
ccfindR:Cancer Clone Finder
A collection of tools for cancer genomic data clustering analyses, including those for single cell RNA-seq. Cell clustering and feature gene selection analysis employ Bayesian (and maximum likelihood) non-negative matrix factorization (NMF) algorithm. Input data set consists of RNA count matrix, gene, and cell bar code annotations. Analysis outputs are factor matrices for multiple ranks and marginal likelihood values for each rank. The package includes utilities for downstream analyses, including meta-gene identification, visualization, and construction of rank-based trees for clusters.
Maintained by Jun Woo. Last updated 5 months ago.
transcriptomicssinglecellimmunooncologybayesianclusteringgslcpp
22.7 match 4.00 score 9 scriptsigraph
igraph:Network Analysis and Visualization
Routines for simple graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more.
Maintained by Kirill Müller. Last updated 2 days ago.
complex-networksgraph-algorithmsgraph-theorymathematicsnetwork-analysisnetwork-graphfortranlibxml2glpkopenblascpp
4.3 match 581 stars 21.10 score 31k scripts 1.9k dependentsjassler
socialranking:Social Ranking Solutions for Power Relations on Coalitions
The notion of power index has been widely used in literature to evaluate the influence of individual players (e.g., voters, political parties, nations, stockholders, etc.) involved in a collective decision situation like an electoral system, a parliament, a council, a management board, etc., where players may form coalitions. Traditionally this ranking is determined through numerical evaluation. More often than not however only ordinal data between coalitions is known. The package 'socialranking' offers a set of solutions to rank players based on a transitive ranking between coalitions, including through CP-Majority, ordinal Banzhaf or lexicographic excellence solution summarized by Tahar Allouche, Bruno Escoffier, Stefano Moretti and Meltem Öztürk (2020, <doi:10.24963/ijcai.2020/3>).
Maintained by Felix Fritz. Last updated 2 days ago.
17.5 match 6 stars 5.08 score 6 scriptsvegandevs
vegan:Community Ecology Package
Ordination methods, diversity analysis and other functions for community and vegetation ecologists.
Maintained by Jari Oksanen. Last updated 16 days ago.
ecological-modellingecologyordinationfortranopenblas
4.6 match 472 stars 19.41 score 15k scripts 440 dependentsmatthieustigler
tsDyn:Nonlinear Time Series Models with Regime Switching
Implements nonlinear autoregressive (AR) time series models. For univariate series, a non-parametric approach is available through additive nonlinear AR. Parametric modeling and testing for regime switching dynamics is available when the transition is either direct (TAR: threshold AR) or smooth (STAR: smooth transition AR, LSTAR). For multivariate series, one can estimate a range of TVAR or threshold cointegration TVECM models with two or three regimes. Tests can be conducted for TVAR as well as for TVECM (Hansen and Seo 2002 and Seo 2006).
Maintained by Matthieu Stigler. Last updated 5 months ago.
8.3 match 34 stars 10.56 score 684 scripts 3 dependentspievos101
TopKSignal:A Convex Optimization Tool for Signal Reconstruction from Multiple Ranked Lists
A mathematical optimization procedure in combination with statistical bootstrap for the estimation of the latent signals (sometimes called scores) informing the global consensus ranking (often named aggregation ranking). To solve mid/large-scale problems, users should install the 'gurobi' optimiser (available from <https://www.gurobi.com/>).
Maintained by Bastian Pfeifer Maintainer. Last updated 1 years ago.
convex-optimizationranking-datasignal-estimation
20.7 match 3 stars 4.18 score 1 scriptsr-hub
pkgsearch:Search and Query CRAN R Packages
Search CRAN metadata about packages by keyword, popularity, recent activity, package name and more. Uses the 'R-hub' search server, see <https://r-pkg.org> and the CRAN metadata database, that contains information about CRAN packages. Note that this is _not_ a CRAN project.
Maintained by Gábor Csárdi. Last updated 2 months ago.
10.0 match 109 stars 8.62 score 64 scripts 10 dependentsalexkowa
EnvStats:Package for Environmental Statistics, Including US EPA Guidance
Graphical and statistical analyses of environmental data, with focus on analyzing chemical concentrations and physical parameters, usually in the context of mandated environmental monitoring. Major environmental statistical methods found in the literature and regulatory guidance documents, with extensive help that explains what these methods do, how to use them, and where to find them in the literature. Numerous built-in data sets from regulatory guidance documents and environmental statistics literature. Includes scripts reproducing analyses presented in the book "EnvStats: An R Package for Environmental Statistics" (Millard, 2013, Springer, ISBN 978-1-4614-8455-4, <doi:10.1007/978-1-4614-8456-1>).
Maintained by Alexander Kowarik. Last updated 16 days ago.
6.6 match 26 stars 12.80 score 2.4k scripts 46 dependentsthomasp85
tidygraph:A Tidy API for Graph Manipulation
A graph, while not "tidy" in itself, can be thought of as two tidy data frames describing node and edge data respectively. 'tidygraph' provides an approach to manipulate these two virtual data frames using the API defined in the 'dplyr' package, as well as provides tidy interfaces to a lot of common graph algorithms.
Maintained by Thomas Lin Pedersen. Last updated 1 months ago.
graph-algorithmsgraph-manipulationigraphnetwork-analysistidyversecpp
5.7 match 553 stars 14.74 score 4.6k scripts 136 dependentsbioc
singscore:Rank-based single-sample gene set scoring method
A simple single-sample gene signature scoring method that uses rank-based statistics to analyze the sample's gene expression profile. It scores the expression activities of gene sets at a single-sample level.
Maintained by Malvika Kharbanda. Last updated 5 months ago.
softwaregeneexpressiongenesetenrichmentbioinformatics
8.3 match 41 stars 10.03 score 124 scripts 4 dependentssalvatoremangiafico
rcompanion:Functions to Support Extension Education Program Evaluation
Functions and datasets to support Summary and Analysis of Extension Program Evaluation in R, and An R Companion for the Handbook of Biological Statistics. Vignettes are available at <https://rcompanion.org>.
Maintained by Salvatore Mangiafico. Last updated 30 days ago.
10.2 match 4 stars 8.01 score 2.4k scripts 5 dependentsbioc
findIPs:Influential Points Detection for Feature Rankings
Feature rankings can be distorted by a single case in the context of high-dimensional data. The cases exerts abnormal influence on feature rankings are called influential points (IPs). The package aims at detecting IPs based on case deletion and quantifies their effects by measuring the rank changes (DOI:10.48550/arXiv.2303.10516). The package applies a novel rank comparing measure using the adaptive weights that stress the top-ranked important features and adjust the weights to ranking properties.
Maintained by Shuo Wang. Last updated 4 months ago.
geneexpressiondifferentialexpressionregressionsurvival
17.7 match 4.60 score 4 scriptsrudeboybert
fivethirtyeight:Data and Code Behind the Stories and Interactives at 'FiveThirtyEight'
Datasets and code published by the data journalism website 'FiveThirtyEight' available at <https://github.com/fivethirtyeight/data>. Note that while we received guidance from editors at 'FiveThirtyEight', this package is not officially published by 'FiveThirtyEight'.
Maintained by Albert Y. Kim. Last updated 2 years ago.
data-sciencedatajournalismfivethirtyeightstatistics
7.4 match 453 stars 10.98 score 1.7k scriptsgdkrmr
coRanking:Co-Ranking Matrix
Calculates the co-ranking matrix to assess the quality of a dimensionality reduction.
Maintained by Guido Kraemer. Last updated 6 months ago.
dimensionality-reductionmanifold-learningqualitystatisticsunsupervised-learningcpp
14.7 match 9 stars 5.43 score 20 scripts 1 dependentslindbrook
packageRank:Computation and Visualization of Package Download Counts and Percentile Ranks
Compute and visualize package download counts and percentile ranks from Posit/RStudio's CRAN mirror.
Maintained by lindbrook. Last updated 4 days ago.
12.9 match 28 stars 6.13 score 27 scriptstidyverse
tidyr:Tidy Messy Data
Tools to help to create tidy data, where each column is a variable, each row is an observation, and each cell contains a single value. 'tidyr' contains tools for changing the shape (pivoting) and hierarchy (nesting and 'unnesting') of a dataset, turning deeply nested lists into rectangular data frames ('rectangling'), and extracting values out of string columns. It also includes tools for working with missing values (both implicit and explicit).
Maintained by Hadley Wickham. Last updated 12 days ago.
3.4 match 1.4k stars 22.88 score 168k scripts 5.5k dependentst-kalinowski
keras:R Interface to 'Keras'
Interface to 'Keras' <https://keras.io>, a high-level neural networks 'API'. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both 'CPU' and 'GPU' devices.
Maintained by Tomasz Kalinowski. Last updated 11 months ago.
7.2 match 10.82 score 10k scripts 54 dependentsbioc
ROSeq:Modeling expression ranks for noise-tolerant differential expression analysis of scRNA-Seq data
ROSeq - A rank based approach to modeling gene expression with filtered and normalized read count matrix. ROSeq takes filtered and normalized read matrix and cell-annotation/condition as input and determines the differentially expressed genes between the contrasting groups of single cells. One of the input parameters is the number of cores to be used.
Maintained by Krishan Gupta. Last updated 5 months ago.
geneexpressiondifferentialexpressionsinglecellcount-datagene-expressiongene-expression-profilesnormalizationpopulationsranktmmtungtung-datasettutorialvignette
17.5 match 2 stars 4.34 score 11 scriptsbioc
RcisTarget:RcisTarget Identify transcription factor binding motifs enriched on a list of genes or genomic regions
RcisTarget identifies transcription factor binding motifs (TFBS) over-represented on a gene list. In a first step, RcisTarget selects DNA motifs that are significantly over-represented in the surroundings of the transcription start site (TSS) of the genes in the gene-set. This is achieved by using a database that contains genome-wide cross-species rankings for each motif. The motifs that are then annotated to TFs and those that have a high Normalized Enrichment Score (NES) are retained. Finally, for each motif and gene-set, RcisTarget predicts the candidate target genes (i.e. genes in the gene-set that are ranked above the leading edge).
Maintained by Gert Hulselmans. Last updated 5 months ago.
generegulationmotifannotationtranscriptomicstranscriptiongenesetenrichmentgenetarget
7.9 match 37 stars 9.47 score 191 scriptsjranke
mkin:Kinetic Evaluation of Chemical Degradation Data
Calculation routines based on the FOCUS Kinetics Report (2006, 2014). Includes a function for conveniently defining differential equation models, model solution based on eigenvalues if possible or using numerical solvers. If a C compiler (on windows: 'Rtools') is installed, differential equation models are solved using automatically generated C functions. Non-constant errors can be taken into account using variance by variable or two-component error models <doi:10.3390/environments6120124>. Hierarchical degradation models can be fitted using nonlinear mixed-effects model packages as a back end <doi:10.3390/environments8080071>. Please note that no warranty is implied for correctness of results or fitness for a particular purpose.
Maintained by Johannes Ranke. Last updated 30 days ago.
degradationfocus-kineticskinetic-modelskineticsodeode-model
9.0 match 11 stars 8.18 score 78 scripts 1 dependentsr-forge
TopKLists:Inference, Aggregation and Visualization for Top-K Ranked Lists
For multiple ranked input lists (full or partial) representing the same set of N objects, the package TopKLists offers (1) statistical inference on the lengths of informative top-k lists, (2) stochastic aggregation of full or partial lists, and (3) graphical tools for the statistical exploration of input lists, and for the visualization of aggregation results.
Maintained by Michael G. Schimek. Last updated 9 years ago.
18.3 match 4.05 score 37 scripts 1 dependentsbioc
scran:Methods for Single-Cell RNA-Seq Data Analysis
Implements miscellaneous functions for interpretation of single-cell RNA-seq data. Methods are provided for assignment of cell cycle phase, detection of highly variable and significantly correlated genes, identification of marker genes, and other common tasks in routine single-cell analysis workflows.
Maintained by Aaron Lun. Last updated 5 months ago.
immunooncologynormalizationsequencingrnaseqsoftwaregeneexpressiontranscriptomicssinglecellclusteringbioconductor-packagehuman-cell-atlassingle-cell-rna-seqopenblascpp
5.6 match 41 stars 13.14 score 7.6k scripts 36 dependentsbioc
limma:Linear Models for Microarray and Omics Data
Data analysis, linear models and differential expression for omics data.
Maintained by Gordon Smyth. Last updated 5 days ago.
exonarraygeneexpressiontranscriptionalternativesplicingdifferentialexpressiondifferentialsplicinggenesetenrichmentdataimportbayesianclusteringregressiontimecoursemicroarraymicrornaarraymrnamicroarrayonechannelproprietaryplatformstwochannelsequencingrnaseqbatcheffectmultiplecomparisonnormalizationpreprocessingqualitycontrolbiomedicalinformaticscellbiologycheminformaticsepigeneticsfunctionalgenomicsgeneticsimmunooncologymetabolomicsproteomicssystemsbiologytranscriptomics
5.3 match 13.81 score 16k scripts 585 dependentsguido-s
netmeta:Network Meta-Analysis using Frequentist Methods
A comprehensive set of functions providing frequentist methods for network meta-analysis (Balduzzi et al., 2023) <doi:10.18637/jss.v106.i02> and supporting Schwarzer et al. (2015) <doi:10.1007/978-3-319-21416-0>, Chapter 8 "Network Meta-Analysis": - frequentist network meta-analysis following Rücker (2012) <doi:10.1002/jrsm.1058>; - additive network meta-analysis for combinations of treatments (Rücker et al., 2020) <doi:10.1002/bimj.201800167>; - network meta-analysis of binary data using the Mantel-Haenszel or non-central hypergeometric distribution method (Efthimiou et al., 2019) <doi:10.1002/sim.8158>, or penalised logistic regression (Evrenoglou et al., 2022) <doi:10.1002/sim.9562>; - rankograms and ranking of treatments by the Surface under the cumulative ranking curve (SUCRA) (Salanti et al., 2013) <doi:10.1016/j.jclinepi.2010.03.016>; - ranking of treatments using P-scores (frequentist analogue of SUCRAs without resampling) according to Rücker & Schwarzer (2015) <doi:10.1186/s12874-015-0060-8>; - split direct and indirect evidence to check consistency (Dias et al., 2010) <doi:10.1002/sim.3767>, (Efthimiou et al., 2019) <doi:10.1002/sim.8158>; - league table with network meta-analysis results; - 'comparison-adjusted' funnel plot (Chaimani & Salanti, 2012) <doi:10.1002/jrsm.57>; - net heat plot and design-based decomposition of Cochran's Q according to Krahn et al. (2013) <doi:10.1186/1471-2288-13-35>; - measures characterizing the flow of evidence between two treatments by König et al. (2013) <doi:10.1002/sim.6001>; - automated drawing of network graphs described in Rücker & Schwarzer (2016) <doi:10.1002/jrsm.1143>; - partial order of treatment rankings ('poset') and Hasse diagram for 'poset' (Carlsen & Bruggemann, 2014) <doi:10.1002/cem.2569>; (Rücker & Schwarzer, 2017) <doi:10.1002/jrsm.1270>; - contribution matrix as described in Papakonstantinou et al. (2018) <doi:10.12688/f1000research.14770.3> and Davies et al. (2022) <doi:10.1002/sim.9346>; - subgroup network meta-analysis.
Maintained by Guido Schwarzer. Last updated 2 days ago.
meta-analysisnetwork-meta-analysisrstudio
6.2 match 33 stars 11.82 score 199 scripts 10 dependentsbioc
BiocGenerics:S4 generic functions used in Bioconductor
The package defines many S4 generic functions used in Bioconductor.
Maintained by Hervé Pagès. Last updated 1 months ago.
infrastructurebioconductor-packagecore-package
5.1 match 12 stars 14.22 score 612 scripts 2.2k dependentsfinyang
RRRR:Online Robust Reduced-Rank Regression Estimation
Methods for estimating online robust reduced-rank regression. The Gaussian maximum likelihood estimation method is described in Johansen, S. (1991) <doi:10.2307/2938278>. The majorisation-minimisation estimation method is partly described in Zhao, Z., & Palomar, D. P. (2017) <doi:10.1109/GlobalSIP.2017.8309093>. The description of the generic stochastic successive upper-bound minimisation method and the sample average approximation can be found in Razaviyayn, M., Sanjabi, M., & Luo, Z. Q. (2016) <doi:10.1007/s10107-016-1021-7>.
Maintained by Yangzhuoran Fin Yang. Last updated 2 years ago.
17.4 match 3 stars 4.18 score 10 scriptsabichat
yatah:Yet Another TAxonomy Handler
Provides functions to manage taxonomy when lineages are described with strings and ranks separated with special patterns like "|*__" or ";*__".
Maintained by Antoine Bichat. Last updated 11 months ago.
15.6 match 6 stars 4.65 score 15 scriptsffverse
ffsimulator:Simulate Fantasy Football Seasons
Uses bootstrap resampling to run fantasy football season simulations supported by historical rankings and 'nflfastR' data, calculating optimal lineups, and returning aggregated results.
Maintained by Tan Ho. Last updated 5 months ago.
13.7 match 17 stars 5.17 score 44 scriptscran
Rfit:Rank-Based Estimation for Linear Models
Rank-based (R) estimation and inference for linear models. Estimation is for general scores and a library of commonly used score functions is included.
Maintained by John Kloke. Last updated 10 months ago.
16.0 match 4.35 score 9 dependentshwborchers
pracma:Practical Numerical Math Functions
Provides a large number of functions from numerical analysis and linear algebra, numerical optimization, differential equations, time series, plus some well-known special mathematical functions. Uses 'MATLAB' function names where appropriate to simplify porting.
Maintained by Hans W. Borchers. Last updated 1 years ago.
5.6 match 29 stars 12.34 score 6.6k scripts 931 dependentsbioc
NetPathMiner:NetPathMiner for Biological Network Construction, Path Mining and Visualization
NetPathMiner is a general framework for network path mining using genome-scale networks. It constructs networks from KGML, SBML and BioPAX files, providing three network representations, metabolic, reaction and gene representations. NetPathMiner finds active paths and applies machine learning methods to summarize found paths for easy interpretation. It also provides static and interactive visualizations of networks and paths to aid manual investigation.
Maintained by Ahmed Mohamed. Last updated 4 months ago.
graphandnetworkpathwaysnetworkclusteringclassificationlibsbmllibxml2openblascpp
10.4 match 9 stars 6.56 score 9 scriptsrobinhankin
hyper2:The Hyperdirichlet Distribution, Mark 2
A suite of routines for the hyperdirichlet distribution and reified Bradley-Terry; supersedes the 'hyperdirichlet' package; uses 'disordR' discipline <doi:10.48550/ARXIV.2210.03856>. To cite in publications please use Hankin 2017 <doi:10.32614/rj-2017-061>, and for Generalized Plackett-Luce likelihoods use Hankin 2024 <doi:10.18637/jss.v109.i08>.
Maintained by Robin K. S. Hankin. Last updated 3 days ago.
11.3 match 5 stars 6.01 score 38 scripts 1 dependentscivilstat
RankingProject:The Ranking Project: Visualizations for Comparing Populations
Functions to generate plots and tables for comparing independently-sampled populations. Companion package to "A Primer on Visualizations for Comparing Populations, Including the Issue of Overlapping Confidence Intervals" by Wright, Klein, and Wieczorek (2019) <DOI:10.1080/00031305.2017.1392359> and "A Joint Confidence Region for an Overall Ranking of Populations" by Klein, Wright, and Wieczorek (2020) <DOI:10.1111/rssc.12402>.
Maintained by Jerzy Wieczorek. Last updated 3 years ago.
13.4 match 7 stars 5.02 score 10 scriptsr-forge
mlogit:Multinomial Logit Models
Maximum Likelihood estimation of random utility discrete choice models, as described in Kenneth Train (2009) Discrete Choice Methods with Simulations <doi:10.1017/CBO9780511805271>.
Maintained by Yves Croissant. Last updated 5 years ago.
6.9 match 9.81 score 1.2k scripts 14 dependentscmollica
PLMIX:Bayesian Analysis of Finite Mixture of Plackett-Luce Models
Fit finite mixtures of Plackett-Luce models for partial top rankings/orderings within the Bayesian framework. It provides MAP point estimates via EM algorithm and posterior MCMC simulations via Gibbs Sampling. It also fits MLE as a special case of the noninformative Bayesian analysis with vague priors. In addition to inferential techniques, the package assists other fundamental phases of a model-based analysis for partial rankings/orderings, by including functions for data manipulation, simulation, descriptive summary, model selection and goodness-of-fit evaluation. Main references on the methods are Mollica and Tardella (2017) <doi.org/10.1007/s11336-016-9530-0> and Mollica and Tardella (2014) <doi/10.1002/sim.6224>.
Maintained by Cristina Mollica. Last updated 4 years ago.
21.4 match 3.15 score 28 scriptsrivolli
utiml:Utilities for Multi-Label Learning
Multi-label learning strategies and others procedures to support multi- label classification in R. The package provides a set of multi-label procedures such as sampling methods, transformation strategies, threshold functions, pre-processing techniques and evaluation metrics. A complete overview of the matter can be seen in Zhang, M. and Zhou, Z. (2014) <doi:10.1109/TKDE.2013.39> and Gibaja, E. and Ventura, S. (2015) A Tutorial on Multi-label Learning.
Maintained by Adriano Rivolli. Last updated 4 years ago.
10.1 match 28 stars 6.39 score 87 scriptsmlverse
torch:Tensors and Neural Networks with 'GPU' Acceleration
Provides functionality to define and train neural networks similar to 'PyTorch' by Paszke et al (2019) <doi:10.48550/arXiv.1912.01703> but written entirely in R using the 'libtorch' library. Also supports low-level tensor operations and 'GPU' acceleration.
Maintained by Daniel Falbel. Last updated 5 days ago.
3.9 match 520 stars 16.52 score 1.4k scripts 38 dependentskjhealy
gssrdoc:Document General Social Survey Variable
The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the a tibble with information on the survey variables, together with every variable documented as an R help page. For more information on the GSS see \url{http://gss.norc.org}.
Maintained by Kieran Healy. Last updated 11 months ago.
27.9 match 2.28 score 38 scriptslineupjs
lineupjs:'HTMLWidget' Wrapper of 'LineUp' for Visual Analysis of Multi-Attribute Rankings
'LineUp' is an interactive technique designed to create, visualize and explore rankings of items based on a set of heterogeneous attributes. This is a 'htmlwidget' wrapper around the JavaScript library 'LineUp.js'. It is designed to be used in 'R Shiny' apps and 'R Markddown' files. Due to an outdated 'webkit' version of 'RStudio' it won't work in the integrated viewer.
Maintained by Samuel Gratzl. Last updated 3 years ago.
crosstalkhtmlwidgethtmlwidgetslineuprankingshiny
13.3 match 55 stars 4.76 score 21 scriptscran
ICSNP:Tools for Multivariate Nonparametrics
Tools for multivariate nonparametrics, as location tests based on marginal ranks, spatial median and spatial signs computation, Hotelling's T-test, estimates of shape are implemented.
Maintained by Klaus Nordhausen. Last updated 1 years ago.
16.5 match 3.84 score 14 dependentsbioc
mixOmics:Omics Data Integration Project
Multivariate methods are well suited to large omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (components), which are defined as combinations of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structures between the different data sets that are integrated. mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data. A non exhaustive list of methods include variants of generalised Canonical Correlation Analysis, sparse Partial Least Squares and sparse Discriminant Analysis. Recently we implemented integrative methods to combine multiple data sets: N-integration with variants of Generalised Canonical Correlation Analysis and P-integration with variants of multi-group Partial Least Squares.
Maintained by Eva Hamrud. Last updated 3 days ago.
immunooncologymicroarraysequencingmetabolomicsmetagenomicsproteomicsgenepredictionmultiplecomparisonclassificationregressionbioconductorgenomicsgenomics-datagenomics-visualizationmultivariate-analysismultivariate-statisticsomicsr-pkgr-project
4.5 match 182 stars 13.71 score 1.3k scripts 22 dependentsr-forge
TailRank:The Tail-Rank Statistic
Implements the tail-rank statistic for selecting biomarkers from a microarray data set, an efficient nonparametric test focused on the distributional tails. See <https://gitlab.com/krcoombes/coombeslab/-/blob/master/doc/papers/tolstoy-new.pdf>.
Maintained by Kevin R. Coombes. Last updated 1 months ago.
9.5 match 6.38 score 37 scripts 3 dependentsubod
rococo:Robust Rank Correlation Coefficient and Test
Provides the robust gamma rank correlation coefficient as introduced by Bodenhofer, Krone, and Klawonn (2013) <DOI:10.1016/j.ins.2012.11.026> along with a permutation-based rank correlation test. The rank correlation coefficient and the test are explicitly designed for dealing with noisy numerical data.
Maintained by Ulrich Bodenhofer. Last updated 11 months ago.
14.1 match 4.32 score 21 scriptskurthornik
relations:Data Structures and Algorithms for Relations
Data structures and algorithms for k-ary relations with arbitrary domains, featuring relational algebra, predicate functions, and fitters for consensus relations.
Maintained by Kurt Hornik. Last updated 25 days ago.
12.0 match 5.04 score 58 scripts 14 dependentsaebilgrau
GMCM:Fast Estimation of Gaussian Mixture Copula Models
Unsupervised Clustering and Meta-analysis using Gaussian Mixture Copula Models.
Maintained by Anders Ellern Bilgrau. Last updated 3 years ago.
clusteringgaussian-mixture-modelsmeta-analysisrankunsupervised-cluster-analysisopenblascpp
13.0 match 15 stars 4.62 score 56 scriptsbioc
msImpute:Imputation of label-free mass spectrometry peptides
MsImpute is a package for imputation of peptide intensity in proteomics experiments. It additionally contains tools for MAR/MNAR diagnosis and assessment of distortions to the probability distribution of the data post imputation. The missing values are imputed by low-rank approximation of the underlying data matrix if they are MAR (method = "v2"), by Barycenter approach if missingness is MNAR ("v2-mnar"), or by Peptide Identity Propagation (PIP).
Maintained by Soroor Hediyeh-zadeh. Last updated 5 months ago.
massspectrometryproteomicssoftwarelabel-free-proteomicslow-rank-approximation
11.7 match 14 stars 5.15 score 7 scriptsmerck
gsDesign2:Group Sequential Design with Non-Constant Effect
The goal of 'gsDesign2' is to enable fixed or group sequential design under non-proportional hazards. To enable highly flexible enrollment, time-to-event and time-to-dropout assumptions, 'gsDesign2' offers piecewise constant enrollment, failure rates, and dropout rates for a stratified population. This package includes three methods for designs: average hazard ratio, weighted logrank tests in Yung and Liu (2019) <doi:10.1111/biom.13196>, and MaxCombo tests. Substantial flexibility on top of what is in the 'gsDesign' package is intended for selecting boundaries.
Maintained by Yujie Zhao. Last updated 2 days ago.
6.7 match 22 stars 8.91 score 186 scriptsludovikcoba
rrecsys:Environment for Evaluating Recommender Systems
Processes standard recommendation datasets (e.g., a user-item rating matrix) as input and generates rating predictions and lists of recommended items. Standard algorithm implementations which are included in this package are the following: Global/Item/User-Average baselines, Weighted Slope One, Item-Based KNN, User-Based KNN, FunkSVD, BPR and weighted ALS. They can be assessed according to the standard offline evaluation methodology (Shani, et al. (2011) <doi:10.1007/978-0-387-85820-3_8>) for recommender systems using measures such as MAE, RMSE, Precision, Recall, F1, AUC, NDCG, RankScore and coverage measures. The package (Coba, et al.(2017) <doi: 10.1007/978-3-319-60042-0_36>) is intended for rapid prototyping of recommendation algorithms and education purposes.
Maintained by Ludovik Çoba. Last updated 3 years ago.
8.7 match 23 stars 6.84 score 25 scriptsjranke
chemCal:Calibration Functions for Analytical Chemistry
Simple functions for plotting linear calibration functions and estimating standard errors for measurements according to the Handbook of Chemometrics and Qualimetrics: Part A by Massart et al. (1997) There are also functions estimating the limit of detection (LOD) and limit of quantification (LOQ). The functions work on model objects from - optionally weighted - linear regression (lm) or robust linear regression ('rlm' from the 'MASS' package).
Maintained by Johannes Ranke. Last updated 2 months ago.
9.1 match 6 stars 6.52 score 55 scriptstvganesh
yorkr:Analyze Cricket Performances Based on Data from Cricsheet
Analyzing performances of cricketers and cricket teams based on 'yaml' match data from Cricsheet <https://cricsheet.org/>.
Maintained by Tinniam V Ganesh. Last updated 2 years ago.
11.8 match 17 stars 5.00 score 118 scriptsthothorn
exactRankTests:Exact Distributions for Rank and Permutation Tests
Computes exact conditional p-values and quantiles using an implementation of the Shift-Algorithm by Streitberg & Roehmel.
Maintained by Torsten Hothorn. Last updated 3 years ago.
8.3 match 1 stars 7.13 score 276 scripts 65 dependentsbioc
Biostrings:Efficient manipulation of biological strings
Memory efficient string containers, string matching algorithms, and other utilities, for fast manipulation of large biological sequences or sets of sequences.
Maintained by Hervé Pagès. Last updated 23 days ago.
sequencematchingalignmentsequencinggeneticsdataimportdatarepresentationinfrastructurebioconductor-packagecore-package
3.3 match 61 stars 17.83 score 8.6k scripts 1.2k dependentsbioc
GenomicRanges:Representation and manipulation of genomic intervals
The ability to efficiently represent and manipulate genomic annotations and alignments is playing a central role when it comes to analyzing high-throughput sequencing data (a.k.a. NGS data). The GenomicRanges package defines general purpose containers for storing and manipulating genomic intervals and variables defined along a genome. More specialized containers for representing and manipulating short alignments against a reference genome, or a matrix-like summarization of an experiment, are defined in the GenomicAlignments and SummarizedExperiment packages, respectively. Both packages build on top of the GenomicRanges infrastructure.
Maintained by Hervé Pagès. Last updated 4 months ago.
geneticsinfrastructuredatarepresentationsequencingannotationgenomeannotationcoveragebioconductor-packagecore-package
3.3 match 44 stars 17.75 score 13k scripts 1.3k dependentseasystats
effectsize:Indices of Effect Size
Provide utilities to work with indices of effect size for a wide variety of models and hypothesis tests (see list of supported models using the function 'insight::supported_models()'), allowing computation of and conversion between indices such as Cohen's d, r, odds, etc. References: Ben-Shachar et al. (2020) <doi:10.21105/joss.02815>.
Maintained by Mattan S. Ben-Shachar. Last updated 1 months ago.
anovacohens-dcomputeconversioncorrelationeffect-sizeeffectsizehacktoberfesthedges-ginterpretationstandardizationstandardizedstatistics
3.5 match 344 stars 16.38 score 1.8k scripts 29 dependentsn8thangreen
BCEA:Bayesian Cost Effectiveness Analysis
Produces an economic evaluation of a sample of suitable variables of cost and effectiveness / utility for two or more interventions, e.g. from a Bayesian model in the form of MCMC simulations. This package computes the most cost-effective alternative and produces graphical summaries and probabilistic sensitivity analysis, see Baio et al (2017) <doi:10.1007/978-3-319-55718-2>.
Maintained by Gianluca Baio. Last updated 2 months ago.
5.8 match 3 stars 9.90 score 243 scripts 3 dependentsagroscope-ch
srppp:Read the Swiss Register of Plant Protection Products
Generate data objects from XML versions of the Swiss Register of Plant Protection Products. An online version of the register can be accessed at <https://www.psm.admin.ch/de/produkte>. There is no guarantee of correspondence of the data read in using this package with that online version, or with the original registration documents. Also, the Federal Food Safety and Veterinary Office, coordinating the authorisation of plant protection products in Switzerland, does not answer requests regarding this package.
Maintained by Johannes Ranke. Last updated 24 days ago.
9.0 match 6.31 score 20 scripts 1 dependentscmmr
rbiom:Read/Write, Analyze, and Visualize 'BIOM' Data
A toolkit for working with Biological Observation Matrix ('BIOM') files. Read/write all 'BIOM' formats. Compute rarefaction, alpha diversity, and beta diversity (including 'UniFrac'). Summarize counts by taxonomic level. Subset based on metadata. Generate visualizations and statistical analyses. CPU intensive operations are coded in C for speed.
Maintained by Daniel P. Smith. Last updated 6 days ago.
6.2 match 15 stars 9.02 score 117 scripts 6 dependentswenjie2wang
clusrank:Wilcoxon Rank Tests for Clustered Data
Non-parametric tests (Wilcoxon rank sum test and Wilcoxon signed rank test) for clustered data documented in Jiang et. al (2020) <doi:10.18637/jss.v096.i06>.
Maintained by Wenjie Wang. Last updated 1 years ago.
12.7 match 3 stars 4.39 score 17 scriptsschochastics
networkdata:Repository of Network Datasets
The package contains a large collection of network dataset with different context. This includes social networks, animal networks and movie networks. All datasets are in 'igraph' format.
Maintained by David Schoch. Last updated 12 months ago.
11.1 match 143 stars 5.01 score 143 scriptsbioc
SummarizedExperiment:A container (S4 class) for matrix-like assays
The SummarizedExperiment container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples.
Maintained by Hervé Pagès. Last updated 5 months ago.
geneticsinfrastructuresequencingannotationcoveragegenomeannotationbioconductor-packagecore-package
3.3 match 34 stars 16.85 score 8.6k scripts 1.2k dependentsgagolews
stringi:Fast and Portable Character String Processing Facilities
A collection of character string/text/natural language processing tools for pattern searching (e.g., with 'Java'-like regular expressions or the 'Unicode' collation algorithm), random string generation, case mapping, string transliteration, concatenation, sorting, padding, wrapping, Unicode normalisation, date-time formatting and parsing, and many more. They are fast, consistent, convenient, and - thanks to 'ICU' (International Components for Unicode) - portable across all locales and platforms. Documentation about 'stringi' is provided via its website at <https://stringi.gagolewski.com/> and the paper by Gagolewski (2022, <doi:10.18637/jss.v103.i02>).
Maintained by Marek Gagolewski. Last updated 1 months ago.
icuicu4cnatural-language-processingnlpregexregexpstring-manipulationstringistringrtexttext-processingtidy-dataunicodecpp
3.0 match 309 stars 18.31 score 10k scripts 8.6k dependentsechasnovski
comperank:Ranking Methods for Competition Results
Compute ranking and rating based on competition results. Methods of different nature are implemented: with fixed Head-to-Head structure, with variable Head-to-Head structure and with iterative nature. All algorithms are taken from the book 'Who’s #1?: The science of rating and ranking' by Amy N. Langville and Carl D. Meyer (2012, ISBN:978-0-691-15422-0).
Maintained by Evgeni Chasnovski. Last updated 2 years ago.
9.7 match 24 stars 5.65 score 37 scriptscmollica
MSmix:Finite Mixtures of Mallows Models with Spearman Distance for Full and Partial Rankings
Fit and analysis of finite Mixtures of Mallows models with Spearman Distance for full and partial rankings with arbitrary missing positions. Inference is conducted within the maximum likelihood framework via Expectation-Maximization algorithms. Estimation uncertainty is tackled via diverse versions of bootstrapping as well as via Hessian-based standard errors calculations. The most relevant reference of the methods is Crispino, Mollica, Astuti and Tardella (2023) <doi:10.1007/s11222-023-10266-8>.
Maintained by Cristina Mollica. Last updated 9 months ago.
36.9 match 1.48 scoretim-tu
weibulltools:Statistical Methods for Life Data Analysis
Provides statistical methods and visualizations that are often used in reliability engineering. Comprises a compact and easily accessible set of methods and visualization tools that make the examination and adjustment as well as the analysis and interpretation of field data (and bench tests) as simple as possible. Non-parametric estimators like Median Ranks, Kaplan-Meier (Abernethy, 2006, <ISBN:978-0-9653062-3-2>), Johnson (Johnson, 1964, <ISBN:978-0444403223>), and Nelson-Aalen for failure probability estimation within samples that contain failures as well as censored data are included. The package supports methods like Maximum Likelihood and Rank Regression, (Genschel and Meeker, 2010, <DOI:10.1080/08982112.2010.503447>) for the estimation of multiple parametric lifetime distributions, as well as the computation of confidence intervals of quantiles and probabilities using the delta method related to Fisher's confidence intervals (Meeker and Escobar, 1998, <ISBN:9780471673279>) and the beta-binomial confidence bounds. If desired, mixture model analysis can be done with segmented regression and the EM algorithm. Besides the well-known Weibull analysis, the package also contains Monte Carlo methods for the correction and completion of imprecisely recorded or unknown lifetime characteristics. (Verband der Automobilindustrie e.V. (VDA), 2016, <ISSN:0943-9412>). Plots are created statically ('ggplot2') or interactively ('plotly') and can be customized with functions of the respective visualization package. The graphical technique of probability plotting as well as the addition of regression lines and confidence bounds to existing plots are supported.
Maintained by Tim-Gunnar Hensel. Last updated 2 years ago.
field-data-analysisinteractive-visualizationsplotlyreliability-analysisweibull-analysisweibulltoolsopenblascpp
8.8 match 13 stars 6.15 score 54 scriptsbioc
S4Vectors:Foundation of vector-like and list-like containers in Bioconductor
The S4Vectors package defines the Vector and List virtual classes and a set of generic functions that extend the semantic of ordinary vectors and lists in R. Package developers can easily implement vector-like or list-like objects as concrete subclasses of Vector or List. In addition, a few low-level concrete subclasses of general interest (e.g. DataFrame, Rle, Factor, and Hits) are implemented in the S4Vectors package itself (many more are implemented in the IRanges package and in other Bioconductor infrastructure packages).
Maintained by Hervé Pagès. Last updated 1 months ago.
infrastructuredatarepresentationbioconductor-packagecore-package
3.3 match 18 stars 16.05 score 1.0k scripts 1.9k dependentsrdatatable
data.table:Extension of `data.frame`
Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development.
Maintained by Tyson Barrett. Last updated 2 days ago.
2.3 match 3.7k stars 23.53 score 230k scripts 4.6k dependentsknausb
vcfR:Manipulate and Visualize VCF Data
Facilitates easy manipulation of variant call format (VCF) data. Functions are provided to rapidly read from and write to VCF files. Once VCF data is read into R a parser function extracts matrices of data. This information can then be used for quality control or other purposes. Additional functions provide visualization of genomic data. Once processing is complete data may be written to a VCF file (*.vcf.gz). It also may be converted into other popular R objects (e.g., genlight, DNAbin). VcfR provides a link between VCF data and familiar R software.
Maintained by Brian J. Knaus. Last updated 22 days ago.
genomicspopulation-geneticspopulation-genomicsrcppvcf-datavisualizationzlibcpp
3.9 match 254 stars 13.59 score 3.1k scripts 19 dependentssdctools
sdcMicro:Statistical Disclosure Control Methods for Anonymization of Data and Risk Estimation
Data from statistical agencies and other institutions are mostly confidential. This package, introduced in Templ, Kowarik and Meindl (2017) <doi:10.18637/jss.v067.i04>, can be used for the generation of anonymized (micro)data, i.e. for the creation of public- and scientific-use files. The theoretical basis for the methods implemented can be found in Templ (2017) <doi:10.1007/978-3-319-50272-4>. Various risk estimation and anonymization methods are included. Note that the package includes a graphical user interface published in Meindl and Templ (2019) <doi:10.3390/a12090191> that allows to use various methods of this package.
Maintained by Matthias Templ. Last updated 26 days ago.
5.3 match 83 stars 9.89 score 258 scriptsbioc
rhdf5:R Interface to HDF5
This package provides an interface between HDF5 and R. HDF5's main features are the ability to store and access very large and/or complex datasets and a wide variety of metadata on mass storage (disk) through a completely portable file format. The rhdf5 package is thus suited for the exchange of large and/or complex datasets between R and other software package, and for letting R applications work on datasets that are larger than the available RAM.
Maintained by Mike Smith. Last updated 2 months ago.
infrastructuredataimporthdf5rhdf5opensslcurlzlibcpp
3.3 match 62 stars 15.93 score 4.2k scripts 232 dependentscran
timeSeries:Financial Time Series Objects (Rmetrics)
'S4' classes and various tools for financial time series: Basic functions such as scaling and sorting, subsetting, mathematical operations and statistical functions.
Maintained by Georgi N. Boshnakov. Last updated 6 months ago.
5.2 match 2 stars 9.90 score 1.3k scripts 145 dependentsstanlazic
desiR:Desirability Functions for Ranking, Selecting, and Integrating Data
Functions for (1) ranking, selecting, and prioritising genes, proteins, and metabolites from high dimensional biology experiments, (2) multivariate hit calling in high content screens, and (3) combining data from diverse sources.
Maintained by Stanley E. Lazic. Last updated 4 years ago.
11.8 match 3 stars 4.26 score 12 scriptssherrillmix
taxonomizr:Functions to Work with NCBI Accessions and Taxonomy
Functions for assigning taxonomy to NCBI accession numbers and taxon IDs based on NCBI's accession2taxid and taxdump files. This package allows the user to download NCBI data dumps and create a local database for fast and local taxonomic assignment.
Maintained by Scott Sherrill-Mix. Last updated 3 days ago.
5.6 match 72 stars 8.85 score 255 scripts 2 dependentsbioc
TREG:Tools for finding Total RNA Expression Genes in single nucleus RNA-seq data
RNA abundance and cell size parameters could improve RNA-seq deconvolution algorithms to more accurately estimate cell type proportions given the different cell type transcription activity levels. A Total RNA Expression Gene (TREG) can facilitate estimating total RNA content using single molecule fluorescent in situ hybridization (smFISH). We developed a data-driven approach using a measure of expression invariance to find candidate TREGs in postmortem human brain single nucleus RNA-seq. This R package implements the method for identifying candidate TREGs from snRNA-seq data.
Maintained by Louise Huuki-Myers. Last updated 3 months ago.
softwaresinglecellrnaseqgeneexpressiontranscriptomicstranscriptionsequencingbioconductordeconvolutionrnascopescrna-seqsmfishsnrna-seqtreg
9.5 match 4 stars 5.20 score 5 scriptsr-forge
robustbase:Basic Robust Statistics
"Essential" Robust Statistics. Tools allowing to analyze data with robust methods. This includes regression methodology including model selections and multivariate statistics where we strive to cover the book "Robust Statistics, Theory and Methods" by 'Maronna, Martin and Yohai'; Wiley 2006.
Maintained by Martin Maechler. Last updated 4 months ago.
3.7 match 13.33 score 1.7k scripts 480 dependentsklauschn
SpatialNP:Multivariate Nonparametric Methods Based on Spatial Signs and Ranks
Test and estimates of location, tests of independence, tests of sphericity and several estimates of shape all based on spatial signs, symmetrized signs, ranks and signed ranks. For details, see Oja and Randles (2004) <doi:10.1214/088342304000000558> and Oja (2010) <doi:10.1007/978-1-4419-0468-3>.
Maintained by Klaus Nordhausen. Last updated 3 years ago.
22.6 match 2.18 score 10 scripts 5 dependentspweidemueller
fullRankMatrix:Generation of Full Rank Design Matrix
Creates a full rank matrix out of a given matrix. The intended use is for one-hot encoded design matrices that should be used in linear models to ensure that significant associations can be correctly interpreted. However, 'fullRankMatrix' can be applied to any matrix to make it full rank. It removes columns with only 0's, merges duplicated columns and discovers linearly dependent columns and replaces them with linearly independent columns that span the space of the original columns. Columns are renamed to reflect those modifications. This results in a full rank matrix that can be used as a design matrix in linear models. The algorithm and some functions are inspired by Kuhn, M. (2008) <doi:10.18637/jss.v028.i05>.
Maintained by Paula Weidemueller. Last updated 9 months ago.
8.7 match 14 stars 5.62 score 6 scriptssportsdataverse
hoopR:Access Men's Basketball Play by Play Data
A utility to quickly obtain clean and tidy men's basketball play by play data. Provides functions to access live play by play and box score data from ESPN<https://www.espn.com> with shot locations when available. It is also a full NBA Stats API<https://www.nba.com/stats/> wrapper. It is also a scraping and aggregating interface for Ken Pomeroy's men's college basketball statistics website<https://kenpom.com>. It provides users with an active subscription the capability to scrape the website tables and analyze the data for themselves.
Maintained by Saiem Gilani. Last updated 1 years ago.
basketballcollege-basketballespnkenpomnbanba-analyticsnba-apinba-datanba-statisticsnba-statsnba-stats-apincaancaa-basketballncaa-bracketncaa-playersncaa-ratingsncaamsportsdataverse
7.0 match 91 stars 6.93 score 261 scriptsgertvv
gemtc:Network Meta-Analysis Using Bayesian Methods
Network meta-analyses (mixed treatment comparisons) in the Bayesian framework using JAGS. Includes methods to assess heterogeneity and inconsistency, and a number of standard visualizations.
Maintained by Gert van Valkenhoef. Last updated 5 years ago.
6.5 match 44 stars 7.48 score 71 scripts 1 dependentsjtfeld
EloOptimized:Optimized Elo Rating Method for Obtaining Dominance Ranks
Provides an implementation of the maximum likelihood methods for deriving Elo scores as published in Foerster, Franz et al. (2016) <DOI:10.1038/srep35404>.
Maintained by Joseph Feldblum. Last updated 10 months ago.
eloelo-ratingmaximum-likelihood-estimation
11.5 match 3 stars 4.18 score 9 scriptshosseinazari
StatRank:Statistical Rank Aggregation: Inference, Evaluation, and Visualization
A set of methods to implement Generalized Method of Moments and Maximal Likelihood methods for Random Utility Models. These methods are meant to provide inference on rank comparison data. These methods accept full, partial, and pairwise rankings, and provides methods to break down full or partial rankings into their pairwise components. Please see Generalized Method-of-Moments for Rank Aggregation from NIPS 2013 for a description of some of our methods.
Maintained by Hossein Azari Soufiani. Last updated 10 years ago.
14.8 match 3.24 score 58 scripts 2 dependentspetersonr
sparseR:Variable Selection under Ranked Sparsity Principles for Interactions and Polynomials
An implementation of ranked sparsity methods, including penalized regression methods such as the sparsity-ranked lasso, its non-convex alternatives, and elastic net, as well as the sparsity-ranked Bayesian Information Criterion. As described in Peterson and Cavanaugh (2022) <doi:10.1007/s10182-021-00431-7>, ranked sparsity is a philosophy with methods primarily useful for variable selection in the presence of prior informational asymmetry, which occurs in the context of trying to perform variable selection in the presence of interactions and/or polynomials. Ultimately, this package attempts to facilitate dealing with cumbersome interactions and polynomials while not avoiding them entirely. Typically, models selected under ranked sparsity principles will also be more transparent, having fewer falsely selected interactions and polynomials than other methods.
Maintained by Ryan Andrew Peterson. Last updated 8 months ago.
9.0 match 6 stars 5.23 score 14 scriptsspholmes
XICOR:Association Measurement Through Cross Rank Increments
Computes robust association measures that do not presuppose linearity. The xi correlation (xicor) is based on cross correlation between ranked increments. The reference for the methods implemented here is Chatterjee, Sourav (2020) <arXiv:1909.10140> This package includes the Galton peas example.
Maintained by Susan Holmes. Last updated 3 months ago.
10.3 match 1 stars 4.56 score 61 scripts 2 dependentstransbiozi
RMTL:Regularized Multi-Task Learning
Efficient solvers for 10 regularized multi-task learning algorithms applicable for regression, classification, joint feature selection, task clustering, low-rank learning, sparse learning and network incorporation. Based on the accelerated gradient descent method, the algorithms feature a state-of-art computational complexity O(1/k^2). Sparse model structure is induced by the solving the proximal operator. The detail of the package is described in the paper of Han Cao and Emanuel Schwarz (2018) <doi:10.1093/bioinformatics/bty831>.
Maintained by Han Cao. Last updated 6 years ago.
low-rank-representaionmulti-task-learningregularizationsparse-coding
8.4 match 19 stars 5.60 score 21 scriptsegeulgen
driveR:Prioritizing Cancer Driver Genes Using Genomics Data
Cancer genomes contain large numbers of somatic alterations but few genes drive tumor development. Identifying cancer driver genes is critical for precision oncology. Most of current approaches either identify driver genes based on mutational recurrence or using estimated scores predicting the functional consequences of mutations. 'driveR' is a tool for personalized or batch analysis of genomic data for driver gene prioritization by combining genomic information and prior biological knowledge. As features, 'driveR' uses coding impact metaprediction scores, non-coding impact scores, somatic copy number alteration scores, hotspot gene/double-hit gene condition, 'phenolyzer' gene scores and memberships to cancer-related KEGG pathways. It uses these features to estimate cancer-type-specific probability for each gene of being a cancer driver using the related task of a multi-task learning classification model. The method is described in detail in Ulgen E, Sezerman OU. 2021. driveR: driveR: a novel method for prioritizing cancer driver genes using somatic genomics data. BMC Bioinformatics <doi:10.1186/s12859-021-04203-7>.
Maintained by Ege Ulgen. Last updated 2 years ago.
cancer-drivernessdriverdriver-gene-prioritizationidentify-driver-genesranking-genesscoring
7.5 match 15 stars 6.29 score 260 scriptsbioc
GSEABenchmarkeR:Reproducible GSEA Benchmarking
The GSEABenchmarkeR package implements an extendable framework for reproducible evaluation of set- and network-based methods for enrichment analysis of gene expression data. This includes support for the efficient execution of these methods on comprehensive real data compendia (microarray and RNA-seq) using parallel computation on standard workstations and institutional computer grids. Methods can then be assessed with respect to runtime, statistical significance, and relevance of the results for the phenotypes investigated.
Maintained by Ludwig Geistlinger. Last updated 5 months ago.
immunooncologymicroarrayrnaseqgeneexpressiondifferentialexpressionpathwaysgraphandnetworknetworkgenesetenrichmentnetworkenrichmentvisualizationreportwritingbioconductor-packageu24ca289073
7.2 match 13 stars 6.55 score 23 scriptsropensci
phylotaR:Automated Phylogenetic Sequence Cluster Identification from 'GenBank'
A pipeline for the identification, within taxonomic groups, of orthologous sequence clusters from 'GenBank' <https://www.ncbi.nlm.nih.gov/genbank/> as the first step in a phylogenetic analysis. The pipeline depends on a local alignment search tool and is, therefore, not dependent on differences in gene naming conventions and naming errors.
Maintained by Shixiang Wang. Last updated 8 months ago.
blastngenbankpeer-reviewedphylogeneticssequence-alignment
8.0 match 23 stars 5.86 score 156 scriptsinsightsengineering
cardx:Extra Analysis Results Data Utilities
Create extra Analysis Results Data (ARD) summary objects. The package supplements the simple ARD functions from the 'cards' package, exporting functions to put statistical results in the ARD format. These objects are used and re-used to construct summary tables, visualizations, and written reports.
Maintained by Daniel D. Sjoberg. Last updated 19 days ago.
5.4 match 19 stars 8.46 score 50 scriptsehrlinger
ggRandomForests:Visually Exploring Random Forests
Graphic elements for exploring Random Forests using the 'randomForest' or 'randomForestSRC' package for survival, regression and classification forests and 'ggplot2' package plotting.
Maintained by John Ehrlinger. Last updated 5 days ago.
5.1 match 148 stars 8.94 score 197 scriptsbioc
ELMER:Inferring Regulatory Element Landscapes and Transcription Factor Networks Using Cancer Methylomes
ELMER is designed to use DNA methylation and gene expression from a large number of samples to infere regulatory element landscape and transcription factor network in primary tissue.
Maintained by Tiago Chedraoui Silva. Last updated 5 months ago.
dnamethylationgeneexpressionmotifannotationsoftwaregeneregulationtranscriptionnetwork
6.1 match 7.42 score 176 scriptsyongkai17
spFSR:Feature Selection and Ranking by Simultaneous Perturbation Stochastic Approximation
An implementation of feature selection and ranking via simultaneous perturbation stochastic approximation (SPSA-FSR) based on works by V. Aksakalli and M. Malekipirbazari (2015) <arXiv:1508.07630> and Zeren D. Yenice and et al. (2018) <arXiv:1804.05589>. The SPSA-FSR algorithm searches for a locally optimal set of features that yield the best predictive performance using a specified error measure such as mean squared error (for regression problems) and accuracy rate (for classification problems). This package requires an object of class 'task' and an object of class 'Learner' from the 'mlr' package.
Maintained by Vural Aksakalli. Last updated 6 years ago.
11.3 match 2 stars 4.00 score 8 scriptsbioc
GenomicAlignments:Representation and manipulation of short genomic alignments
Provides efficient containers for storing and manipulating short genomic alignments (typically obtained by aligning short reads to a reference genome). This includes read counting, computing the coverage, junction detection, and working with the nucleotide content of the alignments.
Maintained by Hervé Pagès. Last updated 5 months ago.
infrastructuredataimportgeneticssequencingrnaseqsnpcoveragealignmentimmunooncologybioconductor-packagecore-package
3.3 match 10 stars 13.61 score 3.1k scripts 529 dependentsbioc
AUCell:AUCell: Analysis of 'gene set' activity in single-cell RNA-seq data (e.g. identify cells with specific gene signatures)
AUCell allows to identify cells with active gene sets (e.g. signatures, gene modules...) in single-cell RNA-seq data. AUCell uses the "Area Under the Curve" (AUC) to calculate whether a critical subset of the input gene set is enriched within the expressed genes for each cell. The distribution of AUC scores across all the cells allows exploring the relative expression of the signature. Since the scoring method is ranking-based, AUCell is independent of the gene expression units and the normalization procedure. In addition, since the cells are evaluated individually, it can easily be applied to bigger datasets, subsetting the expression matrix if needed.
Maintained by Gert Hulselmans. Last updated 5 months ago.
singlecellgenesetenrichmenttranscriptomicstranscriptiongeneexpressionworkflowstepnormalization
5.2 match 8.59 score 860 scripts 4 dependentsmikejseo
bnma:Bayesian Network Meta-Analysis using 'JAGS'
Network meta-analyses using Bayesian framework following Dias et al. (2013) <DOI:10.1177/0272989X12458724>. Based on the data input, creates prior, model file, and initial values needed to run models in 'rjags'. Able to handle binomial, normal and multinomial arm-level data. Can handle multi-arm trials and includes methods to incorporate covariate and baseline risk effects. Includes standard diagnostics and visualization tools to evaluate the results.
Maintained by Michael Seo. Last updated 1 years ago.
9.7 match 7 stars 4.54 score 7 scriptsropensci
taxa:Classes for Storing and Manipulating Taxonomic Data
Provides classes for storing and manipulating taxonomic data. Most of the classes can be treated like base R vectors (e.g. can be used in tables as columns and can be named). Vectorized classes can store taxon names and authorities, taxon IDs from databases, taxon ranks, and other types of information. More complex classes are provided to store taxonomic trees and user-defined data associated with them.
Maintained by Zachary Foster. Last updated 1 years ago.
taxonomybiologyhierarchydata-cleaningtaxon
6.4 match 48 stars 6.80 score 217 scriptsklainfo
ScottKnottESD:The Non-Parametric Scott-Knott Effect Size Difference (ESD) Test
The Non-Parametric Scott-Knott Effect Size Difference (ESD) test is a mean comparison approach that leverages a hierarchical clustering to partition the set of treatment means (e.g., means of variable importance scores, means of model performance) into statistically distinct groups with non-negligible difference [Tantithamthavorn et al., (2018) <doi:10.1109/TSE.2018.2794977>].
Maintained by Chakkrit Tantithamthavorn. Last updated 2 years ago.
defect-prediction-modelseffect-sizemultiple-comparisonsranking-algorithmscott-knottstatistical-tests
7.5 match 43 stars 5.77 score 68 scriptsrobjhyndman
rcademy:Tools to assist with academic promotions
Ideas and tools to help with preparing documentation for promotions at universities.
Maintained by Rob Hyndman. Last updated 6 months ago.
10.2 match 14 stars 4.23 score 9 scriptsbioc
target:Predict Combined Function of Transcription Factors
Implement the BETA algorithm for infering direct target genes from DNA-binding and perturbation expression data Wang et al. (2013) <doi: 10.1038/nprot.2013.150>. Extend the algorithm to predict the combined function of two DNA-binding elements from comprable binding and expression data.
Maintained by Mahmoud Ahmed. Last updated 5 months ago.
softwarestatisticalmethodtranscriptionalgorithmchip-seqdna-bindinggene-regulationtranscription-factors
5.5 match 4 stars 7.79 score 1.3k scriptsmikejareds
hermiter:Efficient Sequential and Batch Estimation of Univariate and Bivariate Probability Density Functions and Cumulative Distribution Functions along with Quantiles (Univariate) and Nonparametric Correlation (Bivariate)
Facilitates estimation of full univariate and bivariate probability density functions and cumulative distribution functions along with full quantile functions (univariate) and nonparametric correlation (bivariate) using Hermite series based estimators. These estimators are particularly useful in the sequential setting (both stationary and non-stationary) and one-pass batch estimation setting for large data sets. Based on: Stephanou, Michael, Varughese, Melvin and Macdonald, Iain. "Sequential quantiles via Hermite series density estimation." Electronic Journal of Statistics 11.1 (2017): 570-607 <doi:10.1214/17-EJS1245>, Stephanou, Michael and Varughese, Melvin. "On the properties of Hermite series based distribution function estimators." Metrika (2020) <doi:10.1007/s00184-020-00785-z> and Stephanou, Michael and Varughese, Melvin. "Sequential estimation of Spearman rank correlation using Hermite series estimators." Journal of Multivariate Analysis (2021) <doi:10.1016/j.jmva.2021.104783>.
Maintained by Michael Stephanou. Last updated 7 months ago.
cumulative-distribution-functionkendall-correlation-coefficientonline-algorithmsprobability-density-functionquantilespearman-correlation-coefficientstatisticsstreaming-algorithmsstreaming-datacpp
7.7 match 15 stars 5.58 score 17 scriptsr-lib
vctrs:Vector Helpers
Defines new notions of prototype and size that are used to provide tools for consistent and well-founded type-coercion and size-recycling, and are in turn connected to ideas of type- and size-stability useful for analysing function interfaces.
Maintained by Davis Vaughan. Last updated 5 months ago.
2.3 match 290 stars 18.97 score 1.1k scripts 13k dependentswanchanglin
mt:Metabolomics Data Analysis Toolbox
Functions for metabolomics data analysis: data preprocessing, orthogonal signal correction, PCA analysis, PCA-DA analysis, PLS-DA analysis, classification, feature selection, correlation analysis, data visualisation and re-sampling strategies.
Maintained by Wanchang Lin. Last updated 1 years ago.
9.2 match 3 stars 4.57 score 50 scriptsthothorn
maxstat:Maximally Selected Rank Statistics
Maximally selected rank statistics with several p-value approximations.
Maintained by Torsten Hothorn. Last updated 8 years ago.
5.6 match 1 stars 7.58 score 107 scripts 56 dependentsfk83
scoringRules:Scoring Rules for Parametric and Simulated Distribution Forecasts
Dictionary-like reference for computing scoring rules in a wide range of situations. Covers both parametric forecast distributions (such as mixtures of Gaussians) and distributions generated via simulation. Further details can be found in the package vignettes <doi:10.18637/jss.v090.i12>, <doi:10.18637/jss.v110.i08>.
Maintained by Fabian Krueger. Last updated 6 months ago.
3.7 match 59 stars 11.33 score 408 scripts 13 dependentsepiforecasts
scoringutils:Utilities for Scoring and Assessing Predictions
Facilitate the evaluation of forecasts in a convenient framework based on data.table. It allows user to to check their forecasts and diagnose issues, to visualise forecasts and missing data, to transform data before scoring, to handle missing forecasts, to aggregate scores, and to visualise the results of the evaluation. The package mostly focuses on the evaluation of probabilistic forecasts and allows evaluating several different forecast types and input formats. Find more information about the package in the Vignettes as well as in the accompanying paper, <doi:10.48550/arXiv.2205.07090>.
Maintained by Nikos Bosse. Last updated 13 days ago.
forecast-evaluationforecasting
3.7 match 52 stars 11.37 score 326 scripts 7 dependentsbioc
siggenes:Multiple Testing using SAM and Efron's Empirical Bayes Approaches
Identification of differentially expressed genes and estimation of the False Discovery Rate (FDR) using both the Significance Analysis of Microarrays (SAM) and the Empirical Bayes Analyses of Microarrays (EBAM).
Maintained by Holger Schwender. Last updated 5 months ago.
multiplecomparisonmicroarraygeneexpressionsnpexonarraydifferentialexpression
5.3 match 7.86 score 74 scripts 33 dependentsnliulab
AutoScore:An Interpretable Machine Learning-Based Automatic Clinical Score Generator
A novel interpretable machine learning-based framework to automate the development of a clinical scoring model for predefined outcomes. Our novel framework consists of six modules: variable ranking with machine learning, variable transformation, score derivation, model selection, domain knowledge-based score fine-tuning, and performance evaluation.The details are described in our research paper<doi:10.2196/21798>. Users or clinicians could seamlessly generate parsimonious sparse-score risk models (i.e., risk scores), which can be easily implemented and validated in clinical practice. We hope to see its application in various medical case studies.
Maintained by Feng Xie. Last updated 14 days ago.
5.4 match 32 stars 7.70 score 30 scriptsbioc
ORFik:Open Reading Frames in Genomics
R package for analysis of transcript and translation features through manipulation of sequence data and NGS data like Ribo-Seq, RNA-Seq, TCP-Seq and CAGE. It is generalized in the sense that any transcript region can be analysed, as the name hints to it was made with investigation of ribosomal patterns over Open Reading Frames (ORFs) as it's primary use case. ORFik is extremely fast through use of C++, data.table and GenomicRanges. Package allows to reassign starts of the transcripts with the use of CAGE-Seq data, automatic shifting of RiboSeq reads, finding of Open Reading Frames for whole genomes and much more.
Maintained by Haakon Tjeldnes. Last updated 27 days ago.
immunooncologysoftwaresequencingriboseqrnaseqfunctionalgenomicscoveragealignmentdataimportcpp
3.9 match 33 stars 10.63 score 115 scripts 2 dependentsgertvv
smaa:Stochastic Multi-Criteria Acceptability Analysis
Implementation of the Stochastic Multi-Criteria Acceptability Analysis (SMAA) family of Multiple Criteria Decision Analysis (MCDA) methods. Tervonen, T. and Figueira, J. R. (2008) <doi:10.1002/mcda.407>.
Maintained by Gert van Valkenhoef. Last updated 2 years ago.
8.5 match 10 stars 4.84 score 23 scripts 1 dependentsbrianaronson
birankr:Ranking Nodes in Bipartite and Weighted Networks
Highly efficient functions for estimating various rank (centrality) measures of nodes in bipartite graphs (two-mode networks). Includes methods for estimating HITS, CoHITS, BGRM, and BiRank with implementation primarily inspired by He et al. (2016) <doi:10.1109/TKDE.2016.2611584>. Also provides easy-to-use tools for efficiently estimating PageRank in one-mode graphs, incorporating or removing edge-weights during rank estimation, projecting two-mode graphs to one-mode, and for converting edgelists and matrices to sparseMatrix format. Best of all, the package's rank estimators can work directly with common formats of network data including edgelists (class data.frame, data.table, or tbl_df) and adjacency matrices (class matrix or dgCMatrix).
Maintained by Brian Aronson. Last updated 3 years ago.
8.4 match 37 stars 4.87 score 9 scriptshputter
mstate:Data Preparation, Estimation and Prediction in Multi-State Models
Contains functions for data preparation, descriptives, hazard estimation and prediction with Aalen-Johansen or simulation in competing risks and multi-state models, see Putter, Fiocco, Geskus (2007) <doi:10.1002/sim.2712>.
Maintained by Hein Putter. Last updated 28 days ago.
3.4 match 11 stars 12.13 score 322 scripts 55 dependentswelch-lab
cytosignal:What the Package Does (One Line, Title Case)
What the package does (one paragraph).
Maintained by Jialin Liu. Last updated 6 days ago.
6.8 match 16 stars 5.95 score 6 scriptsncss-tech
aqp:Algorithms for Quantitative Pedology
The Algorithms for Quantitative Pedology (AQP) project was started in 2009 to organize a loosely-related set of concepts and source code on the topic of soil profile visualization, aggregation, and classification into this package (aqp). Over the past 8 years, the project has grown into a suite of related R packages that enhance and simplify the quantitative analysis of soil profile data. Central to the AQP project is a new vocabulary of specialized functions and data structures that can accommodate the inherent complexity of soil profile information; freeing the scientist to focus on ideas rather than boilerplate data processing tasks <doi:10.1016/j.cageo.2012.10.020>. These functions and data structures have been extensively tested and documented, applied to projects involving hundreds of thousands of soil profiles, and deeply integrated into widely used tools such as SoilWeb <https://casoilresource.lawr.ucdavis.edu/soilweb-apps>. Components of the AQP project (aqp, soilDB, sharpshootR, soilReports packages) serve an important role in routine data analysis within the USDA-NRCS Soil Science Division. The AQP suite of R packages offer a convenient platform for bridging the gap between pedometric theory and practice.
Maintained by Dylan Beaudette. Last updated 28 days ago.
digital-soil-mappingncss-technrcspedologypedometricssoilsoil-surveyusda
3.4 match 55 stars 11.77 score 1.2k scripts 2 dependentsmaiermarco
prefmod:Utilities to Fit Paired Comparison Models for Preferences
Generates design matrix for analysing real paired comparisons and derived paired comparison data (Likert type items/ratings or rankings) using a loglinear approach. Fits loglinear Bradley-Terry model (LLBT) exploiting an eliminate feature. Computes pattern models for paired comparisons, rankings, and ratings. Some treatment of missing values (MCAR and MNAR). Fits latent class (mixture) models for paired comparison, rating and ranking patterns using a non-parametric ML approach.
Maintained by Marco Johannes Maier. Last updated 1 years ago.
16.5 match 2.41 score 53 scripts 1 dependentstidyverse
stringr:Simple, Consistent Wrappers for Common String Operations
A consistent, simple and easy to use set of wrappers around the fantastic 'stringi' package. All function and argument names (and positions) are consistent, all functions deal with "NA"'s and zero length vectors in the same way, and the output from one function is easy to feed into the input of another.
Maintained by Hadley Wickham. Last updated 7 months ago.
1.8 match 622 stars 21.97 score 164k scripts 8.2k dependentsf-caeiro
randtests:Testing Randomness in R
Provides several non parametric randomness tests for numeric sequences.
Maintained by Frederico Caeiro. Last updated 11 months ago.
9.9 match 4.00 score 235 scripts 3 dependentspmartr
pmartR:Panomics Marketplace - Quality Control and Statistical Analysis for Panomics Data
Provides functionality for quality control processing and statistical analysis of mass spectrometry (MS) omics data, in particular proteomic (either at the peptide or the protein level), lipidomic, and metabolomic data, as well as RNA-seq based count data and nuclear magnetic resonance (NMR) data. This includes data transformation, specification of groups that are to be compared against each other, filtering of features and/or samples, data normalization, data summarization (correlation, PCA), and statistical comparisons between defined groups. Implements methods described in: Webb-Robertson et al. (2014) <doi:10.1074/mcp.M113.030932>. Webb-Robertson et al. (2011) <doi:10.1002/pmic.201100078>. Matzke et al. (2011) <doi:10.1093/bioinformatics/btr479>. Matzke et al. (2013) <doi:10.1002/pmic.201200269>. Polpitiya et al. (2008) <doi:10.1093/bioinformatics/btn217>. Webb-Robertson et al. (2010) <doi:10.1021/pr1005247>.
Maintained by Lisa Bramer. Last updated 3 days ago.
data-summarizationlipidsmass-spectrometrymetabolitesmetabolomics-datapeptidesproteinsrna-seq-analysisopenblascpp
5.1 match 40 stars 7.69 score 144 scriptsbioc
PDATK:Pancreatic Ductal Adenocarcinoma Tool-Kit
Pancreatic ductal adenocarcinoma (PDA) has a relatively poor prognosis and is one of the most lethal cancers. Molecular classification of gene expression profiles holds the potential to identify meaningful subtypes which can inform therapeutic strategy in the clinical setting. The Pancreatic Cancer Adenocarcinoma Tool-Kit (PDATK) provides an S4 class-based interface for performing unsupervised subtype discovery, cross-cohort meta-clustering, gene-expression-based classification, and subsequent survival analysis to identify prognostically useful subtypes in pancreatic cancer and beyond. Two novel methods, Consensus Subtypes in Pancreatic Cancer (CSPC) and Pancreatic Cancer Overall Survival Predictor (PCOSP) are included for consensus-based meta-clustering and overall-survival prediction, respectively. Additionally, four published subtype classifiers and three published prognostic gene signatures are included to allow users to easily recreate published results, apply existing classifiers to new data, and benchmark the relative performance of new methods. The use of existing Bioconductor classes as input to all PDATK classes and methods enables integration with existing Bioconductor datasets, including the 21 pancreatic cancer patient cohorts available in the MetaGxPancreas data package. PDATK has been used to replicate results from Sandhu et al (2019) [https://doi.org/10.1200/cci.18.00102] and an additional paper is in the works using CSPC to validate subtypes from the included published classifiers, both of which use the data available in MetaGxPancreas. The inclusion of subtype centroids and prognostic gene signatures from these and other publications will enable researchers and clinicians to classify novel patient gene expression data, allowing the direct clinical application of the classifiers included in PDATK. Overall, PDATK provides a rich set of tools to identify and validate useful prognostic and molecular subtypes based on gene-expression data, benchmark new classifiers against existing ones, and apply discovered classifiers on novel patient data to inform clinical decision making.
Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.
geneexpressionpharmacogeneticspharmacogenomicssoftwareclassificationsurvivalclusteringgeneprediction
9.1 match 1 stars 4.31 score 17 scriptshanettools
Perc:Using Percolation and Conductance to Find Information Flow Certainty in a Direct Network
To find the certainty of dominance interactions with indirect interactions being considered.
Maintained by Jessica Vandeleest. Last updated 4 years ago.
6.7 match 5.88 score 38 scriptstomasfryda
h2o:R Interface for the 'H2O' Scalable Machine Learning Platform
R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).
Maintained by Tomas Fryda. Last updated 1 years ago.
4.8 match 3 stars 8.20 score 7.8k scripts 11 dependentsvladimirholy
gasmodel:Generalized Autoregressive Score Models
Estimation, forecasting, and simulation of generalized autoregressive score (GAS) models of Creal, Koopman, and Lucas (2013) <doi:10.1002/jae.1279> and Harvey (2013) <doi:10.1017/cbo9781139540933>. Model specification allows for various data types and distributions, different parametrizations, exogenous variables, joint and separate modeling of exogenous variables and dynamics, higher score and autoregressive orders, custom and unconditional initial values of time-varying parameters, fixed and bounded values of coefficients, and missing values. Model estimation is performed by the maximum likelihood method.
Maintained by Vladimír Holý. Last updated 1 years ago.
7.2 match 14 stars 5.45 score 2 scriptsalbertocalore
MARMoT:Matching on Poset-Based Average Rank for Multiple Treatments (MARMoT)
It contains the function to apply MARMoT balancing technique discussed in: Silan, Boccuzzo, Arpino (2021) <DOI:10.1002/sim.9192>, Silan, Belloni, Boccuzzo, (2023) <DOI:10.1007/s10260-023-00695-0>; furthermore it contains a function for computing the Deloof's approximation of the average rank (and also a parallelized version) and a function to compute the Absolute Standardized Bias.
Maintained by Alberto Calore. Last updated 7 months ago.
asbaverage-rankposetpropensity-scorepropensity-scorestemplate-matching
14.4 match 2.70 score 6 scriptsalexanderjwhite
mixedLSR:Mixed, Low-Rank, and Sparse Multivariate Regression on High-Dimensional Data
Mixed, low-rank, and sparse multivariate regression ('mixedLSR') provides tools for performing mixture regression when the coefficient matrix is low-rank and sparse. 'mixedLSR' allows subgroup identification by alternating optimization with simulated annealing to encourage global optimum convergence. This method is data-adaptive, automatically performing parameter selection to identify low-rank substructures in the coefficient matrix.
Maintained by Alexander White. Last updated 2 years ago.
10.5 match 3.70 score 2 scriptsbioc
DESeq2:Differential gene expression analysis based on the negative binomial distribution
Estimate variance-mean dependence in count data from high-throughput sequencing assays and test for differential expression based on a model using the negative binomial distribution.
Maintained by Michael Love. Last updated 11 days ago.
sequencingrnaseqchipseqgeneexpressiontranscriptionnormalizationdifferentialexpressionbayesianregressionprincipalcomponentclusteringimmunooncologyopenblascpp
2.4 match 375 stars 16.11 score 17k scripts 115 dependentsbioc
GeneExpressionSignature:Gene Expression Signature based Similarity Metric
This package gives the implementations of the gene expression signature and its distance to each. Gene expression signature is represented as a list of genes whose expression is correlated with a biological state of interest. And its distance is defined using a nonparametric, rank-based pattern-matching strategy based on the Kolmogorov-Smirnov statistic. Gene expression signature and its distance can be used to detect similarities among the signatures of drugs, diseases, and biological states of interest.
Maintained by Yang Cao. Last updated 5 months ago.
7.8 match 1 stars 5.00 score 5 scriptszebang
tensorTS:Factor and Autoregressive Models for Tensor Time Series
Factor and autoregressive models for matrix and tensor valued time series. We provide functions for estimation, simulation and prediction. The models are discussed in Li et al (2021) <doi:10.48550/arXiv.2110.00928>, Chen et al (2020) <DOI:10.1080/01621459.2021.1912757>, Chen et al (2020) <DOI:10.1016/j.jeconom.2020.07.015>, and Xiao et al (2020) <doi:10.48550/arXiv.2006.02611>.
Maintained by Zebang Li. Last updated 13 days ago.
7.4 match 123 stars 5.27 score 4 scriptsr-forge
Matrix:Sparse and Dense Matrix Classes and Methods
A rich hierarchy of sparse and dense matrix classes, including general, symmetric, triangular, and diagonal matrices with numeric, logical, or pattern entries. Efficient methods for operating on such matrices, often wrapping the 'BLAS', 'LAPACK', and 'SuiteSparse' libraries.
Maintained by Martin Maechler. Last updated 6 days ago.
2.3 match 1 stars 17.23 score 33k scripts 12k dependentscovaruber
sommer:Solving Mixed Model Equations in R
Structural multivariate-univariate linear mixed model solver for estimation of multiple random effects with unknown variance-covariance structures (e.g., heterogeneous and unstructured) and known covariance among levels of random effects (e.g., pedigree and genomic relationship matrices) (Covarrubias-Pazaran, 2016 <doi:10.1371/journal.pone.0156744>; Maier et al., 2015 <doi:10.1016/j.ajhg.2014.12.006>; Jensen et al., 1997). REML estimates can be obtained using the Direct-Inversion Newton-Raphson and Direct-Inversion Average Information algorithms for the problems r x r (r being the number of records) or using the Henderson-based average information algorithm for the problem c x c (c being the number of coefficients to estimate). Spatial models can also be fitted using the two-dimensional spline functionality available.
Maintained by Giovanny Covarrubias-Pazaran. Last updated 21 days ago.
average-informationmixed-modelsrcpparmadilloopenblascppopenmp
3.0 match 43 stars 12.70 score 300 scripts 9 dependentspeiliangbai92
LSVAR:Estimation of Low Rank Plus Sparse Structured Vector Auto-Regressive (VAR) Model
Implementations of estimation algorithm of low rank plus sparse structured VAR model by using Fast Iterative Shrinkage-Thresholding Algorithm (FISTA). It relates to the algorithm in Sumanta, Li, and Michailidis (2019) <doi:10.1109/TSP.2018.2887401>.
Maintained by Peiliang Bai. Last updated 4 years ago.
8.4 match 4 stars 4.60 score 5 scriptsdominicmagirr
nphRCT:Non-Proportional Hazards in Randomized Controlled Trials
Perform a stratified weighted log-rank test in a randomized controlled trial. Tests can be visualized as a difference in average score on the two treatment arms. These methods are described in Magirr and Burman (2018) <doi:10.48550/arXiv.1807.11097>, Magirr (2020) <doi:10.48550/arXiv.2007.04767>, and Magirr and Jimenez (2022) <doi:10.48550/arXiv.2201.10445>.
Maintained by Dominic Magirr. Last updated 9 months ago.
9.3 match 4.16 score 24 scripts 1 dependentssieste
SpecsVerification:Forecast Verification Routines for Ensemble Forecasts of Weather and Climate
A collection of forecast verification routines developed for the SPECS FP7 project. The emphasis is on comparative verification of ensemble forecasts of weather and climate.
Maintained by Stefan Siegert. Last updated 5 years ago.
12.2 match 1 stars 3.12 score 44 scripts 5 dependentsbioc
epiNEM:epiNEM
epiNEM is an extension of the original Nested Effects Models (NEM). EpiNEM is able to take into account double knockouts and infer more complex network signalling pathways. It is tailored towards large scale double knock-out screens.
Maintained by Martin Pirkl. Last updated 5 months ago.
pathwayssystemsbiologynetworkinferencenetwork
6.5 match 1 stars 5.83 score 1 scripts 3 dependentsbioc
XVector:Foundation of external vector representation and manipulation in Bioconductor
Provides memory efficient S4 classes for storing sequences "externally" (e.g. behind an R external pointer, or on disk).
Maintained by Hervé Pagès. Last updated 2 months ago.
infrastructuredatarepresentationbioconductor-packagecore-packagezlib
3.3 match 2 stars 11.36 score 67 scripts 1.7k dependentsbcmullins
mobilityIndexR:Calculates Transition Matrices and Mobility Indices
Measures mobility in a population through transition matrices and mobility indices. Relative, mixed, and absolute transition matrices are supported. The Prais-Bibby, Absolute Movement, Origin Specific, and Weighted Group Mobility indices are supported. Example income and grade data are included.
Maintained by Brett Mullins. Last updated 3 years ago.
10.0 match 3.70 score 4 scriptsgobbios
EloRating:Animal Dominance Hierarchies by Elo Rating
Provides functions to quantify animal dominance hierarchies. The major focus is on Elo rating and its ability to deal with temporal dynamics in dominance interaction sequences. For static data, David's score and de Vries' I&SI are also implemented. In addition, the package provides functions to assess transitivity, linearity and stability of dominance networks. See Neumann et al (2011) <doi:10.1016/j.anbehav.2011.07.016> for an introduction.
Maintained by Christof Neumann. Last updated 8 months ago.
5.4 match 4 stars 6.86 score 61 scripts 1 dependentsrstudio
keras3:R Interface to 'Keras'
Interface to 'Keras' <https://keras.io>, a high-level neural networks API. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both CPU and GPU devices.
Maintained by Tomasz Kalinowski. Last updated 4 days ago.
2.7 match 845 stars 13.57 score 264 scripts 2 dependentsbioc
GSAR:Gene Set Analysis in R
Gene set analysis using specific alternative hypotheses. Tests for differential expression, scale and net correlation structure.
Maintained by Yasir Rahmatallah. Last updated 5 months ago.
softwarestatisticalmethoddifferentialexpression
8.3 match 4.38 score 7 scriptsangusian
Kendall:Kendall Rank Correlation and Mann-Kendall Trend Test
Computes the Kendall rank correlation and Mann-Kendall trend test. See documentation for use of block bootstrap when there is autocorrelation.
Maintained by A.I. McLeod. Last updated 3 years ago.
5.4 match 6.74 score 864 scripts 25 dependentscorybrunson
ordr:A Tidyverse Extension for Ordinations and Biplots
Ordination comprises several multivariate exploratory and explanatory techniques with theoretical foundations in geometric data analysis; see Podani (2000, ISBN:90-5782-067-6) for techniques and applications and Le Roux & Rouanet (2005) <doi:10.1007/1-4020-2236-0> for foundations. Greenacre (2010, ISBN:978-84-923846) shows how the most established of these, including principal components analysis, correspondence analysis, multidimensional scaling, factor analysis, and discriminant analysis, rely on eigen-decompositions or singular value decompositions of pre-processed numeric matrix data. These decompositions give rise to a set of shared coordinates along which the row and column elements can be measured. The overlay of their scatterplots on these axes, introduced by Gabriel (1971) <doi:10.1093/biomet/58.3.453>, is called a biplot. 'ordr' provides inspection, extraction, manipulation, and visualization tools for several popular ordination classes supported by a set of recovery methods. It is inspired by and designed to integrate into 'tidyverse' workflows provided by Wickham et al (2019) <doi:10.21105/joss.01686>.
Maintained by Jason Cory Brunson. Last updated 12 days ago.
biplotdata-visualizationdimension-reductiongeometric-data-analysisgrammar-of-graphicslog-ratio-analysismultivariate-analysismultivariate-statisticsordinationtidymodelstidyverse
5.0 match 24 stars 7.26 score 28 scriptsadamlilith
statisfactory:Statistical and Geometrical Tools
A collection of statistical and geometrical tools including the aligned rank transform (ART; Higgins et al. 1990 <doi:10.4148/2475-7772.1443>; Peterson 2002 <doi:10.22237/jmasm/1020255240>; Wobbrock et al. 2011 <doi:10.1145/1978942.1978963>), 2-D histograms and histograms with overlapping bins, a function for making all possible formulae within a set of constraints, amongst others.
Maintained by Adam B. Smith. Last updated 5 months ago.
2d-histogramsaligned-rank-transformsampling
10.7 match 3.38 score 16 scripts 1 dependentsbioc
surfaltr:Rapid Comparison of Surface Protein Isoform Membrane Topologies Through surfaltr
Cell surface proteins form a major fraction of the druggable proteome and can be used for tissue-specific delivery of oligonucleotide/cell-based therapeutics. Alternatively spliced surface protein isoforms have been shown to differ in their subcellular localization and/or their transmembrane (TM) topology. Surface proteins are hydrophobic and remain difficult to study thereby necessitating the use of TM topology prediction methods such as TMHMM and Phobius. However, there exists a need for bioinformatic approaches to streamline batch processing of isoforms for comparing and visualizing topologies. To address this gap, we have developed an R package, surfaltr. It pairs inputted isoforms, either known alternatively spliced or novel, with their APPRIS annotated principal counterparts, predicts their TM topologies using TMHMM or Phobius, and generates a customizable graphical output. Further, surfaltr facilitates the prioritization of biologically diverse isoform pairs through the incorporation of three different ranking metrics and through protein alignment functions. Citations for programs mentioned here can be found in the vignette.
Maintained by Pooja Gangras. Last updated 5 months ago.
softwarevisualizationdatarepresentationsplicedalignmentalignmentmultiplesequencealignmentmultiplecomparison
9.1 match 4.00 score 2 scriptsbioc
cmapR:CMap Tools in R
The Connectivity Map (CMap) is a massive resource of perturbational gene expression profiles built by researchers at the Broad Institute and funded by the NIH Library of Integrated Network-Based Cellular Signatures (LINCS) program. Please visit https://clue.io for more information. The cmapR package implements methods to parse, manipulate, and write common CMap data objects, such as annotated matrices and collections of gene sets.
Maintained by Ted Natoli. Last updated 5 months ago.
dataimportdatarepresentationgeneexpressionbioconductorbioinformaticscmap
4.0 match 89 stars 8.85 score 298 scripts