Showing 200 of total 1207 results (show query)
insightsengineering
rtables:Reporting Tables
Reporting tables often have structure that goes beyond simple rectangular data. The 'rtables' package provides a framework for declaring complex multi-level tabulations and then applying them to data. This framework models both tabulation and the resulting tables as hierarchical, tree-like objects which support sibling sub-tables, arbitrary splitting or grouping of data in row and column dimensions, cells containing multiple values, and the concept of contextual summary computations. A convenient pipe-able interface is provided for declaring table layouts and the corresponding computations, and then applying them to data.
Maintained by Joe Zhu. Last updated 2 months ago.
49.4 match 232 stars 13.65 score 238 scripts 17 dependentskwstat
agridat:Agricultural Datasets
Datasets from books, papers, and websites related to agriculture. Example graphics and analyses are included. Data come from small-plot trials, multi-environment trials, uniformity trials, yield monitors, and more.
Maintained by Kevin Wright. Last updated 28 days ago.
54.3 match 125 stars 11.02 score 1.7k scripts 2 dependentshadley
plyr:Tools for Splitting, Applying and Combining Data
A set of tools that solves a common set of problems: you need to break a big problem down into manageable pieces, operate on each piece and then put all the pieces back together. For example, you might want to fit a model to each spatial location or time point in your study, summarise data by panels or collapse high-dimensional arrays to simpler summary statistics. The development of 'plyr' has been generously supported by 'Becton Dickinson'.
Maintained by Hadley Wickham. Last updated 4 months ago.
27.0 match 500 stars 18.16 score 83k scripts 3.3k dependentsklausvigo
phangorn:Phylogenetic Reconstruction and Analysis
Allows for estimation of phylogenetic trees and networks using Maximum Likelihood, Maximum Parsimony, distance methods and Hadamard conjugation (Schliep 2011). Offers methods for tree comparison, model selection and visualization of phylogenetic networks as described in Schliep et al. (2017).
Maintained by Klaus Schliep. Last updated 1 months ago.
softwaretechnologyqualitycontrolphylogenetic-analysisphylogeneticsopenblascpp
26.6 match 206 stars 16.69 score 2.5k scripts 135 dependentsms609
TreeTools:Create, Modify and Analyse Phylogenetic Trees
Efficient implementations of functions for the creation, modification and analysis of phylogenetic trees. Applications include: generation of trees with specified shapes; tree rearrangement; analysis of tree shape; rooting of trees and extraction of subtrees; calculation and depiction of split support; plotting the position of rogue taxa (Klopfstein & Spasojevic 2019) <doi:10.1371/journal.pone.0212942>; calculation of ancestor-descendant relationships, of 'stemwardness' (Asher & Smith, 2022) <doi:10.1093/sysbio/syab072>, and of tree balance (Mir et al. 2013, Lemant et al. 2022) <doi:10.1016/j.mbs.2012.10.005>, <doi:10.1093/sysbio/syac027>; artificial extinction (Asher & Smith, 2022) <doi:10.1093/sysbio/syab072>; import and export of trees from Newick, Nexus (Maddison et al. 1997) <doi:10.1093/sysbio/46.4.590>, and TNT <https://www.lillo.org.ar/phylogeny/tnt/> formats; and analysis of splits and cladistic information.
Maintained by Martin R. Smith. Last updated 1 months ago.
evolutionary-biologyphylogenetic-treesphylogeneticscpp
31.0 match 21 stars 9.92 score 124 scripts 10 dependentsmrdwab
splitstackshape:Stack and Reshape Datasets After Splitting Concatenated Values
Online data collection tools like Google Forms often export multiple-response questions with data concatenated in cells. The concat.split (cSplit) family of functions splits such data into separate cells. The package also includes functions to stack groups of columns and to reshape wide data, even when the data are "unbalanced"---something which reshape (from base R) does not handle, and which melt and dcast from reshape2 do not easily handle.
Maintained by Ananda Mahto. Last updated 6 years ago.
26.5 match 59 stars 10.25 score 2.1k scripts 21 dependentsasgr
imager:Image Processing Library Based on 'CImg'
Fast image processing for images in up to 4 dimensions (two spatial dimensions, one time/depth dimension, one colour dimension). Provides most traditional image processing tools (filtering, morphology, transformations, etc.) as well as various functions for easily analysing image data using R. The package wraps 'CImg', <http://cimg.eu>, a simple, modern C++ library for image processing.
Maintained by Aaron Robotham. Last updated 27 days ago.
17.8 match 17 stars 13.62 score 2.4k scripts 45 dependentsms609
TreeDist:Calculate and Map Distances Between Phylogenetic Trees
Implements measures of tree similarity, including information-based generalized Robinson-Foulds distances (Phylogenetic Information Distance, Clustering Information Distance, Matching Split Information Distance; Smith 2020) <doi:10.1093/bioinformatics/btaa614>; Jaccard-Robinson-Foulds distances (Bocker et al. 2013) <doi:10.1007/978-3-642-40453-5_13>, including the Nye et al. (2006) metric <doi:10.1093/bioinformatics/bti720>; the Matching Split Distance (Bogdanowicz & Giaro 2012) <doi:10.1109/TCBB.2011.48>; Maximum Agreement Subtree distances; the Kendall-Colijn (2016) distance <doi:10.1093/molbev/msw124>, and the Nearest Neighbour Interchange (NNI) distance, approximated per Li et al. (1996) <doi:10.1007/3-540-61332-3_168>. Includes tools for visualizing mappings of tree space (Smith 2022) <doi:10.1093/sysbio/syab100>, for identifying islands of trees (Silva and Wilkinson 2021) <doi:10.1093/sysbio/syab015>, for calculating the median of sets of trees, and for computing the information content of trees and splits.
Maintained by Martin R. Smith. Last updated 1 months ago.
phylogeneticstree-distancephylogenetic-treestree-distancestreescpp
23.4 match 32 stars 10.32 score 97 scripts 5 dependentsspatstat
spatstat.geom:Geometrical Functionality of the 'spatstat' Family
Defines spatial data types and supports geometrical operations on them. Data types include point patterns, windows (domains), pixel images, line segment patterns, tessellations and hyperframes. Capabilities include creation and manipulation of data (using command line or graphical interaction), plotting, geometrical operations (rotation, shift, rescale, affine transformation), convex hull, discretisation and pixellation, Dirichlet tessellation, Delaunay triangulation, pairwise distances, nearest-neighbour distances, distance transform, morphological operations (erosion, dilation, closing, opening), quadrat counting, geometrical measurement, geometrical covariance, colour maps, calculus on spatial domains, Gaussian blur, level sets of images, transects of images, intersections between objects, minimum distance matching. (Excludes spatial data on a network, which are supported by the package 'spatstat.linnet'.)
Maintained by Adrian Baddeley. Last updated 2 days ago.
classes-and-objectsdistance-calculationgeometrygeometry-processingimagesmensurationplottingpoint-patternsspatial-dataspatial-data-analysis
17.6 match 7 stars 12.11 score 241 scripts 227 dependentstrinker
textshape:Tools for Reshaping Text
Tools that can be used to reshape and restructure text data.
Maintained by Tyler Rinker. Last updated 12 months ago.
data-reshapingmanipulationsentence-boundary-detectiontext-datatext-formatingtidy
21.6 match 50 stars 9.18 score 266 scripts 34 dependentstrinker
qdap:Bridging the Gap Between Qualitative Data and Quantitative Analysis
Automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse including frequency counts of sentence types, words, sentences, turns of talk, syllables and other assorted analysis tasks. The package provides parsing tools for preparing transcript data. Many functions enable the user to aggregate data by any number of grouping variables, providing analysis and seamless integration with other R packages that undertake higher level analysis and visualization of text. This affords the user a more efficient and targeted analysis. 'qdap' is designed for transcript analysis, however, many functions are applicable to other areas of Text Mining/ Natural Language Processing.
Maintained by Tyler Rinker. Last updated 4 years ago.
qdapquantitative-discourse-analysistext-analysistext-miningtext-plottingopenjdk
20.3 match 176 stars 9.61 score 1.3k scripts 3 dependentstpronk
splithalfr:Estimate Split-Half Reliabilities
Estimates split-half reliabilities for scoring algorithms of cognitive tasks and questionnaires. The 'splithalfr' supports researcher-provided scoring algorithms, with six vignettes illustrating how on included datasets. The package provides four splitting methods (first-second, odd-even, permutated, Monte Carlo), the option to stratify splits by task design, a number of reliability coefficients, and the option to sub-sample data.
Maintained by Thomas Pronk. Last updated 1 years ago.
31.8 match 5 stars 5.86 score 41 scriptsalarm-redist
redistmetrics:Redistricting Metrics
Reliable and flexible tools for scoring redistricting plans using common measures and metrics. These functions provide key direct access to tools useful for non-simulation analyses of redistricting plans, such as for measuring compactness or partisan fairness. Tools are designed to work with the 'redist' package seamlessly.
Maintained by Christopher T. Kenny. Last updated 9 months ago.
21.8 match 10 stars 7.57 score 23 scripts 2 dependentsdanlwarren
rwty:R We There Yet? Visualizing MCMC Convergence in Phylogenetics
Implements various tests, visualizations, and metrics for diagnosing convergence of MCMC chains in phylogenetics. It implements and automates many of the functions of the AWTY package in the R environment, as well as a host of other functions. Warren, Geneva, and Lanfear (2017), <doi:10.1093/molbev/msw279>.
Maintained by Dan Warren. Last updated 4 years ago.
22.2 match 30 stars 7.32 score 117 scriptsalarm-redist
redist:Simulation Methods for Legislative Redistricting
Enables researchers to sample redistricting plans from a pre-specified target distribution using Sequential Monte Carlo and Markov Chain Monte Carlo algorithms. The package allows for the implementation of various constraints in the redistricting process such as geographic compactness and population parity requirements. Tools for analysis such as computation of various summary statistics and plotting functionality are also included. The package implements the SMC algorithm of McCartan and Imai (2023) <doi:10.1214/23-AOAS1763>, the enumeration algorithm of Fifield, Imai, Kawahara, and Kenny (2020) <doi:10.1080/2330443X.2020.1791773>, the Flip MCMC algorithm of Fifield, Higgins, Imai and Tarr (2020) <doi:10.1080/10618600.2020.1739532>, the Merge-split/Recombination algorithms of Carter et al. (2019) <arXiv:1911.01503> and DeFord et al. (2021) <doi:10.1162/99608f92.eb30390f>, and the Short-burst optimization algorithm of Cannon et al. (2020) <arXiv:2011.02288>.
Maintained by Christopher T. Kenny. Last updated 2 months ago.
geospatialgerrymanderingredistrictingsamplingopenblascppopenmp
17.5 match 68 stars 9.17 score 259 scriptsohdsi
PatientLevelPrediction:Develop Clinical Prediction Models Using the Common Data Model
A user friendly way to create patient level prediction models using the Observational Medical Outcomes Partnership Common Data Model. Given a cohort of interest and an outcome of interest, the package can use data in the Common Data Model to build a large set of features. These features can then be used to fit a predictive model with a number of machine learning algorithms. This is further described in Reps (2017) <doi:10.1093/jamia/ocy032>.
Maintained by Egill Fridgeirsson. Last updated 9 days ago.
14.8 match 190 stars 10.85 score 297 scriptsrichfitz
diversitree:Comparative 'Phylogenetic' Analyses of Diversification
Contains a number of comparative 'phylogenetic' methods, mostly focusing on analysing diversification and character evolution. Contains implementations of 'BiSSE' (Binary State 'Speciation' and Extinction) and its unresolved tree extensions, 'MuSSE' (Multiple State 'Speciation' and Extinction), 'QuaSSE', 'GeoSSE', and 'BiSSE-ness' Other included methods include Markov models of discrete and continuous trait evolution and constant rate 'speciation' and extinction.
Maintained by Richard G. FitzJohn. Last updated 6 months ago.
18.6 match 33 stars 8.51 score 524 scripts 4 dependentsgpilgrim2670
SwimmeR:Data Import, Cleaning, and Conversions for Swimming Results
The goal of the 'SwimmeR' package is to provide means of acquiring, and then analyzing, data from swimming (and diving) competitions. To that end 'SwimmeR' allows results to be read in from .html sources, like 'Hy-Tek' real time results pages, '.pdf' files, 'ISL' results, 'Omega' results, and (on a development basis) '.hy3' files. Once read in, 'SwimmeR' can convert swimming times (performances) between the computationally useful format of seconds reported to the '100ths' place (e.g. 95.37), and the conventional reporting format (1:35.37) used in the swimming community. 'SwimmeR' can also score meets in a variety of formats with user defined point values, convert times between courses ('LCM', 'SCM', 'SCY') and draw single elimination brackets, as well as providing a suite of tools for working cleaning swimming data. This is a developmental package, not yet mature.
Maintained by Greg Pilgrim. Last updated 2 years ago.
34.2 match 4 stars 4.53 score 17 scriptsandybega
spduration:Split-Population Duration (Cure) Regression
An implementation of split-population duration regression models. Unlike regular duration models, split-population duration models are mixture models that accommodate the presence of a sub-population that is not at risk for failure, e.g. cancer patients who have been cured by treatment. This package implements Weibull and Loglogistic forms for the duration component, and focuses on data with time-varying covariates. These models were originally formulated in Boag (1949) and Berkson and Gage (1952), and extended in Schmidt and Witte (1989).
Maintained by Andreas Beger. Last updated 1 years ago.
mixture-modelregressionsplit-populationsurvival-analysiscpp
27.9 match 4 stars 5.38 score 40 scriptstidymodels
rsample:General Resampling Infrastructure
Classes and functions to create and summarize different types of resampling objects (e.g. bootstrap, cross-validation).
Maintained by Hannah Frick. Last updated 5 days ago.
8.9 match 341 stars 16.72 score 5.2k scripts 79 dependentskkholst
mets:Analysis of Multivariate Event Times
Implementation of various statistical models for multivariate event history data <doi:10.1007/s10985-013-9244-x>. Including multivariate cumulative incidence models <doi:10.1002/sim.6016>, and bivariate random effects probit models (Liability models) <doi:10.1016/j.csda.2015.01.014>. Modern methods for survival analysis, including regression modelling (Cox, Fine-Gray, Ghosh-Lin, Binomial regression) with fast computation of influence functions.
Maintained by Klaus K. Holst. Last updated 3 days ago.
multivariate-time-to-eventsurvival-analysistime-to-eventfortranopenblascpp
9.5 match 14 stars 13.47 score 236 scripts 42 dependentsludvigolsen
groupdata2:Creating Groups from Data
Methods for dividing data into groups. Create balanced partitions and cross-validation folds. Perform time series windowing and general grouping and splitting of data. Balance existing groups with up- and downsampling or collapse them to fewer groups.
Maintained by Ludvig Renbo Olsen. Last updated 3 months ago.
balancecross-validationdatadata-framefoldgroup-factorgroupsparticipantspartitionsplitstaircase
13.4 match 27 stars 9.36 score 338 scripts 7 dependentsinsightsengineering
tern:Create Common TLGs Used in Clinical Trials
Table, Listings, and Graphs (TLG) library for common outputs used in clinical trials.
Maintained by Joe Zhu. Last updated 2 months ago.
clinical-trialsgraphslistingsnestoutputstables
9.8 match 79 stars 12.62 score 186 scripts 9 dependentscran
nlme:Linear and Nonlinear Mixed Effects Models
Fit and compare Gaussian linear and nonlinear mixed-effects models.
Maintained by R Core Team. Last updated 2 months ago.
9.4 match 6 stars 13.00 score 13k scripts 8.7k dependentstomkellygenetics
vioplot:Violin Plot
A violin plot is a combination of a box plot and a kernel density plot. This package allows extensive customisation of violin plots.
Maintained by S. Thomas Kelly. Last updated 21 days ago.
boxplotcolourscustomisationdatavizformulaplottingviolin-plotviolinplotvioplot
9.9 match 26 stars 12.32 score 2.0k scripts 8 dependentsncss-tech
aqp:Algorithms for Quantitative Pedology
The Algorithms for Quantitative Pedology (AQP) project was started in 2009 to organize a loosely-related set of concepts and source code on the topic of soil profile visualization, aggregation, and classification into this package (aqp). Over the past 8 years, the project has grown into a suite of related R packages that enhance and simplify the quantitative analysis of soil profile data. Central to the AQP project is a new vocabulary of specialized functions and data structures that can accommodate the inherent complexity of soil profile information; freeing the scientist to focus on ideas rather than boilerplate data processing tasks <doi:10.1016/j.cageo.2012.10.020>. These functions and data structures have been extensively tested and documented, applied to projects involving hundreds of thousands of soil profiles, and deeply integrated into widely used tools such as SoilWeb <https://casoilresource.lawr.ucdavis.edu/soilweb-apps>. Components of the AQP project (aqp, soilDB, sharpshootR, soilReports packages) serve an important role in routine data analysis within the USDA-NRCS Soil Science Division. The AQP suite of R packages offer a convenient platform for bridging the gap between pedometric theory and practice.
Maintained by Dylan Beaudette. Last updated 29 days ago.
digital-soil-mappingncss-technrcspedologypedometricssoilsoil-surveyusda
9.9 match 55 stars 11.77 score 1.2k scripts 2 dependentssportsdataverse
hoopR:Access Men's Basketball Play by Play Data
A utility to quickly obtain clean and tidy men's basketball play by play data. Provides functions to access live play by play and box score data from ESPN<https://www.espn.com> with shot locations when available. It is also a full NBA Stats API<https://www.nba.com/stats/> wrapper. It is also a scraping and aggregating interface for Ken Pomeroy's men's college basketball statistics website<https://kenpom.com>. It provides users with an active subscription the capability to scrape the website tables and analyze the data for themselves.
Maintained by Saiem Gilani. Last updated 1 years ago.
basketballcollege-basketballespnkenpomnbanba-analyticsnba-apinba-datanba-statisticsnba-statsnba-stats-apincaancaa-basketballncaa-bracketncaa-playersncaa-ratingsncaamsportsdataverse
16.8 match 91 stars 6.93 score 261 scriptsstan-dev
rstanarm:Bayesian Applied Regression Modeling via Stan
Estimates previously compiled regression models using the 'rstan' package, which provides the R interface to the Stan C++ library for Bayesian estimation. Users specify models via the customary R syntax with a formula and data.frame plus some additional arguments for priors.
Maintained by Ben Goodrich. Last updated 9 months ago.
bayesianbayesian-data-analysisbayesian-inferencebayesian-methodsbayesian-statisticsmultilevel-modelsrstanrstanarmstanstatistical-modelingcpp
7.4 match 393 stars 15.68 score 5.0k scripts 13 dependentsbioc
YAPSA:Yet Another Package for Signature Analysis
This package provides functions and routines for supervised analyses of mutational signatures (i.e., the signatures have to be known, cf. L. Alexandrov et al., Nature 2013 and L. Alexandrov et al., Bioaxiv 2018). In particular, the family of functions LCD (LCD = linear combination decomposition) can use optimal signature-specific cutoffs which takes care of different detectability of the different signatures. Moreover, the package provides different sets of mutational signatures, including the COSMIC and PCAWG SNV signatures and the PCAWG Indel signatures; the latter infering that with YAPSA, the concept of supervised analysis of mutational signatures is extended to Indel signatures. YAPSA also provides confidence intervals as computed by profile likelihoods and can perform signature analysis on a stratified mutational catalogue (SMC = stratify mutational catalogue) in order to analyze enrichment and depletion patterns for the signatures in different strata.
Maintained by Zuguang Gu. Last updated 5 months ago.
sequencingdnaseqsomaticmutationvisualizationclusteringgenomicvariationstatisticalmethodbiologicalquestion
17.9 match 6.41 score 57 scriptsbethatkinson
rpart:Recursive Partitioning and Regression Trees
Recursive partitioning for classification, regression and survival trees. An implementation of most of the functionality of the 1984 book by Breiman, Friedman, Olshen and Stone.
Maintained by Beth Atkinson. Last updated 8 months ago.
6.7 match 52 stars 16.59 score 18k scripts 1.6k dependentsbioc
Spectra:Spectra Infrastructure for Mass Spectrometry Data
The Spectra package defines an efficient infrastructure for storing and handling mass spectrometry spectra and functionality to subset, process, visualize and compare spectra data. It provides different implementations (backends) to store mass spectrometry data. These comprise backends tuned for fast data access and processing and backends for very large data sets ensuring a small memory footprint.
Maintained by RforMassSpectrometry Package Maintainer. Last updated 10 days ago.
infrastructureproteomicsmassspectrometrymetabolomicsbioconductorhacktoberfestmass-spectrometry
8.5 match 41 stars 13.01 score 254 scripts 35 dependentswinvector
vtreat:A Statistically Sound 'data.frame' Processor/Conditioner
A 'data.frame' processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. 'vtreat' prepares variables so that data has fewer exceptional cases, making it easier to safely use models in production. Common problems 'vtreat' defends against: 'Inf', 'NA', too many categorical levels, rare categorical levels, and new categorical levels (levels seen during application, but not during training). Reference: "'vtreat': a data.frame Processor for Predictive Modeling", Zumel, Mount, 2016, <DOI:10.5281/zenodo.1173313>.
Maintained by John Mount. Last updated 2 months ago.
categorical-variablesmachine-learning-algorithmsnested-modelsprepare-data
9.7 match 285 stars 11.19 score 328 scripts 1 dependentsdarwin-eu
omopgenerics:Methods and Classes for the OMOP Common Data Model
Provides definitions of core classes and methods used by analytic pipelines that query the OMOP (Observational Medical Outcomes Partnership) common data model.
Maintained by Martí Català. Last updated 10 days ago.
10.8 match 9.97 score 193 scripts 16 dependentstidyverse
stringr:Simple, Consistent Wrappers for Common String Operations
A consistent, simple and easy to use set of wrappers around the fantastic 'stringi' package. All function and argument names (and positions) are consistent, all functions deal with "NA"'s and zero length vectors in the same way, and the output from one function is easy to feed into the input of another.
Maintained by Hadley Wickham. Last updated 7 months ago.
4.8 match 622 stars 21.97 score 164k scripts 8.2k dependentssebkrantz
collapse:Advanced and Fast Data Transformation
A C/C++ based package for advanced data transformation and statistical computing in R that is extremely fast, class-agnostic, robust and programmer friendly. Core functionality includes a rich set of S3 generic grouped and weighted statistical functions for vectors, matrices and data frames, which provide efficient low-level vectorizations, OpenMP multithreading, and skip missing values by default. These are integrated with fast grouping and ordering algorithms (also callable from C), and efficient data manipulation functions. The package also provides a flexible and rigorous approach to time series and panel data in R. It further includes fast functions for common statistical procedures, detailed (grouped, weighted) summary statistics, powerful tools to work with nested data, fast data object conversions, functions for memory efficient R programming, and helpers to effectively deal with variable labels, attributes, and missing data. It is well integrated with base R classes, 'dplyr'/'tibble', 'data.table', 'sf', 'units', 'plm' (panel-series and data frames), and 'xts'/'zoo'.
Maintained by Sebastian Krantz. Last updated 6 days ago.
data-aggregationdata-analysisdata-manipulationdata-processingdata-sciencedata-transformationeconometricshigh-performancepanel-datascientific-computingstatisticstime-seriesweightedweightscppopenmp
6.2 match 672 stars 16.63 score 708 scripts 97 dependentsrpolars
polars:Lightning-Fast 'DataFrame' Library
Lightning-fast 'DataFrame' library written in 'Rust'. Convert R data to 'Polars' data and vice versa. Perform fast, lazy, larger-than-memory and optimized data queries. 'Polars' is interoperable with the package 'arrow', as both are based on the 'Apache Arrow' Columnar Format.
Maintained by Soren Welling. Last updated 4 days ago.
8.6 match 499 stars 12.01 score 1.0k scripts 2 dependentsgagolews
stringi:Fast and Portable Character String Processing Facilities
A collection of character string/text/natural language processing tools for pattern searching (e.g., with 'Java'-like regular expressions or the 'Unicode' collation algorithm), random string generation, case mapping, string transliteration, concatenation, sorting, padding, wrapping, Unicode normalisation, date-time formatting and parsing, and many more. They are fast, consistent, convenient, and - thanks to 'ICU' (International Components for Unicode) - portable across all locales and platforms. Documentation about 'stringi' is provided via its website at <https://stringi.gagolewski.com/> and the paper by Gagolewski (2022, <doi:10.18637/jss.v103.i02>).
Maintained by Marek Gagolewski. Last updated 1 months ago.
icuicu4cnatural-language-processingnlpregexregexpstring-manipulationstringistringrtexttext-processingtidy-dataunicodecpp
5.6 match 309 stars 18.31 score 10k scripts 8.6k dependentscran
compositions:Compositional Data Analysis
Provides functions for the consistent analysis of compositional data (e.g. portions of substances) and positive numbers (e.g. concentrations) in the way proposed by J. Aitchison and V. Pawlowsky-Glahn.
Maintained by K. Gerald van den Boogaart. Last updated 1 years ago.
15.3 match 1 stars 6.35 score 36 dependentsrspatial
terra:Spatial Data Analysis
Methods for spatial data analysis with vector (points, lines, polygons) and raster (grid) data. Methods for vector data include geometric operations such as intersect and buffer. Raster methods include local, focal, global, zonal and geometric operations. The predict and interpolate methods facilitate the use of regression type (interpolation, machine learning) models for spatial prediction, including with satellite remote sensing data. Processing of very large files is supported. See the manual and tutorials on <https://rspatial.org/> to get started. 'terra' replaces the 'raster' package ('terra' can do more, and it is faster and easier to use).
Maintained by Robert J. Hijmans. Last updated 2 hours ago.
geospatialrasterspatialvectoronetbbprojgdalgeoscpp
5.3 match 559 stars 17.64 score 17k scripts 851 dependentsddalthorp
GenEst:Generalized Mortality Estimator
Command-line and 'shiny' GUI implementation of the GenEst models for estimating bird and bat mortality at wind and solar power facilities, following Dalthorp, et al. (2018) <doi:10.3133/tm7A2>.
Maintained by Daniel Dalthorp. Last updated 2 years ago.
11.8 match 7 stars 7.81 score 55 scripts 2 dependentsdidiermurillof
FielDHub:A Shiny App for Design of Experiments in Life Sciences
A shiny design of experiments (DOE) app that aids in the creation of traditional, un-replicated, augmented and partially-replicated designs applied to agriculture, plant breeding, forestry, animal and biological sciences.
Maintained by Didier Murillo. Last updated 8 months ago.
agriculturalbreedingdesigndoeexperimentalplantbreedingshiny
10.1 match 48 stars 9.10 score 70 scripts 1 dependentsjohnjsl7
daewr:Design and Analysis of Experiments with R
Contains Data frames and functions used in the book "Design and Analysis of Experiments with R", Lawson(2015) ISBN-13:978-1-4398-6813-3.
Maintained by John Lawson. Last updated 2 years ago.
23.9 match 3 stars 3.83 score 217 scripts 3 dependentstidyverse
tidyr:Tidy Messy Data
Tools to help to create tidy data, where each column is a variable, each row is an observation, and each cell contains a single value. 'tidyr' contains tools for changing the shape (pivoting) and hierarchy (nesting and 'unnesting') of a dataset, turning deeply nested lists into rectangular data frames ('rectangling'), and extracting values out of string columns. It also includes tools for working with missing values (both implicit and explicit).
Maintained by Hadley Wickham. Last updated 13 days ago.
4.0 match 1.4k stars 22.88 score 168k scripts 5.5k dependentssportsdataverse
wehoop:Access Women's Basketball Play by Play Data
A utility for working with women's basketball data. A scraping and aggregating interface for the WNBA Stats API <https://stats.wnba.com/> and ESPN's <https://www.espn.com> women's college basketball and WNBA statistics. It provides users with the capability to access the game play-by-plays, box scores, standings and results to analyze the data for themselves.
Maintained by Saiem Gilani. Last updated 8 months ago.
college-basketballespnespn-statsncaancaa-basketballprofessional-basketball-datasportsdataversewnbawnba-playerswnba-statswomens-basketball
16.8 match 28 stars 5.36 score 54 scriptsyangjasp
optimall:Allocate Samples Among Strata
Functions for the design process of survey sampling, with specific tools for multi-wave and multi-phase designs. Perform optimum allocation using Neyman (1934) <doi:10.2307/2342192> or Wright (2012) <doi:10.1080/00031305.2012.733679> allocation, split strata based on quantiles or values of known variables, randomly select samples from strata, allocate sampling waves iteratively, and organize a complex survey design. Also includes a Shiny application for observing the effects of different strata splits.
Maintained by Jasper Yang. Last updated 21 days ago.
13.4 match 5 stars 6.59 score 39 scriptstomasfryda
h2o:R Interface for the 'H2O' Scalable Machine Learning Platform
R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).
Maintained by Tomas Fryda. Last updated 1 years ago.
10.8 match 3 stars 8.20 score 7.8k scripts 11 dependentsropensci
stplanr:Sustainable Transport Planning
Tools for transport planning with an emphasis on spatial transport data and non-motorized modes. The package was originally developed to support the 'Propensity to Cycle Tool', a publicly available strategic cycle network planning tool (Lovelace et al. 2017) <doi:10.5198/jtlu.2016.862>, but has since been extended to support public transport routing and accessibility analysis (Moreno-Monroy et al. 2017) <doi:10.1016/j.jtrangeo.2017.08.012> and routing with locally hosted routing engines such as 'OSRM' (Lowans et al. 2023) <doi:10.1016/j.enconman.2023.117337>. The main functions are for creating and manipulating geographic "desire lines" from origin-destination (OD) data (building on the 'od' package); calculating routes on the transport network locally and via interfaces to routing services such as <https://cyclestreets.net/> (Desjardins et al. 2021) <doi:10.1007/s11116-021-10197-1>; and calculating route segment attributes such as bearing. The package implements the 'travel flow aggregration' method described in Morgan and Lovelace (2020) <doi:10.1177/2399808320942779> and the 'OD jittering' method described in Lovelace et al. (2022) <doi:10.32866/001c.33873>. Further information on the package's aim and scope can be found in the vignettes and in a paper in the R Journal (Lovelace and Ellison 2018) <doi:10.32614/RJ-2018-053>, and in a paper outlining the landscape of open source software for geographic methods in transport planning (Lovelace, 2021) <doi:10.1007/s10109-020-00342-2>.
Maintained by Robin Lovelace. Last updated 7 months ago.
cyclecyclingdesire-linesorigin-destinationpeer-reviewedpubic-transportroute-networkroutesroutingspatialtransporttransport-planningtransportationwalking
7.1 match 427 stars 12.31 score 684 scripts 3 dependentsbioc
DelayedArray:A unified framework for working transparently with on-disk and in-memory array-like datasets
Wrapping an array-like object (typically an on-disk object) in a DelayedArray object allows one to perform common array operations on it without loading the object in memory. In order to reduce memory usage and optimize performance, operations on the object are either delayed or executed using a block processing mechanism. Note that this also works on in-memory array-like objects like DataFrame objects (typically with Rle columns), Matrix objects, ordinary arrays and, data frames.
Maintained by Hervé Pagès. Last updated 1 months ago.
infrastructuredatarepresentationannotationgenomeannotationbioconductor-packagecore-packageu24ca289073
5.6 match 27 stars 15.59 score 538 scripts 1.2k dependentsemmanuelparadis
ape:Analyses of Phylogenetics and Evolution
Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel's test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ*, BIONJ*, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.
Maintained by Emmanuel Paradis. Last updated 21 hours ago.
5.0 match 64 stars 17.22 score 13k scripts 599 dependentsropensci
osmdata:Import 'OpenStreetMap' Data as Simple Features or Spatial Objects
Download and import of 'OpenStreetMap' ('OSM') data as 'sf' or 'sp' objects. 'OSM' data are extracted from the 'Overpass' web server (<https://overpass-api.de/>) and processed with very fast 'C++' routines for return to 'R'.
Maintained by Mark Padgham. Last updated 1 months ago.
open0street0mapopenstreetmapoverpass0apiosmcpposm-dataoverpass-apipeer-reviewedcpp
5.9 match 322 stars 14.53 score 2.8k scripts 14 dependentsyouyifong
kyotil:Utility Functions for Statistical Analysis Report Generation and Monte Carlo Studies
Helper functions for creating formatted summary of regression models, writing publication-ready tables to latex files, and running Monte Carlo experiments.
Maintained by Youyi Fong. Last updated 8 days ago.
10.8 match 7.87 score 236 scripts 7 dependentshojsgaard
doBy:Groupwise Statistics, LSmeans, Linear Estimates, Utilities
Utility package containing: 1) Facilities for working with grouped data: 'do' something to data stratified 'by' some variables. 2) LSmeans (least-squares means), general linear estimates. 3) Restrict functions to a smaller domain. 4) Miscellaneous other utilities.
Maintained by Søren Højsgaard. Last updated 5 days ago.
5.6 match 1 stars 14.94 score 3.2k scripts 939 dependentseahouseman
RPMM:Recursively Partitioned Mixture Model
Recursively Partitioned Mixture Model for Beta and Gaussian Mixtures. This is a model-based clustering algorithm that returns a hierarchy of classes, similar to hierarchical clustering, but also similar to finite mixture models.
Maintained by E. Andres Houseman. Last updated 8 years ago.
19.3 match 4.34 score 78 scripts 7 dependentscran
agricolae:Statistical Procedures for Agricultural Research
Original idea was presented in the thesis "A statistical analysis tool for agricultural research" to obtain the degree of Master on science, National Engineering University (UNI), Lima-Peru. Some experimental data for the examples come from the CIP and others research. Agricolae offers extensive functionality on experimental design especially for agricultural and plant breeding experiments, which can also be useful for other purposes. It supports planning of lattice, Alpha, Cyclic, Complete Block, Latin Square, Graeco-Latin Squares, augmented block, factorial, split and strip plot designs. There are also various analysis facilities for experimental data, e.g. treatment comparison procedures and several non-parametric tests comparison, biodiversity indexes and consensus cluster.
Maintained by Felipe de Mendiburu. Last updated 1 years ago.
11.9 match 7 stars 7.01 score 15 dependentsbioc
eisaR:Exon-Intron Split Analysis (EISA) in R
Exon-intron split analysis (EISA) uses ordinary RNA-seq data to measure changes in mature RNA and pre-mRNA reads across different experimental conditions to quantify transcriptional and post-transcriptional regulation of gene expression. For details see Gaidatzis et al., Nat Biotechnol 2015. doi: 10.1038/nbt.3269. eisaR implements the major steps of EISA in R.
Maintained by Michael Stadler. Last updated 2 months ago.
transcriptiongeneexpressiongeneregulationfunctionalgenomicstranscriptomicsregressionrnaseq
11.0 match 16 stars 7.48 score 63 scriptsopengeos
whitebox:'WhiteboxTools' R Frontend
An R frontend for the 'WhiteboxTools' library, which is an advanced geospatial data analysis platform developed by Prof. John Lindsay at the University of Guelph's Geomorphometry and Hydrogeomatics Research Group. 'WhiteboxTools' can be used to perform common geographical information systems (GIS) analysis operations, such as cost-distance analysis, distance buffering, and raster reclassification. Remote sensing and image processing tasks include image enhancement (e.g. panchromatic sharpening, contrast adjustments), image mosaicing, numerous filtering operations, simple classification (k-means), and common image transformations. 'WhiteboxTools' also contains advanced tooling for spatial hydrological analysis (e.g. flow-accumulation, watershed delineation, stream network analysis, sink removal), terrain analysis (e.g. common terrain indices such as slope, curvatures, wetness index, hillshading; hypsometric analysis; multi-scale topographic position analysis), and LiDAR data processing. Suggested citation: Lindsay (2016) <doi:10.1016/j.cageo.2016.07.003>.
Maintained by Andrew Brown. Last updated 5 months ago.
geomorphometrygeoprocessinggeospatialgishydrologyremote-sensingrstudio
8.5 match 173 stars 9.65 score 203 scripts 2 dependentsbioc
GenomicRanges:Representation and manipulation of genomic intervals
The ability to efficiently represent and manipulate genomic annotations and alignments is playing a central role when it comes to analyzing high-throughput sequencing data (a.k.a. NGS data). The GenomicRanges package defines general purpose containers for storing and manipulating genomic intervals and variables defined along a genome. More specialized containers for representing and manipulating short alignments against a reference genome, or a matrix-like summarization of an experiment, are defined in the GenomicAlignments and SummarizedExperiment packages, respectively. Both packages build on top of the GenomicRanges infrastructure.
Maintained by Hervé Pagès. Last updated 4 months ago.
geneticsinfrastructuredatarepresentationsequencingannotationgenomeannotationcoveragebioconductor-packagecore-package
4.6 match 44 stars 17.75 score 13k scripts 1.3k dependentscsids
csmaps:Preformatted Maps of Norway that Don't Need Geolibraries
Provides datasets containing preformatted maps of Norway at the county, municipality, and ward (Oslo only) level for redistricting in 2024, 2020, 2018, and 2017. Multiple layouts are provided (normal, split, and with an insert for Oslo), allowing the user to rapidly create choropleth maps of Norway without any geolibraries.
Maintained by Richard Aubrey White. Last updated 6 months ago.
15.7 match 3 stars 5.08 score 20 scriptsr-hyperspec
hyperSpec:Work with Hyperspectral Data, i.e. Spectra + Meta Information (Spatial, Time, Concentration, ...)
Comfortable ways to work with hyperspectral data sets, i.e. spatially or time-resolved spectra, or spectra with any other kind of information associated with each of the spectra. The spectra can be data as obtained in XRF, UV/VIS, Fluorescence, AES, NIR, IR, Raman, NMR, MS, etc. More generally, any data that is recorded over a discretized variable, e.g. absorbance = f(wavelength), stored as a vector of absorbance values for discrete wavelengths is suitable.
Maintained by Claudia Beleites. Last updated 10 months ago.
data-wranglinghyperspectralimaginginfrarednmrramanspectroscopyuv-visxrf
9.8 match 16 stars 8.13 score 233 scripts 2 dependentslrberge
stringmagic:Character String Operations and Interpolation, Magic Edition
Performs complex string operations compactly and efficiently. Supports string interpolation jointly with over 50 string operations. Also enhances regular string functions (like grep() and co). See an introduction at <https://lrberge.github.io/stringmagic/>.
Maintained by Laurent R Berge. Last updated 7 months ago.
7.3 match 15 stars 10.56 score 37 scripts 33 dependentshighlanderlab
SIMplyBee:'AlphaSimR' Extension for Simulating Honeybee Populations and Breeding Programmes
An extension of the 'AlphaSimR' package (<https://cran.r-project.org/package=AlphaSimR>) for stochastic simulations of honeybee populations and breeding programmes. 'SIMplyBee' enables simulation of individual bees that form a colony, which includes a queen, fathers (drones the queen mated with), virgin queens, workers, and drones. Multiple colony can be merged into a population of colonies, such as an apiary or a whole country of colonies. Functions enable operations on castes, colony, or colonies, to ease 'R' scripting of whole populations. All 'AlphaSimR' functionality with respect to genomes and genetic and phenotype values is available and further extended for honeybees, including haplo-diploidy, complementary sex determiner locus, colony events (swarming, supersedure, etc.), and colony phenotype values.
Maintained by Jana Obšteter. Last updated 6 months ago.
12.2 match 2 stars 6.24 score 18 scriptsrmi-pacta
pacta.multi.loanbook:Run 'PACTA' on Multiple Loan Books Easily
Run Paris Agreement Capital Transition Assessment ('PACTA') analyses on multiple loan books in a structured way. Provides access to standard 'PACTA' metrics and additional 'PACTA'-related metrics for multiple loan books. Results take the form of 'csv' files and plots and are exported to user-specified project paths.
Maintained by Jacob Kastl. Last updated 3 days ago.
climate-changepactapactaversesustainable-finance
11.7 match 6.48 score 4 scriptsbioc
celda:CEllular Latent Dirichlet Allocation
Celda is a suite of Bayesian hierarchical models for clustering single-cell RNA-sequencing (scRNA-seq) data. It is able to perform "bi-clustering" and simultaneously cluster genes into gene modules and cells into cell subpopulations. It also contains DecontX, a novel Bayesian method to computationally estimate and remove RNA contamination in individual cells without empty droplet information. A variety of scRNA-seq data visualization functions is also included.
Maintained by Joshua Campbell. Last updated 28 days ago.
singlecellgeneexpressionclusteringsequencingbayesianimmunooncologydataimportcppopenmp
7.2 match 147 stars 10.47 score 256 scripts 2 dependentsmrcieu
TwoSampleMR:Two Sample MR Functions and Interface to MRC Integrative Epidemiology Unit OpenGWAS Database
A package for performing Mendelian randomization using GWAS summary data. It uses the IEU OpenGWAS database <https://gwas.mrcieu.ac.uk/> to automatically obtain data, and a wide range of methods to run the analysis.
Maintained by Gibran Hemani. Last updated 11 days ago.
6.7 match 467 stars 11.23 score 1.7k scripts 1 dependentsbioc
ReactomeGSA:Client for the Reactome Analysis Service for comparative multi-omics gene set analysis
The ReactomeGSA packages uses Reactome's online analysis service to perform a multi-omics gene set analysis. The main advantage of this package is, that the retrieved results can be visualized using REACTOME's powerful webapplication. Since Reactome's analysis service also uses R to perfrom the actual gene set analysis you will get similar results when using the same packages (such as limma and edgeR) locally. Therefore, if you only require a gene set analysis, different packages are more suited.
Maintained by Johannes Griss. Last updated 4 months ago.
genesetenrichmentproteomicstranscriptomicssystemsbiologygeneexpressionreactome
9.1 match 23 stars 8.05 score 67 scriptsevolecolgroup
tidysdm:Species Distribution Models with Tidymodels
Fit species distribution models (SDMs) using the 'tidymodels' framework, which provides a standardised interface to define models and process their outputs. 'tidysdm' expands 'tidymodels' by providing methods for spatial objects, models and metrics specific to SDMs, as well as a number of specialised functions to process occurrences for contemporary and palaeo datasets. The full functionalities of the package are described in Leonardi et al. (2023) <doi:10.1101/2023.07.24.550358>.
Maintained by Andrea Manica. Last updated 10 days ago.
species-distribution-modellingtidymodels
8.3 match 31 stars 8.82 score 51 scriptsstan-dev
loo:Efficient Leave-One-Out Cross-Validation and WAIC for Bayesian Models
Efficient approximate leave-one-out cross-validation (LOO) for Bayesian models fit using Markov chain Monte Carlo, as described in Vehtari, Gelman, and Gabry (2017) <doi:10.1007/s11222-016-9696-4>. The approximation uses Pareto smoothed importance sampling (PSIS), a new procedure for regularizing importance weights. As a byproduct of the calculations, we also obtain approximate standard errors for estimated predictive errors and for the comparison of predictive errors between models. The package also provides methods for using stacking and other model weighting techniques to average Bayesian predictive distributions.
Maintained by Jonah Gabry. Last updated 3 days ago.
bayesbayesianbayesian-data-analysisbayesian-inferencebayesian-methodsbayesian-statisticscross-validationinformation-criterionmodel-comparisonstan
4.2 match 152 stars 17.30 score 2.6k scripts 297 dependentsbioc
IRanges:Foundation of integer range manipulation in Bioconductor
Provides efficient low-level and highly reusable S4 classes for storing, manipulating and aggregating over annotated ranges of integers. Implements an algebra of range operations, including efficient algorithms for finding overlaps and nearest neighbors. Defines efficient list-like classes for storing, transforming and aggregating large grouped data, i.e., collections of atomic vectors and DataFrames.
Maintained by Hervé Pagès. Last updated 1 months ago.
infrastructuredatarepresentationbioconductor-packagecore-package
4.8 match 22 stars 15.09 score 2.1k scripts 1.8k dependentsplangfelder
WGCNA:Weighted Correlation Network Analysis
Functions necessary to perform Weighted Correlation Network Analysis on high-dimensional data as originally described in Horvath and Zhang (2005) <doi:10.2202/1544-6115.1128> and Langfelder and Horvath (2008) <doi:10.1186/1471-2105-9-559>. Includes functions for rudimentary data cleaning, construction of correlation networks, module identification, summarization, and relating of variables and modules to sample traits. Also includes a number of utility functions for data manipulation and visualization.
Maintained by Peter Langfelder. Last updated 6 months ago.
7.5 match 54 stars 9.65 score 5.3k scripts 32 dependentschoi-phd
TestDesign:Optimal Test Design Approach to Fixed and Adaptive Test Construction
Uses the optimal test design approach by Birnbaum (1968, ISBN:9781593119348) and van der Linden (2018) <doi:10.1201/9781315117430> to construct fixed, adaptive, and parallel tests. Supports the following mixed-integer programming (MIP) solver packages: 'Rsymphony', 'highs', 'gurobi', 'lpSolve', and 'Rglpk'. The 'gurobi' package is not available from CRAN; see <https://www.gurobi.com/downloads/>.
Maintained by Seung W. Choi. Last updated 6 months ago.
9.8 match 3 stars 7.34 score 37 scripts 2 dependentsbioc
MAST:Model-based Analysis of Single Cell Transcriptomics
Methods and models for handling zero-inflated single cell assay data.
Maintained by Andrew McDavid. Last updated 5 months ago.
geneexpressiondifferentialexpressiongenesetenrichmentrnaseqtranscriptomicssinglecell
5.6 match 230 stars 12.75 score 1.8k scripts 5 dependentsbioc
flowCore:flowCore: Basic structures for flow cytometry data
Provides S4 data structures and basic functions to deal with flow cytometry data.
Maintained by Mike Jiang. Last updated 5 months ago.
immunooncologyinfrastructureflowcytometrycellbasedassayscpp
6.8 match 10.34 score 1.7k scripts 59 dependentscran
SPlit:Split a Dataset for Training and Testing
Procedure to optimally split a dataset for training and testing. 'SPlit' is based on the method of support points, which is independent of modeling methods. Please see Joseph and Vakayil (2021) <doi:10.1080/00401706.2021.1921037> for details. This work is supported by U.S. National Science Foundation grant DMREF-1921873.
Maintained by Akhil Vakayil. Last updated 3 years ago.
69.4 match 1 stars 1.00 scorejulianfaraway
faraway:Datasets and Functions for Books by Julian Faraway
Books are "Linear Models with R" published 1st Ed. August 2004, 2nd Ed. July 2014, 3rd Ed. February 2025 by CRC press, ISBN 9781439887332, and "Extending the Linear Model with R" published by CRC press in 1st Ed. December 2005 and 2nd Ed. March 2016, ISBN 9781584884248 and "Practical Regression and ANOVA in R" contributed documentation on CRAN (now very dated).
Maintained by Julian Faraway. Last updated 1 months ago.
7.3 match 29 stars 9.43 score 1.7k scripts 1 dependentsrorynolan
strex:Extra String Manipulation Functions
There are some things that I wish were easier with the 'stringr' or 'stringi' packages. The foremost of these is the extraction of numbers from strings. 'stringr' and 'stringi' make you figure out the regular expression for yourself; 'strex' takes care of this for you. There are many other handy functionalities in 'strex'. Contributions to this package are encouraged; it is intended as a miscellany of string manipulation functions that cannot be found in 'stringi' or 'stringr'.
Maintained by Rory Nolan. Last updated 6 months ago.
6.4 match 41 stars 10.59 score 1.2k scripts 18 dependentspakillo
lump.split.pool.ENM:Comparing different ENM approaches
Facilitate running simulations to compare different approaches for ecological niche modelling, namely splitting, lumping, and fit models with partial pooling. See https://doi.org/10.1016/j.tree.2018.10.012 for more information.
Maintained by Francisco Rodriguez-Sanchez. Last updated 5 years ago.
33.8 match 2 stars 2.00 scorearbuzovv
rusquant:Quantitative Trading Framework
Collection of functions to retrieve financial data from various sources, including brokerage and exchange platforms, financial websites, and data providers. Includes functions to retrieve account information, portfolio information, and place/cancel orders from different brokers. Additionally, allows users to download historical data such as earnings, dividends, stock splits.
Maintained by Vyacheslav Arbuzov. Last updated 10 months ago.
cryptocurrencydata-sciencedatasciencedatascrapingdatasetdatasourcedividendsearningsfinamfinanceinvestinginvesting-apiipoquantquantitative-financesplitsstockstrading
12.2 match 46 stars 5.51 score 47 scriptsbioc
BASiCS:Bayesian Analysis of Single-Cell Sequencing data
Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model to perform statistical analyses of single-cell RNA sequencing datasets in the context of supervised experiments (where the groups of cells of interest are known a priori, e.g. experimental conditions or cell types). BASiCS performs built-in data normalisation (global scaling) and technical noise quantification (based on spike-in genes). BASiCS provides an intuitive detection criterion for highly (or lowly) variable genes within a single group of cells. Additionally, BASiCS can compare gene expression patterns between two or more pre-specified groups of cells. Unlike traditional differential expression tools, BASiCS quantifies changes in expression that lie beyond comparisons of means, also allowing the study of changes in cell-to-cell heterogeneity. The latter can be quantified via a biological over-dispersion parameter that measures the excess of variability that is observed with respect to Poisson sampling noise, after normalisation and technical noise removal. Due to the strong mean/over-dispersion confounding that is typically observed for scRNA-seq datasets, BASiCS also tests for changes in residual over-dispersion, defined by residual values with respect to a global mean/over-dispersion trend.
Maintained by Catalina Vallejos. Last updated 5 months ago.
immunooncologynormalizationsequencingrnaseqsoftwaregeneexpressiontranscriptomicssinglecelldifferentialexpressionbayesiancellbiologybioconductor-packagegene-expressionrcpprcpparmadilloscrna-seqsingle-cellopenblascppopenmp
6.5 match 83 stars 10.26 score 368 scripts 1 dependentsmlr-org
mlr3pipelines:Preprocessing Operators and Pipelines for 'mlr3'
Dataflow programming toolkit that enriches 'mlr3' with a diverse set of pipelining operators ('PipeOps') that can be composed into graphs. Operations exist for data preprocessing, model fitting, and ensemble learning. Graphs can themselves be treated as 'mlr3' 'Learners' and can therefore be resampled, benchmarked, and tuned.
Maintained by Martin Binder. Last updated 9 days ago.
baggingdata-sciencedataflow-programmingensemble-learningmachine-learningmlr3pipelinespreprocessingstacking
5.3 match 141 stars 12.36 score 448 scripts 7 dependentsjcfaria
TukeyC:Conventional Tukey Test
Perform the conventional Tukey test from formula, lm, aov, aovlist and lmer objects.
Maintained by Ivan Bezerra Allaman. Last updated 2 years ago.
13.7 match 4 stars 4.80 score 45 scriptsyihui
xfun:Supporting Functions for Packages Maintained by 'Yihui Xie'
Miscellaneous functions commonly used in other packages maintained by 'Yihui Xie'.
Maintained by Yihui Xie. Last updated 4 days ago.
3.6 match 145 stars 18.18 score 916 scripts 4.4k dependentscomputationalstylistics
stylo:Stylometric Multivariate Analyses
Supervised and unsupervised multivariate methods, supplemented by GUI and some visualizations, to perform various analyses in the field of computational stylistics, authorship attribution, etc. For further reference, see Eder et al. (2016), <https://journal.r-project.org/archive/2016/RJ-2016-007/index.html>. You are also encouraged to visit the Computational Stylistics Group's website <https://computationalstylistics.github.io/>, where a reasonable amount of information about the package and related projects are provided.
Maintained by Maciej Eder. Last updated 2 months ago.
7.6 match 187 stars 8.58 score 462 scriptss-u
iotools:I/O Tools for Streaming
Basic I/O tools for streaming and data parsing.
Maintained by Simon Urbanek. Last updated 1 years ago.
8.8 match 48 stars 7.35 score 60 scripts 10 dependentsfinnishcancerregistry
popEpi:Functions for Epidemiological Analysis using Population Data
Enables computation of epidemiological statistics, including those where counts or mortality rates of the reference population are used. Currently supported: excess hazard models (Dickman, Sloggett, Hills, and Hakulinen (2012) <doi:10.1002/sim.1597>), rates, mean survival times, relative/net survival (in particular the Ederer II (Ederer and Heise (1959)) and Pohar Perme (Pohar Perme, Stare, and Esteve (2012) <doi:10.1111/j.1541-0420.2011.01640.x>) estimators), and standardized incidence and mortality ratios, all of which can be easily adjusted for by covariates such as age. Fast splitting and aggregation of 'Lexis' objects (from package 'Epi') and other computations achieved using 'data.table'.
Maintained by Joonas Miettinen. Last updated 2 months ago.
adjust-estimatesage-adjustingdirect-adjustingepidemiologyindirect-adjustingsurvival
8.0 match 8 stars 8.05 score 117 scripts 1 dependentsgdemin
expss:Tables, Labels and Some Useful Functions from Spreadsheets and 'SPSS' Statistics
Package computes and displays tables with support for 'SPSS'-style labels, multiple and nested banners, weights, multiple-response variables and significance testing. There are facilities for nice output of tables in 'knitr', 'Shiny', '*.xlsx' files, R and 'Jupyter' notebooks. Methods for labelled variables add value labels support to base R functions and to some functions from other packages. Additionally, the package brings popular data transformation functions from 'SPSS' Statistics and 'Excel': 'RECODE', 'COUNT', 'COUNTIF', 'VLOOKUP' and etc. These functions are very useful for data processing in marketing research surveys. Package intended to help people to move data processing from 'Excel' and 'SPSS' to R.
Maintained by Gregory Demin. Last updated 11 months ago.
excellabelslabels-supportmsexcelpivot-tablesrecodespssspss-statisticstablesvariable-labelsvlookup
5.8 match 84 stars 11.00 score 1.8k scripts 4 dependentsinlabru-org
fmesher:Triangle Meshes and Related Geometry Tools
Generate planar and spherical triangle meshes, compute finite element calculations for 1- and 2-dimensional flat and curved manifolds with associated basis function spaces, methods for lines and polygons, and transparent handling of coordinate reference systems and coordinate transformation, including 'sf' and 'sp' geometries. The core 'fmesher' library code was originally part of the 'INLA' package, and implements parts of "Triangulations and Applications" by Hjelle and Daehlen (2006) <doi:10.1007/3-540-33261-8>.
Maintained by Finn Lindgren. Last updated 3 days ago.
5.6 match 16 stars 11.18 score 261 scripts 26 dependentsmatthiaspucher
staRdom:PARAFAC Analysis of EEMs from DOM
'This is a user-friendly way to run a parallel factor (PARAFAC) analysis (Harshman, 1971) <doi:10.1121/1.1977523> on excitation emission matrix (EEM) data from dissolved organic matter (DOM) samples (Murphy et al., 2013) <doi:10.1039/c3ay41160e>. The analysis includes profound methods for model validation. Some additional functions allow the calculation of absorbance slope parameters and create beautiful plots.'
Maintained by Matthias Pucher. Last updated 4 months ago.
10.4 match 21 stars 6.03 score 86 scriptsivanalaman
ScottKnott:The ScottKnott Clustering Algorithm
Perform the balanced (Scott and Knott, 1974) and unbalanced <doi:10.1590/1984-70332017v17n1a1> Scott & Knott algorithm.
Maintained by Ivan Bezerra Allaman. Last updated 2 years ago.
13.7 match 1 stars 4.58 score 42 scripts 1 dependentsjensharbers
agricolaeplotr:Visualization of Design of Experiments from the 'agricolae' Package
Visualization of Design of Experiments from the 'agricolae' package with 'ggplot2' framework The user provides an experiment design from the 'agricolae' package, calls the corresponding function and will receive a visualization with 'ggplot2' based functions that are specific for each design. As there are many different designs, each design is tested on its type. The output can be modified with standard 'ggplot2' commands or with other packages with 'ggplot2' function extensions.
Maintained by Jens Harbers. Last updated 2 months ago.
9.8 match 8 stars 6.27 score 78 scriptsosqp
osqp:Quadratic Programming Solver using the 'OSQP' Library
Provides bindings to the 'OSQP' solver. The 'OSQP' solver is a numerical optimization package or solving convex quadratic programs written in 'C' and based on the alternating direction method of multipliers. See <doi:10.48550/arXiv.1711.08013> for details.
Maintained by Balasubramanian Narasimhan. Last updated 9 months ago.
admmconvex-optimizationlassomachine-learningoperator-splittingquadratic-programmingcpp
7.5 match 12 stars 8.21 score 47 scripts 65 dependentsedzer
sp:Classes and Methods for Spatial Data
Classes and methods for spatial data; the classes document where the spatial location information resides, for 2D or 3D data. Utility functions are provided, e.g. for plotting data as maps, spatial selection, as well as methods for retrieving coordinates, for subsetting, print, summary, etc. From this version, 'rgdal', 'maptools', and 'rgeos' are no longer used at all, see <https://r-spatial.org/r/2023/05/15/evolution4.html> for details.
Maintained by Edzer Pebesma. Last updated 2 months ago.
3.3 match 127 stars 18.63 score 35k scripts 1.3k dependentsberndbischl
BBmisc:Miscellaneous Helper Functions for B. Bischl
Miscellaneous helper functions for and from B. Bischl and some other guys, mainly for package development.
Maintained by Bernd Bischl. Last updated 2 years ago.
5.8 match 20 stars 10.59 score 980 scripts 69 dependentsbioc
memes:motif matching, comparison, and de novo discovery using the MEME Suite
A seamless interface to the MEME Suite family of tools for motif analysis. 'memes' provides data aware utilities for using GRanges objects as entrypoints to motif analysis, data structures for examining & editing motif lists, and novel data visualizations. 'memes' functions and data structures are amenable to both base R and tidyverse workflows.
Maintained by Spencer Nystrom. Last updated 5 months ago.
dataimportfunctionalgenomicsgeneregulationmotifannotationmotifdiscoverysequencematchingsoftware
7.0 match 49 stars 8.68 score 117 scripts 1 dependentssatijalab
Seurat:Tools for Single Cell Genomics
A toolkit for quality control, analysis, and exploration of single cell RNA sequencing data. 'Seurat' aims to enable users to identify and interpret sources of heterogeneity from single cell transcriptomic measurements, and to integrate diverse types of single cell data. See Satija R, Farrell J, Gennert D, et al (2015) <doi:10.1038/nbt.3192>, Macosko E, Basu A, Satija R, et al (2015) <doi:10.1016/j.cell.2015.05.002>, Stuart T, Butler A, et al (2019) <doi:10.1016/j.cell.2019.05.031>, and Hao, Hao, et al (2020) <doi:10.1101/2020.10.12.335331> for more details.
Maintained by Paul Hoffman. Last updated 1 years ago.
human-cell-atlassingle-cell-genomicssingle-cell-rna-seqcpp
3.5 match 2.4k stars 16.86 score 50k scripts 73 dependentscran
timeSeries:Financial Time Series Objects (Rmetrics)
'S4' classes and various tools for financial time series: Basic functions such as scaling and sorting, subsetting, mathematical operations and statistical functions.
Maintained by Georgi N. Boshnakov. Last updated 6 months ago.
6.0 match 2 stars 9.90 score 1.3k scripts 145 dependentsbioc
LOLA:Locus overlap analysis for enrichment of genomic ranges
Provides functions for testing overlap of sets of genomic regions with public and custom region set (genomic ranges) databases. This makes it possible to do automated enrichment analysis for genomic region sets, thus facilitating interpretation of functional genomics and epigenomics data.
Maintained by Nathan Sheffield. Last updated 5 months ago.
genesetenrichmentgeneregulationgenomeannotationsystemsbiologyfunctionalgenomicschipseqmethylseqsequencing
6.4 match 76 stars 9.34 score 160 scriptsthomasp85
ggforce:Accelerating 'ggplot2'
The aim of 'ggplot2' is to aid in visual data investigations. This focus has led to a lack of facilities for composing specialised plots. 'ggforce' aims to be a collection of mainly new stats and geoms that fills this gap. All additional functionality is aimed to come through the official extension system so using 'ggforce' should be a stable experience.
Maintained by Thomas Lin Pedersen. Last updated 1 years ago.
ggplot-extensionggplot2visualizationcpp
3.8 match 920 stars 15.83 score 9.3k scripts 293 dependentsatkinsjeff
forestr:Ecosystem and Canopy Structural Complexity Metrics from LiDAR
Provides a toolkit for calculating forest and canopy structural complexity metrics from terrestrial LiDAR (light detection and ranging). References: Atkins et al. 2018 <doi:10.1111/2041-210X.13061>; Hardiman et al. 2013 <doi:10.3390/f4030537>; Parker et al. 2004 <doi:10.1111/j.0021-8901.2004.00925.x>.
Maintained by Jeff Atkins. Last updated 1 years ago.
12.0 match 29 stars 4.95 score 31 scriptsjoshuaulrich
quantmod:Quantitative Financial Modelling Framework
Specify, build, trade, and analyse quantitative financial trading strategies.
Maintained by Joshua M. Ulrich. Last updated 15 days ago.
algorithmic-tradingchartingdata-importfinancetime-series
3.6 match 839 stars 16.17 score 8.1k scripts 343 dependentsinsightsengineering
teal.modules.clinical:'teal' Modules for Standard Clinical Outputs
Provides user-friendly tools for creating and customizing clinical trial reports. By leveraging the 'teal' framework, this package provides 'teal' modules to easily create an interactive panel that allows for seamless adjustments to data presentation, thereby streamlining the creation of detailed and accurate reports.
Maintained by Dawid Kaledkowski. Last updated 17 days ago.
clinical-trialsmodulesnestoutputsshiny
5.5 match 34 stars 10.25 score 149 scriptshugaped
MBNMAdose:Dose-Response MBNMA Models
Fits Bayesian dose-response model-based network meta-analysis (MBNMA) that incorporate multiple doses within an agent by modelling different dose-response functions, as described by Mawdsley et al. (2016) <doi:10.1002/psp4.12091>. By modelling dose-response relationships this can connect networks of evidence that might otherwise be disconnected, and can improve precision on treatment estimates. Several common dose-response functions are provided; others may be added by the user. Various characteristics and assumptions can be flexibly added to the models, such as shared class effects. The consistency of direct and indirect evidence in the network can be assessed using unrelated mean effects models and/or by node-splitting at the treatment level.
Maintained by Hugo Pedder. Last updated 1 months ago.
8.5 match 10 stars 6.60 scoreropensci
qpdf:Split, Combine and Compress PDF Files
Content-preserving transformations transformations of PDF files such as split, combine, and compress. This package interfaces directly to the 'qpdf' C++ library <https://qpdf.sourceforge.io/> and does not require any command line utilities. Note that 'qpdf' does not read actual content from PDF files: to extract text and data you need the 'pdftools' package.
Maintained by Jeroen Ooms. Last updated 5 months ago.
5.3 match 57 stars 10.51 score 203 scripts 75 dependentsmurrayefford
secr:Spatially Explicit Capture-Recapture
Functions to estimate the density and size of a spatially distributed animal population sampled with an array of passive detectors, such as traps, or by searching polygons or transects. Models incorporating distance-dependent detection are fitted by maximizing the likelihood. Tools are included for data manipulation and model selection.
Maintained by Murray Efford. Last updated 3 hours ago.
5.5 match 3 stars 10.16 score 410 scripts 5 dependentsprivefl
bigparallelr:Easy Parallel Tools
Utility functions for easy parallelism in R. Include some reexports from other packages, utility functions for splitting and parallelizing over blocks, and choosing and setting the number of cores used.
Maintained by Florian Privé. Last updated 5 months ago.
8.5 match 4 stars 6.44 score 76 scripts 19 dependentsmlverse
torch:Tensors and Neural Networks with 'GPU' Acceleration
Provides functionality to define and train neural networks similar to 'PyTorch' by Paszke et al (2019) <doi:10.48550/arXiv.1912.01703> but written entirely in R using the 'libtorch' library. Also supports low-level tensor operations and 'GPU' acceleration.
Maintained by Daniel Falbel. Last updated 6 days ago.
3.3 match 520 stars 16.52 score 1.4k scripts 38 dependentskogalur
randomForestSRC:Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)
Fast OpenMP parallel computing of Breiman's random forests for univariate, multivariate, unsupervised, survival, competing risks, class imbalanced classification and quantile regression. New Mahalanobis splitting for correlated outcomes. Extreme random forests and randomized splitting. Suite of imputation methods for missing data. Fast random forests using subsampling. Confidence regions and standard errors for variable importance. New improved holdout importance. Case-specific importance. Minimal depth variable importance. Visualize trees on your Safari or Google Chrome browser. Anonymous random forests for data privacy.
Maintained by Udaya B. Kogalur. Last updated 2 months ago.
6.9 match 10 stars 7.90 score 1.2k scripts 12 dependentstrackage
trip:Tracking Data
Access and manipulate spatial tracking data, with straightforward coercion from and to other formats. Filter for speed and create time spent maps from tracking data. There are coercion methods to convert between 'trip' and 'ltraj' from 'adehabitatLT', and between 'trip' and 'psp' and 'ppp' from 'spatstat'. Trip objects can be created from raw or grouped data frames, and from types in the 'sp', sf', 'amt', 'trackeR', 'mousetrap', and other packages, Sumner, MD (2011) <https://figshare.utas.edu.au/articles/thesis/The_tag_location_problem/23209538>.
Maintained by Michael D. Sumner. Last updated 8 months ago.
7.0 match 13 stars 7.72 score 137 scripts 1 dependentsquanteda
quanteda:Quantitative Analysis of Textual Data
A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.
Maintained by Kenneth Benoit. Last updated 2 months ago.
corpusnatural-language-processingquantedatext-analyticsonetbbcpp
3.2 match 851 stars 16.68 score 5.4k scripts 51 dependentsarcadeantics
this.path:Get Executing Script's Path
Determine the path of the executing script. Compatible with several popular GUIs: 'Rgui', 'RStudio', 'Positron', 'VSCode', 'Jupyter', 'Emacs', and 'Rscript' (shell). Compatible with several functions and packages: 'source()', 'sys.source()', 'debugSource()' in 'RStudio', 'compiler::loadcmp()', 'utils::Sweave()', 'box::use()', 'knitr::knit()', 'plumber::plumb()', 'shiny::runApp()', 'package:targets', and 'testthat::source_file()'.
Maintained by Iris Simmons. Last updated 15 days ago.
6.0 match 49 stars 8.84 score 388 scripts 3 dependentsms609
Quartet:Comparison of Phylogenetic Trees Using Quartet and Split Measures
Calculates the number of four-taxon subtrees consistent with a pair of cladograms, calculating the symmetric quartet distance of Bandelt & Dress (1986), Reconstructing the shape of a tree from observed dissimilarity data, Advances in Applied Mathematics, 7, 309-343 <doi:10.1016/0196-8858(86)90038-2>, and using the tqDist algorithm of Sand et al. (2014), tqDist: a library for computing the quartet and triplet distances between binary or general trees, Bioinformatics, 30, 2079–2080 <doi:10.1093/bioinformatics/btu157> for pairs of binary trees.
Maintained by Martin R. Smith. Last updated 2 months ago.
bioinformaticscomparisonphylogenetic-treesphylogeneticsquartetquartet-distanceresearch-tooltreecpp
6.7 match 14 stars 8.00 score 40 scriptsbioc
S4Vectors:Foundation of vector-like and list-like containers in Bioconductor
The S4Vectors package defines the Vector and List virtual classes and a set of generic functions that extend the semantic of ordinary vectors and lists in R. Package developers can easily implement vector-like or list-like objects as concrete subclasses of Vector or List. In addition, a few low-level concrete subclasses of general interest (e.g. DataFrame, Rle, Factor, and Hits) are implemented in the S4Vectors package itself (many more are implemented in the IRanges package and in other Bioconductor infrastructure packages).
Maintained by Hervé Pagès. Last updated 1 months ago.
infrastructuredatarepresentationbioconductor-packagecore-package
3.3 match 18 stars 16.05 score 1.0k scripts 1.9k dependentstylermorganwall
skpr:Design of Experiments Suite: Generate and Evaluate Optimal Designs
Generates and evaluates D, I, A, Alias, E, T, and G optimal designs. Supports generation and evaluation of blocked and split/split-split/.../N-split plot designs. Includes parametric and Monte Carlo power evaluation functions, and supports calculating power for censored responses. Provides a framework to evaluate power using functions provided in other packages or written by the user. Includes a Shiny graphical user interface that displays the underlying code used to create and evaluate the design to improve ease-of-use and make analyses more reproducible. For details, see Morgan-Wall et al. (2021) <doi:10.18637/jss.v099.i01>.
Maintained by Tyler Morgan-Wall. Last updated 9 days ago.
design-of-experimentslinear-modelslinear-regressionmonte-carlooptimal-designspowersplit-plot-designssurvival-analysiscpp
7.7 match 118 stars 6.89 score 35 scriptsrpahl
pipeflow:Lightweight, General-Purpose Data Analysis Pipelines
A lightweight yet powerful framework for building robust data analysis pipelines. With 'pipeflow', you initialize a pipeline with your dataset and construct workflows step by step by adding R functions. You can modify, remove, or insert steps and parameters at any stage, while 'pipeflow' ensures the pipeline's integrity. Overall, this package offers a beginner-friendly framework that simplifies and streamlines the development of data analysis pipelines by making them modular, intuitive, and adaptable.
Maintained by Roman Pahl. Last updated 2 months ago.
pipeline-toolsreproducible-research
8.3 match 13 stars 6.35 score 19 scriptsspatstat
spatstat.explore:Exploratory Data Analysis for the 'spatstat' Family
Functionality for exploratory data analysis and nonparametric analysis of spatial data, mainly spatial point patterns, in the 'spatstat' family of packages. (Excludes analysis of spatial data on a linear network, which is covered by the separate package 'spatstat.linnet'.) Methods include quadrat counts, K-functions and their simulation envelopes, nearest neighbour distance and empty space statistics, Fry plots, pair correlation function, kernel smoothed intensity, relative risk estimation with cross-validated bandwidth selection, mark correlation functions, segregation indices, mark dependence diagnostics, and kernel estimates of covariate effects. Formal hypothesis tests of random pattern (chi-squared, Kolmogorov-Smirnov, Monte Carlo, Diggle-Cressie-Loosmore-Ford, Dao-Genton, two-stage Monte Carlo) and tests for covariate effects (Cox-Berman-Waller-Lawson, Kolmogorov-Smirnov, ANOVA) are also supported.
Maintained by Adrian Baddeley. Last updated 4 hours ago.
cluster-detectionconfidence-intervalshypothesis-testingk-functionroc-curvesscan-statisticssignificance-testingsimulation-envelopesspatial-analysisspatial-data-analysisspatial-sharpeningspatial-smoothingspatial-statistics
5.1 match 1 stars 10.18 score 67 scripts 149 dependentseleonoraarnone
fdaPDE:Physics-Informed Spatial and Functional Data Analysis
An implementation of regression models with partial differential regularizations, making use of the Finite Element Method. The models efficiently handle data distributed over irregularly shaped domains and can comply with various conditions at the boundaries of the domain. A priori information about the spatial structure of the phenomenon under study can be incorporated in the model via the differential regularization. See Sangalli, L. M. (2021) <doi:10.1111/insr.12444> "Spatial Regression With Partial Differential Equation Regularisation" for an overview. The release 1.1-9 requires R (>= 4.2.0) to be installed on windows machines.
Maintained by Eleonora Arnone. Last updated 2 months ago.
13.9 match 1 stars 3.73 score 267 scriptskenaho1
asbio:A Collection of Statistical Tools for Biologists
Contains functions from: Aho, K. (2014) Foundational and Applied Statistics for Biologists using R. CRC/Taylor and Francis, Boca Raton, FL, ISBN: 978-1-4398-7338-0.
Maintained by Ken Aho. Last updated 2 months ago.
7.1 match 5 stars 7.32 score 310 scripts 3 dependentsjackdunnnz
iai:Interface to 'Interpretable AI' Modules
An interface to the algorithms of 'Interpretable AI' <https://www.interpretable.ai> from the R programming language. 'Interpretable AI' provides various modules, including 'Optimal Trees' for classification, regression, prescription and survival analysis, 'Optimal Imputation' for missing data imputation and outlier detection, and 'Optimal Feature Selection' for exact sparse regression. The 'iai' package is an open-source project. The 'Interpretable AI' software modules are proprietary products, but free academic and evaluation licenses are available.
Maintained by Jack Dunn. Last updated 5 months ago.
25.7 match 1 stars 2.00 score 7 scriptsgleon
rLakeAnalyzer:Lake Physics Tools
Standardized methods for calculating common important derived physical features of lakes including water density based based on temperature, thermal layers, thermocline depth, lake number, Wedderburn number, Schmidt stability and others.
Maintained by Luke Winslow. Last updated 4 years ago.
5.7 match 45 stars 9.05 score 280 scripts 1 dependentsdmphillippo
multinma:Bayesian Network Meta-Analysis of Individual and Aggregate Data
Network meta-analysis and network meta-regression models for aggregate data, individual patient data, and mixtures of both individual and aggregate data using multilevel network meta-regression as described by Phillippo et al. (2020) <doi:10.1111/rssa.12579>. Models are estimated in a Bayesian framework using 'Stan'.
Maintained by David M. Phillippo. Last updated 3 days ago.
5.5 match 35 stars 9.11 score 163 scriptscran
AssocBin:Measuring Association with Recursive Binning
An iterative implementation of a recursive binary partitioning algorithm to measure pairwise dependence with a modular design that allows user specification of the splitting logic and stop criteria. Helper functions provide suggested versions of both and support visualization and the computation of summary statistics on final binnings. For a complete description of the functionality and algorithm, see Salahub and Oldford (2023) <doi:10.48550/arXiv.2311.08561>.
Maintained by Chris Salahub. Last updated 3 months ago.
19.4 match 2.60 scoremayer79
splitTools:Tools for Data Splitting
Fast, lightweight toolkit for data splitting. Data sets can be partitioned into disjoint groups (e.g. into training, validation, and test) or into (repeated) k-folds for subsequent cross-validation. Besides basic splits, the package supports stratified, grouped as well as blocked splitting. Furthermore, cross-validation folds for time series data can be created. See e.g. Hastie et al. (2001) <doi:10.1007/978-0-387-84858-7> for the basic background on data partitioning and cross-validation.
Maintained by Michael Mayer. Last updated 30 days ago.
cross-validationmachine-learningtime-seriesvalidation
6.3 match 13 stars 8.03 score 169 scripts 4 dependentsaphalo
photobiology:Photobiological Calculations
Definitions of classes, methods, operators and functions for use in photobiology and radiation meteorology and climatology. Calculation of effective (weighted) and not-weighted irradiances/doses, fluence rates, transmittance, reflectance, absorptance, absorbance and diverse ratios and other derived quantities from spectral data. Local maxima and minima: peaks, valleys and spikes. Conversion between energy-and photon-based units. Wavelength interpolation. Astronomical calculations related solar angles and day length. Colours and vision. This package is part of the 'r4photobiology' suite, Aphalo, P. J. (2015) <doi:10.19232/uv4pb.2015.1.14>.
Maintained by Pedro J. Aphalo. Last updated 3 days ago.
lightphotobiologyquantificationr4photobiology-suiteradiationspectrasun-position
5.3 match 4 stars 9.35 score 604 scripts 12 dependentsjuba
questionr:Functions to Make Surveys Processing Easier
Set of functions to make the processing and analysis of surveys easier : interactive shiny apps and addins for data recoding, contingency tables, dataset metadata handling, and several convenience functions.
Maintained by Julien Barnier. Last updated 1 days ago.
4.0 match 83 stars 12.62 score 1.1k scripts 19 dependentsr-spatialecology
landscapemetrics:Landscape Metrics for Categorical Map Patterns
Calculates landscape metrics for categorical landscape patterns in a tidy workflow. 'landscapemetrics' reimplements the most common metrics from 'FRAGSTATS' (<https://www.fragstats.org/>) and new ones from the current literature on landscape metrics. This package supports 'terra' SpatRaster objects as input arguments. It further provides utility functions to visualize patches, select metrics and building blocks to develop new metrics.
Maintained by Maximilian H.K. Hesselbarth. Last updated 1 months ago.
landscape-ecologylandscape-metricsrasterspatialcpp
4.0 match 240 stars 12.47 score 584 scripts 4 dependentstrinker
qdapTools:Tools for the 'qdap' Package
A collection of tools associated with the 'qdap' package that may be useful outside of the context of text analysis.
Maintained by Tyler Rinker. Last updated 2 years ago.
7.0 match 16 stars 7.04 score 408 scripts 5 dependentsrstudio
keras3:R Interface to 'Keras'
Interface to 'Keras' <https://keras.io>, a high-level neural networks API. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both CPU and GPU devices.
Maintained by Tomasz Kalinowski. Last updated 4 hours ago.
3.7 match 845 stars 13.60 score 264 scripts 2 dependentsdselivanov
text2vec:Modern Text Mining Framework for R
Fast and memory-friendly tools for text vectorization, topic modeling (LDA, LSA), word embeddings (GloVe), similarities. This package provides a source-agnostic streaming API, which allows researchers to perform analysis of collections of documents which are larger than available RAM. All core functions are parallelized to benefit from multicore machines.
Maintained by Dmitriy Selivanov. Last updated 7 months ago.
glovelatent-dirichlet-allocationnatural-language-processingtext-miningtopic-modelingvectorizationword-embeddingsword2veccpp
3.7 match 860 stars 13.48 score 1.3k scripts 23 dependentsdpc10ster
RJafroc:Artificial Intelligence Systems and Observer Performance
Analyzing the performance of artificial intelligence (AI) systems/algorithms characterized by a 'search-and-report' strategy. Historically observer performance has dealt with measuring radiologists' performances in search tasks, e.g., searching for lesions in medical images and reporting them, but the implicit location information has been ignored. The implemented methods apply to analyzing the absolute and relative performances of AI systems, comparing AI performance to a group of human readers or optimizing the reporting threshold of an AI system. In addition to performing historical receiver operating receiver operating characteristic (ROC) analysis (localization information ignored), the software also performs free-response receiver operating characteristic (FROC) analysis, where lesion localization information is used. A book using the software has been published: Chakraborty DP: Observer Performance Methods for Diagnostic Imaging - Foundations, Modeling, and Applications with R-Based Examples, Taylor-Francis LLC; 2017: <https://www.routledge.com/Observer-Performance-Methods-for-Diagnostic-Imaging-Foundations-Modeling/Chakraborty/p/book/9781482214840>. Online updates to this book, which use the software, are at <https://dpc10ster.github.io/RJafrocQuickStart/>, <https://dpc10ster.github.io/RJafrocRocBook/> and at <https://dpc10ster.github.io/RJafrocFrocBook/>. Supported data collection paradigms are the ROC, FROC and the location ROC (LROC). ROC data consists of single ratings per images, where a rating is the perceived confidence level that the image is that of a diseased patient. An ROC curve is a plot of true positive fraction vs. false positive fraction. FROC data consists of a variable number (zero or more) of mark-rating pairs per image, where a mark is the location of a reported suspicious region and the rating is the confidence level that it is a real lesion. LROC data consists of a rating and a location of the most suspicious region, for every image. Four models of observer performance, and curve-fitting software, are implemented: the binormal model (BM), the contaminated binormal model (CBM), the correlated contaminated binormal model (CORCBM), and the radiological search model (RSM). Unlike the binormal model, CBM, CORCBM and RSM predict 'proper' ROC curves that do not inappropriately cross the chance diagonal. Additionally, RSM parameters are related to search performance (not measured in conventional ROC analysis) and classification performance. Search performance refers to finding lesions, i.e., true positives, while simultaneously not finding false positive locations. Classification performance measures the ability to distinguish between true and false positive locations. Knowing these separate performances allows principled optimization of reader or AI system performance. This package supersedes Windows JAFROC (jackknife alternative FROC) software V4.2.1, <https://github.com/dpc10ster/WindowsJafroc>. Package functions are organized as follows. Data file related function names are preceded by 'Df', curve fitting functions by 'Fit', included data sets by 'dataset', plotting functions by 'Plot', significance testing functions by 'St', sample size related functions by 'Ss', data simulation functions by 'Simulate' and utility functions by 'Util'. Implemented are figures of merit (FOMs) for quantifying performance and functions for visualizing empirical or fitted operating characteristics: e.g., ROC, FROC, alternative FROC (AFROC) and weighted AFROC (wAFROC) curves. For fully crossed study designs significance testing of reader-averaged FOM differences between modalities is implemented via either Dorfman-Berbaum-Metz or the Obuchowski-Rockette methods. Also implemented is single modality analysis, which allows comparison of performance of a group of radiologists to a specified value, or comparison of AI to a group of radiologists interpreting the same cases. Crossed-modality analysis is implemented wherein there are two crossed modality factors and the aim is to determined performance in each modality factor averaged over all levels of the second factor. Sample size estimation tools are provided for ROC and FROC studies; these use estimates of the relevant variances from a pilot study to predict required numbers of readers and cases in a pivotal study to achieve the desired power. Utility and data file manipulation functions allow data to be read in any of the currently used input formats, including Excel, and the results of the analysis can be viewed in text or Excel output files. The methods are illustrated with several included datasets from the author's collaborations. This update includes improvements to the code, some as a result of user-reported bugs and new feature requests, and others discovered during ongoing testing and code simplification.
Maintained by Dev Chakraborty. Last updated 5 months ago.
ai-optimizationartificial-intelligence-algorithmscomputer-aided-diagnosisfroc-analysisroc-analysistarget-classificationtarget-localizationcpp
8.6 match 19 stars 5.69 score 65 scriptsmschubert
narray:Subset- And Name-Aware Array Utility Functions
Stacking arrays according to dimension names, subset-aware splitting and mapping of functions, intersecting along arbitrary dimensions, converting to and from data.frames, and many other helper functions.
Maintained by Michael Schubert. Last updated 2 months ago.
7.0 match 27 stars 6.91 score 10 scripts 10 dependentskkholst
lava:Latent Variable Models
A general implementation of Structural Equation Models with latent variables (MLE, 2SLS, and composite likelihood estimators) with both continuous, censored, and ordinal outcomes (Holst and Budtz-Joergensen (2013) <doi:10.1007/s00180-012-0344-y>). Mixture latent variable models and non-linear latent variable models (Holst and Budtz-Joergensen (2020) <doi:10.1093/biostatistics/kxy082>). The package also provides methods for graph exploration (d-separation, back-door criterion), simulation of general non-linear latent variable models, and estimation of influence functions for a broad range of statistical models.
Maintained by Klaus K. Holst. Last updated 2 months ago.
latent-variable-modelssimulationstatisticsstructural-equation-models
3.8 match 33 stars 12.85 score 610 scripts 476 dependentsedzer
intervals:Tools for Working with Points and Intervals
Tools for working with and comparing sets of points and intervals.
Maintained by Edzer Pebesma. Last updated 7 months ago.
5.1 match 11 stars 9.40 score 122 scripts 90 dependentspachadotdev
cpp11qpdf:Split, Combine and Compress PDF Files
Bindings to 'qpdf': 'qpdf' (<https://qpdf.sourceforge.io/>) is a an open-source PDF rendering library that allows to conduct content-preserving transformations of PDF files such as split, combine, and compress PDF files.
Maintained by Mauricio Vargas Sepulveda. Last updated 2 months ago.
8.6 match 3 stars 5.56 score 4 scriptsdfsp-spirit
fsbrain:Managing and Visualizing Brain Surface Data
Provides high-level access to neuroimaging data from standard software packages like 'FreeSurfer' <http://freesurfer.net/> on the level of subjects and groups. Load morphometry data, surfaces and brain parcellations based on atlases. Mask data using labels, load data for specific atlas regions only, and visualize data and statistical results directly in 'R'.
Maintained by Tim Schäfer. Last updated 4 months ago.
3dbraindtifreesurfermeshmrineuroimagingresearchsurfacevisualizationvoxel
7.4 match 66 stars 6.47 score 15 scriptsrstudio
shiny:Web Application Framework for R
Makes it incredibly easy to build interactive web applications with R. Automatic "reactive" binding between inputs and outputs and extensive prebuilt widgets make it possible to build beautiful, responsive, and powerful applications with minimal effort.
Maintained by Winston Chang. Last updated 14 days ago.
reactiverstudioshinyweb-appweb-development
2.3 match 5.4k stars 21.28 score 108k scripts 1.8k dependentsbioc
tidytof:Analyze High-dimensional Cytometry Data Using Tidy Data Principles
This package implements an interactive, scientific analysis pipeline for high-dimensional cytometry data built using tidy data principles. It is specifically designed to play well with both the tidyverse and Bioconductor software ecosystems, with functionality for reading/writing data files, data cleaning, preprocessing, clustering, visualization, modeling, and other quality-of-life functions. tidytof implements a "grammar" of high-dimensional cytometry data analysis.
Maintained by Timothy Keyes. Last updated 5 months ago.
singlecellflowcytometrybioinformaticscytometrydata-sciencesingle-celltidyversecpp
6.5 match 19 stars 7.26 score 35 scriptsbioc
xcms:LC-MS and GC-MS Data Analysis
Framework for processing and visualization of chromatographically separated and single-spectra mass spectral data. Imports from AIA/ANDI NetCDF, mzXML, mzData and mzML files. Preprocesses data for high-throughput, untargeted analyte profiling.
Maintained by Steffen Neumann. Last updated 3 days ago.
immunooncologymassspectrometrymetabolomicsbioconductorfeature-detectionmass-spectrometrypeak-detectioncpp
3.3 match 196 stars 14.31 score 984 scripts 11 dependentsbioc
nethet:A bioconductor package for high-dimensional exploration of biological network heterogeneity
Package nethet is an implementation of statistical solid methodology enabling the analysis of network heterogeneity from high-dimensional data. It combines several implementations of recent statistical innovations useful for estimation and comparison of networks in a heterogeneous, high-dimensional setting. In particular, we provide code for formal two-sample testing in Gaussian graphical models (differential network and GGM-GSA; Stadler and Mukherjee, 2013, 2014) and make a novel network-based clustering algorithm available (mixed graphical lasso, Stadler and Mukherjee, 2013).
Maintained by Nicolas Staedler. Last updated 5 months ago.
10.9 match 4.30 score 7 scriptsmichaellli
evalITR:Evaluating Individualized Treatment Rules
Provides various statistical methods for evaluating Individualized Treatment Rules under randomized data. The provided metrics include Population Average Value (PAV), Population Average Prescription Effect (PAPE), Area Under Prescription Effect Curve (AUPEC). It also provides the tools to analyze Individualized Treatment Rules under budget constraints. Detailed reference in Imai and Li (2019) <arXiv:1905.05389>.
Maintained by Michael Lingzhi Li. Last updated 2 years ago.
6.9 match 14 stars 6.78 score 36 scriptsbioc
SIAMCAT:Statistical Inference of Associations between Microbial Communities And host phenoTypes
Pipeline for Statistical Inference of Associations between Microbial Communities And host phenoTypes (SIAMCAT). A primary goal of analyzing microbiome data is to determine changes in community composition that are associated with environmental factors. In particular, linking human microbiome composition to host phenotypes such as diseases has become an area of intense research. For this, robust statistical modeling and biomarker extraction toolkits are crucially needed. SIAMCAT provides a full pipeline supporting data preprocessing, statistical association testing, statistical modeling (LASSO logistic regression) including tools for evaluation and interpretation of these models (such as cross validation, parameter selection, ROC analysis and diagnostic model plots).
Maintained by Jakob Wirbel. Last updated 5 months ago.
immunooncologymetagenomicsclassificationmicrobiomesequencingpreprocessingclusteringfeatureextractiongeneticvariabilitymultiplecomparisonregression
6.8 match 6.72 score 147 scriptscovaruber
sommer:Solving Mixed Model Equations in R
Structural multivariate-univariate linear mixed model solver for estimation of multiple random effects with unknown variance-covariance structures (e.g., heterogeneous and unstructured) and known covariance among levels of random effects (e.g., pedigree and genomic relationship matrices) (Covarrubias-Pazaran, 2016 <doi:10.1371/journal.pone.0156744>; Maier et al., 2015 <doi:10.1016/j.ajhg.2014.12.006>; Jensen et al., 1997). REML estimates can be obtained using the Direct-Inversion Newton-Raphson and Direct-Inversion Average Information algorithms for the problems r x r (r being the number of records) or using the Henderson-based average information algorithm for the problem c x c (c being the number of coefficients to estimate). Spatial models can also be fitted using the two-dimensional spline functionality available.
Maintained by Giovanny Covarrubias-Pazaran. Last updated 22 days ago.
average-informationmixed-modelsrcpparmadilloopenblascppopenmp
3.6 match 43 stars 12.70 score 300 scripts 9 dependentscoffeemuggler
caTools:Tools: Moving Window Statistics, GIF, Base64, ROC AUC, etc
Contains several basic utility functions including: moving (rolling, running) window statistic functions, read/write for GIF and ENVI binary files, fast calculation of AUC, LogitBoost classifier, base64 encoder/decoder, round-off-error-free sum and cumsum, etc.
Maintained by Michael Dietze. Last updated 6 months ago.
4.0 match 8 stars 11.17 score 9.1k scripts 566 dependentsbart1
move:Visualizing and Analyzing Animal Track Data
Contains functions to access movement data stored in 'movebank.org' as well as tools to visualize and statistically analyze animal movement data, among others functions to calculate dynamic Brownian Bridge Movement Models. Move helps addressing movement ecology questions.
Maintained by Bart Kranstauber. Last updated 4 months ago.
5.2 match 8.74 score 690 scripts 3 dependentsgamlss-dev
gamlss:Generalized Additive Models for Location Scale and Shape
Functions for fitting the Generalized Additive Models for Location Scale and Shape introduced by Rigby and Stasinopoulos (2005), <doi:10.1111/j.1467-9876.2005.00510.x>. The models use a distributional regression approach where all the parameters of the conditional distribution of the response variable are modelled using explanatory variables.
Maintained by Mikis Stasinopoulos. Last updated 4 months ago.
4.0 match 16 stars 11.23 score 2.0k scripts 49 dependentssteveweston
itertools:Iterator Tools
Various tools for creating iterators, many patterned after functions in the Python itertools module, and others patterned after functions in the 'snow' package.
Maintained by Steve Weston. Last updated 11 years ago.
5.4 match 8.26 score 8.3k scripts 65 dependentsanthonychristidis
splitSelect:Best Split Selection Modeling for Low-Dimensional Data
Functions to generate or sample from all possible splits of features or variables into a number of specified groups. Also computes the best split selection estimator (for low-dimensional data) as defined in Christidis, Van Aelst and Zamar (2019) <arXiv:1812.05678>.
Maintained by Anthony Christidis. Last updated 7 months ago.
16.4 match 2.70 score 4 scriptsbioc
SOMNiBUS:Smooth modeling of bisulfite sequencing
This package aims to analyse count-based methylation data on predefined genomic regions, such as those obtained by targeted sequencing, and thus to identify differentially methylated regions (DMRs) that are associated with phenotypes or traits. The method is built a rich flexible model that allows for the effects, on the methylation levels, of multiple covariates to vary smoothly along genomic regions. At the same time, this method also allows for sequencing errors and can adjust for variability in cell type mixture.
Maintained by Kathleen Klein. Last updated 3 months ago.
dnamethylationregressionepigeneticsdifferentialmethylationsequencingfunctionalprediction
10.3 match 1 stars 4.30 score 3 scriptsmarkfairbanks
tidytable:Tidy Interface to 'data.table'
A tidy interface to 'data.table', giving users the speed of 'data.table' while using tidyverse-like syntax.
Maintained by Mark Fairbanks. Last updated 2 months ago.
3.9 match 458 stars 11.41 score 732 scripts 10 dependentsflr
FLCore:Core Package of FLR, Fisheries Modelling in R
Core classes and methods for FLR, a framework for fisheries modelling and management strategy simulation in R. Developed by a team of fisheries scientists in various countries. More information can be found at <http://flr-project.org/>.
Maintained by Iago Mosqueira. Last updated 10 days ago.
fisheriesflrfisheries-modelling
5.0 match 16 stars 8.78 score 956 scripts 23 dependentsr-spatial
lwgeom:Bindings to Selected 'liblwgeom' Functions for Simple Features
Access to selected functions found in 'liblwgeom' <https://github.com/postgis/postgis/tree/master/liblwgeom>, the light-weight geometry library used by 'PostGIS' <http://postgis.net/>.
Maintained by Edzer Pebesma. Last updated 1 months ago.
3.4 match 61 stars 12.95 score 1.7k scripts 66 dependentsbioc
Gviz:Plotting data and annotation information along genomic coordinates
Genomic data analyses requires integrated visualization of known genomic information and new experimental data. Gviz uses the biomaRt and the rtracklayer packages to perform live annotation queries to Ensembl and UCSC and translates this to e.g. gene/transcript structures in viewports of the grid graphics package. This results in genomic information plotted together with your data.
Maintained by Robert Ivanek. Last updated 5 months ago.
visualizationmicroarraysequencing
3.3 match 79 stars 13.08 score 1.4k scripts 48 dependentshegghammer
daiR:Interface with Google Cloud Document AI API
R interface for the Google Cloud Services 'Document AI API' <https://cloud.google.com/document-ai/> with additional tools for output file parsing and text reconstruction. 'Document AI' is a powerful server-based OCR service that extracts text and tables from images and PDF files with high accuracy. 'daiR' gives R users programmatic access to this service and additional tools to handle and visualize the output. See the package website <https://dair.info/> for more information and examples.
Maintained by Thomas Hegghammer. Last updated 4 months ago.
6.4 match 42 stars 6.77 score 40 scriptsbioc
SPIAT:Spatial Image Analysis of Tissues
SPIAT (**Sp**atial **I**mage **A**nalysis of **T**issues) is an R package with a suite of data processing, quality control, visualization and data analysis tools. SPIAT is compatible with data generated from single-cell spatial proteomics platforms (e.g. OPAL, CODEX, MIBI, cellprofiler). SPIAT reads spatial data in the form of X and Y coordinates of cells, marker intensities and cell phenotypes. SPIAT includes six analysis modules that allow visualization, calculation of cell colocalization, categorization of the immune microenvironment relative to tumor areas, analysis of cellular neighborhoods, and the quantification of spatial heterogeneity, providing a comprehensive toolkit for spatial data analysis.
Maintained by Yuzhou Feng. Last updated 22 hours ago.
biomedicalinformaticscellbiologyspatialclusteringdataimportimmunooncologyqualitycontrolsinglecellsoftwarevisualization
5.0 match 22 stars 8.59 score 69 scriptsjuba
rainette:The Reinert Method for Textual Data Clustering
An R implementation of the Reinert text clustering method. For more details about the algorithm see the included vignettes or Reinert (1990) <doi:10.1177/075910639002600103>.
Maintained by Julien Barnier. Last updated 11 months ago.
text-analysistext-classificationcpp
6.2 match 55 stars 6.90 score 24 scriptsrdatatable
data.table:Extension of `data.frame`
Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development.
Maintained by Tyson Barrett. Last updated 7 hours ago.
1.8 match 3.7k stars 23.52 score 230k scripts 4.6k dependentsprivefl
bigstatsr:Statistical Tools for Filebacked Big Matrices
Easy-to-use, efficient, flexible and scalable statistical tools. Package bigstatsr provides and uses Filebacked Big Matrices via memory-mapping. It provides for instance matrix operations, Principal Component Analysis, sparse linear supervised models, utility functions and more <doi:10.1093/bioinformatics/bty185>.
Maintained by Florian Privé. Last updated 6 months ago.
big-datalarge-matricesmemory-mapped-fileparallel-computingstatistical-methodsopenblascppopenmp
4.0 match 180 stars 10.59 score 394 scripts 16 dependentsbioc
MSnbase:Base Functions and Classes for Mass Spectrometry and Proteomics
MSnbase provides infrastructure for manipulation, processing and visualisation of mass spectrometry and proteomics data, ranging from raw to quantitative and annotated data.
Maintained by Laurent Gatto. Last updated 3 days ago.
immunooncologyinfrastructureproteomicsmassspectrometryqualitycontroldataimportbioconductorbioinformaticsmass-spectrometryproteomics-datavisualisationcpp
3.3 match 130 stars 12.81 score 772 scripts 36 dependentswinvector
wrapr:Wrap R Tools for Debugging and Parametric Programming
Tools for writing and debugging R code. Provides: '%.>%' dot-pipe (an 'S3' configurable pipe), unpack/to (R style multiple assignment/return), 'build_frame()'/'draw_frame()' ('data.frame' example tools), 'qc()' (quoting concatenate), ':=' (named map builder), 'let()' (converts non-standard evaluation interfaces to parametric standard evaluation interfaces, inspired by 'gtools::strmacro()' and 'base::bquote()'), and more.
Maintained by John Mount. Last updated 2 years ago.
3.8 match 137 stars 11.11 score 390 scripts 12 dependentsbarbarabodinier
sharp:Stability-enHanced Approaches using Resampling Procedures
In stability selection (N Meinshausen, P Bühlmann (2010) <doi:10.1111/j.1467-9868.2010.00740.x>) and consensus clustering (S Monti et al (2003) <doi:10.1023/A:1023949509487>), resampling techniques are used to enhance the reliability of the results. In this package, hyper-parameters are calibrated by maximising model stability, which is measured under the null hypothesis that all selection (or co-membership) probabilities are identical (B Bodinier et al (2023a) <doi:10.1093/jrsssc/qlad058> and B Bodinier et al (2023b) <doi:10.1093/bioinformatics/btad635>). Functions are readily implemented for the use of LASSO regression, sparse PCA, sparse (group) PLS or graphical LASSO in stability selection, and hierarchical clustering, partitioning around medoids, K means or Gaussian mixture models in consensus clustering.
Maintained by Barbara Bodinier. Last updated 1 years ago.
7.1 match 13 stars 5.91 score 124 scriptsbioc
rtracklayer:R interface to genome annotation files and the UCSC genome browser
Extensible framework for interacting with multiple genome browsers (currently UCSC built-in) and manipulating annotation tracks in various formats (currently GFF, BED, bedGraph, BED15, WIG, BigWig and 2bit built-in). The user may export/import tracks to/from the supported browsers, as well as query and modify the browser state, such as the current viewport.
Maintained by Michael Lawrence. Last updated 9 days ago.
annotationvisualizationdataimportzlibopensslcurl
3.3 match 12.66 score 6.7k scripts 481 dependentsropensci
ruODK:An R Client for the ODK Central API
Access and tidy up data from the 'ODK Central' API. 'ODK Central' is a clearinghouse for digitally captured data using ODK <https://docs.getodk.org/central-intro/>. It manages user accounts and permissions, stores form definitions, and allows data collection clients like 'ODK Collect' to connect to it for form download and submission upload. The 'ODK Central' API is documented at <https://docs.getodk.org/central-api/>.
Maintained by Florian W. Mayer. Last updated 4 months ago.
databaseopen-dataodkapidatadatasetodataodata-clientodk-centralopendatakit
5.4 match 42 stars 7.73 score 57 scripts 1 dependentsosimon81
SqueakR:An Experiment Interface for 'DeepSqueak' Bioacoustics Research
Data processing and visualizations for rodent vocalizations exported from 'DeepSqueak'. These functions are compatible with the 'SqueakR' Shiny Dashboard, which can be used to visualize experimental results and analyses.
Maintained by Simon Ogundare. Last updated 3 years ago.
9.0 match 8 stars 4.60 score 5 scriptsjoshuaulrich
xts:eXtensible Time Series
Provide for uniform handling of R's different time-based data classes by extending zoo, maximizing native format information preservation and allowing for user level customization and extension, while simplifying cross-class interoperability.
Maintained by Joshua M. Ulrich. Last updated 4 months ago.
2.3 match 221 stars 18.38 score 12k scripts 654 dependentskwb-r
kwb.utils:General Utility Functions Developed at KWB
This package contains some small helper functions that aim at improving the quality of code developed at Kompetenzzentrum Wasser gGmbH (KWB).
Maintained by Hauke Sonnenberg. Last updated 12 months ago.
5.6 match 8 stars 7.33 score 12 scripts 78 dependentspolmine
polmineR:Verbs and Nouns for Corpus Analysis
Package for corpus analysis using the Corpus Workbench ('CWB', <https://cwb.sourceforge.io>) as an efficient back end for indexing and querying large corpora. The package offers functionality to flexibly create subcorpora and to carry out basic statistical operations (count, co-occurrences etc.). The original full text of documents can be reconstructed and inspected at any time. Beyond that, the package is intended to serve as an interface to packages implementing advanced statistical procedures. Respective data structures (document-term matrices, term-co-occurrence matrices etc.) can be created based on the indexed corpora.
Maintained by Andreas Blaette. Last updated 1 years ago.
5.2 match 49 stars 7.96 score 311 scriptspierreroudier
spectacles:Storing, Manipulating and Analysis Spectroscopy and Associated Data
Stores and eases the manipulation of spectra and associated data, with dedicated classes for spatial and soil-related data.
Maintained by Pierre Roudier. Last updated 2 years ago.
6.6 match 11 stars 6.17 score 45 scripts 1 dependentsoobianom
quickcode:Quick and Essential 'R' Tricks for Better Scripts
The NOT functions, 'R' tricks and a compilation of some simple quick plus often used 'R' codes to improve your scripts. Improve the quality and reproducibility of 'R' scripts.
Maintained by Obinna Obianom. Last updated 14 days ago.
5.2 match 5 stars 7.76 score 7 scripts 6 dependentsbioc
methodical:Discovering genomic regions where methylation is strongly associated with transcriptional activity
DNA methylation is generally considered to be associated with transcriptional silencing. However, comprehensive, genome-wide investigation of this relationship requires the evaluation of potentially millions of correlation values between the methylation of individual genomic loci and expression of associated transcripts in a relatively large numbers of samples. Methodical makes this process quick and easy while keeping a low memory footprint. It also provides a novel method for identifying regions where a number of methylation sites are consistently strongly associated with transcriptional expression. In addition, Methodical enables housing DNA methylation data from diverse sources (e.g. WGBS, RRBS and methylation arrays) with a common framework, lifting over DNA methylation data between different genome builds and creating base-resolution plots of the association between DNA methylation and transcriptional activity at transcriptional start sites.
Maintained by Richard Heery. Last updated 2 months ago.
dnamethylationmethylationarraytranscriptiongenomewideassociationsoftwareopenjdk
8.6 match 4.65 score 14 scriptsbioc
ncdfFlow:ncdfFlow: A package that provides HDF5 based storage for flow cytometry data.
Provides HDF5 storage based methods and functions for manipulation of flow cytometry data.
Maintained by Mike Jiang. Last updated 2 months ago.
immunooncologyflowcytometryzlibcpp
5.3 match 7.56 score 96 scripts 11 dependentsludkinm
SBMSplitMerge:Inference for a Generalised SBM with a Split Merge Sampler
Inference in a Bayesian framework for a generalised stochastic block model. The generalised stochastic block model (SBM) can capture group structure in network data without requiring conjugate priors on the edge-states. Two sampling methods are provided to perform inference on edge parameters and block structure: a split-merge Markov chain Monte Carlo algorithm and a Dirichlet process sampler. Green, Richardson (2001) <doi:10.1111/1467-9469.00242>; Neal (2000) <doi:10.1080/10618600.2000.10474879>; Ludkin (2019) <arXiv:1909.09421>.
Maintained by Matthew Ludkin. Last updated 5 years ago.
14.8 match 2.70 score 3 scriptswanghaoxue0
SplitKnockoff:Split Knockoffs for Structural Sparsity
Split Knockoff is a data adaptive variable selection framework for controlling the (directional) false discovery rate (FDR) in structural sparsity, where variable selection on linear transformation of parameters is of concern. This proposed scheme relaxes the linear subspace constraint to its neighborhood, often known as variable splitting in optimization. Simulation experiments can be reproduced following the Vignette. We include data (both .mat and .csv format) and application with our method of Alzheimer's Disease study in this package. 'Split Knockoffs' is first defined in Cao et al. (2021) <arXiv:2103.16159>.
Maintained by Haoxue Wang. Last updated 3 years ago.
9.5 match 3 stars 4.18 score 4 scriptschoonghyunryu
alookr:Model Classifier for Binary Classification
A collection of tools that support data splitting, predictive modeling, and model evaluation. A typical function is to split a dataset into a training dataset and a test dataset. Then compare the data distribution of the two datasets. Another feature is to support the development of predictive models and to compare the performance of several predictive models, helping to select the best model.
Maintained by Choonghyun Ryu. Last updated 1 years ago.
7.3 match 12 stars 5.38 score 9 scriptsasa12138
pcutils:Some Useful Functions for Statistics and Visualization
Offers a range of utilities and functions for everyday programming tasks. 1.Data Manipulation. Such as grouping and merging, column splitting, and character expansion. 2.File Handling. Read and convert files in popular formats. 3.Plotting Assistance. Helpful utilities for generating color palettes, validating color formats, and adding transparency. 4.Statistical Analysis. Includes functions for pairwise comparisons and multiple testing corrections, enabling perform statistical analyses with ease. 5.Graph Plotting, Provides efficient tools for creating doughnut plot and multi-layered doughnut plot; Venn diagrams, including traditional Venn diagrams, upset plots, and flower plots; Simplified functions for creating stacked bar plots, or a box plot with alphabets group for multiple comparison group.
Maintained by Chen Peng. Last updated 5 months ago.
5.9 match 22 stars 6.57 score 28 scripts 4 dependentsh56cho
forestRK:Implements the Forest-R.K. Algorithm for Classification Problems
Provides functions that calculates common types of splitting criteria used in random forests for classification problems, as well as functions that make predictions based on a single tree or a Forest-R.K. model; the package also provides functions to generate importance plot for a Forest-R.K. model, as well as the 2D multidimensional-scaling plot of data points that are colour coded by their predicted class types by the Forest-R.K. model. This package is based on: Bernard, S., Heutte, L., Adam, S., (2008, ISBN:978-3-540-85983-3) "Forest-R.K.: A New Random Forest Induction Method", Fourth International Conference on Intelligent Computing, September 2008, Shanghai, China, pp.430-437.
Maintained by Hyunjin Cho. Last updated 6 years ago.
9.2 match 4.24 score 35 scriptstopepo
caret:Classification and Regression Training
Misc functions for training and plotting classification and regression models.
Maintained by Max Kuhn. Last updated 3 months ago.
2.0 match 1.6k stars 19.24 score 61k scripts 303 dependentsflorianschwendinger
scs:Splitting Conic Solver
Solves convex cone programs via operator splitting. Can solve: linear programs ('LPs'), second-order cone programs ('SOCPs'), semidefinite programs ('SDPs'), exponential cone programs ('ECPs'), and power cone programs ('PCPs'), or problems with any combination of those cones. 'SCS' uses 'AMD' (a set of routines for permuting sparse matrices prior to factorization) and 'LDL' (a sparse 'LDL' factorization and solve package) from 'SuiteSparse' (<https://people.engr.tamu.edu/davis/suitesparse.html>).
Maintained by Florian Schwendinger. Last updated 2 years ago.
5.7 match 8 stars 6.72 score 16 scripts 53 dependentsjbengler
tidyplots:Tidy Plots for Scientific Papers
The goal of 'tidyplots' is to streamline the creation of publication-ready plots for scientific papers. It allows to gradually add, remove and adjust plot components using a consistent and intuitive syntax.
Maintained by Jan Broder Engler. Last updated 4 days ago.
4.1 match 482 stars 9.40 score 85 scriptsbioc
EnrichedHeatmap:Making Enriched Heatmaps
Enriched heatmap is a special type of heatmap which visualizes the enrichment of genomic signals on specific target regions. Here we implement enriched heatmap by ComplexHeatmap package. Since this type of heatmap is just a normal heatmap but with some special settings, with the functionality of ComplexHeatmap, it would be much easier to customize the heatmap as well as concatenating to a list of heatmaps to show correspondance between different data sources.
Maintained by Zuguang Gu. Last updated 5 months ago.
softwarevisualizationsequencinggenomeannotationcoveragecpp
3.5 match 190 stars 10.87 score 330 scripts 1 dependentsashwinikv
visTree:Visualization of Subgroups for Decision Trees
Provides a visualization for characterizing subgroups defined by a decision tree structure. The visualization simplifies the ability to interpret individual pathways to subgroups; each sub-plot describes the distribution of observations within individual terminal nodes and percentile ranges for the associated inner nodes.
Maintained by Ashwini Venkatasubramaniam. Last updated 6 years ago.
9.5 match 2 stars 4.00 score 9 scriptsr-lib
vctrs:Vector Helpers
Defines new notions of prototype and size that are used to provide tools for consistent and well-founded type-coercion and size-recycling, and are in turn connected to ideas of type- and size-stability useful for analysing function interfaces.
Maintained by Davis Vaughan. Last updated 5 months ago.
2.0 match 290 stars 18.97 score 1.1k scripts 13k dependentsvubiostat
redcapAPI:Interface to 'REDCap'
Access data stored in 'REDCap' databases using the Application Programming Interface (API). 'REDCap' (Research Electronic Data CAPture; <https://projectredcap.org>, Harris, et al. (2009) <doi:10.1016/j.jbi.2008.08.010>, Harris, et al. (2019) <doi:10.1016/j.jbi.2019.103208>) is a web application for building and managing online surveys and databases developed at Vanderbilt University. The API allows users to access data and project meta data (such as the data dictionary) from the web programmatically. The 'redcapAPI' package facilitates the process of accessing data with options to prepare an analysis-ready data set consistent with the definitions in a database's data dictionary.
Maintained by Shawn Garbett. Last updated 10 days ago.
3.6 match 22 stars 10.47 score 134 scripts 2 dependentsmladenjovanovic
shorts:Short Sprints
Create short sprint acceleration-velocity (AVP) and force-velocity (FVP) profiles and predict kinematic and kinetic variables using the timing-gate split times, laser or radar gun data, tether devices data, as well as the data provided by the GPS and LPS monitoring systems. The modeling method utilized in this package is based on the works of Furusawa K, Hill AV, Parkinson JL (1927) <doi: 10.1098/rspb.1927.0035>, Greene PR. (1986) <doi: 10.1016/0025-5564(86)90063-5>, Chelly SM, Denis C. (2001) <doi: 10.1097/00005768-200102000-00024>, Clark KP, Rieger RH, Bruno RF, Stearne DJ. (2017) <doi: 10.1519/JSC.0000000000002081>, Samozino P. (2018) <doi: 10.1007/978-3-319-05633-3_11>, Samozino P. and Peyrot N., et al (2022) <doi: 10.1111/sms.14097>, Clavel, P., et al (2023) <doi: 10.1016/j.jbiomech.2023.111602>, Jovanovic M. (2023) <doi: 10.1080/10255842.2023.2170713>, Jovanovic M., et al (2024) <doi: 10.3390/s24092894>, and Jovanovic M., et al (2024) <doi: 10.3390/s24196192>.
Maintained by Mladen Jovanović. Last updated 5 months ago.
8.4 match 14 stars 4.45 score 4 scriptsbioc
flowClust:Clustering for Flow Cytometry
Robust model-based clustering using a t-mixture model with Box-Cox transformation. Note: users should have GSL installed. Windows users: 'consult the README file available in the inst directory of the source distribution for necessary configuration instructions'.
Maintained by Greg Finak. Last updated 5 months ago.
immunooncologyclusteringvisualizationflowcytometry
5.1 match 7.30 score 83 scripts 6 dependentsbusiness-science
modeltime:The Tidymodels Extension for Time Series Modeling
The time series forecasting framework for use with the 'tidymodels' ecosystem. Models include ARIMA, Exponential Smoothing, and additional time series models from the 'forecast' and 'prophet' packages. Refer to "Forecasting Principles & Practice, Second edition" (<https://otexts.com/fpp2/>). Refer to "Prophet: forecasting at scale" (<https://research.facebook.com/blog/2017/02/prophet-forecasting-at-scale/>.).
Maintained by Matt Dancho. Last updated 5 months ago.
arimadata-sciencedeep-learningetsforecastingmachine-learningmachine-learning-algorithmsmodeltimeprophettbatstidymodelingtidymodelstimetime-seriestime-series-analysistimeseriestimeseries-forecasting
3.5 match 549 stars 10.57 score 1.1k scripts 7 dependentsigraph
igraph:Network Analysis and Visualization
Routines for simple graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more.
Maintained by Kirill Müller. Last updated 3 hours ago.
complex-networksgraph-algorithmsgraph-theorymathematicsnetwork-analysisnetwork-graphfortranlibxml2glpkopenblascpp
1.8 match 582 stars 21.11 score 31k scripts 1.9k dependentsohdsi
CohortConstructor:Build and Manipulate Study Cohorts Using a Common Data Model
Create and manipulate study cohorts in data mapped to the Observational Medical Outcomes Partnership Common Data Model.
Maintained by Edward Burn. Last updated 4 days ago.
3.8 match 2 stars 9.71 score 207 scripts 2 dependentsstan-dev
posterior:Tools for Working with Posterior Distributions
Provides useful tools for both users and developers of packages for fitting Bayesian models or working with output from Bayesian models. The primary goals of the package are to: (a) Efficiently convert between many different useful formats of draws (samples) from posterior or prior distributions. (b) Provide consistent methods for operations commonly performed on draws, for example, subsetting, binding, or mutating draws. (c) Provide various summaries of draws in convenient formats. (d) Provide lightweight implementations of state of the art posterior inference diagnostics. References: Vehtari et al. (2021) <doi:10.1214/20-BA1221>.
Maintained by Paul-Christian Bürkner. Last updated 11 days ago.
2.3 match 168 stars 16.13 score 3.3k scripts 342 dependentsr-lib
cli:Helpers for Developing Command Line Interfaces
A suite of tools to build attractive command line interfaces ('CLIs'), from semantic elements: headings, lists, alerts, paragraphs, etc. Supports custom themes via a 'CSS'-like language. It also contains a number of lower level 'CLI' elements: rules, boxes, trees, and 'Unicode' symbols with 'ASCII' alternatives. It support ANSI colors and text styles as well.
Maintained by Gábor Csárdi. Last updated 2 days ago.
1.9 match 664 stars 19.33 score 1.4k scripts 14k dependentslaresbernardo
lares:Analytics & Machine Learning Sidekick
Auxiliary package for better/faster analytics, visualization, data mining, and machine learning tasks. With a wide variety of family functions, like Machine Learning, Data Wrangling, Marketing Mix Modeling (Robyn), Exploratory, API, and Scrapper, it helps the analyst or data scientist to get quick and robust results, without the need of repetitive coding or advanced R programming skills.
Maintained by Bernardo Lares. Last updated 24 days ago.
analyticsapiautomationautomldata-sciencedescriptive-statisticsh2omachine-learningmarketingmmmpredictive-modelingpuzzlerlanguagerobynvisualization
3.7 match 233 stars 9.84 score 185 scripts 1 dependentstagteam
riskRegression:Risk Regression Models and Prediction Scores for Survival Analysis with Competing Risks
Implementation of the following methods for event history analysis. Risk regression models for survival endpoints also in the presence of competing risks are fitted using binomial regression based on a time sequence of binary event status variables. A formula interface for the Fine-Gray regression model and an interface for the combination of cause-specific Cox regression models. A toolbox for assessing and comparing performance of risk predictions (risk markers and risk prediction models). Prediction performance is measured by the Brier score and the area under the ROC curve for binary possibly time-dependent outcome. Inverse probability of censoring weighting and pseudo values are used to deal with right censored data. Lists of risk markers and lists of risk models are assessed simultaneously. Cross-validation repeatedly splits the data, trains the risk prediction models on one part of each split and then summarizes and compares the performance across splits.
Maintained by Thomas Alexander Gerds. Last updated 18 days ago.
2.8 match 46 stars 13.00 score 736 scripts 35 dependentshneth
ds4psy:Data Science for Psychologists
All datasets and functions required for the examples and exercises of the book "Data Science for Psychologists" (by Hansjoerg Neth, Konstanz University, 2023), freely available at <https://bookdown.org/hneth/ds4psy/>. The book and course introduce principles and methods of data science to students of psychology and other biological or social sciences. The 'ds4psy' package primarily provides datasets, but also functions for data generation and manipulation (e.g., of text and time data) and graphics that are used in the book and its exercises. All functions included in 'ds4psy' are designed to be explicit and instructive, rather than efficient or elegant.
Maintained by Hansjoerg Neth. Last updated 1 months ago.
data-literacydata-scienceeducationexploratory-data-analysispsychologysocial-sciencesvisualisation
5.3 match 22 stars 6.79 score 70 scriptsskranz
stringtools:Tools for working with strings in R
Tools for working with strings in R
Maintained by Sebastian Kranz. Last updated 3 years ago.
9.8 match 2 stars 3.66 score 29 scripts 26 dependentstherneau
survival:Survival Analysis
Contains the core survival analysis routines, including definition of Surv objects, Kaplan-Meier and Aalen-Johansen (multi-state) curves, Cox models, and parametric accelerated failure time models.
Maintained by Terry M Therneau. Last updated 3 months ago.
1.8 match 400 stars 20.43 score 29k scripts 3.9k dependentsbrandmaier
semtree:Recursive Partitioning for Structural Equation Models
SEM Trees and SEM Forests -- an extension of model-based decision trees and forests to Structural Equation Models (SEM). SEM trees hierarchically split empirical data into homogeneous groups each sharing similar data patterns with respect to a SEM by recursively selecting optimal predictors of these differences. SEM forests are an extension of SEM trees. They are ensembles of SEM trees each built on a random sample of the original data. By aggregating over a forest, we obtain measures of variable importance that are more robust than measures from single trees. A description of the method was published by Brandmaier, von Oertzen, McArdle, & Lindenberger (2013) <doi:10.1037/a0030001> and Arnold, Voelkle, & Brandmaier (2020) <doi:10.3389/fpsyg.2020.564403>.
Maintained by Andreas M. Brandmaier. Last updated 3 months ago.
bigdatadecision-treeforestmultivariaterandomforestrecursive-partitioningsemstatistical-modelingstructural-equation-modelingstructural-equation-models
4.1 match 15 stars 8.63 score 68 scripts