R-universe search: subsetting

rdatatable

data.table:Extension of `data.frame`

Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, friendly and fast character-separated-value read/write. Offers a natural and flexible syntax, for faster development.

Maintained by Tyson Barrett. Last updated 1 days ago.

38.3 match 3.7k stars 23.53 score 230k scripts 4.6k dependents

dankelley

oce:Analysis of Oceanographic Data

Supports the analysis of Oceanographic data, including 'ADCP' measurements, measurements made with 'argo' floats, 'CTD' measurements, sectional data, sea-level time series, coastline and topographic data, etc. Provides specialized functions for calculating seawater properties such as potential temperature in either the 'UNESCO' or 'TEOS-10' equation of state. Produces graphical displays that conform to the conventions of the Oceanographic literature. This package is discussed extensively by Kelley (2018) "Oceanographic Analysis with R" <doi:10.1007/978-1-4939-8844-0>.

Maintained by Dan Kelley. Last updated 3 days ago.

oceanography fortran cpp

39.3 match 146 stars 15.42 score 4.2k scripts 18 dependents

spatstat

spatstat.geom:Geometrical Functionality of the 'spatstat' Family

Defines spatial data types and supports geometrical operations on them. Data types include point patterns, windows (domains), pixel images, line segment patterns, tessellations and hyperframes. Capabilities include creation and manipulation of data (using command line or graphical interaction), plotting, geometrical operations (rotation, shift, rescale, affine transformation), convex hull, discretisation and pixellation, Dirichlet tessellation, Delaunay triangulation, pairwise distances, nearest-neighbour distances, distance transform, morphological operations (erosion, dilation, closing, opening), quadrat counting, geometrical measurement, geometrical covariance, colour maps, calculus on spatial domains, Gaussian blur, level sets of images, transects of images, intersections between objects, minimum distance matching. (Excludes spatial data on a network, which are supported by the package 'spatstat.linnet'.)

Maintained by Adrian Baddeley. Last updated 20 hours ago.

classes-and-objects distance-calculation geometry geometry-processing images mensuration plotting point-patterns spatial-data spatial-data-analysis

46.8 match 7 stars 12.10 score 241 scripts 227 dependents

bioc

ComplexHeatmap:Make Complex Heatmaps

Complex heatmaps are efficient to visualize associations between different sources of data sets and reveal potential patterns. Here the ComplexHeatmap package provides a highly flexible way to arrange multiple heatmaps and supports various annotation graphics.

Maintained by Zuguang Gu. Last updated 5 months ago.

software visualization sequencing clustering complex-heatmaps heatmap

23.4 match 1.3k stars 16.93 score 16k scripts 151 dependents

ohdsi

CohortGenerator:Cohort Generation for the OMOP Common Data Model

Generate cohorts and subsets using an Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) Database. Cohorts are defined using 'CIRCE' (<https://github.com/ohdsi/circe-be>) or SQL compatible with 'SqlRender' (<https://github.com/OHDSI/SqlRender>).

Maintained by Anthony Sena. Last updated 6 months ago.

hades openjdk

42.1 match 13 stars 7.91 score 165 scripts

big-life-lab

cchsflow:Transforming and Harmonizing CCHS Variables

Supporting the use of the Canadian Community Health Survey (CCHS) by transforming variables from each cycle into harmonized, consistent versions that span survey cycles (currently, 2001 to 2018). CCHS data used in this library is accessed and adapted in accordance to the Statistics Canada Open Licence Agreement. This package uses rec_with_table(), which was developed from 'sjmisc' rec(). Lüdecke D (2018). "sjmisc: Data and Variable Transformation Functions". Journal of Open Source Software, 3(26), 754. <doi:10.21105/joss.00754>.

Maintained by Kitty Chen. Last updated 1 years ago.

cchs opensci openscience

51.1 match 12 stars 6.02 score 192 scripts

ncss-tech

aqp:Algorithms for Quantitative Pedology

The Algorithms for Quantitative Pedology (AQP) project was started in 2009 to organize a loosely-related set of concepts and source code on the topic of soil profile visualization, aggregation, and classification into this package (aqp). Over the past 8 years, the project has grown into a suite of related R packages that enhance and simplify the quantitative analysis of soil profile data. Central to the AQP project is a new vocabulary of specialized functions and data structures that can accommodate the inherent complexity of soil profile information; freeing the scientist to focus on ideas rather than boilerplate data processing tasks <doi:10.1016/j.cageo.2012.10.020>. These functions and data structures have been extensively tested and documented, applied to projects involving hundreds of thousands of soil profiles, and deeply integrated into widely used tools such as SoilWeb <https://casoilresource.lawr.ucdavis.edu/soilweb-apps>. Components of the AQP project (aqp, soilDB, sharpshootR, soilReports packages) serve an important role in routine data analysis within the USDA-NRCS Soil Science Division. The AQP suite of R packages offer a convenient platform for bridging the gap between pedometric theory and practice.

Maintained by Dylan Beaudette. Last updated 1 months ago.

digital-soil-mapping ncss-tech nrcs pedology pedometrics soil soil-survey usda

23.0 match 55 stars 11.90 score 1.2k scripts 2 dependents

bioc

MultiAssayExperiment:Software for the integration of multi-omics experiments in Bioconductor

Harmonize data management of multiple experimental assays performed on an overlapping set of specimens. It provides a familiar Bioconductor user experience by extending concepts from SummarizedExperiment, supporting an open-ended mix of standard data classes for individual assays, and allowing subsetting by genomic ranges or rownames. Facilities are provided for reshaping data into wide and long formats for adaptability to graphing and downstream analysis.

Maintained by Marcel Ramos. Last updated 2 months ago.

infrastructure datarepresentation bioconductor bioconductor-package genomics nci-itcr tcga u24ca289073

17.8 match 71 stars 14.95 score 670 scripts 127 dependents

tguillerme

dispRity:Measuring Disparity

A modular package for measuring disparity (multidimensional space occupancy). Disparity can be calculated from any matrix defining a multidimensional space. The package provides a set of implemented metrics to measure properties of the space and allows users to provide and test their own metrics. The package also provides functions for looking at disparity in a serial way (e.g. disparity through time) or per groups as well as visualising the results. Finally, this package provides several statistical tests for disparity analysis.

Maintained by Thomas Guillerme. Last updated 4 days ago.

disparity ecology multidimensionality palaeobiology

29.9 match 26 stars 8.69 score 220 scripts 1 dependents

bioc

xcms:LC-MS and GC-MS Data Analysis

Framework for processing and visualization of chromatographically separated and single-spectra mass spectral data. Imports from AIA/ANDI NetCDF, mzXML, mzData and mzML files. Preprocesses data for high-throughput, untargeted analyte profiling.

Maintained by Steffen Neumann. Last updated 4 days ago.

immunooncology massspectrometry metabolomics bioconductor feature-detection mass-spectrometry peak-detection cpp

16.1 match 196 stars 14.31 score 984 scripts 11 dependents

rspatial

terra:Spatial Data Analysis

Methods for spatial data analysis with vector (points, lines, polygons) and raster (grid) data. Methods for vector data include geometric operations such as intersect and buffer. Raster methods include local, focal, global, zonal and geometric operations. The predict and interpolate methods facilitate the use of regression type (interpolation, machine learning) models for spatial prediction, including with satellite remote sensing data. Processing of very large files is supported. See the manual and tutorials on <https://rspatial.org/> to get started. 'terra' replaces the 'raster' package ('terra' can do more, and it is faster and easier to use).

Maintained by Robert J. Hijmans. Last updated 14 hours ago.

geospatial raster spatial vector onetbb proj gdal geos cpp

13.1 match 559 stars 17.63 score 17k scripts 851 dependents

tidyverse

tibble:Simple Data Frames

Provides a 'tbl_df' class (the 'tibble') with stricter checking and better formatting than the traditional data frame.

Maintained by Kirill Müller. Last updated 3 months ago.

tidy-data

8.6 match 693 stars 22.82 score 47k scripts 11k dependents

rspatial

raster:Geographic Data Analysis and Modeling

Reading, writing, manipulating, analyzing and modeling of spatial data. This package has been superseded by the "terra" package <https://CRAN.R-project.org/package=terra>.

Maintained by Robert J. Hijmans. Last updated 2 months ago.

cpp

11.1 match 164 stars 17.05 score 58k scripts 555 dependents

bioc

SummarizedExperiment:A container (S4 class) for matrix-like assays

The SummarizedExperiment container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples.

Maintained by Hervé Pagès. Last updated 5 months ago.

genetics infrastructure sequencing annotation coverage genomeannotation bioconductor-package core-package

11.1 match 34 stars 16.85 score 8.6k scripts 1.2k dependents

bioc

ExperimentSubset:Manages subsets of data with Bioconductor Experiment objects

Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for one or more matrix-like assays along with the associated row and column data. Often only a subset of the original data is needed for down-stream analysis. For example, filtering out poor quality samples will require excluding some columns before analysis. The ExperimentSubset object is a container to efficiently manage different subsets of the same data without having to make separate objects for each new subset.

Maintained by Irzam Sarfraz. Last updated 5 months ago.

infrastructure software dataimport datarepresentation

46.4 match 4.00 score 8 scripts

tidyverse

dbplyr:A 'dplyr' Back End for Databases

A 'dplyr' back end for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features works with any database that has a 'DBI' back end; more advanced features require 'SQL' translation to be provided by the package author.

Maintained by Hadley Wickham. Last updated 3 months ago.

database

9.3 match 481 stars 19.72 score 5.2k scripts 736 dependents

bioc

CoreGx:Classes and Functions to Serve as the Basis for Other 'Gx' Packages

A collection of functions and classes which serve as the foundation for our lab's suite of R packages, such as 'PharmacoGx' and 'RadioGx'. This package was created to abstract shared functionality from other lab package releases to increase ease of maintainability and reduce code repetition in current and future 'Gx' suite programs. Major features include a 'CoreSet' class, from which 'RadioSet' and 'PharmacoSet' are derived, along with get and set methods for each respective slot. Additional functions related to fitting and plotting dose response curves, quantifying statistical correlation and calculating area under the curve (AUC) or survival fraction (SF) are included. For more details please see the included documentation, as well as: Smirnov, P., Safikhani, Z., El-Hachem, N., Wang, D., She, A., Olsen, C., Freeman, M., Selby, H., Gendoo, D., Grossman, P., Beck, A., Aerts, H., Lupien, M., Goldenberg, A. (2015) <doi:10.1093/bioinformatics/btv723>. Manem, V., Labie, M., Smirnov, P., Kofia, V., Freeman, M., Koritzinksy, M., Abazeed, M., Haibe-Kains, B., Bratman, S. (2018) <doi:10.1101/449793>.

Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.

software pharmacogenomics classification survival

27.7 match 6.53 score 63 scripts 6 dependents

doi-usgs

nhdplusTools:NHDPlus Tools

Tools for traversing and working with National Hydrography Dataset Plus (NHDPlus) data. All methods implemented in 'nhdplusTools' are available in the NHDPlus documentation available from the US Environmental Protection Agency <https://www.epa.gov/waterdata/basic-information>.

Maintained by David Blodgett. Last updated 27 days ago.

15.6 match 87 stars 11.38 score 348 scripts 5 dependents

bioc

S4Vectors:Foundation of vector-like and list-like containers in Bioconductor

The S4Vectors package defines the Vector and List virtual classes and a set of generic functions that extend the semantic of ordinary vectors and lists in R. Package developers can easily implement vector-like or list-like objects as concrete subclasses of Vector or List. In addition, a few low-level concrete subclasses of general interest (e.g. DataFrame, Rle, Factor, and Hits) are implemented in the S4Vectors package itself (many more are implemented in the IRanges package and in other Bioconductor infrastructure packages).

Maintained by Hervé Pagès. Last updated 1 months ago.

infrastructure datarepresentation bioconductor-package core-package

10.8 match 18 stars 16.05 score 1.0k scripts 1.9k dependents

mhahsler

arules:Mining Association Rules and Frequent Itemsets

Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides C implementations of the association mining algorithms Apriori and Eclat. Hahsler, Gruen and Hornik (2005) <doi:10.18637/jss.v014.i15>.

Maintained by Michael Hahsler. Last updated 1 months ago.

arules association-rules frequent-itemsets

12.0 match 194 stars 13.99 score 3.3k scripts 28 dependents

kbroman

qtl:Tools for Analyzing QTL Experiments

Analysis of experimental crosses to identify genes (called quantitative trait loci, QTLs) contributing to variation in quantitative traits. Broman et al. (2003) <doi:10.1093/bioinformatics/btg112>.

Maintained by Karl W Broman. Last updated 7 months ago.

openblas

12.8 match 80 stars 12.79 score 2.4k scripts 29 dependents

tlverse

sl3:Pipelines for Machine Learning and Super Learning

A modern implementation of the Super Learner prediction algorithm, coupled with a general purpose framework for composing arbitrary pipelines for machine learning tasks.

Maintained by Jeremy Coyle. Last updated 4 months ago.

data-science ensemble-learning ensemble-model machine-learning model-selection regression stacking statistics

15.3 match 100 stars 9.94 score 748 scripts 7 dependents

tidyverse

dtplyr:Data Table Back-End for 'dplyr'

Provides a data.table backend for 'dplyr'. The goal of 'dtplyr' is to allow you to write 'dplyr' code that is automatically translated to the equivalent, but usually much faster, data.table code.

Maintained by Hadley Wickham. Last updated 2 months ago.

datatable dplyr

9.3 match 671 stars 16.27 score 2.5k scripts 147 dependents

bioc

survcomp:Performance Assessment and Comparison for Survival Analysis

Assessment and Comparison for Performance of Risk Prediction (Survival) Models.

Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.

geneexpression differentialexpression visualization cpp

20.0 match 7.46 score 448 scripts 12 dependents

kasperwelbers

corpustools:Managing, Querying and Analyzing Tokenized Text

Provides text analysis in R, focusing on the use of a tokenized text format. In this format, the positions of tokens are maintained, and each token can be annotated (e.g., part-of-speech tags, dependency relations). Prominent features include advanced Lucene-like querying for specific tokens or contexts (e.g., documents, sentences), similarity statistics for words and documents, exporting to DTM for compatibility with many text analysis packages, and the possibility to reconstruct original text from tokens to facilitate interpretation.

Maintained by Kasper Welbers. Last updated 6 months ago.

cpp

19.7 match 31 stars 7.50 score 174 scripts 1 dependents

tidyverse

dplyr:A Grammar of Data Manipulation

A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

Maintained by Hadley Wickham. Last updated 15 days ago.

data-manipulation grammar cpp

5.9 match 4.8k stars 24.68 score 659k scripts 7.8k dependents

rqtl

qtl2:Quantitative Trait Locus Mapping in Experimental Crosses

Provides a set of tools to perform quantitative trait locus (QTL) analysis in experimental crosses. It is a reimplementation of the 'R/qtl' package to better handle high-dimensional data and complex cross designs. Broman et al. (2019) <doi:10.1534/genetics.118.301595>.

Maintained by Karl W Broman. Last updated 10 days ago.

cpp

15.3 match 34 stars 9.48 score 1.1k scripts 5 dependents

sebkrantz

collapse:Advanced and Fast Data Transformation

A C/C++ based package for advanced data transformation and statistical computing in R that is extremely fast, class-agnostic, robust and programmer friendly. Core functionality includes a rich set of S3 generic grouped and weighted statistical functions for vectors, matrices and data frames, which provide efficient low-level vectorizations, OpenMP multithreading, and skip missing values by default. These are integrated with fast grouping and ordering algorithms (also callable from C), and efficient data manipulation functions. The package also provides a flexible and rigorous approach to time series and panel data in R. It further includes fast functions for common statistical procedures, detailed (grouped, weighted) summary statistics, powerful tools to work with nested data, fast data object conversions, functions for memory efficient R programming, and helpers to effectively deal with variable labels, attributes, and missing data. It is well integrated with base R classes, 'dplyr'/'tibble', 'data.table', 'sf', 'units', 'plm' (panel-series and data frames), and 'xts'/'zoo'.

Maintained by Sebastian Krantz. Last updated 8 days ago.

data-aggregation data-analysis data-manipulation data-processing data-science data-transformation econometrics high-performance panel-data scientific-computing statistics time-series weighted weights cpp openmp

8.7 match 672 stars 16.63 score 708 scripts 97 dependents

r-spatial

spdep:Spatial Dependence: Weighting Schemes, Statistics

A collection of functions to create spatial weights matrix objects from polygon 'contiguities', from point patterns by distance and tessellations, for summarizing these objects, and for permitting their use in spatial data analysis, including regional aggregation by minimum spanning tree; a collection of tests for spatial 'autocorrelation', including global 'Morans I' and 'Gearys C' proposed by 'Cliff' and 'Ord' (1973, ISBN: 0850860369) and (1981, ISBN: 0850860814), 'Hubert/Mantel' general cross product statistic, Empirical Bayes estimates and 'Assunção/Reis' (1999) <doi:10.1002/(SICI)1097-0258(19990830)18:16%3C2147::AID-SIM179%3E3.0.CO;2-I> Index, 'Getis/Ord' G ('Getis' and 'Ord' 1992) <doi:10.1111/j.1538-4632.1992.tb00261.x> and multicoloured join count statistics, 'APLE' ('Li 'et al.' ) <doi:10.1111/j.1538-4632.2007.00708.x>, local 'Moran's I', 'Gearys C' ('Anselin' 1995) <doi:10.1111/j.1538-4632.1995.tb00338.x> and 'Getis/Ord' G ('Ord' and 'Getis' 1995) <doi:10.1111/j.1538-4632.1995.tb00912.x>, 'saddlepoint' approximations ('Tiefelsdorf' 2002) <doi:10.1111/j.1538-4632.2002.tb01084.x> and exact tests for global and local 'Moran's I' ('Bivand et al.' 2009) <doi:10.1016/j.csda.2008.07.021> and 'LOSH' local indicators of spatial heteroscedasticity ('Ord' and 'Getis') <doi:10.1007/s00168-011-0492-y>. The implementation of most of these measures is described in 'Bivand' and 'Wong' (2018) <doi:10.1007/s11749-018-0599-x>, with further extensions in 'Bivand' (2022) <doi:10.1111/gean.12319>. 'Lagrange' multiplier tests for spatial dependence in linear models are provided ('Anselin et al'. 1996) <doi:10.1016/0166-0462(95)02111-6>, as are 'Rao' score tests for hypothesised spatial 'Durbin' models based on linear models ('Koley' and 'Bera' 2023) <doi:10.1080/17421772.2023.2256810>. A local indicators for categorical data (LICD) implementation based on 'Carrer et al.' (2021) <doi:10.1016/j.jas.2020.105306> and 'Bivand et al.' (2017) <doi:10.1016/j.spasta.2017.03.003> was added in 1.3-7. From 'spdep' and 'spatialreg' versions >= 1.2-1, the model fitting functions previously present in this package are defunct in 'spdep' and may be found in 'spatialreg'.

Maintained by Roger Bivand. Last updated 20 days ago.

spatial-autocorrelation spatial-dependence spatial-weights

8.4 match 131 stars 16.62 score 6.0k scripts 107 dependents

bioc

singleCellTK:Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data

The Single Cell Toolkit (SCTK) in the singleCellTK package provides an interface to popular tools for importing, quality control, analysis, and visualization of single cell RNA-seq data. SCTK allows users to seamlessly integrate tools from various packages at different stages of the analysis workflow. A general "a la carte" workflow gives users the ability access to multiple methods for data importing, calculation of general QC metrics, doublet detection, ambient RNA estimation and removal, filtering, normalization, batch correction or integration, dimensionality reduction, 2-D embedding, clustering, marker detection, differential expression, cell type labeling, pathway analysis, and data exporting. Curated workflows can be used to run Seurat and Celda. Streamlined quality control can be performed on the command line using the SCTK-QC pipeline. Users can analyze their data using commands in the R console or by using an interactive Shiny Graphical User Interface (GUI). Specific analyses or entire workflows can be summarized and shared with comprehensive HTML reports generated by Rmarkdown. Additional documentation and vignettes can be found at camplab.net/sctk.

Maintained by Joshua David Campbell. Last updated 26 days ago.

singlecell geneexpression differentialexpression alignment clustering immunooncology batcheffect normalization qualitycontrol dataimport gui

13.5 match 181 stars 10.16 score 252 scripts

fcharte

mldr.datasets:R Ultimate Multilabel Dataset Repository

Large collection of multilabel datasets along with the functions needed to export them to several formats, to make partitions, and to obtain bibliographic information.

Maintained by David Charte. Last updated 6 years ago.

27.7 match 8 stars 4.68 score 120 scripts

welch-lab

rliger:Linked Inference of Genomic Experimental Relationships

Uses an extension of nonnegative matrix factorization to identify shared and dataset-specific factors. See Welch J, Kozareva V, et al (2019) <doi:10.1016/j.cell.2019.05.006>, and Liu J, Gao C, Sodicoff J, et al (2020) <doi:10.1038/s41596-020-0391-8> for more details.

Maintained by Yichen Wang. Last updated 2 months ago.

nonnegative-matrix-factorization single-cell openblas cpp

11.8 match 408 stars 10.77 score 334 scripts 1 dependents

r-forge

zipfR:Statistical Models for Word Frequency Distributions

Statistical models and utilities for the analysis of word frequency distributions. The utilities include functions for loading, manipulating and visualizing word frequency data and vocabulary growth curves. The package also implements several statistical models for the distribution of word frequencies in a population. (The name of this package derives from the most famous word frequency distribution, Zipf's law.)

Maintained by Stefan Evert. Last updated 4 years ago.

21.0 match 5.97 score 188 scripts 12 dependents

bioc

phyloseq:Handling and analysis of high-throughput microbiome census data

phyloseq provides a set of classes and tools to facilitate the import, storage, analysis, and graphical display of microbiome census data.

Maintained by Paul J. McMurdie. Last updated 5 months ago.

immunooncology sequencing microbiome metagenomics clustering classification multiplecomparison geneticvariability

9.0 match 597 stars 13.90 score 8.4k scripts 37 dependents

quanteda

quanteda:Quantitative Analysis of Textual Data

A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.

Maintained by Kenneth Benoit. Last updated 2 months ago.

corpus natural-language-processing quanteda text-analytics onetbb cpp

7.3 match 851 stars 16.68 score 5.4k scripts 51 dependents

molgenis

MolgenisArmadillo:Armadillo Client for the Armadillo Service

A set of functions to manage data shared on a 'MOLGENIS Armadillo' server.

Maintained by Mariska Slofstra. Last updated 18 days ago.

hacktoberfest

16.3 match 3 stars 7.51 score 28 scripts

pharmaverse

admiral:ADaM in R Asset Library

A toolbox for programming Clinical Data Interchange Standards Consortium (CDISC) compliant Analysis Data Model (ADaM) datasets in R. ADaM datasets are a mandatory part of any New Drug or Biologics License Application submitted to the United States Food and Drug Administration (FDA). Analysis derivations are implemented in accordance with the "Analysis Data Model Implementation Guide" (CDISC Analysis Data Model Team, 2021, <https://www.cdisc.org/standards/foundational/adam>).

Maintained by Ben Straub. Last updated 1 days ago.

cdisc clinical-trials open-source

8.7 match 238 stars 13.92 score 486 scripts 4 dependents

natverse

nat:NeuroAnatomy Toolbox for Analysis of 3D Image Data

NeuroAnatomy Toolbox (nat) enables analysis and visualisation of 3D biological image data, especially traced neurons. Reads and writes 3D images in NRRD and 'Amira' AmiraMesh formats and reads surfaces in 'Amira' hxsurf format. Traced neurons can be imported from and written to SWC and 'Amira' LineSet and SkeletonGraph formats. These data can then be visualised in 3D via 'rgl', manipulated including applying calculated registrations, e.g. using the 'CMTK' registration suite, and analysed. There is also a simple representation for neurons that have been subjected to 3D skeletonisation but not formally traced; this allows morphological comparison between neurons including searches and clustering (via the 'nat.nblast' extension package).

Maintained by Gregory Jefferis. Last updated 5 months ago.

3d connectomics image-analysis neuroanatomy neuroanatomy-toolbox neuron neuron-morphology neuroscience visualisation

12.1 match 67 stars 9.94 score 436 scripts 2 dependents

bioc

MOFA2:Multi-Omics Factor Analysis v2

The MOFA2 package contains a collection of tools for training and analysing multi-omic factor analysis (MOFA). MOFA is a probabilistic factor model that aims to identify principal axes of variation from data sets that can comprise multiple omic layers and/or groups of samples. Additional time or space information on the samples can be incorporated using the MEFISTO framework, which is part of MOFA2. Downstream analysis functions to inspect molecular features underlying each factor, vizualisation, imputation etc are available.

Maintained by Ricard Argelaguet. Last updated 5 months ago.

dimensionreduction bayesian visualization factor-analysis mofa multi-omics

11.3 match 319 stars 10.02 score 502 scripts

crunch-io

crunch:Crunch.io Data Tools

The Crunch.io service <https://crunch.io/> provides a cloud-based data store and analytic engine, as well as an intuitive web interface. Using this package, analysts can interact with and manipulate Crunch datasets from within R. Importantly, this allows technical researchers to collaborate naturally with team members, managers, and clients who prefer a point-and-click interface.

Maintained by Greg Freedman Ellis. Last updated 13 days ago.

10.6 match 9 stars 10.53 score 200 scripts 2 dependents

pharmaverse

tidytlg:Create TLGs using the 'tidyverse'

Generate tables, listings, and graphs (TLG) using 'tidyverse.' Tables can be created functionally, using a standard TLG process, or by specifying table and column metadata to create generic analysis summaries. The 'envsetup' package can also be leveraged to create environments for table creation.

Maintained by Konrad Pagacz. Last updated 9 months ago.

13.6 match 33 stars 8.07 score 22 scripts

darwin-eu

CDMConnector:Connect to an OMOP Common Data Model

Provides tools for working with observational health data in the Observational Medical Outcomes Partnership (OMOP) Common Data Model format with a pipe friendly syntax. Common data model database table references are stored in a single compound object along with metadata.

Maintained by Adam Black. Last updated 20 days ago.

9.6 match 12 stars 11.39 score 502 scripts 12 dependents

lcef97

SchoolDataIT:Retrieve, Harmonise and Map Open Data Regarding the Italian School System

Compiles and displays the available data sets regarding the Italian school system, with a focus on the infrastructural aspects. Input datasets are downloaded from the web, with the aim of updating everything to real time. The functions are divided in four main modules, namely 'Get', to scrape raw data from the web 'Util', various utilities needed to process raw data 'Group', to aggregate data at the municipality or province level 'Map', to visualize the output datasets.

Maintained by Leonardo Cefalo. Last updated 2 months ago.

27.9 match 3.88 score

spatstat

spatstat.linnet:Linear Networks Functionality of the 'spatstat' Family

Defines types of spatial data on a linear network and provides functionality for geometrical operations, data analysis and modelling of data on a linear network, in the 'spatstat' family of packages. Contains definitions and support for linear networks, including creation of networks, geometrical measurements, topological connectivity, geometrical operations such as inserting and deleting vertices, intersecting a network with another object, and interactive editing of networks. Data types defined on a network include point patterns, pixel images, functions, and tessellations. Exploratory methods include kernel estimation of intensity on a network, K-functions and pair correlation functions on a network, simulation envelopes, nearest neighbour distance and empty space distance, relative risk estimation with cross-validated bandwidth selection. Formal hypothesis tests of random pattern (chi-squared, Kolmogorov-Smirnov, Monte Carlo, Diggle-Cressie-Loosmore-Ford, Dao-Genton, two-stage Monte Carlo) and tests for covariate effects (Cox-Berman-Waller-Lawson, Kolmogorov-Smirnov, ANOVA) are also supported. Parametric models can be fitted to point pattern data using the function lppm() similar to glm(). Only Poisson models are implemented so far. Models may involve dependence on covariates and dependence on marks. Models are fitted by maximum likelihood. Fitted point process models can be simulated, automatically. Formal hypothesis tests of a fitted model are supported (likelihood ratio test, analysis of deviance, Monte Carlo tests) along with basic tools for model selection (stepwise(), AIC()) and variable selection (sdr). Tools for validating the fitted model include simulation envelopes, residuals, residual plots and Q-Q plots, leverage and influence diagnostics, partial residuals, and added variable plots. Random point patterns on a network can be generated using a variety of models.

Maintained by Adrian Baddeley. Last updated 2 months ago.

density-estimation heat-equation kernel-density-estimation network-analysis point-processes spatial-data-analysis statistical-analysis statistical-inference statistical-models

11.2 match 6 stars 9.64 score 35 scripts 43 dependents

statistikat

VIM:Visualization and Imputation of Missing Values

New tools for the visualization of missing and/or imputed values are introduced, which can be used for exploring the data and the structure of the missing and/or imputed values. Depending on this structure of the missing values, the corresponding methods may help to identify the mechanism generating the missing values and allows to explore the data including missing values. In addition, the quality of imputation can be visually explored using various univariate, bivariate, multiple and multivariate plot methods. A graphical user interface available in the separate package VIMGUI allows an easy handling of the implemented plot methods.

Maintained by Matthias Templ. Last updated 7 months ago.

hotdeck imputation-methods model-predictions visualization cpp

7.4 match 85 stars 14.44 score 2.6k scripts 19 dependents

tslumley

leaps:Regression Subset Selection

Regression subset selection, including exhaustive search.

Maintained by Thomas Lumley. Last updated 9 months ago.

fortran

9.9 match 8 stars 10.29 score 4.5k scripts 171 dependents

bioc

clusterExperiment:Compare Clusterings for Single-Cell Sequencing

Provides functionality for running and comparing many different clusterings of single-cell sequencing data or other large mRNA Expression data sets.

Maintained by Elizabeth Purdom. Last updated 5 months ago.

clustering rnaseq sequencing software singlecell cpp

10.5 match 39 stars 9.63 score 192 scripts 1 dependents

r-forge

tramvs:Optimal Subset Selection for Transformation Models

Greedy optimal subset selection for transformation models (Hothorn et al., 2018, <doi:10.1111/sjos.12291> ) based on the abess algorithm (Zhu et al., 2020, <doi:10.1073/pnas.2014241117> ). Applicable to models from packages 'tram' and 'cotram'. Application to shift-scale transformation models are described in Siegfried et al. (2024, <doi:10.1080/00031305.2023.2203177>).

Maintained by Lucas Kook. Last updated 6 days ago.

24.3 match 4.12 score 5 scripts

grunwaldlab

metacoder:Tools for Parsing, Manipulating, and Graphing Taxonomic Abundance Data

Reads, plots, and manipulates large taxonomic data sets, like those generated from modern high-throughput sequencing, such as metabarcoding (i.e. amplification metagenomics, 16S metagenomics, etc). It provides a tree-based visualization called "heat trees" used to depict statistics for every taxon in a taxonomy using color and size. It also provides various functions to do common tasks in microbiome bioinformatics on data in the 'taxmap' format defined by the 'taxa' package. The 'metacoder' package is described in the publication by Foster et al. (2017) <doi:10.1371/journal.pcbi.1005404>.

Maintained by Zachary Foster. Last updated 1 months ago.

community-diversity hierarchical metabarcoding pcr taxonomy trees cpp

10.1 match 140 stars 9.64 score 328 scripts

geomorphr

geomorph:Geometric Morphometric Analyses of 2D and 3D Landmark Data

Read, manipulate, and digitize landmark data, generate shape variables via Procrustes analysis for points, curves and surfaces, perform shape analyses, and provide graphical depictions of shapes and patterns of shape variation.

Maintained by Dean Adams. Last updated 1 months ago.

8.0 match 76 stars 12.05 score 700 scripts 6 dependents

bioc

tidyFlowCore:tidyFlowCore: Bringing flowCore to the tidyverse

tidyFlowCore bridges the gap between flow cytometry analysis using the flowCore Bioconductor package and the tidy data principles advocated by the tidyverse. It provides a suite of dplyr-, ggplot2-, and tidyr-like verbs specifically designed for working with flowFrame and flowSet objects as if they were tibbles; however, your data remain flowCore data structures under this layer of abstraction. tidyFlowCore enables intuitive and streamlined analysis workflows that can leverage both the Bioconductor and tidyverse ecosystems for cytometry data.

Maintained by Timothy Keyes. Last updated 5 months ago.

singlecell flowcytometry infrastructure

22.2 match 1 stars 4.30 score 7 scripts

jeffreyhanson

raptr:Representative and Adequate Prioritization Toolkit in R

Biodiversity is in crisis. The overarching aim of conservation is to preserve biodiversity patterns and processes. To this end, protected areas are established to buffer species and preserve biodiversity processes. But resources are limited and so protected areas must be cost-effective. This package contains tools to generate plans for protected areas (prioritizations), using spatially explicit targets for biodiversity patterns and processes. To obtain solutions in a feasible amount of time, this package uses the commercial 'Gurobi' software (obtained from <https://www.gurobi.com/>). For more information on using this package, see Hanson et al. (2018) <doi:10.1111/2041-210X.12862>.

Maintained by Jeffrey O Hanson. Last updated 1 years ago.

cpp

17.3 match 8 stars 5.52 score 83 scripts

reconhub

epicontacts:Handling, Visualisation and Analysis of Epidemiological Contacts

A collection of tools for representing epidemiological contact data, composed of case line lists and contacts between cases. Also contains procedures for data handling, interactive graphics, and statistics.

Maintained by Finlay Campbell. Last updated 2 months ago.

outbreak

10.7 match 15 stars 8.86 score 112 scripts 2 dependents

adamlilith

fasterRaster:Faster Raster and Spatial Vector Processing Using 'GRASS GIS'

Processing of large-in-memory/large-on disk rasters and spatial vectors using 'GRASS GIS' <https://grass.osgeo.org/>. Most functions in the 'terra' package are recreated. Processing of medium-sized and smaller spatial objects will nearly always be faster using 'terra' or 'sf', but for large-in-memory/large-on-disk objects, 'fasterRaster' may be faster. To use most of the functions, you must have the stand-alone version (not the 'OSGeoW4' installer version) of 'GRASS GIS' 8.0 or higher.

Maintained by Adam B. Smith. Last updated 21 days ago.

aspect distance fragmentation fragmentation-indices gis grass grass-gis raster raster-projection rasterize slope topography vectorization

12.3 match 58 stars 7.69 score 8 scripts

mschubert

narray:Subset- And Name-Aware Array Utility Functions

Stacking arrays according to dimension names, subset-aware splitting and mapping of functions, intersecting along arbitrary dimensions, converting to and from data.frames, and many other helper functions.

Maintained by Michael Schubert. Last updated 2 months ago.

array utility cpp

13.6 match 27 stars 6.91 score 10 scripts 10 dependents

bioc

systemPipeR:systemPipeR: Workflow Environment for Data Analysis and Report Generation

systemPipeR is a multipurpose data analysis workflow environment that unifies R with command-line tools. It enables scientists to analyze many types of large- or small-scale data on local or distributed computer systems with a high level of reproducibility, scalability and portability. At its core is a command-line interface (CLI) that adopts the Common Workflow Language (CWL). This design allows users to choose for each analysis step the optimal R or command-line software. It supports both end-to-end and partial execution of workflows with built-in restart functionalities. Efficient management of complex analysis tasks is accomplished by a flexible workflow control container class. Handling of large numbers of input samples and experimental designs is facilitated by consistent sample annotation mechanisms. As a multi-purpose workflow toolkit, systemPipeR enables users to run existing workflows, customize them or design entirely new ones while taking advantage of widely adopted data structures within the Bioconductor ecosystem. Another important core functionality is the generation of reproducible scientific analysis and technical reports. For result interpretation, systemPipeR offers a wide range of plotting functionality, while an associated Shiny App offers many useful functionalities for interactive result exploration. The vignettes linked from this page include (1) a general introduction, (2) a description of technical details, and (3) a collection of workflow templates.

Maintained by Thomas Girke. Last updated 5 months ago.

genetics infrastructure dataimport sequencing rnaseq riboseq chipseq methylseq snp geneexpression coverage genesetenrichment alignment qualitycontrol immunooncology reportwriting workflowstep workflowmanagement

8.0 match 53 stars 11.56 score 344 scripts 3 dependents

satijalab

Seurat:Tools for Single Cell Genomics

A toolkit for quality control, analysis, and exploration of single cell RNA sequencing data. 'Seurat' aims to enable users to identify and interpret sources of heterogeneity from single cell transcriptomic measurements, and to integrate diverse types of single cell data. See Satija R, Farrell J, Gennert D, et al (2015) <doi:10.1038/nbt.3192>, Macosko E, Basu A, Satija R, et al (2015) <doi:10.1016/j.cell.2015.05.002>, Stuart T, Butler A, et al (2019) <doi:10.1016/j.cell.2019.05.031>, and Hao, Hao, et al (2020) <doi:10.1101/2020.10.12.335331> for more details.

Maintained by Paul Hoffman. Last updated 1 years ago.

human-cell-atlas single-cell-genomics single-cell-rna-seq cpp

5.5 match 2.4k stars 16.86 score 50k scripts 73 dependents

fishr-core-team

FSA:Simple Fisheries Stock Assessment Methods

A variety of simple fish stock assessment methods.

Maintained by Derek H. Ogle. Last updated 2 months ago.

fish fisheries fisheries-management fisheries-stock-assessment population-dynamics stock-assessment

8.3 match 68 stars 11.08 score 1.7k scripts 6 dependents

stan-dev

posterior:Tools for Working with Posterior Distributions

Provides useful tools for both users and developers of packages for fitting Bayesian models or working with output from Bayesian models. The primary goals of the package are to: (a) Efficiently convert between many different useful formats of draws (samples) from posterior or prior distributions. (b) Provide consistent methods for operations commonly performed on draws, for example, subsetting, binding, or mutating draws. (c) Provide various summaries of draws in convenient formats. (d) Provide lightweight implementations of state of the art posterior inference diagnostics. References: Vehtari et al. (2021) <doi:10.1214/20-BA1221>.

Maintained by Paul-Christian Bürkner. Last updated 12 days ago.

bayes bayesian mcmc

5.7 match 168 stars 16.13 score 3.3k scripts 342 dependents

usepa

tcpl:ToxCast Data Analysis Pipeline

The ToxCast Data Analysis Pipeline ('tcpl') is an R package that manages, curve-fits, plots, and stores ToxCast data to populate its linked MySQL database, 'invitrodb'. The package was developed for the chemical screening data curated by the US EPA's Toxicity Forecaster (ToxCast) program, but 'tcpl' can be used to support diverse chemical screening efforts.

Maintained by Jason Brown. Last updated 5 days ago.

ccte comptox ord

9.7 match 36 stars 9.41 score 90 scripts

bioc

ChemmineR:Cheminformatics Toolkit for R

ChemmineR is a cheminformatics package for analyzing drug-like small molecule data in R. Its latest version contains functions for efficient processing of large numbers of molecules, physicochemical/structural property predictions, structural similarity searching, classification and clustering of compound libraries with a wide spectrum of algorithms. In addition, it offers visualization functions for compound clustering results and chemical structures.

Maintained by Thomas Girke. Last updated 5 months ago.

cheminformatics biomedicalinformatics pharmacogenetics pharmacogenomics microtitreplateassay cellbasedassays visualization infrastructure dataimport clustering proteomics metabolomics cpp

9.6 match 15 stars 9.45 score 253 scripts 12 dependents

amices

mice:Multivariate Imputation by Chained Equations

Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.

Maintained by Stef van Buuren. Last updated 8 days ago.

chained-equations fcs imputation mice missing-data missing-values multiple-imputation multivariate-data cpp

5.5 match 462 stars 16.50 score 10k scripts 154 dependents

bioc

BASiCS:Bayesian Analysis of Single-Cell Sequencing data

Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model to perform statistical analyses of single-cell RNA sequencing datasets in the context of supervised experiments (where the groups of cells of interest are known a priori, e.g. experimental conditions or cell types). BASiCS performs built-in data normalisation (global scaling) and technical noise quantification (based on spike-in genes). BASiCS provides an intuitive detection criterion for highly (or lowly) variable genes within a single group of cells. Additionally, BASiCS can compare gene expression patterns between two or more pre-specified groups of cells. Unlike traditional differential expression tools, BASiCS quantifies changes in expression that lie beyond comparisons of means, also allowing the study of changes in cell-to-cell heterogeneity. The latter can be quantified via a biological over-dispersion parameter that measures the excess of variability that is observed with respect to Poisson sampling noise, after normalisation and technical noise removal. Due to the strong mean/over-dispersion confounding that is typically observed for scRNA-seq datasets, BASiCS also tests for changes in residual over-dispersion, defined by residual values with respect to a global mean/over-dispersion trend.

Maintained by Catalina Vallejos. Last updated 5 months ago.

immunooncology normalization sequencing rnaseq software geneexpression transcriptomics singlecell differentialexpression bayesian cellbiology bioconductor-package gene-expression rcpp rcpparmadillo scrna-seq single-cell openblas cpp openmp

8.7 match 83 stars 10.26 score 368 scripts 1 dependents

asardaes

dtwclust:Time Series Clustering Along with Optimizations for the Dynamic Time Warping Distance

Time series clustering along with optimized techniques related to the Dynamic Time Warping distance and its corresponding lower bounds. Implementations of partitional, hierarchical, fuzzy, k-Shape and TADPole clustering are available. Functionality can be easily extended with custom distance measures and centroid definitions. Implementations of DTW barycenter averaging, a distance based on global alignment kernels, and the soft-DTW distance and centroid routines are also provided. All included distance functions have custom loops optimized for the calculation of cross-distance matrices, including parallelization support. Several cluster validity indices are included.

Maintained by Alexis Sarda. Last updated 8 months ago.

clustering dtw time-series openblas cpp

7.2 match 261 stars 12.39 score 406 scripts 14 dependents

briencj

growthPheno:Functional Analysis of Phenotypic Growth Data to Smooth and Extract Traits

Assists in the plotting and functional smoothing of traits measured over time and the extraction of features from these traits, implementing the SET (Smoothing and Extraction of Traits) method described in Brien et al. (2020) Plant Methods, 16. Smoothing of growth trends for individual plants using natural cubic smoothing splines or P-splines is available for removing transient effects and segmented smoothing is available to deal with discontinuities in growth trends. There are graphical tools for assessing the adequacy of trait smoothing, both when using this and other packages, such as those that fit nonlinear growth models. A range of per-unit (plant, pot, plot) growth traits or features can be extracted from the data, including single time points, interval growth rates and other growth statistics, such as maximum growth or days to maximum growth. The package also has tools adapted to inputting data from high-throughput phenotyping facilities, such from a Lemna-Tec Scananalyzer 3D (see <https://www.youtube.com/watch?v=MRAF_mAEa7E/> for more information). The package 'growthPheno' can also be installed from <http://chris.brien.name/rpackages/>.

Maintained by Chris Brien. Last updated 2 days ago.

13.2 match 6 stars 6.66 score 42 scripts

bioc

Cardinal:A mass spectrometry imaging toolbox for statistical analysis

Implements statistical & computational tools for analyzing mass spectrometry imaging datasets, including methods for efficient pre-processing, spatial segmentation, and classification.

Maintained by Kylie Ariel Bemis. Last updated 3 months ago.

software infrastructure proteomics lipidomics massspectrometry imagingmassspectrometry immunooncology normalization clustering classification regression

8.5 match 47 stars 10.34 score 200 scripts

bioc

iNETgrate:Integrates DNA methylation data with gene expression in a single gene network

The iNETgrate package provides functions to build a correlation network in which nodes are genes. DNA methylation and gene expression data are integrated to define the connections between genes. This network is used to identify modules (clusters) of genes. The biological information in each of the resulting modules is represented by an eigengene. These biological signatures can be used as features e.g., for classification of patients into risk categories. The resulting biological signatures are very robust and give a holistic view of the underlying molecular changes.

Maintained by Habil Zare. Last updated 5 months ago.

geneexpression rnaseq dnamethylation networkinference network graphandnetwork biomedicalinformatics systemsbiology transcriptomics classification clustering dimensionreduction principalcomponent mrnamicroarray normalization geneprediction kegg survival core-services

14.2 match 74 stars 6.21 score 1 scripts

alexanderrobitzsch

CDM:Cognitive Diagnosis Modeling

Functions for cognitive diagnosis modeling and multidimensional item response modeling for dichotomous and polytomous item responses. This package enables the estimation of the DINA and DINO model (Junker & Sijtsma, 2001, <doi:10.1177/01466210122032064>), the multiple group (polytomous) GDINA model (de la Torre, 2011, <doi:10.1007/s11336-011-9207-7>), the multiple choice DINA model (de la Torre, 2009, <doi:10.1177/0146621608320523>), the general diagnostic model (GDM; von Davier, 2008, <doi:10.1348/000711007X193957>), the structured latent class model (SLCA; Formann, 1992, <doi:10.1080/01621459.1992.10475229>) and regularized latent class analysis (Chen, Li, Liu, & Ying, 2017, <doi:10.1007/s11336-016-9545-6>). See George, Robitzsch, Kiefer, Gross, and Uenlue (2017) <doi:10.18637/jss.v074.i02> or Robitzsch and George (2019, <doi:10.1007/978-3-030-05584-4_26>) for further details on estimation and the package structure. For tutorials on how to use the CDM package see George and Robitzsch (2015, <doi:10.20982/tqmp.11.3.p189>) as well as Ravand and Robitzsch (2015).

Maintained by Alexander Robitzsch. Last updated 9 months ago.

cognitive-diagnostic-models item-response-theory cpp

10.0 match 22 stars 8.76 score 138 scripts 28 dependents

renkun-ken

rlist:A Toolbox for Non-Tabular Data Manipulation

Provides a set of functions for data manipulation with list objects, including mapping, filtering, grouping, sorting, updating, searching, and other useful functions. Most functions are designed to be pipeline friendly so that data processing with lists can be chained.

Maintained by Kun Ren. Last updated 2 years ago.

6.4 match 206 stars 13.73 score 2.2k scripts 123 dependents

insightsengineering

rtables:Reporting Tables

Reporting tables often have structure that goes beyond simple rectangular data. The 'rtables' package provides a framework for declaring complex multi-level tabulations and then applying them to data. This framework models both tabulation and the resulting tables as hierarchical, tree-like objects which support sibling sub-tables, arbitrary splitting or grouping of data in row and column dimensions, cells containing multiple values, and the concept of contextual summary computations. A convenient pipe-able interface is provided for declaring table layouts and the corresponding computations, and then applying them to data.

Maintained by Joe Zhu. Last updated 2 months ago.

pharmaceuticals tables

6.3 match 232 stars 13.65 score 238 scripts 17 dependents

bioc

flowCore:flowCore: Basic structures for flow cytometry data

Provides S4 data structures and basic functions to deal with flow cytometry data.

Maintained by Mike Jiang. Last updated 5 months ago.

immunooncology infrastructure flowcytometry cellbasedassays cpp

8.3 match 10.34 score 1.7k scripts 59 dependents

tomasfryda

h2o:R Interface for the 'H2O' Scalable Machine Learning Platform

R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).

Maintained by Tomas Fryda. Last updated 1 years ago.

10.3 match 3 stars 8.20 score 7.8k scripts 11 dependents

bioc

GenomicSuperSignature:Interpretation of RNA-seq experiments through robust, efficient comparison to public databases

This package provides a novel method for interpreting new transcriptomic datasets through near-instantaneous comparison to public archives without high-performance computing requirements. Through the pre-computed index, users can identify public resources associated with their dataset such as gene sets, MeSH term, and publication. Functions to identify interpretable annotations and intuitive visualization options are implemented in this package.

Maintained by Sehyun Oh. Last updated 5 months ago.

transcriptomics systemsbiology principalcomponent rnaseq sequencing pathways clustering bioconductor-package exploratory-data-analysis gsea mesh principal-component-analysis rna-sequencing-profiles transferlearning

12.0 match 16 stars 6.97 score 59 scripts

ouhscbbmc

REDCapR:Interaction Between R and REDCap

Encapsulates functions to streamline calls from R to the REDCap API. REDCap (Research Electronic Data CAPture) is a web application for building and managing online surveys and databases developed at Vanderbilt University. The Application Programming Interface (API) offers an avenue to access and modify data programmatically, improving the capacity for literate and reproducible programming.

Maintained by Will Beasley. Last updated 2 months ago.

redcap redcap-api

6.7 match 118 stars 12.36 score 438 scripts 6 dependents

bioc

pathwayPCA:Integrative Pathway Analysis with Modern PCA Methodology and Gene Selection

pathwayPCA is an integrative analysis tool that implements the principal component analysis (PCA) based pathway analysis approaches described in Chen et al. (2008), Chen et al. (2010), and Chen (2011). pathwayPCA allows users to: (1) Test pathway association with binary, continuous, or survival phenotypes. (2) Extract relevant genes in the pathways using the SuperPCA and AES-PCA approaches. (3) Compute principal components (PCs) based on the selected genes. These estimated latent variables represent pathway activities for individual subjects, which can then be used to perform integrative pathway analysis, such as multi-omics analysis. (4) Extract relevant genes that drive pathway significance as well as data corresponding to these relevant genes for additional in-depth analysis. (5) Perform analyses with enhanced computational efficiency with parallel computing and enhanced data safety with S4-class data objects. (6) Analyze studies with complex experimental designs, with multiple covariates, and with interaction effects, e.g., testing whether pathway association with clinical phenotype is different between male and female subjects. Citations: Chen et al. (2008) <https://doi.org/10.1093/bioinformatics/btn458>; Chen et al. (2010) <https://doi.org/10.1002/gepi.20532>; and Chen (2011) <https://doi.org/10.2202/1544-6115.1697>.

Maintained by Gabriel Odom. Last updated 5 months ago.

copynumbervariation dnamethylation geneexpression snp transcription geneprediction genesetenrichment genesignaling genetarget genomewideassociation genomicvariation cellbiology epigenetics functionalgenomics genetics lipidomics metabolomics proteomics systemsbiology transcriptomics classification dimensionreduction featureextraction principalcomponent regression survival multiplecomparison pathways

10.6 match 11 stars 7.74 score 42 scripts

rvlenth

emmeans:Estimated Marginal Means, aka Least-Squares Means

Obtain estimated marginal means (EMMs) for many linear, generalized linear, and mixed models. Compute contrasts or linear functions of EMMs, trends, and comparisons of slopes. Plots and other displays. Least-squares means are discussed, and the term "estimated marginal means" is suggested, in Searle, Speed, and Milliken (1980) Population marginal means in the linear model: An alternative to least squares means, The American Statistician 34(4), 216-221 <doi:10.1080/00031305.1980.10483031>.

Maintained by Russell V. Lenth. Last updated 5 days ago.

4.3 match 377 stars 19.19 score 13k scripts 187 dependents

fmichonneau

phylobase:Base Package for Phylogenetic Structures and Comparative Data

Provides a base S4 class for comparative methods, incorporating one or more trees and trait data.

Maintained by Francois Michonneau. Last updated 1 years ago.

phylogenetics cpp

7.2 match 18 stars 11.14 score 394 scripts 18 dependents

bioc

dominoSignal:Cell Communication Analysis for Single Cell RNA Sequencing

dominoSignal is a package developed to analyze cell signaling through ligand - receptor - transcription factor networks in scRNAseq data. It takes as input information transcriptomic data, requiring counts, z-scored counts, and cluster labels, as well as information on transcription factor activation (such as from SCENIC) and a database of ligand and receptor pairings (such as from CellPhoneDB). This package creates an object storing ligand - receptor - transcription factor linkages by cluster and provides several methods for exploring, summarizing, and visualizing the analysis.

Maintained by Jacob T Mitchell. Last updated 5 months ago.

systemsbiology singlecell transcriptomics network

12.3 match 5 stars 6.50 score 5 scripts

bioc

CatsCradle:This package provides methods for analysing spatial transcriptomics data and for discovering gene clusters

This package addresses two broad areas. It allows for in-depth analysis of spatial transcriptomic data by identifying tissue neighbourhoods. These are contiguous regions of tissue surrounding individual cells. 'CatsCradle' allows for the categorisation of neighbourhoods by the cell types contained in them and the genes expressed in them. In particular, it produces Seurat objects whose individual elements are neighbourhoods rather than cells. In addition, it enables the categorisation and annotation of genes by producing Seurat objects whose elements are genes.

Maintained by Michael Shapiro. Last updated 1 days ago.

biologicalquestion statisticalmethod geneexpression singlecell transcriptomics spatial

12.1 match 3 stars 6.52 score

bioc

SingleCellExperiment:S4 Classes for Single Cell Data

Defines a S4 class for storing data from single-cell experiments. This includes specialized methods to store and retrieve spike-in information, dimensionality reduction coordinates and size factors for each cell, along with the usual metadata for genes and libraries.

Maintained by Davide Risso. Last updated 11 days ago.

immunooncology datarepresentation dataimport infrastructure singlecell

5.8 match 13.53 score 15k scripts 285 dependents

murrayefford

secr:Spatially Explicit Capture-Recapture

Functions to estimate the density and size of a spatially distributed animal population sampled with an array of passive detectors, such as traps, or by searching polygons or transects. Models incorporating distance-dependent detection are fitted by maximizing the likelihood. Tools are included for data manipulation and model selection.

Maintained by Murray Efford. Last updated 21 hours ago.

cpp

7.7 match 3 stars 10.13 score 410 scripts 5 dependents

alarm-redist

redist:Simulation Methods for Legislative Redistricting

Enables researchers to sample redistricting plans from a pre-specified target distribution using Sequential Monte Carlo and Markov Chain Monte Carlo algorithms. The package allows for the implementation of various constraints in the redistricting process such as geographic compactness and population parity requirements. Tools for analysis such as computation of various summary statistics and plotting functionality are also included. The package implements the SMC algorithm of McCartan and Imai (2023) <doi:10.1214/23-AOAS1763>, the enumeration algorithm of Fifield, Imai, Kawahara, and Kenny (2020) <doi:10.1080/2330443X.2020.1791773>, the Flip MCMC algorithm of Fifield, Higgins, Imai and Tarr (2020) <doi:10.1080/10618600.2020.1739532>, the Merge-split/Recombination algorithms of Carter et al. (2019) <arXiv:1911.01503> and DeFord et al. (2021) <doi:10.1162/99608f92.eb30390f>, and the Short-burst optimization algorithm of Cannon et al. (2020) <arXiv:2011.02288>.

Maintained by Christopher T. Kenny. Last updated 2 months ago.

geospatial gerrymandering redistricting sampling openblas cpp openmp

8.4 match 68 stars 9.17 score 259 scripts

joshuaulrich

xts:eXtensible Time Series

Provide for uniform handling of R's different time-based data classes by extending zoo, maximizing native format information preservation and allowing for user level customization and extension, while simplifying cross-class interoperability.

Maintained by Joshua M. Ulrich. Last updated 4 months ago.

c time-series

4.1 match 221 stars 18.38 score 12k scripts 654 dependents

rje42

rje:Miscellaneous Useful Functions for Statistics

A series of functions in some way considered useful to the author. These include methods for subsetting tables and generating indices for arrays, conditioning and intervening in probability distributions, generating combinations, fast transformations, and more...

Maintained by Robin Evans. Last updated 12 months ago.

11.6 match 6.50 score 173 scripts 10 dependents

tanaylab

naryn:Native Access Medical Record Retriever for High Yield Analytics

A toolkit for medical records data analysis. The 'naryn' package implements an efficient data structure for storing medical records, and provides a set of functions for data extraction, manipulation and analysis.

Maintained by Aviezer Lifshitz. Last updated 1 days ago.

data-analysis medical-records cpp

13.9 match 3 stars 5.38 score 4 scripts

talgalili

dendextend:Extending 'dendrogram' Functionality in R

Offers a set of functions for extending 'dendrogram' objects in R, letting you visualize and compare trees of 'hierarchical clusterings'. You can (1) Adjust a tree's graphical parameters - the color, size, type, etc of its branches, nodes and labels. (2) Visually and statistically compare different 'dendrograms' to one another.

Maintained by Tal Galili. Last updated 2 months ago.

4.3 match 154 stars 17.02 score 6.0k scripts 164 dependents

briencj

asremlPlus:Augments 'ASReml-R' in Fitting Mixed Models and Packages Generally in Exploring Prediction Differences

Assists in automating the selection of terms to include in mixed models when 'asreml' is used to fit the models. Procedures are available for choosing models that conform to the hierarchy or marginality principle, for fitting and choosing between two-dimensional spatial models using correlation, natural cubic smoothing spline and P-spline models. A history of the fitting of a sequence of models is kept in a data frame. Also used to compute functions and contrasts of, to investigate differences between and to plot predictions obtained using any model fitting function. The content falls into the following natural groupings: (i) Data, (ii) Model modification functions, (iii) Model selection and description functions, (iv) Model diagnostics and simulation functions, (v) Prediction production and presentation functions, (vi) Response transformation functions, (vii) Object manipulation functions, and (viii) Miscellaneous functions (for further details see 'asremlPlus-package' in help). The 'asreml' package provides a computationally efficient algorithm for fitting a wide range of linear mixed models using Residual Maximum Likelihood. It is a commercial package and a license for it can be purchased from 'VSNi' <https://vsni.co.uk/> as 'asreml-R', who will supply a zip file for local installation/updating (see <https://asreml.kb.vsni.co.uk/>). It is not needed for functions that are methods for 'alldiffs' and 'data.frame' objects. The package 'asremPlus' can also be installed from <http://chris.brien.name/rpackages/>.

Maintained by Chris Brien. Last updated 30 days ago.

asreml mixed-models

7.9 match 19 stars 9.34 score 200 scripts

bioc

cola:A Framework for Consensus Partitioning

Subgroup classification is a basic task in genomic data analysis, especially for gene expression and DNA methylation data analysis. It can also be used to test the agreement to known clinical annotations, or to test whether there exist significant batch effects. The cola package provides a general framework for subgroup classification by consensus partitioning. It has the following features: 1. It modularizes the consensus partitioning processes that various methods can be easily integrated. 2. It provides rich visualizations for interpreting the results. 3. It allows running multiple methods at the same time and provides functionalities to straightforward compare results. 4. It provides a new method to extract features which are more efficient to separate subgroups. 5. It automatically generates detailed reports for the complete analysis. 6. It allows applying consensus partitioning in a hierarchical manner.

Maintained by Zuguang Gu. Last updated 1 months ago.

clustering geneexpression classification software consensus-clustering cpp

9.9 match 61 stars 7.49 score 112 scripts

r-spatial

stars:Spatiotemporal Arrays, Raster and Vector Data Cubes

Reading, manipulating, writing and plotting spatiotemporal arrays (raster and vector data cubes) in 'R', using 'GDAL' bindings provided by 'sf', and 'NetCDF' bindings by 'ncmeta' and 'RNetCDF'.

Maintained by Edzer Pebesma. Last updated 1 months ago.

raster satellite-images spatial

4.0 match 571 stars 18.27 score 7.2k scripts 137 dependents

tagteam

prodlim:Product-Limit Estimation for Censored Event History Analysis

Fast and user friendly implementation of nonparametric estimators for censored event history (survival) analysis. Kaplan-Meier and Aalen-Johansen method.

Maintained by Thomas A. Gerds. Last updated 14 days ago.

6.0 match 7 stars 12.18 score 1000 scripts 462 dependents

plangfelder

WGCNA:Weighted Correlation Network Analysis

Functions necessary to perform Weighted Correlation Network Analysis on high-dimensional data as originally described in Horvath and Zhang (2005) <doi:10.2202/1544-6115.1128> and Langfelder and Horvath (2008) <doi:10.1186/1471-2105-9-559>. Includes functions for rudimentary data cleaning, construction of correlation networks, module identification, summarization, and relating of variables and modules to sample traits. Also includes a number of utility functions for data manipulation and visualization.

Maintained by Peter Langfelder. Last updated 6 months ago.

cpp

7.5 match 54 stars 9.65 score 5.3k scripts 32 dependents

dieghernan

tidyterra:'tidyverse' Methods and 'ggplot2' Helpers for 'terra' Objects

Extension of the 'tidyverse' for 'SpatRaster' and 'SpatVector' objects of the 'terra' package. It includes also new 'geom_' functions that provide a convenient way of visualizing 'terra' objects with 'ggplot2'.

Maintained by Diego Hernangómez. Last updated 3 days ago.

terra ggplot-extension r-spatial rspatial

5.3 match 191 stars 13.62 score 1.9k scripts 25 dependents

jknowles

merTools:Tools for Analyzing Mixed Effect Regression Models

Provides methods for extracting results from mixed-effect model objects fit with the 'lme4' package. Allows construction of prediction intervals efficiently from large scale linear and generalized linear mixed-effects models. This method draws from the simulation framework used in the Gelman and Hill (2007) textbook: Data Analysis Using Regression and Multilevel/Hierarchical Models.

Maintained by Jared E. Knowles. Last updated 1 years ago.

6.9 match 105 stars 10.49 score 768 scripts

bioc

CoGAPS:Coordinated Gene Activity in Pattern Sets

Coordinated Gene Activity in Pattern Sets (CoGAPS) implements a Bayesian MCMC matrix factorization algorithm, GAPS, and links it to gene set statistic methods to infer biological process activity. It can be used to perform sparse matrix factorization on any data, and when this data represents biomolecules, to do gene set analysis.

Maintained by Elana J. Fertig. Last updated 5 months ago.

geneexpression transcription genesetenrichment differentialexpression bayesian clustering timecourse rnaseq microarray multiplecomparison dimensionreduction immunooncology cpp

10.7 match 6.72 score 104 scripts

stuart-lab

Signac:Analysis of Single-Cell Chromatin Data

A framework for the analysis and exploration of single-cell chromatin data. The 'Signac' package contains functions for quantifying single-cell chromatin data, computing per-cell quality control metrics, dimension reduction and normalization, visualization, and DNA sequence motif analysis. Reference: Stuart et al. (2021) <doi:10.1038/s41592-021-01282-5>.

Maintained by Tim Stuart. Last updated 7 months ago.

atac bioinformatics single-cell zlib cpp

5.9 match 349 stars 12.19 score 3.7k scripts 1 dependents

bioc

ballgown:Flexible, isoform-level differential expression analysis

Tools for statistical analysis of assembled transcriptomes, including flexible differential expression analysis, visualization of transcript structures, and matching of assembled transcripts to annotation.

Maintained by Jack Fu. Last updated 5 months ago.

immunooncology rnaseq statisticalmethod preprocessing differentialexpression

6.8 match 146 stars 10.51 score 338 scripts 1 dependents

bioc

maftools:Summarize, Analyze and Visualize MAF Files

Analyze and visualize Mutation Annotation Format (MAF) files from large scale sequencing studies. This package provides various functions to perform most commonly used analyses in cancer genomics and to create feature rich customizable visualzations with minimal effort.

Maintained by Anand Mayakonda. Last updated 5 months ago.

datarepresentation dnaseq visualization drivermutation variantannotation featureextraction classification somaticmutation sequencing functionalgenomics survival bioinformatics cancer-genome-atlas cancer-genomics genomics maf-files tcga curl bzip2 xz-utils zlib

4.8 match 459 stars 14.63 score 948 scripts 18 dependents

bioc

BiocGenerics:S4 generic functions used in Bioconductor

The package defines many S4 generic functions used in Bioconductor.

Maintained by Hervé Pagès. Last updated 1 months ago.

infrastructure bioconductor-package core-package

5.0 match 12 stars 14.22 score 612 scripts 2.2k dependents

pacificclimate

ncdf4.helpers:Helper Functions for Use with the 'ncdf4' Package

Contains a collection of helper functions for dealing with 'NetCDF' files <https://www.unidata.ucar.edu/software/netcdf/> opened using 'ncdf4', particularly 'NetCDF' files that conform to the Climate and Forecast (CF) Metadata Conventions <http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html>.

Maintained by Lee Zeman. Last updated 1 days ago.

10.7 match 5 stars 6.55 score 236 scripts 1 dependents

bioc

SparseArray:High-performance sparse data representation and manipulation in R

The SparseArray package provides array-like containers for efficient in-memory representation of multidimensional sparse data in R (arrays and matrices). The package defines the SparseArray virtual class and two concrete subclasses: COO_SparseArray and SVT_SparseArray. Each subclass uses its own internal representation of the nonzero multidimensional data: the "COO layout" and the "SVT layout", respectively. SVT_SparseArray objects mimic as much as possible the behavior of ordinary matrix and array objects in base R. In particular, they suppport most of the "standard matrix and array API" defined in base R and in the matrixStats package from CRAN.

Maintained by Hervé Pagès. Last updated 26 days ago.

infrastructure datarepresentation bioconductor-package core-package openmp

5.5 match 8 stars 12.68 score 79 scripts 1.2k dependents

dmmelamed

rioplot:Turn a Regression Model Inside Out

Turns regression models inside out. Functions decompose variances and coefficients for various regression model types. Functions also visualize regression model objects using techniques developed in Schoon, Melamed, and Breiger (2024) <doi:10.1017/9781108887205>.

Maintained by David Melamed. Last updated 4 months ago.

17.0 match 4.08 score 9 scripts

solivella

lda:Collapsed Gibbs Sampling Methods for Topic Models

Implements latent Dirichlet allocation (LDA) and related models. This includes (but is not limited to) sLDA, corrLDA, and the mixed-membership stochastic blockmodel. Inference for all of these models is implemented via a fast collapsed Gibbs sampler written in C. Utility functions for reading/writing data typically used in topic models, as well as tools for examining posterior distributions are also included.

Maintained by Santiago Olivella. Last updated 11 months ago.

9.0 match 7.62 score 548 scripts 11 dependents

bioc

cytomapper:Visualization of highly multiplexed imaging data in R

Highly multiplexed imaging acquires the single-cell expression of selected proteins in a spatially-resolved fashion. These measurements can be visualised across multiple length-scales. First, pixel-level intensities represent the spatial distributions of feature expression with highest resolution. Second, after segmentation, expression values or cell-level metadata (e.g. cell-type information) can be visualised on segmented cell areas. This package contains functions for the visualisation of multiplexed read-outs and cell-level information obtained by multiplexed imaging technologies. The main functions of this package allow 1. the visualisation of pixel-level information across multiple channels, 2. the display of cell-level information (expression and/or metadata) on segmentation masks and 3. gating and visualisation of single cells.

Maintained by Lasse Meyer. Last updated 5 months ago.

immunooncology software singlecell onechannel twochannel multiplecomparison normalization dataimport bioimaging imaging-mass-cytometry single-cell spatial-analysis

7.1 match 32 stars 9.61 score 354 scripts 5 dependents

muschellij2

neurobase:'Neuroconductor' Base Package with Helper Functions for 'nifti' Objects

Base package for 'Neuroconductor', which includes many helper functions that interact with objects of class 'nifti', implemented by package 'oro.nifti', for reading/writing and also other manipulation functions.

Maintained by John Muschelli. Last updated 1 months ago.

8.0 match 5 stars 8.49 score 486 scripts 7 dependents

bioc

HPAanalyze:Retrieve and analyze data from the Human Protein Atlas

Provide functions for retrieving, exploratory analyzing and visualizing the Human Protein Atlas data.

Maintained by Anh Nhat Tran. Last updated 5 months ago.

proteomics cellbiology visualization software

9.1 match 35 stars 7.43 score 37 scripts

bioc

ncdfFlow:ncdfFlow: A package that provides HDF5 based storage for flow cytometry data.

Provides HDF5 storage based methods and functions for manipulation of flow cytometry data.

Maintained by Mike Jiang. Last updated 2 months ago.

immunooncology flowcytometry zlib cpp

8.9 match 7.56 score 96 scripts 11 dependents

jacobseedorff21

BranchGLM:Efficient Best Subset Selection for GLMs via Branch and Bound Algorithms

Performs efficient and scalable glm best subset selection using a novel implementation of a branch and bound algorithm. To speed up the model fitting process, a range of optimization methods are implemented in 'RcppArmadillo'. Parallel computation is available using 'OpenMP'.

Maintained by Jacob Seedorff. Last updated 6 months ago.

generalized-linear-models regression statistics subset-selection variable-selection openblas cpp openmp

10.8 match 7 stars 6.20 score 30 scripts

momx

Momocs:Morphometrics using R

The goal of 'Momocs' is to provide a complete, convenient, reproducible and open-source toolkit for 2D morphometrics. It includes most common 2D morphometrics approaches on outlines, open outlines, configurations of landmarks, traditional morphometrics, and facilities for data preparation, manipulation and visualization with a consistent grammar throughout. It allows reproducible, complex morphometrics analyses and other morphometrics approaches should be easy to plug in, or develop from, on top of this canvas.

Maintained by Vincent Bonhomme. Last updated 1 years ago.

morphometrics

8.9 match 51 stars 7.42 score 346 scripts

walkerke

tigris:Load Census TIGER/Line Shapefiles

Download TIGER/Line shapefiles from the United States Census Bureau (<https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html>) and load into R as 'sf' objects.

Maintained by Kyle Walker. Last updated 4 months ago.

5.0 match 331 stars 12.87 score 5.3k scripts 16 dependents

ramiromagno

gwasrapidd:'REST' 'API' Client for the 'NHGRI'-'EBI' 'GWAS' Catalog

'GWAS' R 'API' Data Download. This package provides easy access to the 'NHGRI'-'EBI' 'GWAS' Catalog data by accessing the 'REST' 'API' <https://www.ebi.ac.uk/gwas/rest/docs/api/>.

Maintained by Ramiro Magno. Last updated 1 years ago.

thirdpartyclient biomedicalinformatics genomewideassociation snp association-studies gwas-catalog human rest-client trait trait-ontology

8.0 match 95 stars 8.10 score 49 scripts 1 dependents

bluefoxr

COINr:Composite Indicator Construction and Analysis

A comprehensive high-level package, for composite indicator construction and analysis. It is a "development environment" for composite indicators and scoreboards, which includes utilities for construction (indicator selection, denomination, imputation, data treatment, normalisation, weighting and aggregation) and analysis (multivariate analysis, correlation plotting, short cuts for principal component analysis, global sensitivity analysis, and more). A composite indicator is completely encapsulated inside a single hierarchical list called a "coin". This allows a fast and efficient work flow, as well as making quick copies, testing methodological variations and making comparisons. It also includes many plotting options, both statistical (scatter plots, distribution plots) as well as for presenting results.

Maintained by William Becker. Last updated 2 months ago.

7.1 match 26 stars 9.07 score 73 scripts 1 dependents

bioc

RankProd:Rank Product method for identifying differentially expressed genes with application in meta-analysis

Non-parametric method for identifying differentially expressed (up- or down- regulated) genes based on the estimated percentage of false predictions (pfp). The method can combine data sets from different origins (meta-analysis) to increase the power of the identification.

Maintained by Francesco Del Carratore. Last updated 5 months ago.

differentialexpression statisticalmethod software researchfield metabolomics lipidomics proteomics systemsbiology geneexpression microarray genesignaling

9.9 match 6.46 score 81 scripts 6 dependents

martin3141

spant:MR Spectroscopy Analysis Tools

Tools for reading, visualising and processing Magnetic Resonance Spectroscopy data. The package includes methods for spectral fitting: Wilson (2021) <DOI:10.1002/mrm.28385> and spectral alignment: Wilson (2018) <DOI:10.1002/mrm.27605>.

Maintained by Martin Wilson. Last updated 1 months ago.

brain mri mrs mrshub spectroscopy fortran

7.5 match 25 stars 8.52 score 81 scripts

r-forge

Matrix:Sparse and Dense Matrix Classes and Methods

A rich hierarchy of sparse and dense matrix classes, including general, symmetric, triangular, and diagonal matrices with numeric, logical, or pattern entries. Efficient methods for operating on such matrices, often wrapping the 'BLAS', 'LAPACK', and 'SuiteSparse' libraries.

Maintained by Martin Maechler. Last updated 9 days ago.

openblas

3.7 match 1 stars 17.23 score 33k scripts 12k dependents

papatheodorou-group

scGOclust:Measuring Cell Type Similarity with Gene Ontology in Single-Cell RNA-Seq

Traditional methods for analyzing single cell RNA-seq datasets focus solely on gene expression, but this package introduces a novel approach that goes beyond this limitation. Using Gene Ontology terms as features, the package allows for the functional profile of cell populations, and comparison within and between datasets from the same or different species. Our approach enables the discovery of previously unrecognized functional similarities and differences between cell types and has demonstrated success in identifying cell types' functional correspondence even between evolutionarily distant species.

Maintained by Yuyao Song. Last updated 1 years ago.

13.1 match 9 stars 4.80 score 14 scripts

tidyverse

stringr:Simple, Consistent Wrappers for Common String Operations

A consistent, simple and easy to use set of wrappers around the fantastic 'stringi' package. All function and argument names (and positions) are consistent, all functions deal with "NA"'s and zero length vectors in the same way, and the output from one function is easy to feed into the input of another.

Maintained by Hadley Wickham. Last updated 7 months ago.

regular-expression strings

2.8 match 628 stars 21.99 score 164k scripts 8.3k dependents

mayoverse

arsenal:An Arsenal of 'R' Functions for Large-Scale Statistical Summaries

An Arsenal of 'R' functions for large-scale statistical summaries, which are streamlined to work within the latest reporting tools in 'R' and 'RStudio' and which use formulas and versatile summary statistics for summary tables and models. The primary functions include tableby(), a Table-1-like summary of multiple variable types 'by' the levels of one or more categorical variables; paired(), a Table-1-like summary of multiple variable types paired across two time points; modelsum(), which performs simple model fits on one or more endpoints for many variables (univariate or adjusted for covariates); freqlist(), a powerful frequency table across many categorical variables; comparedf(), a function for comparing data.frames; and write2(), a function to output tables to a document.

Maintained by Ethan Heinzen. Last updated 7 months ago.

baseline-characteristics descriptive-statistics modeling paired-comparisons reporting statistics tableone

4.6 match 225 stars 13.45 score 1.2k scripts 16 dependents

bioc

scRepertoire:A toolkit for single-cell immune receptor profiling

scRepertoire is a toolkit for processing and analyzing single-cell T-cell receptor (TCR) and immunoglobulin (Ig). The scRepertoire framework supports use of 10x, AIRR, BD, MiXCR, Omniscope, TRUST4, and WAT3R single-cell formats. The functionality includes basic clonal analyses, repertoire summaries, distance-based clustering and interaction with the popular Seurat and SingleCellExperiment/Bioconductor R workflows.

Maintained by Nick Borcherding. Last updated 2 months ago.

software immunooncology singlecell classification annotation sequencing cpp

5.9 match 326 stars 10.49 score 240 scripts

retowuest

autoMrP:Improving MrP with Ensemble Learning

A tool that improves the prediction performance of multilevel regression with post-stratification (MrP) by combining a number of machine learning methods. For information on the method, please refer to Broniecki, Wüest, Leemann (2020) ''Improving Multilevel Regression with Post-Stratification Through Machine Learning (autoMrP)'' in the 'Journal of Politics'. Final pre-print version: <https://lucasleemann.files.wordpress.com/2020/07/automrp-r2pa.pdf>.

Maintained by Philipp Broniecki. Last updated 5 months ago.

10.9 match 27 stars 5.61 score

nathaneastwood

poorman:A Poor Man's Dependency Free Recreation of 'dplyr'

A replication of key functionality from 'dplyr' and the wider 'tidyverse' using only 'base'.

Maintained by Nathan Eastwood. Last updated 1 years ago.

base-r data-manipulation grammar

5.7 match 341 stars 10.79 score 156 scripts 27 dependents

lmaowisc

WR:Win Ratio Analysis of Composite Time-to-Event Outcomes

Implements various win ratio methodologies for composite endpoints of death and non-fatal events, including the (stratified) proportional win-fractions (PW) regression models (Mao and Wang, 2020 <doi:10.1111/biom.13382>), (stratified) two-sample tests with possibly recurrent nonfatal event, and sample size calculation for standard win ratio test (Mao et al., 2021 <doi:10.1111/biom.13501>).

Maintained by Lu Mao. Last updated 2 months ago.

10.0 match 6.11 score 43 scripts

henningte

ir:Functions to Handle and Preprocess Infrared Spectra

Functions to import and handle infrared spectra (import from '.csv' and Thermo Galactic's '.spc', baseline correction, binning, clipping, interpolating, smoothing, averaging, adding, subtracting, dividing, multiplying, plotting).

Maintained by Henning Teickner. Last updated 3 years ago.

chemometrics infrared infrared-spectra ir-package mid-infrared-spectra spectroscopy

11.4 match 6 stars 5.32 score 35 scripts

joshuawlambert

rFSA:Feasible Solution Algorithm for Finding Best Subsets and Interactions

Assists in statistical model building to find optimal and semi-optimal higher order interactions and best subsets. Uses the lm(), glm(), and other R functions to fit models generated from a feasible solution algorithm. Discussed in Subset Selection in Regression, A Miller (2002). Applied and explained for least median of squares in Hawkins (1993) <doi:10.1016/0167-9473(93)90246-P>. The feasible solution algorithm comes up with model forms of a specific type that can have fixed variables, higher order interactions and their lower order terms.

Maintained by Joshua Lambert. Last updated 4 years ago.

algorithm fsa interaction models parallel statistical statistics subset

14.6 match 7 stars 4.15 score 20 scripts

bioc

mastR:Markers Automated Screening Tool in R

mastR is an R package designed for automated screening of signatures of interest for specific research questions. The package is developed for generating refined lists of signature genes from multiple group comparisons based on the results from edgeR and limma differential expression (DE) analysis workflow. It also takes into account the background noise of tissue-specificity, which is often ignored by other marker generation tools. This package is particularly useful for the identification of group markers in various biological and medical applications, including cancer research and developmental biology.

Maintained by Jinjin Chen. Last updated 5 months ago.

software geneexpression transcriptomics differentialexpression visualization

11.7 match 4 stars 5.08 score 3 scripts

r-forge

survey:Analysis of Complex Survey Samples

Summary statistics, two-sample tests, rank tests, generalised linear models, cumulative link models, Cox models, loglinear models, and general maximum pseudolikelihood estimation for multistage stratified, cluster-sampled, unequally weighted survey samples. Variances by Taylor series linearisation or replicate weights. Post-stratification, calibration, and raking. Two-phase and multiphase subsampling designs. Graphics. PPS sampling without replacement. Small-area estimation. Dual-frame designs.

Maintained by "Thomas Lumley". Last updated 6 months ago.

cpp

4.3 match 1 stars 13.93 score 13k scripts 235 dependents

cran

probs:Elementary Probability on Finite Sample Spaces

Performs elementary probability calculations on finite sample spaces, which may be represented by data frames or lists. This package is meant to rescue some widely used functions from the archived 'prob' package (see <https://cran.r-project.org/src/contrib/Archive/prob/>). Functionality includes setting up sample spaces, counting tools, defining probability spaces, performing set algebra, calculating probability and conditional probability, tools for simulation and checking the law of large numbers, adding random variables, and finding marginal distributions. Characteristic functions for all base R distributions are included.

Maintained by Joe gr. Schlarmann. Last updated 9 months ago.

34.8 match 1.70 score

cdcgov

surveytable:Formatted Survey Estimates

Short and understandable commands that generate tabulated, formatted, and rounded survey estimates. Mostly a wrapper for the 'survey' package (Lumley (2004) <doi:10.18637/jss.v009.i08> <https://CRAN.R-project.org/package=survey>) that identifies low-precision estimates using the National Center for Health Statistics (NCHS) presentation standards (Parker et al. (2017) <https://www.cdc.gov/nchs/data/series/sr_02/sr02_175.pdf>, Parker et al. (2023) <doi:10.15620/cdc:124368>).

Maintained by Alex Strashny. Last updated 6 days ago.

estimates formatted-output pretty-print survey tables

8.8 match 6 stars 6.71 score 19 scripts

nelson-n

lmForc:Linear Model Forecasting

Introduces in-sample, out-of-sample, pseudo out-of-sample, and benchmark model forecast tests and a new class for working with forecast data, Forecast.

Maintained by Nelson Rayl. Last updated 7 months ago.

forecasting linear-models

11.2 match 6 stars 5.26 score 20 scripts

owp-spatial

hfsubsetR:Hydrofabric Subsetter

Subset Hydrofabric Data in R.

Maintained by Mike Johnson. Last updated 26 days ago.

geospatial hydrofabric nextgen noaa-owp subsetting

14.6 match 7 stars 4.02 score 8 scripts

kassambara

rstatix:Pipe-Friendly Framework for Basic Statistical Tests

Provides a simple and intuitive pipe-friendly framework, coherent with the 'tidyverse' design philosophy, for performing basic statistical tests, including t-test, Wilcoxon test, ANOVA, Kruskal-Wallis and correlation analyses. The output of each test is automatically transformed into a tidy data frame to facilitate visualization. Additional functions are available for reshaping, reordering, manipulating and visualizing correlation matrix. Functions are also included to facilitate the analysis of factorial experiments, including purely 'within-Ss' designs (repeated measures), purely 'between-Ss' designs, and mixed 'within-and-between-Ss' designs. It's also possible to compute several effect size metrics, including "eta squared" for ANOVA, "Cohen's d" for t-test and 'Cramer V' for the association between categorical variables. The package contains helper functions for identifying univariate and multivariate outliers, assessing normality and homogeneity of variances.

Maintained by Alboukadel Kassambara. Last updated 2 years ago.

3.9 match 456 stars 15.16 score 11k scripts 420 dependents

nerler

JointAI:Joint Analysis and Imputation of Incomplete Data

Joint analysis and imputation of incomplete data in the Bayesian framework, using (generalized) linear (mixed) models and extensions there of, survival models, or joint models for longitudinal and survival data, as described in Erler, Rizopoulos and Lesaffre (2021) <doi:10.18637/jss.v100.i20>. Incomplete covariates, if present, are automatically imputed. The package performs some preprocessing of the data and creates a 'JAGS' model, which will then automatically be passed to 'JAGS' <https://mcmc-jags.sourceforge.io/> with the help of the package 'rjags'.

Maintained by Nicole S. Erler. Last updated 12 months ago.

bayesian generalized-linear-models glm glmm imputation imputations jags joint-analysis linear-mixed-models linear-regression-models mcmc-sample mcmc-sampling missing-data missing-values survival cpp

8.0 match 28 stars 7.30 score 59 scripts 1 dependents

r-lib

lintr:A 'Linter' for R Code

Checks adherence to a given style, syntax errors and possible semantic issues. Supports on the fly checking of R code edited with 'RStudio IDE', 'Emacs', 'Vim', 'Sublime Text', 'Atom' and 'Visual Studio Code'.

Maintained by Michael Chirico. Last updated 1 days ago.

linter

3.4 match 1.2k stars 16.99 score 916 scripts 33 dependents

bxc147

Epi:Statistical Analysis in Epidemiology

Functions for demographic and epidemiological analysis in the Lexis diagram, i.e. register and cohort follow-up data. In particular representation, manipulation, rate estimation and simulation for multistate data - the Lexis suite of functions, which includes interfaces to 'mstate', 'etm' and 'cmprsk' packages. Contains functions for Age-Period-Cohort and Lee-Carter modeling and a function for interval censored data and some useful functions for tabulation and plotting, as well as a number of epidemiological data sets.

Maintained by Bendix Carstensen. Last updated 2 months ago.

6.0 match 4 stars 9.65 score 708 scripts 11 dependents

bioc

MAST:Model-based Analysis of Single Cell Transcriptomics

Methods and models for handling zero-inflated single cell assay data.

Maintained by Andrew McDavid. Last updated 5 months ago.

geneexpression differentialexpression genesetenrichment rnaseq transcriptomics singlecell

5.2 match 232 stars 11.28 score 1.8k scripts 5 dependents

spatstat

spatstat.data:Datasets for 'spatstat' Family

Contains all the datasets for the 'spatstat' family of packages.

Maintained by Adrian Baddeley. Last updated 2 days ago.

kernel-density point-process spatial-analysis spatial-data spatial-data-analysis spatstat statistical-analysis statistical-methods statistical-tests statistics

5.3 match 6 stars 11.07 score 186 scripts 228 dependents

polmine

polmineR:Verbs and Nouns for Corpus Analysis

Package for corpus analysis using the Corpus Workbench ('CWB', <https://cwb.sourceforge.io>) as an efficient back end for indexing and querying large corpora. The package offers functionality to flexibly create subcorpora and to carry out basic statistical operations (count, co-occurrences etc.). The original full text of documents can be reconstructed and inspected at any time. Beyond that, the package is intended to serve as an interface to packages implementing advanced statistical procedures. Respective data structures (document-term matrices, term-co-occurrence matrices etc.) can be created based on the indexed corpora.

Maintained by Andreas Blaette. Last updated 1 years ago.

7.3 match 49 stars 7.96 score 311 scripts

mrcieu

TwoSampleMR:Two Sample MR Functions and Interface to MRC Integrative Epidemiology Unit OpenGWAS Database

A package for performing Mendelian randomization using GWAS summary data. It uses the IEU OpenGWAS database <https://gwas.mrcieu.ac.uk/> to automatically obtain data, and a wide range of methods to run the analysis.

Maintained by Gibran Hemani. Last updated 12 days ago.

5.2 match 467 stars 11.23 score 1.7k scripts 1 dependents

davidorme

caper:Comparative Analyses of Phylogenetics and Evolution in R

Functions for performing phylogenetic comparative analyses.

Maintained by David Orme. Last updated 1 years ago.

7.8 match 1 stars 7.41 score 928 scripts 5 dependents

argocanada

argoFloats:Analysis of Oceanographic Argo Floats

Supports the analysis of oceanographic data recorded by Argo autonomous drifting profiling floats. Functions are provided to (a) download and cache data files, (b) subset data in various ways, (c) handle quality-control flags and (d) plot the results according to oceanographic conventions. A shiny app is provided for easy exploration of datasets. The package is designed to work well with the 'oce' package, providing a wide range of processing capabilities that are particular to oceanographic analysis. See Kelley, Harbin, and Richards (2021) <doi:10.3389/fmars.2021.635922> for more on the scientific context and applications.

Maintained by Dan Kelley. Last updated 1 months ago.

7.8 match 17 stars 7.32 score 203 scripts

bioc

methrix:Fast and efficient summarization of generic bedGraph files from Bisufite sequencing

Bedgraph files generated by Bisulfite pipelines often come in various flavors. Critical downstream step requires summarization of these files into methylation/coverage matrices. This step of data aggregation is done by Methrix, including many other useful downstream functions.

Maintained by Anand Mayakonda. Last updated 5 months ago.

dnamethylation sequencing coverage bedgraph bioinformatics dna-methylation

7.6 match 31 stars 7.51 score 39 scripts 1 dependents

poissonconsulting

nlist:Lists of Numeric Atomic Objects

Create and manipulate numeric list ('nlist') objects. An 'nlist' is an S3 list of uniquely named numeric objects. An numeric object is an integer or double vector, matrix or array. An 'nlists' object is a S3 class list of 'nlist' objects with the same names, dimensionalities and typeofs. Numeric list objects are of interest because they are the raw data inputs for analytic engines such as 'JAGS', 'STAN' and 'TMB'. Numeric lists objects, which are useful for storing multiple realizations of of simulated data sets, can be converted to coda::mcmc and coda::mcmc.list objects.

Maintained by Joe Thorley. Last updated 2 months ago.

data-frame natomic nlist nlists

7.9 match 6 stars 7.23 score 13 scripts 12 dependents

spatstat

spatstat.explore:Exploratory Data Analysis for the 'spatstat' Family

Functionality for exploratory data analysis and nonparametric analysis of spatial data, mainly spatial point patterns, in the 'spatstat' family of packages. (Excludes analysis of spatial data on a linear network, which is covered by the separate package 'spatstat.linnet'.) Methods include quadrat counts, K-functions and their simulation envelopes, nearest neighbour distance and empty space statistics, Fry plots, pair correlation function, kernel smoothed intensity, relative risk estimation with cross-validated bandwidth selection, mark correlation functions, segregation indices, mark dependence diagnostics, and kernel estimates of covariate effects. Formal hypothesis tests of random pattern (chi-squared, Kolmogorov-Smirnov, Monte Carlo, Diggle-Cressie-Loosmore-Ford, Dao-Genton, two-stage Monte Carlo) and tests for covariate effects (Cox-Berman-Waller-Lawson, Kolmogorov-Smirnov, ANOVA) are also supported.

Maintained by Adrian Baddeley. Last updated 2 days ago.

cluster-detection confidence-intervals hypothesis-testing k-function roc-curves scan-statistics significance-testing simulation-envelopes spatial-analysis spatial-data-analysis spatial-sharpening spatial-smoothing spatial-statistics

5.5 match 1 stars 10.18 score 67 scripts 149 dependents

bnprks

BPCells:Single Cell Counts Matrices to PCA

> Efficient operations for single cell ATAC-seq fragments and RNA counts matrices. Interoperable with standard file formats, and introduces efficient bit-packed formats that allow large storage savings and increased read speeds.

Maintained by Benjamin Parks. Last updated 1 months ago.

zlib hdf5 cpp

7.5 match 184 stars 7.48 score 172 scripts

guido-s

meta:General Package for Meta-Analysis

User-friendly general package providing standard methods for meta-analysis and supporting Schwarzer, Carpenter, and Rücker <DOI:10.1007/978-3-319-21416-0>, "Meta-Analysis with R" (2015): - common effect and random effects meta-analysis; - several plots (forest, funnel, Galbraith / radial, L'Abbe, Baujat, bubble); - three-level meta-analysis model; - generalised linear mixed model; - logistic regression with penalised likelihood for rare events; - Hartung-Knapp method for random effects model; - Kenward-Roger method for random effects model; - prediction interval; - statistical tests for funnel plot asymmetry; - trim-and-fill method to evaluate bias in meta-analysis; - meta-regression; - cumulative meta-analysis and leave-one-out meta-analysis; - import data from 'RevMan 5'; - produce forest plot summarising several (subgroup) meta-analyses.

Maintained by Guido Schwarzer. Last updated 27 days ago.

meta-analysis rstudio

3.8 match 84 stars 14.84 score 2.3k scripts 29 dependents

bioc

VennDetail:A package for visualization and extract details

A set of functions to generate high-resolution Venn,Vennpie plot,extract and combine details of these subsets with user datasets in data frame is available.

Maintained by Kai Guo. Last updated 5 months ago.

datarepresentation graphandnetwork extract venndiagram

8.2 match 29 stars 6.75 score 65 scripts

environmentalinformatics-marburg

satellite:Handling and Manipulating Remote Sensing Data

Herein, we provide a broad variety of functions which are useful for handling, manipulating, and visualizing satellite-based remote sensing data. These operations range from mere data import and layer handling (eg subsetting), over Raster* typical data wrangling (eg crop, extend), to more sophisticated (pre-)processing tasks typically applied to satellite imagery (eg atmospheric and topographic correction). This functionality is complemented by a full access to the satellite layers' metadata at any stage and the documentation of performed actions in a separate log file. Currently available sensors include Landsat 4-5 (TM), 7 (ETM+), and 8 (OLI/TIRS Combined), and additional compatibility is ensured for the Landsat Global Land Survey data set.

Maintained by Florian Detsch. Last updated 1 years ago.

cpp

5.6 match 22 stars 9.88 score 61 scripts 27 dependents

darwin-eu

omopgenerics:Methods and Classes for the OMOP Common Data Model

Provides definitions of core classes and methods used by analytic pipelines that query the OMOP (Observational Medical Outcomes Partnership) common data model.

Maintained by Martí Català. Last updated 11 days ago.

5.5 match 9.97 score 193 scripts 16 dependents

tidymodels

infer:Tidy Statistical Inference

The objective of this package is to perform inference using an expressive statistical grammar that coheres with the tidy design framework.

Maintained by Simon Couch. Last updated 6 months ago.

3.5 match 736 stars 15.75 score 3.5k scripts 18 dependents

sparklyr

sparklyr:R Interface to Apache Spark

R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.

Maintained by Edgar Ruiz. Last updated 18 hours ago.

apache-spark distributed dplyr ide livy machine-learning remote-clusters spark sparklyr

3.6 match 959 stars 15.20 score 4.0k scripts 21 dependents

mikejohnson51

AOI:Areas of Interest

A consistent tool kit for forward and reverse geocoding and defining boundaries for spatial analysis.

Maintained by Mike Johnson. Last updated 1 years ago.

aoi area-of-interest bounding-boxes gis spatial subset

11.0 match 37 stars 4.98 score 174 scripts 1 dependents

rebeccasalles

TSPred:Functions for Benchmarking Time Series Prediction

Functions for defining and conducting a time series prediction process including pre(post)processing, decomposition, modelling, prediction and accuracy assessment. The generated models and its yielded prediction errors can be used for benchmarking other time series prediction methods and for creating a demand for the refinement of such methods. For this purpose, benchmark data from prediction competitions may be used.

Maintained by Rebecca Pontes Salles. Last updated 4 years ago.

benchmarking linear-models machine-learning nonstationarity time-series-forecast time-series-prediction

9.9 match 24 stars 5.53 score 94 scripts 1 dependents

cliffordlai

bestglm:Best Subset GLM and Regression Utilities

Best subset glm using information criteria or cross-validation, carried by using 'leaps' algorithm (Furnival and Wilson, 1974) <doi:10.2307/1267601> or complete enumeration (Morgan and Tatar, 1972) <doi:10.1080/00401706.1972.10488918>. Implements PCR and PLS using AIC/BIC. Implements one-standard deviation rule for use with the 'caret' package.

Maintained by Yuanhao Lai. Last updated 5 years ago.

10.3 match 5.29 score 418 scripts 5 dependents

steffenmoritz

imputeR:A General Multivariate Imputation Framework

Multivariate Expectation-Maximization (EM) based imputation framework that offers several different algorithms. These include regularisation methods like Lasso and Ridge regression, tree-based models and dimensionality reduction methods like PCA and PLS.

Maintained by Steffen Moritz. Last updated 4 years ago.

missing-data

11.0 match 16 stars 4.94 score 54 scripts

bioc

scp:Mass Spectrometry-Based Single-Cell Proteomics Data Analysis

Utility functions for manipulating, processing, and analyzing mass spectrometry-based single-cell proteomics data. The package is an extension to the 'QFeatures' package and relies on 'SingleCellExpirement' to enable single-cell proteomics analyses. The package offers the user the functionality to process quantitative table (as generated by MaxQuant, Proteome Discoverer, and more) into data tables ready for downstream analysis and data visualization.

Maintained by Christophe Vanderaa. Last updated 19 days ago.

geneexpression proteomics singlecell massspectrometry preprocessing cellbasedassays bioconductor mass-spectrometry single-cell software

6.0 match 25 stars 8.94 score 115 scripts

poissonconsulting

chk:Check User-Supplied Function Arguments

For developers to check user-supplied function arguments. It is designed to be simple, fast and customizable. Error messages follow the tidyverse style guide.

Maintained by Joe Thorley. Last updated 2 months ago.

chk

4.5 match 48 stars 11.89 score 22 scripts 95 dependents

jacobbien

simulator:An Engine for Running Simulations

A framework for performing simulations such as those common in methodological statistics papers. The design principles of this package are described in greater depth in Bien, J. (2016) "The simulator: An Engine to Streamline Simulations," which is available at <arXiv:1607.00021>.

Maintained by Jacob Bien. Last updated 2 years ago.

simulation

7.5 match 52 stars 7.13 score 103 scripts

bioc

ChemmineOB:R interface to a subset of OpenBabel functionalities

ChemmineOB provides an R interface to a subset of cheminformatics functionalities implemented by the OpelBabel C++ project. OpenBabel is an open source cheminformatics toolbox that includes utilities for structure format interconversions, descriptor calculations, compound similarity searching and more. ChemineOB aims to make a subset of these utilities available from within R. For non-developers, ChemineOB is primarily intended to be used from ChemmineR as an add-on package rather than used directly.

Maintained by Thomas Girke. Last updated 5 months ago.

cheminformatics biomedicalinformatics pharmacogenetics pharmacogenomics microtitreplateassay cellbasedassays visualization infrastructure dataimport clustering proteomics metabolomics openbabel cpp

6.8 match 10 stars 7.87 score 77 scripts 1 dependents

darwin-eu

CodelistGenerator:Identify Relevant Clinical Codes and Evaluate Their Use

Generate a candidate code list for the Observational Medical Outcomes Partnership (OMOP) common data model based on string matching. For a given search strategy, a candidate code list will be returned.

Maintained by Edward Burn. Last updated 27 days ago.

5.4 match 13 stars 9.87 score 165 scripts 4 dependents

cmmr

rbiom:Read/Write, Analyze, and Visualize 'BIOM' Data

A toolkit for working with Biological Observation Matrix ('BIOM') files. Read/write all 'BIOM' formats. Compute rarefaction, alpha diversity, and beta diversity (including 'UniFrac'). Summarize counts by taxonomic level. Subset based on metadata. Generate visualizations and statistical analyses. CPU intensive operations are coded in C for speed.

Maintained by Daniel P. Smith. Last updated 1 days ago.

5.9 match 15 stars 9.06 score 117 scripts 6 dependents

bioc

VariantAnnotation:Annotation of Genetic Variants

Annotate variants, compute amino acid coding changes, predict coding outcomes.

Maintained by Bioconductor Package Maintainer. Last updated 2 months ago.

dataimport sequencing snp annotation genetics variantannotation curl bzip2 xz-utils zlib

4.6 match 11.39 score 1.9k scripts 152 dependents

patzaw

TKCat:Tailored Knowledge Catalog

Facilitate the management of data from knowledge resources that are frequently used alone or together in research environments. In 'TKCat', knowledge resources are manipulated as modeled database (MDB) objects. These objects provide access to the data tables along with a general description of the resource and a detail data model documenting the tables, their fields and their relationships. These MDBs are then gathered in catalogs that can be easily explored an shared. Finally, 'TKCat' provides tools to easily subset, filter and combine MDBs and create new catalogs suited for specific needs.

Maintained by Patrice Godard. Last updated 1 days ago.

8.5 match 5 stars 6.13 score 27 scripts

olink-proteomics

OlinkAnalyze:Facilitate Analysis of Proteomic Data from Olink

A collection of functions to facilitate analysis of proteomic data from Olink, primarily NPX data that has been exported from Olink Software. The functions also work on QUANT data from Olink by log- transforming the QUANT data. The functions are focused on reading data, facilitating data wrangling and quality control analysis, performing statistical analysis and generating figures to visualize the results of the statistical analysis. The goal of this package is to help users extract biological insights from proteomic data run on the Olink platform.

Maintained by Kathleen Nevola. Last updated 22 days ago.

olink proteomics proteomics-data-analysis

5.3 match 104 stars 9.72 score 61 scripts

mjskay

ggdist:Visualizations of Distributions and Uncertainty

Provides primitives for visualizing distributions using 'ggplot2' that are particularly tuned for visualizing uncertainty in either a frequentist or Bayesian mode. Both analytical distributions (such as frequentist confidence distributions or Bayesian priors) and distributions represented as samples (such as bootstrap distributions or Bayesian posterior samples) are easily visualized. Visualization primitives include but are not limited to: points with multiple uncertainty intervals, eye plots (Spiegelhalter D., 1999) <https://ideas.repec.org/a/bla/jorssa/v162y1999i1p45-58.html>, density plots, gradient plots, dot plots (Wilkinson L., 1999) <doi:10.1080/00031305.1999.10474474>, quantile dot plots (Kay M., Kola T., Hullman J., Munson S., 2016) <doi:10.1145/2858036.2858558>, complementary cumulative distribution function barplots (Fernandes M., Walls L., Munson S., Hullman J., Kay M., 2018) <doi:10.1145/3173574.3173718>, and fit curves with multiple uncertainty ribbons.

Maintained by Matthew Kay. Last updated 4 months ago.

ggplot2 uncertainty uncertainty-visualization visualization cpp

3.4 match 856 stars 15.24 score 3.1k scripts 61 dependents

gesistsa

sweater:Speedy Word Embedding Association Test and Extras Using R

Conduct various tests for evaluating implicit biases in word embeddings: Word Embedding Association Test (Caliskan et al., 2017), <doi:10.1126/science.aal4230>, Relative Norm Distance (Garg et al., 2018), <doi:10.1073/pnas.1720347115>, Mean Average Cosine Similarity (Mazini et al., 2019) <arXiv:1904.04047>, SemAxis (An et al., 2018) <arXiv:1806.05521>, Relative Negative Sentiment Bias (Sweeney & Najafian, 2019) <doi:10.18653/v1/P19-1162>, and Embedding Coherence Test (Dev & Phillips, 2019) <arXiv:1901.07656>.

Maintained by Chung-hong Chan. Last updated 2 months ago.

bias-detection textanalysis wordembedding cpp

10.7 match 30 stars 4.80 score 14 scripts

edzer

spacetime:Classes and Methods for Spatio-Temporal Data

Classes and methods for spatio-temporal data, including space-time regular lattices, sparse lattices, irregular data, and trajectories; utility functions for plotting data as map sequences (lattice or animation) or multiple time series; methods for spatial and temporal selection and subsetting, as well as for spatial/temporal/spatio-temporal matching or aggregation, retrieving coordinates, print, summary, etc.

Maintained by Edzer Pebesma. Last updated 2 months ago.

3.8 match 74 stars 13.23 score 628 scripts 69 dependents

spatialnous

alcyon:Spatial Network Analysis

Interface package for 'sala', the spatial network analysis library from the 'depthmapX' software application. The R parts of the code are based on the 'rdepthmap' package. Allows for the analysis of urban and building-scale networks and provides metrics and methods usually found within the Space Syntax domain. Methods in this package are described by K. Al-Sayed, A. Turner, B. Hillier, S. Iida and A. Penn (2014) "Space Syntax methodology", and also by A. Turner (2004) <https://discovery.ucl.ac.uk/id/eprint/2651> "Depthmap 4: a researcher's handbook".

Maintained by Petros Koutsolampros. Last updated 2 months ago.

cpp openmp

8.0 match 2 stars 6.34 score 13 scripts

bioc

SpatialExperiment:S4 Class for Spatially Resolved -omics Data

Defines an S4 class for storing data from spatial -omics experiments. The class extends SingleCellExperiment to support storage and retrieval of additional information from spot-based and molecule-based platforms, including spatial coordinates, images, and image metadata. A specialized constructor function is included for data from the 10x Genomics Visium platform.

Maintained by Dario Righelli. Last updated 5 months ago.

datarepresentation dataimport infrastructure immunooncology geneexpression transcriptomics singlecell spatial

4.0 match 59 stars 12.63 score 1.8k scripts 71 dependents

bioc

RAIDS:Accurate Inference of Genetic Ancestry from Cancer Sequences

This package implements specialized algorithms that enable genetic ancestry inference from various cancer sequences sources (RNA, Exome and Whole-Genome sequences). This package also implements a simulation algorithm that generates synthetic cancer-derived data. This code and analysis pipeline was designed and developed for the following publication: Belleau, P et al. Genetic Ancestry Inference from Cancer-Derived Molecular Data across Genomic and Transcriptomic Platforms. Cancer Res 1 January 2023; 83 (1): 49–58.

Maintained by Pascal Belleau. Last updated 5 months ago.

genetics software sequencing wholegenome principalcomponent geneticvariability dimensionreduction biocviews ancestry cancer-genomics exome-sequencing genomics inference r-language rna-seq rna-sequencing whole-genome-sequencing

8.1 match 5 stars 6.23 score 19 scripts

biotimehub

BioTIMEr:Tools to Use and Explore the 'BioTIME' Database

The 'BioTIME' database was first published in 2018 and inspired ideas, questions, project and research article. To make it even more accessible, an R package was created. The 'BioTIMEr' package provides tools designed to interact with the 'BioTIME' database. The functions provided include the 'BioTIME' recommended methods for preparing (gridding and rarefaction) time series data, a selection of standard biodiversity metrics (including species richness, numerical abundance and exponential Shannon) alongside examples on how to display change over time. It also includes a sample subset of both the query and meta data, the full versions of which are freely available on the 'BioTIME' website <https://biotime.st-andrews.ac.uk/home.php>.

Maintained by Alban Sagouis. Last updated 8 months ago.

9.0 match 4 stars 5.60 score 10 scripts

spatstat

spatstat.model:Parametric Statistical Modelling and Inference for the 'spatstat' Family

Functionality for parametric statistical modelling and inference for spatial data, mainly spatial point patterns, in the 'spatstat' family of packages. (Excludes analysis of spatial data on a linear network, which is covered by the separate package 'spatstat.linnet'.) Supports parametric modelling, formal statistical inference, and model validation. Parametric models include Poisson point processes, Cox point processes, Neyman-Scott cluster processes, Gibbs point processes and determinantal point processes. Models can be fitted to data using maximum likelihood, maximum pseudolikelihood, maximum composite likelihood and the method of minimum contrast. Fitted models can be simulated and predicted. Formal inference includes hypothesis tests (quadrat counting tests, Cressie-Read tests, Clark-Evans test, Berman test, Diggle-Cressie-Loosmore-Ford test, scan test, studentised permutation test, segregation test, ANOVA tests of fitted models, adjusted composite likelihood ratio test, envelope tests, Dao-Genton test, balanced independent two-stage test), confidence intervals for parameters, and prediction intervals for point counts. Model validation techniques include leverage, influence, partial residuals, added variable plots, diagnostic plots, pseudoscore residual plots, model compensators and Q-Q plots.

Maintained by Adrian Baddeley. Last updated 9 days ago.

analysis-of-variance cluster-process confidence-intervals cox-process determinantal-point-processes gibbs-process influence leverage model-diagnostics neyman-scott parameter-estimation poisson-process spatial-analysis spatial-modelling spatial-point-processes statistical-inference

5.5 match 5 stars 9.09 score 6 scripts 46 dependents

capitalone

dataCompareR:Compare Two Data Frames and Summarise the Difference

Easy comparison of two tabular data objects in R. Specifically designed to show differences between two sets of data in a useful way that should make it easier to understand the differences, and if necessary, help you work out how to remedy them. Aims to offer a more useful output than all.equal() when your two data sets do not match, but isn't intended to replace all.equal() as a way to test for equality.

Maintained by Sarah Johnston. Last updated 2 years ago.

compare-data data data-analysis data-science

7.0 match 76 stars 7.24 score 76 scripts

bioc

KEGGgraph:KEGGgraph: A graph approach to KEGG PATHWAY in R and Bioconductor

KEGGGraph is an interface between KEGG pathway and graph object as well as a collection of tools to analyze, dissect and visualize these graphs. It parses the regularly updated KGML (KEGG XML) files into graph models maintaining all essential pathway attributes. The package offers functionalities including parsing, graph operation, visualization and etc.

Maintained by Jitao David Zhang. Last updated 5 months ago.

pathways graphandnetwork visualization kegg

6.5 match 7.76 score 114 scripts 23 dependents

asgr

imager:Image Processing Library Based on 'CImg'

Fast image processing for images in up to 4 dimensions (two spatial dimensions, one time/depth dimension, one colour dimension). Provides most traditional image processing tools (filtering, morphology, transformations, etc.) as well as various functions for easily analysing image data using R. The package wraps 'CImg', <http://cimg.eu>, a simple, modern C++ library for image processing.

Maintained by Aaron Robotham. Last updated 29 days ago.

libx11 fftw3 tiff cpp openmp

3.7 match 17 stars 13.62 score 2.4k scripts 45 dependents

moodymudskipper

inops:Infix Operators for Detection, Subsetting and Replacement

Infix operators to detect, subset, and replace the elements matched by a given condition. The functions have several variants of operator types, including subsets, ranges, regular expressions and others. Implemented operators work on vectors, matrices, and lists.

Maintained by Antoine Fabri. Last updated 5 years ago.

r-language r-programming

9.3 match 40 stars 5.34 score 11 scripts

bioc

AnnotationDbi:Manipulation of SQLite-based annotations in Bioconductor

Implements a user-friendly interface for querying SQLite-based annotation data packages.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

annotation microarray sequencing genomeannotation bioconductor-package core-package

3.3 match 9 stars 15.05 score 3.6k scripts 769 dependents

mamba413

abess:Fast Best Subset Selection

Extremely efficient toolkit for solving the best subset selection problem <https://www.jmlr.org/papers/v23/21-1060.html>. This package is its R interface. The package implements and generalizes algorithms designed in <doi:10.1073/pnas.2014241117> that exploits a novel sequencing-and-splicing technique to guarantee exact support recovery and globally optimal solution in polynomial times for linear model. It also supports best subset selection for logistic regression, Poisson regression, Cox proportional hazard model, Gamma regression, multiple-response regression, multinomial logistic regression, ordinal regression, (sequential) principal component analysis, and robust principal component analysis. The other valuable features such as the best subset of group selection <doi:10.1287/ijoc.2022.1241> and sure independence screening <doi:10.1111/j.1467-9868.2008.00674.x> are also provided.

Maintained by Jin Zhu. Last updated 6 months ago.

cpp openmp

11.6 match 6 stars 4.27 score 62 scripts

bioc

universalmotif:Import, Modify, and Export Motifs with R

Allows for importing most common motif types into R for use by functions provided by other Bioconductor motif-related packages. Motifs can be exported into most major motif formats from various classes as defined by other Bioconductor packages. A suite of motif and sequence manipulation and analysis functions are included, including enrichment, comparison, P-value calculation, shuffling, trimming, higher-order motifs, and others.

Maintained by Benjamin Jean-Marie Tremblay. Last updated 4 months ago.

motifannotation motifdiscovery dataimport generegulation motif-analysis motif-enrichment-analysis sequence-logo cpp

4.4 match 28 stars 11.04 score 342 scripts 12 dependents

bioc

GenomicInteractions:Utilities for handling genomic interaction data

Utilities for handling genomic interaction data such as ChIA-PET or Hi-C, annotating genomic features with interaction information, and producing plots and summary statistics.

Maintained by Liz Ing-Simmons. Last updated 5 months ago.

software infrastructure dataimport datarepresentation hic

5.2 match 7 stars 9.39 score 162 scripts 6 dependents

nutriverse

zscorer:Child Anthropometry z-Score Calculator

A tool for calculating z-scores and centiles for weight-for-age, length/height-for-age, weight-for-length/height, BMI-for-age, head circumference-for-age, age circumference-for-age, subscapular skinfold-for-age, triceps skinfold-for-age based on the WHO Child Growth Standards.

Maintained by Ernest Guevarra. Last updated 4 years ago.

anthropometric-indices anthropometry growth-charts growth-standards height-for-age nutrition weight-for-age weight-for-height z-score

6.7 match 14 stars 7.30 score 47 scripts 1 dependents

bioc

edgeR:Empirical Analysis of Digital Gene Expression Data in R

Differential expression analysis of sequence count data. Implements a range of statistical methodology based on the negative binomial distributions, including empirical Bayes estimation, exact tests, generalized linear models, quasi-likelihood, and gene set enrichment. Can perform differential analyses of any type of omics data that produces read counts, including RNA-seq, ChIP-seq, ATAC-seq, Bisulfite-seq, SAGE, CAGE, metabolomics, or proteomics spectral counts. RNA-seq analyses can be conducted at the gene or isoform level, and tests can be conducted for differential exon or transcript usage.

Maintained by Yunshun Chen. Last updated 7 days ago.

alternativesplicing batcheffect bayesian biomedicalinformatics cellbiology chipseq clustering coverage differentialexpression differentialmethylation differentialsplicing dnamethylation epigenetics functionalgenomics geneexpression genesetenrichment genetics immunooncology multiplecomparison normalization pathways proteomics qualitycontrol regression rnaseq sage sequencing singlecell systemsbiology timecourse transcription transcriptomics openblas

3.6 match 13.40 score 17k scripts 255 dependents

bioc

CiteFuse:CiteFuse: multi-modal analysis of CITE-seq data

CiteFuse pacakage implements a suite of methods and tools for CITE-seq data from pre-processing to integrative analytics, including doublet detection, network-based modality integration, cell type clustering, differential RNA and protein expression analysis, ADT evaluation, ligand-receptor interaction analysis, and interactive web-based visualisation of the analyses.

Maintained by Yingxin Lin. Last updated 5 months ago.

singlecell geneexpression bioinformatics single-cell cpp

7.3 match 27 stars 6.59 score 18 scripts

bioc

scClassify:scClassify: single-cell Hierarchical Classification

scClassify is a multiscale classification framework for single-cell RNA-seq data based on ensemble learning and cell type hierarchies, enabling sample size estimation required for accurate cell type classification and joint classification of cells using multiple references.

Maintained by Yingxin Lin. Last updated 5 months ago.

singlecell geneexpression classification

7.0 match 23 stars 6.92 score 30 scripts

bioc

gwascat:representing and modeling data in the EMBL-EBI GWAS catalog

Represent and model data in the EMBL-EBI GWAS catalog.

Maintained by VJ Carey. Last updated 5 months ago.

genetics

8.0 match 6.05 score 110 scripts 2 dependents

bluegreen-labs

MODISTools:Interface to the 'MODIS Land Products Subsets' Web Services

Programmatic interface to the Oak Ridge National Laboratories 'MODIS Land Products Subsets' web services (<https://modis.ornl.gov/data/modis_webservice.html>). Allows for easy downloads of 'MODIS' time series directly to your R workspace or your computer.

Maintained by Koen Hufkens. Last updated 1 years ago.

api remote-sensing satellite-data

6.9 match 1 stars 6.90 score 155 scripts 4 dependents

bioc

MoleculeExperiment:Prioritising a molecule-level storage of Spatial Transcriptomics Data

MoleculeExperiment contains functions to create and work with objects from the new MoleculeExperiment class. We introduce this class for analysing molecule-based spatial transcriptomics data (e.g., Xenium by 10X, Cosmx SMI by Nanostring, and Merscope by Vizgen). This allows researchers to analyse spatial transcriptomics data at the molecule level, and to have standardised data formats accross vendors.

Maintained by Shila Ghazanfar. Last updated 5 months ago.

dataimport datarepresentation infrastructure software spatial transcriptomics

7.4 match 12 stars 6.45 score 39 scripts

bioc

TADCompare:TADCompare: Identification and characterization of differential TADs

TADCompare is an R package designed to identify and characterize differential Topologically Associated Domains (TADs) between multiple Hi-C contact matrices. It contains functions for finding differential TADs between two datasets, finding differential TADs over time and identifying consensus TADs across multiple matrices. It takes all of the main types of HiC input and returns simple, comprehensive, easy to analyze results.

Maintained by Mikhail Dozmorov. Last updated 5 months ago.

software hic sequencing featureextraction clustering

6.8 match 23 stars 7.04 score 10 scripts

bioc

RaggedExperiment:Representation of Sparse Experiments and Assays Across Samples

This package provides a flexible representation of copy number, mutation, and other data that fit into the ragged array schema for genomic location data. The basic representation of such data provides a rectangular flat table interface to the user with range information in the rows and samples/specimen in the columns. The RaggedExperiment class derives from a GRangesList representation and provides a semblance of a rectangular dataset.

Maintained by Marcel Ramos. Last updated 4 months ago.

infrastructure datarepresentation copynumber core-package data-structure mutations u24ca289073

5.3 match 4 stars 8.96 score 76 scripts 15 dependents

bioc

QFeatures:Quantitative features for mass spectrometry data

The QFeatures infrastructure enables the management and processing of quantitative features for high-throughput mass spectrometry assays. It provides a familiar Bioconductor user experience to manages quantitative data across different assay levels (such as peptide spectrum matches, peptides and proteins) in a coherent and tractable format.

Maintained by Laurent Gatto. Last updated 14 days ago.

infrastructure massspectrometry proteomics metabolomics bioconductor mass-spectrometry

4.0 match 27 stars 11.87 score 278 scripts 49 dependents

bioc

GenomicRanges:Representation and manipulation of genomic intervals

The ability to efficiently represent and manipulate genomic annotations and alignments is playing a central role when it comes to analyzing high-throughput sequencing data (a.k.a. NGS data). The GenomicRanges package defines general purpose containers for storing and manipulating genomic intervals and variables defined along a genome. More specialized containers for representing and manipulating short alignments against a reference genome, or a matrix-like summarization of an experiment, are defined in the GenomicAlignments and SummarizedExperiment packages, respectively. Both packages build on top of the GenomicRanges infrastructure.

Maintained by Hervé Pagès. Last updated 4 months ago.

genetics infrastructure datarepresentation sequencing annotation genomeannotation coverage bioconductor-package core-package

2.7 match 44 stars 17.75 score 13k scripts 1.3k dependents

fumi-github

omicwas:Cell-Type-Specific Association Testing in Bulk Omics Experiments

In bulk epigenome/transcriptome experiments, molecular expression is measured in a tissue, which is a mixture of multiple types of cells. This package tests association of a disease/phenotype with a molecular marker for each cell type. The proportion of cell types in each sample needs to be given as input. The package is applicable to epigenome-wide association study (EWAS) and differential gene expression analysis. Takeuchi and Kato (submitted) "omicwas: cell-type-specific epigenome-wide and transcriptome association study".

Maintained by Fumihiko Takeuchi. Last updated 4 years ago.

10.9 match 4 stars 4.30 score 7 scripts

simongrund1

mitml:Tools for Multiple Imputation in Multilevel Modeling

Provides tools for multiple imputation of missing data in multilevel modeling. Includes a user-friendly interface to the packages 'pan' and 'jomo', and several functions for visualization, data management and the analysis of multiply imputed data sets.

Maintained by Simon Grund. Last updated 1 years ago.

imputation missing-data mixed-effects multilevel-data multilevel-models

3.8 match 29 stars 12.36 score 246 scripts 153 dependents

bioc

HIBAG:HLA Genotype Imputation with Attribute Bagging

Imputes HLA classical alleles using GWAS SNP data, and it relies on a training set of HLA and SNP genotypes. HIBAG can be used by researchers with published parameter estimates instead of requiring access to large training sample datasets. It combines the concepts of attribute bagging, an ensemble classifier method, with haplotype inference for SNPs and HLA types. Attribute bagging is a technique which improves the accuracy and stability of classifier ensembles using bootstrap aggregating and random variable selection.

Maintained by Xiuwen Zheng. Last updated 4 months ago.

genetics statisticalmethod bioinformatics gpu hla imputation mhc snp cpp

5.7 match 30 stars 8.24 score 48 scripts

bioc

CHETAH:Fast and accurate scRNA-seq cell type identification

CHETAH (CHaracterization of cEll Types Aided by Hierarchical classification) is an accurate, selective and fast scRNA-seq classifier. Classification is guided by a reference dataset, preferentially also a scRNA-seq dataset. By hierarchical clustering of the reference data, CHETAH creates a classification tree that enables a step-wise, top-to-bottom classification. Using a novel stopping rule, CHETAH classifies the input cells to the cell types of the references and to "intermediate types": more general classifications that ended in an intermediate node of the tree.

Maintained by Jurrian de Kanter. Last updated 5 months ago.

classification rnaseq singlecell clustering geneexpression immunooncology

6.4 match 44 stars 7.27 score 70 scripts

bioc

InTAD:Search for correlation between epigenetic signals and gene expression in TADs

The package is focused on the detection of correlation between expressed genes and selected epigenomic signals (i.e. enhancers obtained from ChIP-seq data) either within topologically associated domains (TADs) or between chromatin contact loop anchors. Various parameters can be controlled to investigate the influence of external factors and visualization plots are available for each analysis step.

Maintained by Konstantin Okonechnikov. Last updated 5 months ago.

epigenetics sequencing chipseq rnaseq hic geneexpression immunooncology

10.8 match 4.30 score 6 scripts

tilltnet

egor:Import and Analyse Ego-Centered Network Data

Tools for importing, analyzing and visualizing ego-centered network data. Supports several data formats, including the export formats of 'EgoNet', 'EgoWeb 2.0' and 'openeddi'. An interactive (shiny) app for the intuitive visualization of ego-centered networks is provided. Also included are procedures for creating and visualizing Clustered Graphs (Lerner 2008 <DOI:10.1109/PACIFICVIS.2008.4475458>).

Maintained by Till Krenz. Last updated 15 days ago.

ego-centered egonet egor network-analysis sna

5.4 match 24 stars 8.64 score 76 scripts 2 dependents

sevvandi

stxplore:Exploration of Spatio-Temporal Data

A set of statistical tools for spatio-temporal data exploration. Includes simple plotting functions, covariance calculations and computations similar to principal component analysis for spatio-temporal data. Can use both dataframes and stars objects for all plots and computations. For more details refer 'Spatio-Temporal Statistics with R' (Christopher K. Wikle, Andrew Zammit-Mangion, Noel Cressie, 2019, ISBN:9781138711136).

Maintained by Sevvandi Kandanaarachchi. Last updated 2 years ago.

9.9 match 5 stars 4.70 score 7 scripts

ryan-graebner

GeneticSubsetter:Identify Favorable Subsets of Germplasm Collections

Finds subsets of sets of genotypes with a high Heterozygosity, and Mean of Transformed Kinships (MTK), measures that can indicate a subset would be beneficial for rare-trait discovery and genome-wide association scanning, respectively.

Maintained by Ryan C. Graebner. Last updated 8 years ago.

21.0 match 2.20 score 16 scripts

cynkra

dm:Relational Data Models

Provides tools for working with multiple related tables, stored as data frames or in a relational database. Multiple tables (data and metadata) are stored in a compound object, which can then be manipulated with a pipe-friendly syntax.

Maintained by Kirill Müller. Last updated 3 months ago.

data-model data-warehousing datawarehousing dbi dbplyr relational-databases

3.1 match 511 stars 14.81 score 410 scripts 8 dependents

ropensci

taxlist:Handling Taxonomic Lists

Handling taxonomic lists through objects of class 'taxlist'. This package provides functions to import species lists from 'Turboveg' (<https://www.synbiosys.alterra.nl/turboveg/>) and the possibility to create backups from resulting R-objects. Also quick displays are implemented as summary-methods.

Maintained by Miguel Alvarez. Last updated 6 months ago.

6.5 match 12 stars 7.07 score 81 scripts 2 dependents

r-forge

car:Companion to Applied Regression

Functions to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage, 2019.

Maintained by John Fox. Last updated 5 months ago.

3.0 match 15.29 score 43k scripts 901 dependents