Showing 200 of total 3251 results (show query)
tidyverse
tidyverse:Easily Install and Load the 'Tidyverse'
The 'tidyverse' is a set of packages that work in harmony because they share common data representations and 'API' design. This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step. Learn more about the 'tidyverse' at <https://www.tidyverse.org>.
Maintained by Hadley Wickham. Last updated 5 months ago.
1.7k stars 20.23 score 664k scripts 125 dependentstidyverse
readr:Read Rectangular Text Data
The goal of 'readr' is to provide a fast and friendly way to read rectangular data (like 'csv', 'tsv', and 'fwf'). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changes.
Maintained by Jennifer Bryan. Last updated 8 months ago.
1.0k stars 20.06 score 132k scripts 2.1k dependentsapache
arrow:Integration to 'Apache' 'Arrow'
'Apache' 'Arrow' <https://arrow.apache.org/> is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. This package provides an interface to the 'Arrow C++' library.
Maintained by Jonathan Keane. Last updated 2 months ago.
15k stars 19.25 score 10k scripts 82 dependentsr-dbi
RSQLite:SQLite Interface for R
Embeds the SQLite database engine in R and provides an interface compliant with the DBI package. The source for the SQLite engine and for various extensions in a recent version is included. System libraries will never be consulted because this package relies on static linking for the plugins it includes; this also ensures a consistent experience across all installations.
Maintained by Kirill Müller. Last updated 1 days ago.
331 stars 18.78 score 8.1k scripts 1.1k dependentstidyverse
haven:Import and Export 'SPSS', 'Stata' and 'SAS' Files
Import foreign statistical formats into R via the embedded 'ReadStat' C library, <https://github.com/WizardMac/ReadStat>.
Maintained by Hadley Wickham. Last updated 6 months ago.
427 stars 18.63 score 18k scripts 682 dependentstidyverse
vroom:Read and Write Rectangular Text Data Quickly
The goal of 'vroom' is to read and write data (like 'csv', 'tsv' and 'fwf') quickly. When reading it uses a quick initial indexing step, then reads the values lazily , so only the data you actually use needs to be read. The writer formats the data in parallel and writes to disk asynchronously from formatting.
Maintained by Jennifer Bryan. Last updated 7 months ago.
csvcsv-parserfixed-width-texttsvtsv-parsercpp
625 stars 17.82 score 4.5k scripts 2.1k dependentsgesistsa
rio:A Swiss-Army Knife for Data I/O
Streamlined data import and export by making assumptions that the user is probably willing to make: 'import()' and 'export()' determine the data format from the file extension, reasonable defaults are used for data import and export, web-based import is natively supported (including from SSL/HTTPS), compressed files can be read directly, and fast import packages are used where appropriate. An additional convenience function, 'convert()', provides a simple method for converting between file types.
Maintained by Chung-hong Chan. Last updated 3 months ago.
csvcsvydatadata-scienceexcelioriosasspssstata
610 stars 17.10 score 7.8k scripts 74 dependentsbioc
clusterProfiler:A universal enrichment tool for interpreting omics data
This package supports functional characteristics of both coding and non-coding genomics data for thousands of species with up-to-date gene annotation. It provides a univeral interface for gene functional annotation from a variety of sources and thus can be applied in diverse scenarios. It provides a tidy interface to access, manipulate, and visualize enrichment results to help users achieve efficient data interpretation. Datasets obtained from multiple treatments and time points can be analyzed and compared in a single run, easily revealing functional consensus and differences among distinct conditions.
Maintained by Guangchuang Yu. Last updated 4 months ago.
annotationclusteringgenesetenrichmentgokeggmultiplecomparisonpathwaysreactomevisualizationenrichment-analysisgsea
1.1k stars 17.03 score 11k scripts 48 dependentsamices
mice:Multivariate Imputation by Chained Equations
Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm as described in Van Buuren and Groothuis-Oudshoorn (2011) <doi:10.18637/jss.v045.i03>. Each variable has its own imputation model. Built-in imputation models are provided for continuous data (predictive mean matching, normal), binary data (logistic regression), unordered categorical data (polytomous logistic regression) and ordered categorical data (proportional odds). MICE can also impute continuous two-level data (normal model, pan, second-level variables). Passive imputation can be used to maintain consistency between variables. Various diagnostic plots are available to inspect the quality of the imputations.
Maintained by Stef van Buuren. Last updated 3 days ago.
chained-equationsfcsimputationmicemissing-datamissing-valuesmultiple-imputationmultivariate-datacpp
462 stars 16.64 score 10k scripts 154 dependentsmlverse
torch:Tensors and Neural Networks with 'GPU' Acceleration
Provides functionality to define and train neural networks similar to 'PyTorch' by Paszke et al (2019) <doi:10.48550/arXiv.1912.01703> but written entirely in R using the 'libtorch' library. Also supports low-level tensor operations and 'GPU' acceleration.
Maintained by Daniel Falbel. Last updated 5 days ago.
521 stars 16.50 score 1.4k scripts 39 dependentsr-dbi
odbc:Connect to ODBC Compatible Databases (using the DBI Interface)
A DBI-compatible interface to ODBC databases.
Maintained by Hadley Wickham. Last updated 4 days ago.
396 stars 16.31 score 2.9k scripts 23 dependentsbioc
biomaRt:Interface to BioMart databases (i.e. Ensembl)
In recent years a wealth of biological data has become available in public data repositories. Easy access to these valuable data resources and firm integration with data analysis is needed for comprehensive bioinformatics data analysis. biomaRt provides an interface to a growing collection of databases implementing the BioMart software suite (<http://www.biomart.org>). The package enables retrieval of large amounts of data in a uniform way without the need to know the underlying database schemas or write complex SQL queries. The most prominent examples of BioMart databases are maintain by Ensembl, which provides biomaRt users direct access to a diverse set of data and enables a wide range of powerful online queries from gene annotation to database mining.
Maintained by Mike Smith. Last updated 17 days ago.
annotationbioconductorbiomartensembl
38 stars 15.99 score 13k scripts 230 dependentsbioc
enrichplot:Visualization of Functional Enrichment Result
The 'enrichplot' package implements several visualization methods for interpreting functional enrichment results obtained from ORA or GSEA analysis. It is mainly designed to work with the 'clusterProfiler' package suite. All the visualization methods are developed based on 'ggplot2' graphics.
Maintained by Guangchuang Yu. Last updated 3 months ago.
annotationgenesetenrichmentgokeggpathwayssoftwarevisualizationenrichment-analysispathway-analysis
239 stars 15.71 score 3.1k scripts 58 dependentsnjtierney
naniar:Data Structures, Summaries, and Visualisations for Missing Data
Missing values are ubiquitous in data and need to be explored and handled in the initial stages of analysis. 'naniar' provides data structures and functions that facilitate the plotting of missing values and examination of imputations. This allows missing data dependencies to be explored with minimal deviation from the common work patterns of 'ggplot2' and tidy data. The work is fully discussed at Tierney & Cook (2023) <doi:10.18637/jss.v105.i07>.
Maintained by Nicholas Tierney. Last updated 18 days ago.
data-visualisationggplot2missing-datamissingnesstidy-data
657 stars 15.63 score 5.1k scripts 9 dependentsbioc
GenomicFeatures:Query the gene models of a given organism/assembly
Extract the genomic locations of genes, transcripts, exons, introns, and CDS, for the gene models stored in a TxDb object. A TxDb object is a small database that contains the gene models of a given organism/assembly. Bioconductor provides a small collection of TxDb objects in the form of ready-to-install TxDb packages for the most commonly studied organisms. Additionally, the user can easily make a TxDb object (or package) for the organism/assembly of their choice by using the tools from the txdbmaker package.
Maintained by H. Pagès. Last updated 5 months ago.
geneticsinfrastructureannotationsequencinggenomeannotationbioconductor-packagecore-package
26 stars 15.34 score 5.3k scripts 339 dependentsrich-iannone
DiagrammeR:Graph/Network Visualization
Build graph/network structures using functions for stepwise addition and deletion of nodes and edges. Work with data available in tables for bulk addition of nodes, edges, and associated metadata. Use graph selections and traversals to apply changes to specific nodes or edges. A wide selection of graph algorithms allow for the analysis of graphs. Visualize the graphs and take advantage of any aesthetic properties assigned to nodes and edges.
Maintained by Richard Iannone. Last updated 2 months ago.
graphgraph-functionsnetwork-graphproperty-graphvisualization
1.7k stars 15.29 score 3.8k scripts 86 dependentsbioc
AnnotationDbi:Manipulation of SQLite-based annotations in Bioconductor
Implements a user-friendly interface for querying SQLite-based annotation data packages.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
annotationmicroarraysequencinggenomeannotationbioconductor-packagecore-package
9 stars 15.05 score 3.6k scripts 769 dependentslarmarange
labelled:Manipulating Labelled Data
Work with labelled data imported from 'SPSS' or 'Stata' with 'haven' or 'foreign'. This package provides useful functions to deal with "haven_labelled" and "haven_labelled_spss" classes introduced by 'haven' package.
Maintained by Joseph Larmarange. Last updated 1 months ago.
havenlabelsmetadatasasspssstata
76 stars 15.04 score 2.4k scripts 98 dependentsbioc
DOSE:Disease Ontology Semantic and Enrichment analysis
This package implements five methods proposed by Resnik, Schlicker, Jiang, Lin and Wang respectively for measuring semantic similarities among DO terms and gene products. Enrichment analyses including hypergeometric model and gene set enrichment analysis are also implemented for discovering disease associations of high-throughput biological data.
Maintained by Guangchuang Yu. Last updated 5 months ago.
annotationvisualizationmultiplecomparisongenesetenrichmentpathwayssoftwaredisease-ontologyenrichment-analysissemantic-similarity
119 stars 14.97 score 2.0k scripts 61 dependentsguido-s
meta:General Package for Meta-Analysis
User-friendly general package providing standard methods for meta-analysis and supporting Schwarzer, Carpenter, and Rücker <DOI:10.1007/978-3-319-21416-0>, "Meta-Analysis with R" (2015): - common effect and random effects meta-analysis; - several plots (forest, funnel, Galbraith / radial, L'Abbe, Baujat, bubble); - three-level meta-analysis model; - generalised linear mixed model; - logistic regression with penalised likelihood for rare events; - Hartung-Knapp method for random effects model; - Kenward-Roger method for random effects model; - prediction interval; - statistical tests for funnel plot asymmetry; - trim-and-fill method to evaluate bias in meta-analysis; - meta-regression; - cumulative meta-analysis and leave-one-out meta-analysis; - import data from 'RevMan 5'; - produce forest plot summarising several (subgroup) meta-analyses.
Maintained by Guido Schwarzer. Last updated 3 days ago.
89 stars 14.95 score 2.3k scripts 30 dependentsr-lib
bit64:A S3 Class for Vectors of 64bit Integers
Package 'bit64' provides serializable S3 atomic 64bit (signed) integers. These are useful for handling database keys and exact counting in +-2^63. WARNING: do not use them as replacement for 32bit integers, integer64 are not supported for subscripting by R-core and they have different semantics when combined with double, e.g. integer64 + double => integer64. Class integer64 can be used in vectors, matrices, arrays and data.frames. Methods are available for coercion from and to logicals, integers, doubles, characters and factors as well as many elementwise and summary functions. Many fast algorithmic operations such as 'match' and 'order' support inter- active data exploration and manipulation and optionally leverage caching.
Maintained by Michael Chirico. Last updated 19 days ago.
35 stars 14.91 score 1.5k scripts 3.2k dependentsr-dbi
RPostgres:C++ Interface to PostgreSQL
Fully DBI-compliant C++-backed interface to PostgreSQL <https://www.postgresql.org/>, an open-source relational database.
Maintained by Kirill Müller. Last updated 1 months ago.
338 stars 14.78 score 1.6k scripts 31 dependentsbioc
GSVA:Gene Set Variation Analysis for Microarray and RNA-Seq Data
Gene Set Variation Analysis (GSVA) is a non-parametric, unsupervised method for estimating variation of gene set enrichment through the samples of a expression data set. GSVA performs a change in coordinate systems, transforming the data from a gene by sample matrix to a gene-set by sample matrix, thereby allowing the evaluation of pathway enrichment for each sample. This new matrix of GSVA enrichment scores facilitates applying standard analytical methods like functional enrichment, survival analysis, clustering, CNV-pathway analysis or cross-tissue pathway analysis, in a pathway-centric manner.
Maintained by Robert Castelo. Last updated 10 days ago.
functionalgenomicsmicroarrayrnaseqpathwaysgenesetenrichmentgene-set-enrichmentgenomicspathway-enrichment-analysis
212 stars 14.74 score 1.6k scripts 19 dependentsbioc
TCGAbiolinks:TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data
The aim of TCGAbiolinks is : i) facilitate the GDC open-access data retrieval, ii) prepare the data using the appropriate pre-processing strategies, iii) provide the means to carry out different standard analyses and iv) to easily reproduce earlier research results. In more detail, the package provides multiple methods for analysis (e.g., differential expression analysis, identifying differentially methylated regions) and methods for visualization (e.g., survival plots, volcano plots, starburst plots) in order to easily develop complete analysis pipelines.
Maintained by Tiago Chedraoui Silva. Last updated 1 months ago.
dnamethylationdifferentialmethylationgeneregulationgeneexpressionmethylationarraydifferentialexpressionpathwaysnetworksequencingsurvivalsoftwarebiocbioconductorgdcintegrative-analysistcgatcga-datatcgabiolinks
310 stars 14.47 score 1.6k scripts 6 dependentsbusiness-science
timetk:A Tool Kit for Working with Time Series
Easy visualization, wrangling, and feature engineering of time series data for forecasting and machine learning prediction. Consolidates and extends time series functionality from packages including 'dplyr', 'stats', 'xts', 'forecast', 'slider', 'padr', 'recipes', and 'rsample'.
Maintained by Matt Dancho. Last updated 1 years ago.
coercioncoercion-functionsdata-miningdplyrforecastforecastingforecasting-modelsmachine-learningseries-decompositionseries-signaturetibbletidytidyquanttidyversetimetime-seriestimeseries
626 stars 14.20 score 4.0k scripts 16 dependentsdoi-usgs
dataRetrieval:Retrieval Functions for USGS and EPA Hydrology and Water Quality Data
Collection of functions to help retrieve U.S. Geological Survey and U.S. Environmental Protection Agency water quality and hydrology data from web services. Data are discovered from National Water Information System <https://waterservices.usgs.gov/> and <https://waterdata.usgs.gov/nwis>. Water quality data are obtained from the Water Quality Portal <https://www.waterqualitydata.us/>.
Maintained by Laura DeCicco. Last updated 5 days ago.
286 stars 14.16 score 1.7k scripts 15 dependentsbioc
GOSemSim:GO-terms Semantic Similarity Measures
The semantic comparisons of Gene Ontology (GO) annotations provide quantitative ways to compute similarities between genes and gene groups, and have became important basis for many bioinformatics analysis approaches. GOSemSim is an R package for semantic similarity computation among GO terms, sets of GO terms, gene products and gene clusters. GOSemSim implemented five methods proposed by Resnik, Schlicker, Jiang, Lin and Wang respectively.
Maintained by Guangchuang Yu. Last updated 5 months ago.
annotationgoclusteringpathwaysnetworksoftwarebioinformaticsgene-ontologysemantic-similaritycpp
63 stars 14.12 score 708 scripts 68 dependentsbioc
ensembldb:Utilities to create and use Ensembl-based annotation databases
The package provides functions to create and use transcript centric annotation databases/packages. The annotation for the databases are directly fetched from Ensembl using their Perl API. The functionality and data is similar to that of the TxDb packages from the GenomicFeatures package, but, in addition to retrieve all gene/transcript models and annotations from the database, ensembldb provides a filter framework allowing to retrieve annotations for specific entries like genes encoded on a chromosome region or transcript models of lincRNA genes. EnsDb databases built with ensembldb contain also protein annotations and mappings between proteins and their encoding transcripts. Finally, ensembldb provides functions to map between genomic, transcript and protein coordinates.
Maintained by Johannes Rainer. Last updated 5 months ago.
geneticsannotationdatasequencingcoverageannotationbioconductorbioconductor-packagesensembl
35 stars 14.08 score 892 scripts 108 dependentswalkerke
tidycensus:Load US Census Boundary and Attribute Data as 'tidyverse' and 'sf'-Ready Data Frames
An integrated R interface to several United States Census Bureau APIs (<https://www.census.gov/data/developers/data-sets.html>) and the US Census Bureau's geographic boundary files. Allows R users to return Census and ACS data as tidyverse-ready data frames, and optionally returns a list-column with feature geometry for mapping and spatial analysis.
Maintained by Kyle Walker. Last updated 2 months ago.
648 stars 14.02 score 7.5k scripts 10 dependentsbioc
AnnotationHub:Client to access AnnotationHub resources
This package provides a client for the Bioconductor AnnotationHub web resource. The AnnotationHub web resource provides a central location where genomic files (e.g., VCF, bed, wig) and other resources from standard locations (e.g., UCSC, Ensembl) can be discovered. The resource includes metadata about each resource, e.g., a textual description, tags, and date of modification. The client creates and manages a local cache of files retrieved by the user, helping with quick and reproducible access.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
infrastructuredataimportguithirdpartyclientcore-packageu24ca289073
17 stars 13.88 score 2.7k scripts 104 dependentsbioc
BiocFileCache:Manage Files Across Sessions
This package creates a persistent on-disk cache of files that the user can add, update, and retrieve. It is useful for managing resources (such as custom Txdb objects) that are costly or difficult to create, web resources, and data files used across sessions.
Maintained by Lori Shepherd. Last updated 2 months ago.
dataimportcore-packageu24ca289073
13 stars 13.76 score 486 scripts 436 dependentsropensci
taxize:Taxonomic Information from Around the Web
Interacts with a suite of web application programming interfaces (API) for taxonomic tasks, such as getting database specific taxonomic identifiers, verifying species names, getting taxonomic hierarchies, fetching downstream and upstream taxonomic names, getting taxonomic synonyms, converting scientific to common names and vice versa, and more. Some of the services supported include 'NCBI E-utilities' (<https://www.ncbi.nlm.nih.gov/books/NBK25501/>), 'Encyclopedia of Life' (<https://eol.org/docs/what-is-eol/data-services>), 'Global Biodiversity Information Facility' (<https://techdocs.gbif.org/en/openapi/>), and many more. Links to the API documentation for other supported services are available in the documentation for their respective functions in this package.
Maintained by Zachary Foster. Last updated 27 days ago.
taxonomybiologynomenclaturejsonapiwebapi-clientidentifiersspeciesnamesapi-wrapperbiodiversitydarwincoredatataxize
274 stars 13.63 score 1.6k scripts 23 dependentskaz-yos
tableone:Create 'Table 1' to Describe Baseline Characteristics with or without Propensity Score Weights
Creates 'Table 1', i.e., description of baseline patient characteristics, which is essential in every medical research. Supports both continuous and categorical variables, as well as p-values and standardized mean differences. Weighted data are supported via the 'survey' package.
Maintained by Kazuki Yoshida. Last updated 3 years ago.
baseline-characteristicsdescriptive-statisticsstatistics
221 stars 13.55 score 2.3k scripts 12 dependentsbioc
GEOquery:Get data from NCBI Gene Expression Omnibus (GEO)
The NCBI Gene Expression Omnibus (GEO) is a public repository of microarray data. Given the rich and varied nature of this resource, it is only natural to want to apply BioConductor tools to these data. GEOquery is the bridge between GEO and BioConductor.
Maintained by Sean Davis. Last updated 5 months ago.
microarraydataimportonechanneltwochannelsagebioconductorbioinformaticsdata-sciencegenomicsncbi-geo
93 stars 13.48 score 4.1k scripts 45 dependentsbusiness-science
tidyquant:Tidy Quantitative Financial Analysis
Bringing business and financial analysis to the 'tidyverse'. The 'tidyquant' package provides a convenient wrapper to various 'xts', 'zoo', 'quantmod', 'TTR' and 'PerformanceAnalytics' package functions and returns the objects in the tidy 'tibble' format. The main advantage is being able to use quantitative functions with the 'tidyverse' functions including 'purrr', 'dplyr', 'tidyr', 'ggplot2', 'lubridate', etc. See the 'tidyquant' website for more information, documentation and examples.
Maintained by Matt Dancho. Last updated 2 months ago.
dplyrfinancial-analysisfinancial-datafinancial-statementsmultiple-stocksperformance-analysisperformanceanalyticsquantmodstockstock-exchangesstock-indexesstock-listsstock-performancestock-pricesstock-symboltidyversetime-seriestimeseriesxts
872 stars 13.34 score 5.2k scriptsprojectmosaic
mosaic:Project MOSAIC Statistics and Mathematics Teaching Utilities
Data sets and utilities from Project MOSAIC (<http://www.mosaic-web.org>) used to teach mathematics, statistics, computation and modeling. Funded by the NSF, Project MOSAIC is a community of educators working to tie together aspects of quantitative work that students in science, technology, engineering and mathematics will need in their professional lives, but which are usually taught in isolation, if at all.
Maintained by Randall Pruim. Last updated 1 years ago.
93 stars 13.32 score 7.2k scripts 7 dependentsropensci
visdat:Preliminary Visualisation of Data
Create preliminary exploratory data visualisations of an entire dataset to identify problems or unexpected features using 'ggplot2'.
Maintained by Nicholas Tierney. Last updated 9 months ago.
exploratory-data-analysismissingnesspeer-reviewedropenscivisualisation
452 stars 13.31 score 2.1k scripts 11 dependentsdreamrs
esquisse:Explore and Visualize Your Data Interactively
A 'shiny' gadget to create 'ggplot2' figures interactively with drag-and-drop to map your variables to different aesthetics. You can quickly visualize your data accordingly to their type, export in various formats, and retrieve the code to reproduce the plot.
Maintained by Victor Perrier. Last updated 1 months ago.
addindata-visualizationggplot2rstudio-addinvisualization
1.8k stars 13.31 score 1.1k scripts 1 dependentsoscarkjell
text:Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning
Link R with Transformers from Hugging Face to transform text variables to word embeddings; where the word embeddings are used to statistically test the mean difference between set of texts, compute semantic similarity scores between texts, predict numerical variables, and visual statistically significant words according to various dimensions etc. For more information see <https://www.r-text.org>.
Maintained by Oscar Kjell. Last updated 9 days ago.
deep-learningmachine-learningnlptransformersopenjdk
145 stars 13.21 score 436 scripts 1 dependentswadpac
GGIR:Raw Accelerometer Data Analysis
A tool to process and analyse data collected with wearable raw acceleration sensors as described in Migueles and colleagues (JMPB 2019), and van Hees and colleagues (JApplPhysiol 2014; PLoSONE 2015). The package has been developed and tested for binary data from 'GENEActiv' <https://activinsights.com/>, binary (.gt3x) and .csv-export data from 'Actigraph' <https://theactigraph.com> devices, and binary (.cwa) and .csv-export data from 'Axivity' <https://axivity.com>. These devices are currently widely used in research on human daily physical activity. Further, the package can handle accelerometer data file from any other sensor brand providing that the data is stored in csv format. Also the package allows for external function embedding.
Maintained by Vincent T van Hees. Last updated 17 days ago.
accelerometeractivity-recognitioncircadian-rhythmmovement-sensorsleep
109 stars 13.20 score 342 scripts 3 dependentsbioc
Gviz:Plotting data and annotation information along genomic coordinates
Genomic data analyses requires integrated visualization of known genomic information and new experimental data. Gviz uses the biomaRt and the rtracklayer packages to perform live annotation queries to Ensembl and UCSC and translates this to e.g. gene/transcript structures in viewports of the grid graphics package. This results in genomic information plotted together with your data.
Maintained by Robert Ivanek. Last updated 5 months ago.
visualizationmicroarraysequencing
79 stars 13.05 score 1.4k scripts 46 dependentsbioc
ChIPseeker:ChIPseeker for ChIP peak Annotation, Comparison, and Visualization
This package implements functions to retrieve the nearest genes around the peak, annotate genomic region of the peak, statstical methods for estimate the significance of overlap among ChIP peak data sets, and incorporate GEO database for user to compare the own dataset with those deposited in database. The comparison can be used to infer cooperative regulation and thus can be used to generate hypotheses. Several visualization functions are implemented to summarize the coverage of the peak experiment, average profile and heatmap of peaks binding to TSS regions, genomic annotation, distance to TSS, and overlap of peaks or genes.
Maintained by Guangchuang Yu. Last updated 5 months ago.
annotationchipseqsoftwarevisualizationmultiplecomparisonatac-seqchip-seqcomparisonepigeneticsepigenomics
233 stars 13.05 score 1.6k scripts 5 dependentsggrothendieck
sqldf:Manipulate R Data Frames Using SQL
The sqldf() function is typically passed a single argument which is an SQL select statement where the table names are ordinary R data frame names. sqldf() transparently sets up a database, imports the data frames into that database, performs the SQL select or other statement and returns the result using a heuristic to determine which class to assign to each column of the returned data frame. The sqldf() or read.csv.sql() functions can also be used to read filtered files into R even if the original files are larger than R itself can handle. 'RSQLite', 'RH2', 'RMySQL' and 'RPostgreSQL' backends are supported.
Maintained by G. Grothendieck. Last updated 3 years ago.
250 stars 13.04 score 8.1k scripts 52 dependentsopenair-project
openair:Tools for the Analysis of Air Pollution Data
Tools to analyse, interpret and understand air pollution data. Data are typically regular time series and air quality measurement, meteorological data and dispersion model output can be analysed. The package is described in Carslaw and Ropkins (2012, <doi:10.1016/j.envsoft.2011.09.008>) and subsequent papers.
Maintained by David Carslaw. Last updated 3 days ago.
air-qualityair-quality-datameteorologyopenaircpp
316 stars 12.94 score 1.2k scripts 12 dependentsjuba
questionr:Functions to Make Surveys Processing Easier
Set of functions to make the processing and analysis of surveys easier : interactive shiny apps and addins for data recoding, contingency tables, dataset metadata handling, and several convenience functions.
Maintained by Julien Barnier. Last updated 10 days ago.
83 stars 12.93 score 1.1k scripts 19 dependentscvxgrp
CVXR:Disciplined Convex Optimization
An object-oriented modeling language for disciplined convex programming (DCP) as described in Fu, Narasimhan, and Boyd (2020, <doi:10.18637/jss.v094.i14>). It allows the user to formulate convex optimization problems in a natural way following mathematical convention and DCP rules. The system analyzes the problem, verifies its convexity, converts it into a canonical form, and hands it off to an appropriate solver to obtain the solution. Interfaces to solvers on CRAN and elsewhere are provided, both commercial and open source.
Maintained by Anqi Fu. Last updated 5 months ago.
207 stars 12.89 score 768 scripts 51 dependentsbioc
minfi:Analyze Illumina Infinium DNA methylation arrays
Tools to analyze & visualize Illumina Infinium methylation arrays.
Maintained by Kasper Daniel Hansen. Last updated 4 months ago.
immunooncologydnamethylationdifferentialmethylationepigeneticsmicroarraymethylationarraymultichanneltwochanneldataimportnormalizationpreprocessingqualitycontrol
60 stars 12.82 score 996 scripts 27 dependentsbioc
SpatialExperiment:S4 Class for Spatially Resolved -omics Data
Defines an S4 class for storing data from spatial -omics experiments. The class extends SingleCellExperiment to support storage and retrieval of additional information from spot-based and molecule-based platforms, including spatial coordinates, images, and image metadata. A specialized constructor function is included for data from the 10x Genomics Visium platform.
Maintained by Dario Righelli. Last updated 5 months ago.
datarepresentationdataimportinfrastructureimmunooncologygeneexpressiontranscriptomicssinglecellspatial
59 stars 12.63 score 1.8k scripts 71 dependentsohdsi
DatabaseConnector:Connecting to Various Database Platforms
An R 'DataBase Interface' ('DBI') compatible interface to various database platforms ('PostgreSQL', 'Oracle', 'Microsoft SQL Server', 'Amazon Redshift', 'Microsoft Parallel Database Warehouse', 'IBM Netezza', 'Apache Impala', 'Google BigQuery', 'Snowflake', 'Spark', 'SQLite', and 'InterSystems IRIS'). Also includes support for fetching data as 'Andromeda' objects. Uses either 'Java Database Connectivity' ('JDBC') or other 'DBI' drivers to connect to databases.
Maintained by Martijn Schuemie. Last updated 2 months ago.
56 stars 12.63 score 772 scripts 11 dependentswesm
feather:R Bindings to the Feather 'API'
Read and write feather files, a lightweight binary columnar data store designed for maximum speed.
Maintained by Hadley Wickham. Last updated 4 years ago.
2.7k stars 12.61 score 3.9k scripts 5 dependentsmassimoaria
bibliometrix:Comprehensive Science Mapping Analysis
Tool for quantitative research in scientometrics and bibliometrics. It implements the comprehensive workflow for science mapping analysis proposed in Aria M. and Cuccurullo C. (2017) <doi:10.1016/j.joi.2017.08.007>. 'bibliometrix' provides various routines for importing bibliographic data from 'SCOPUS', 'Clarivate Analytics Web of Science' (<https://www.webofknowledge.com/>), 'Digital Science Dimensions' (<https://www.dimensions.ai/>), 'OpenAlex' (<https://openalex.org/>), 'Cochrane Library' (<https://www.cochranelibrary.com/>), 'Lens' (<https://lens.org>), and 'PubMed' (<https://pubmed.ncbi.nlm.nih.gov/>) databases, performing bibliometric analysis and building networks for co-citation, coupling, scientific collaboration and co-word analysis.
Maintained by Massimo Aria. Last updated 12 days ago.
bibliometric-analysisbibliometricscitationcitation-networkcitationsco-authorsco-occurenceco-word-analysiscorrespondence-analysiscouplingisi-webjournalmanuscriptquantitative-analysisscholarssciencescience-mappingscientificscientometricsscopus
545 stars 12.54 score 518 scripts 2 dependentsr-dbi
bigrquery:An Interface to Google's 'BigQuery' 'API'
Easily talk to Google's 'BigQuery' database from R.
Maintained by Hadley Wickham. Last updated 1 months ago.
520 stars 12.47 score 1.8k scripts 4 dependentssimongrund1
mitml:Tools for Multiple Imputation in Multilevel Modeling
Provides tools for multiple imputation of missing data in multilevel modeling. Includes a user-friendly interface to the packages 'pan' and 'jomo', and several functions for visualization, data management and the analysis of multiply imputed data sets.
Maintained by Simon Grund. Last updated 1 years ago.
imputationmissing-datamixed-effectsmultilevel-datamultilevel-models
29 stars 12.36 score 246 scripts 153 dependentsbioc
TFBSTools:Software Package for Transcription Factor Binding Site (TFBS) Analysis
TFBSTools is a package for the analysis and manipulation of transcription factor binding sites. It includes matrices conversion between Position Frequency Matirx (PFM), Position Weight Matirx (PWM) and Information Content Matrix (ICM). It can also scan putative TFBS from sequence/alignment, query JASPAR database and provides a wrapper of de novo motif discovery software.
Maintained by Ge Tan. Last updated 19 days ago.
motifannotationgeneregulationmotifdiscoverytranscriptionalignment
28 stars 12.36 score 1.1k scripts 18 dependentsouhscbbmc
REDCapR:Interaction Between R and REDCap
Encapsulates functions to streamline calls from R to the REDCap API. REDCap (Research Electronic Data CAPture) is a web application for building and managing online surveys and databases developed at Vanderbilt University. The Application Programming Interface (API) offers an avenue to access and modify data programmatically, improving the capacity for literate and reproducible programming.
Maintained by Will Beasley. Last updated 3 months ago.
118 stars 12.36 score 438 scripts 6 dependentsbioc
ReactomePA:Reactome Pathway Analysis
This package provides functions for pathway analysis based on REACTOME pathway database. It implements enrichment analysis, gene set enrichment analysis and several functions for visualization. This package is not affiliated with the Reactome team.
Maintained by Guangchuang Yu. Last updated 5 months ago.
pathwaysvisualizationannotationmultiplecomparisongenesetenrichmentreactomeenrichment-analysisreactome-pathway-analysisreactomepa
40 stars 12.25 score 1.5k scripts 7 dependentsbioc
ggbio:Visualization tools for genomic data
The ggbio package extends and specializes the grammar of graphics for biological data. The graphics are designed to answer common scientific questions, in particular those often asked of high throughput genomics data. All core Bioconductor data structures are supported, where appropriate. The package supports detailed views of particular genomic regions, as well as genome-wide overviews. Supported overviews include ideograms and grand linear views. High-level plots include sequence fragment length, edge-linked interval to data view, mismatch pileup, and several splicing summaries.
Maintained by Michael Lawrence. Last updated 5 months ago.
111 stars 12.23 score 734 scripts 16 dependentsr-dbi
RMariaDB:Database Interface and MariaDB Driver
Implements a DBI-compliant interface to MariaDB (<https://mariadb.org/>) and MySQL (<https://www.mysql.com/>) databases.
Maintained by Kirill Müller. Last updated 1 months ago.
133 stars 12.20 score 792 scripts 10 dependentshhoeflin
hdf5r:Interface to the 'HDF5' Binary Data Format
'HDF5' is a data model, library and file format for storing and managing large amounts of data. This package provides a nearly feature complete, object oriented wrapper for the 'HDF5' API <https://support.hdfgroup.org/documentation/hdf5/latest/_r_m.html> using R6 classes. Additionally, functionality is added so that 'HDF5' objects behave very similar to their corresponding R counterparts.
Maintained by Holger Hoefling. Last updated 2 months ago.
82 stars 12.09 score 988 scripts 34 dependentsdreamrs
datamods:Modules to Import and Manipulate Data in 'Shiny'
'Shiny' modules to import data into an application or 'addin' from various sources, and to manipulate them after that.
Maintained by Victor Perrier. Last updated 26 days ago.
144 stars 12.03 score 174 scripts 7 dependentsbioc
ExperimentHub:Client to access ExperimentHub resources
This package provides a client for the Bioconductor ExperimentHub web resource. ExperimentHub provides a central location where curated data from experiments, publications or training courses can be accessed. Each resource has associated metadata, tags and date of modification. The client creates and manages a local cache of files retrieved enabling quick and reproducible access.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
infrastructuredataimportguithirdpartyclientcore-packageu24ca289073
10 stars 11.94 score 764 scripts 57 dependentsbioc
GenomicDataCommons:NIH / NCI Genomic Data Commons Access
Programmatically access the NIH / NCI Genomic Data Commons RESTful service.
Maintained by Sean Davis. Last updated 2 months ago.
dataimportsequencingapi-clientbioconductorbioinformaticscancercore-servicesdata-sciencegenomicsncitcgavignette
87 stars 11.94 score 238 scripts 12 dependentsguido-s
netmeta:Network Meta-Analysis using Frequentist Methods
A comprehensive set of functions providing frequentist methods for network meta-analysis (Balduzzi et al., 2023) <doi:10.18637/jss.v106.i02> and supporting Schwarzer et al. (2015) <doi:10.1007/978-3-319-21416-0>, Chapter 8 "Network Meta-Analysis": - frequentist network meta-analysis following Rücker (2012) <doi:10.1002/jrsm.1058>; - additive network meta-analysis for combinations of treatments (Rücker et al., 2020) <doi:10.1002/bimj.201800167>; - network meta-analysis of binary data using the Mantel-Haenszel or non-central hypergeometric distribution method (Efthimiou et al., 2019) <doi:10.1002/sim.8158>, or penalised logistic regression (Evrenoglou et al., 2022) <doi:10.1002/sim.9562>; - rankograms and ranking of treatments by the Surface under the cumulative ranking curve (SUCRA) (Salanti et al., 2013) <doi:10.1016/j.jclinepi.2010.03.016>; - ranking of treatments using P-scores (frequentist analogue of SUCRAs without resampling) according to Rücker & Schwarzer (2015) <doi:10.1186/s12874-015-0060-8>; - split direct and indirect evidence to check consistency (Dias et al., 2010) <doi:10.1002/sim.3767>, (Efthimiou et al., 2019) <doi:10.1002/sim.8158>; - league table with network meta-analysis results; - 'comparison-adjusted' funnel plot (Chaimani & Salanti, 2012) <doi:10.1002/jrsm.57>; - net heat plot and design-based decomposition of Cochran's Q according to Krahn et al. (2013) <doi:10.1186/1471-2288-13-35>; - measures characterizing the flow of evidence between two treatments by König et al. (2013) <doi:10.1002/sim.6001>; - automated drawing of network graphs described in Rücker & Schwarzer (2016) <doi:10.1002/jrsm.1143>; - partial order of treatment rankings ('poset') and Hasse diagram for 'poset' (Carlsen & Bruggemann, 2014) <doi:10.1002/cem.2569>; (Rücker & Schwarzer, 2017) <doi:10.1002/jrsm.1270>; - contribution matrix as described in Papakonstantinou et al. (2018) <doi:10.12688/f1000research.14770.3> and Davies et al. (2022) <doi:10.1002/sim.9346>; - subgroup network meta-analysis.
Maintained by Guido Schwarzer. Last updated 10 days ago.
meta-analysisnetwork-meta-analysisrstudio
33 stars 11.84 score 199 scripts 10 dependentstiledb-inc
tiledb:Modern Database Engine for Complex Data Based on Multi-Dimensional Arrays
The modern database 'TileDB' introduces a powerful on-disk format for storing and accessing any complex data based on multi-dimensional arrays. It supports dense and sparse arrays, dataframes and key-values stores, cloud storage ('S3', 'GCS', 'Azure'), chunked arrays, multiple compression, encryption and checksum filters, uses a fully multi-threaded implementation, supports parallel I/O, data versioning ('time travel'), metadata and groups. It is implemented as an embeddable cross-platform C++ library with APIs from several languages, and integrations. This package provides the R support.
Maintained by Isaiah Norton. Last updated 3 days ago.
arrayhdfss3storage-managertiledbcpp
108 stars 11.79 score 306 scripts 4 dependentsateucher
rmapshaper:Client for 'mapshaper' for 'Geospatial' Operations
Edit and simplify 'geojson', 'Spatial', and 'sf' objects. This is wrapper around the 'mapshaper' 'JavaScript' library by Matthew Bloch <https://github.com/mbloch/mapshaper/> to perform topologically-aware polygon simplification, as well as other operations such as clipping, erasing, dissolving, and converting 'multi-part' to 'single-part' geometries.
Maintained by Andy Teucher. Last updated 9 months ago.
204 stars 11.64 score 2.1k scripts 18 dependentspecanproject
PEcAn.data.atmosphere:PEcAn Functions Used for Managing Climate Driver Data
The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The PECAn.data.atmosphere package converts climate driver data into a standard format for models integrated into PEcAn. As a standalone package, it provides an interface to access diverse climate data sets.
Maintained by David LeBauer. Last updated 7 hours ago.
bayesiancyberinfrastructuredata-assimilationdata-scienceecosystem-modelecosystem-scienceforecastingmeta-analysisnational-science-foundationpecanplants
216 stars 11.63 score 64 scripts 14 dependentsbioc
bumphunter:Bump Hunter
Tools for finding bumps in genomic data
Maintained by Tamilselvi Guharaj. Last updated 5 months ago.
dnamethylationepigeneticsinfrastructuremultiplecomparisonimmunooncology
16 stars 11.61 score 210 scripts 43 dependentsprojectmosaic
ggformula:Formula Interface to the Grammar of Graphics
Provides a formula interface to 'ggplot2' graphics.
Maintained by Randall Pruim. Last updated 1 years ago.
38 stars 11.55 score 1.7k scripts 25 dependentsbioc
mia:Microbiome analysis
mia implements tools for microbiome analysis based on the SummarizedExperiment, SingleCellExperiment and TreeSummarizedExperiment infrastructure. Data wrangling and analysis in the context of taxonomic data is the main scope. Additional functions for common task are implemented such as community indices calculation and summarization.
Maintained by Tuomas Borman. Last updated 4 days ago.
microbiomesoftwaredataimportanalysisbioconductorcpp
51 stars 11.51 score 316 scripts 5 dependentslarmarange
broom.helpers:Helpers for Model Coefficients Tibbles
Provides suite of functions to work with regression model 'broom::tidy()' tibbles. The suite includes functions to group regression model terms by variable, insert reference and header rows for categorical variables, add variable labels, and more.
Maintained by Joseph Larmarange. Last updated 25 days ago.
22 stars 11.45 score 165 scripts 2 dependentsprivefl
bigsnpr:Analysis of Massive SNP Arrays
Easy-to-use, efficient, flexible and scalable tools for analyzing massive SNP arrays. Privé et al. (2018) <doi:10.1093/bioinformatics/bty185>.
Maintained by Florian Privé. Last updated 25 days ago.
big-databioinformaticsmemory-mapped-fileparallel-computingpolygenic-scorespopulation-structure-inferencesnp-datastatistical-methodsopenblaszlibcppopenmp
200 stars 11.44 score 1.5k scripts 3 dependentsewenharrison
finalfit:Quickly Create Elegant Regression Results Tables and Plots when Modelling
Generate regression results tables and plots in final format for publication. Explore models and export directly to PDF and 'Word' using 'RMarkdown'.
Maintained by Ewen Harrison. Last updated 10 days ago.
270 stars 11.43 score 1.0k scriptsdarwin-eu
CDMConnector:Connect to an OMOP Common Data Model
Provides tools for working with observational health data in the Observational Medical Outcomes Partnership (OMOP) Common Data Model format with a pipe friendly syntax. Common data model database table references are stored in a single compound object along with metadata.
Maintained by Adam Black. Last updated 1 months ago.
12 stars 11.43 score 502 scripts 12 dependentsbioc
annotate:Annotation for microarrays
Using R enviroments for annotation.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
11.41 score 812 scripts 239 dependentsbioc
VariantAnnotation:Annotation of Genetic Variants
Annotate variants, compute amino acid coding changes, predict coding outcomes.
Maintained by Bioconductor Package Maintainer. Last updated 3 months ago.
dataimportsequencingsnpannotationgeneticsvariantannotationcurlbzip2xz-utilszlib
11.39 score 1.9k scripts 152 dependentsopenintrostat
openintro:Datasets and Supplemental Functions from 'OpenIntro' Textbooks and Labs
Supplemental functions and data for 'OpenIntro' resources, which includes open-source textbooks and resources for introductory statistics (<https://www.openintro.org/>). The package contains datasets used in our open-source textbooks along with custom plotting functions for reproducing book figures. Note that many functions and examples include color transparency; some plotting elements may not show up properly (or at all) when run in some versions of Windows operating system.
Maintained by Mine Çetinkaya-Rundel. Last updated 3 months ago.
240 stars 11.39 score 6.0k scriptsdoi-usgs
nhdplusTools:NHDPlus Tools
Tools for traversing and working with National Hydrography Dataset Plus (NHDPlus) data. All methods implemented in 'nhdplusTools' are available in the NHDPlus documentation available from the US Environmental Protection Agency <https://www.epa.gov/waterdata/basic-information>.
Maintained by David Blodgett. Last updated 1 months ago.
87 stars 11.38 score 348 scripts 5 dependentsbioc
pathview:a tool set for pathway based data integration and visualization
Pathview is a tool set for pathway based data integration and visualization. It maps and renders a wide variety of biological data on relevant pathway graphs. All users need is to supply their data and specify the target pathway. Pathview automatically downloads the pathway graph data, parses the data file, maps user data to the pathway, and render pathway graph with the mapped data. In addition, Pathview also seamlessly integrates with pathway and gene set (enrichment) analysis tools for large-scale and fully automated analysis.
Maintained by Weijun Luo. Last updated 2 days ago.
pathwaysgraphandnetworkvisualizationgenesetenrichmentdifferentialexpressiongeneexpressionmicroarrayrnaseqgeneticsmetabolomicsproteomicssystemsbiologysequencing
40 stars 11.37 score 1.6k scripts 10 dependentsropensci
biomartr:Genomic Data Retrieval
Perform large scale genomic data retrieval and functional annotation retrieval. This package aims to provide users with a standardized way to automate genome, proteome, 'RNA', coding sequence ('CDS'), 'GFF', and metagenome retrieval from 'NCBI RefSeq', 'NCBI Genbank', 'ENSEMBL', and 'UniProt' databases. Furthermore, an interface to the 'BioMart' database (Smedley et al. (2009) <doi:10.1186/1471-2164-10-22>) allows users to retrieve functional annotation for genomic loci. In addition, users can download entire databases such as 'NCBI RefSeq' (Pruitt et al. (2007) <doi:10.1093/nar/gkl842>), 'NCBI nr', 'NCBI nt', 'NCBI Genbank' (Benson et al. (2013) <doi:10.1093/nar/gks1195>), etc. with only one command.
Maintained by Hajk-Georg Drost. Last updated 2 months ago.
biomartgenomic-data-retrievalannotation-retrievaldatabase-retrievalncbiensemblbiological-data-retrievalensembl-serversgenomegenome-annotationgenome-retrievalgenomicsmeta-analysismetagenomicsncbi-genbankpeer-reviewedproteomesequenced-genomes
218 stars 11.35 score 129 scripts 3 dependentsmrcieu
TwoSampleMR:Two Sample MR Functions and Interface to MRC Integrative Epidemiology Unit OpenGWAS Database
A package for performing Mendelian randomization using GWAS summary data. It uses the IEU OpenGWAS database <https://gwas.mrcieu.ac.uk/> to automatically obtain data, and a wide range of methods to run the analysis.
Maintained by Gibran Hemani. Last updated 3 days ago.
476 stars 11.27 score 1.7k scripts 1 dependentsbioc
karyoploteR:Plot customizable linear genomes displaying arbitrary data
karyoploteR creates karyotype plots of arbitrary genomes and offers a complete set of functions to plot arbitrary data on them. It mimicks many R base graphics functions coupling them with a coordinate change function automatically mapping the chromosome and data coordinates into the plot coordinates. In addition to the provided data plotting functions, it is easy to add new ones.
Maintained by Bernat Gel. Last updated 5 months ago.
visualizationcopynumbervariationsequencingcoveragednaseqchipseqmethylseqdataimportonechannelbioconductorbioinformaticsdata-visualizationgenomegenomics-visualizationplotting-in-r
307 stars 11.25 score 656 scripts 4 dependentsbioc
genomation:Summary, annotation and visualization of genomic data
A package for summary and annotation of genomic intervals. Users can visualize and quantify genomic intervals over pre-defined functional regions, such as promoters, exons, introns, etc. The genomic intervals represent regions with a defined chromosome position, which may be associated with a score, such as aligned reads from HT-seq experiments, TF binding sites, methylation scores, etc. The package can use any tabular genomic feature data as long as it has minimal information on the locations of genomic intervals. In addition, It can use BAM or BigWig files as input.
Maintained by Altuna Akalin. Last updated 5 months ago.
annotationsequencingvisualizationcpgislandcpp
76 stars 11.13 score 738 scripts 5 dependentsbioc
genefilter:genefilter: methods for filtering genes from high-throughput experiments
Some basic functions for filtering genes.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
11.11 score 2.4k scripts 143 dependentscovid19datahub
COVID19:COVID-19 Data Hub
Unified datasets for a better understanding of COVID-19.
Maintained by Emanuele Guidotti. Last updated 1 months ago.
2019-ncovcoronaviruscovid-19covid-datacovid19-data
252 stars 11.08 score 265 scriptsropengov
eurostat:Tools for Eurostat Open Data
Tools to download data from the Eurostat database <https://ec.europa.eu/eurostat> together with search and manipulation utilities.
Maintained by Leo Lahti. Last updated 1 months ago.
242 stars 11.07 score 892 scripts 4 dependentschoonghyunryu
dlookr:Tools for Data Diagnosis, Exploration, Transformation
A collection of tools that support data diagnosis, exploration, and transformation. Data diagnostics provides information and visualization of missing values, outliers, and unique and negative values to help you understand the distribution and quality of your data. Data exploration provides information and visualization of the descriptive statistics of univariate variables, normality tests and outliers, correlation of two variables, and the relationship between the target variable and predictor. Data transformation supports binning for categorizing continuous variables, imputes missing values and outliers, and resolves skewness. And it creates automated reports that support these three tasks.
Maintained by Choonghyun Ryu. Last updated 10 months ago.
212 stars 11.05 score 748 scripts 2 dependentsipums
ipumsr:An R Interface for Downloading, Reading, and Handling IPUMS Data
An easy way to work with census, survey, and geographic data provided by IPUMS in R. Generate and download data through the IPUMS API and load IPUMS files into R with their associated metadata to make analysis easier. IPUMS data describing 1.4 billion individuals drawn from over 750 censuses and surveys is available free of charge from the IPUMS website <https://www.ipums.org>.
Maintained by Derek Burk. Last updated 1 months ago.
30 stars 11.05 score 720 scripts 2 dependentsuupharmacometrics
xpose:Diagnostics for Pharmacometric Models
Diagnostics for non-linear mixed-effects (population) models from 'NONMEM' <https://www.iconplc.com/solutions/technologies/nonmem/>. 'xpose' facilitates data import, creation of numerical run summary and provide 'ggplot2'-based graphics for data exploration and model diagnostics.
Maintained by Benjamin Guiastrennec. Last updated 3 months ago.
diagnosticsggplot2nonmempharmacometricsxpose
62 stars 11.02 score 183 scripts 6 dependentseddelbuettel
nanotime:Nanosecond-Resolution Time Support for R
Full 64-bit resolution date and time functionality with nanosecond granularity is provided, with easy transition to and from the standard 'POSIXct' type. Three additional classes offer interval, period and duration functionality for nanosecond-resolution timestamps.
Maintained by Dirk Eddelbuettel. Last updated 2 months ago.
datetimedatetimesnanosecond-resolutionnanosecondscpp
53 stars 10.91 score 134 scripts 17 dependentsohdsi
PatientLevelPrediction:Develop Clinical Prediction Models Using the Common Data Model
A user friendly way to create patient level prediction models using the Observational Medical Outcomes Partnership Common Data Model. Given a cohort of interest and an outcome of interest, the package can use data in the Common Data Model to build a large set of features. These features can then be used to fit a predictive model with a number of machine learning algorithms. This is further described in Reps (2017) <doi:10.1093/jamia/ocy032>.
Maintained by Egill Fridgeirsson. Last updated 24 days ago.
190 stars 10.85 score 297 scriptsropensci
geojsonio:Convert Data from and to 'GeoJSON' or 'TopoJSON'
Convert data to 'GeoJSON' or 'TopoJSON' from various R classes, including vectors, lists, data frames, shape files, and spatial classes. 'geojsonio' does not aim to replace packages like 'sp', 'rgdal', 'rgeos', but rather aims to be a high level client to simplify conversions of data from and to 'GeoJSON' and 'TopoJSON'.
Maintained by Michael Mahoney. Last updated 1 years ago.
geojsontopojsongeospatialconversiondatainput-outputio
151 stars 10.83 score 2.9k scripts 13 dependentsbioc
ANCOMBC:Microbiome differential abudance and correlation analyses with bias correction
ANCOMBC is a package containing differential abundance (DA) and correlation analyses for microbiome data. Specifically, the package includes Analysis of Compositions of Microbiomes with Bias Correction 2 (ANCOM-BC2), Analysis of Compositions of Microbiomes with Bias Correction (ANCOM-BC), and Analysis of Composition of Microbiomes (ANCOM) for DA analysis, and Sparse Estimation of Correlations among Microbiomes (SECOM) for correlation analysis. Microbiome data are typically subject to two sources of biases: unequal sampling fractions (sample-specific biases) and differential sequencing efficiencies (taxon-specific biases). Methodologies included in the ANCOMBC package are designed to correct these biases and construct statistically consistent estimators.
Maintained by Huang Lin. Last updated 15 days ago.
differentialexpressionmicrobiomenormalizationsequencingsoftwareancomancombcancombc2correlationdifferential-abundance-analysissecom
120 stars 10.79 score 406 scripts 1 dependentswelch-lab
rliger:Linked Inference of Genomic Experimental Relationships
Uses an extension of nonnegative matrix factorization to identify shared and dataset-specific factors. See Welch J, Kozareva V, et al (2019) <doi:10.1016/j.cell.2019.05.006>, and Liu J, Gao C, Sodicoff J, et al (2020) <doi:10.1038/s41596-020-0391-8> for more details.
Maintained by Yichen Wang. Last updated 3 months ago.
nonnegative-matrix-factorizationsingle-cellopenblascpp
408 stars 10.77 score 334 scripts 1 dependentsjimmyday12
fitzRoy:Easily Scrape and Process AFL Data
An easy package for scraping and processing Australia Rules Football (AFL) data. 'fitzRoy' provides a range of functions for accessing publicly available data from 'AFL Tables' <https://afltables.com/afl/afl_index.html>, 'Footy Wire' <https://www.footywire.com> and 'The Squiggle' <https://squiggle.com.au>. Further functions allow for easy processing, cleaning and transformation of this data into formats that can be used for analysis.
Maintained by James Day. Last updated 10 days ago.
136 stars 10.72 score 324 scriptsbioc
GWASTools:Tools for Genome Wide Association Studies
Classes for storing very large GWAS data sets and annotation, and functions for GWAS data cleaning and analysis.
Maintained by Stephanie M. Gogarten. Last updated 13 days ago.
snpgeneticvariabilityqualitycontrolmicroarray
17 stars 10.67 score 396 scripts 5 dependentsdoi-usgs
EGRET:Exploration and Graphics for RivEr Trends
Statistics and graphics for streamflow history, water quality trends, and the statistical modeling algorithm: Weighted Regressions on Time, Discharge, and Season (WRTDS).
Maintained by Laura DeCicco. Last updated 4 months ago.
usgswater-qualitywater-quality-data
90 stars 10.67 score 362 scripts 1 dependentsohdsi
FeatureExtraction:Generating Features for a Cohort
An R interface for generating features for a cohort using data in the Common Data Model. Features can be constructed using default or custom made feature definitions. Furthermore it's possible to aggregate features and get the summary statistics.
Maintained by Ger Inberg. Last updated 11 days ago.
62 stars 10.64 score 209 scripts 2 dependentsbusiness-science
modeltime:The Tidymodels Extension for Time Series Modeling
The time series forecasting framework for use with the 'tidymodels' ecosystem. Models include ARIMA, Exponential Smoothing, and additional time series models from the 'forecast' and 'prophet' packages. Refer to "Forecasting Principles & Practice, Second edition" (<https://otexts.com/fpp2/>). Refer to "Prophet: forecasting at scale" (<https://research.facebook.com/blog/2017/02/prophet-forecasting-at-scale/>.).
Maintained by Matt Dancho. Last updated 5 months ago.
arimadata-sciencedeep-learningetsforecastingmachine-learningmachine-learning-algorithmsmodeltimeprophettbatstidymodelingtidymodelstimetime-seriestime-series-analysistimeseriestimeseries-forecasting
551 stars 10.61 score 1.1k scripts 7 dependentsprivefl
bigstatsr:Statistical Tools for Filebacked Big Matrices
Easy-to-use, efficient, flexible and scalable statistical tools. Package bigstatsr provides and uses Filebacked Big Matrices via memory-mapping. It provides for instance matrix operations, Principal Component Analysis, sparse linear supervised models, utility functions and more <doi:10.1093/bioinformatics/bty185>.
Maintained by Florian Privé. Last updated 7 months ago.
big-datalarge-matricesmemory-mapped-fileparallel-computingstatistical-methodsopenblascppopenmp
180 stars 10.59 score 394 scripts 16 dependentsbioc
tximeta:Transcript Quantification Import with Automatic Metadata
Transcript quantification import from Salmon and other quantifiers with automatic attachment of transcript ranges and release information, and other associated metadata. De novo transcriptomes can be linked to the appropriate sources with linkedTxomes and shared for computational reproducibility.
Maintained by Michael Love. Last updated 2 months ago.
annotationgenomeannotationdataimportpreprocessingrnaseqsinglecelltranscriptomicstranscriptiongeneexpressionfunctionalgenomicsreproducibleresearchreportwritingimmunooncology
67 stars 10.58 score 466 scripts 1 dependentsbioc
ORFik:Open Reading Frames in Genomics
R package for analysis of transcript and translation features through manipulation of sequence data and NGS data like Ribo-Seq, RNA-Seq, TCP-Seq and CAGE. It is generalized in the sense that any transcript region can be analysed, as the name hints to it was made with investigation of ribosomal patterns over Open Reading Frames (ORFs) as it's primary use case. ORFik is extremely fast through use of C++, data.table and GenomicRanges. Package allows to reassign starts of the transcripts with the use of CAGE-Seq data, automatic shifting of RiboSeq reads, finding of Open Reading Frames for whole genomes and much more.
Maintained by Haakon Tjeldnes. Last updated 1 months ago.
immunooncologysoftwaresequencingriboseqrnaseqfunctionalgenomicscoveragealignmentdataimportcpp
33 stars 10.56 score 115 scripts 2 dependentsdatastorm-open
shinymanager:Authentication Management for 'Shiny' Applications
Simple and secure authentification mechanism for single 'Shiny' applications. Credentials can be stored in an encrypted 'SQLite' database or on your own SQL Database (Postgres, MySQL, ...). Source code of main application is protected until authentication is successful.
Maintained by Benoit Thieurmel. Last updated 11 months ago.
391 stars 10.51 score 316 scripts 2 dependentsbioc
ballgown:Flexible, isoform-level differential expression analysis
Tools for statistical analysis of assembled transcriptomes, including flexible differential expression analysis, visualization of transcript structures, and matching of assembled transcripts to annotation.
Maintained by Jack Fu. Last updated 5 months ago.
immunooncologyrnaseqstatisticalmethodpreprocessingdifferentialexpression
145 stars 10.51 score 338 scripts 1 dependentsropensci
gutenbergr:Download and Process Public Domain Works from Project Gutenberg
Download and process public domain works in the Project Gutenberg collection <https://www.gutenberg.org/>. Includes metadata for all Project Gutenberg works, so that they can be searched and retrieved.
Maintained by Jon Harmon. Last updated 3 months ago.
105 stars 10.50 score 1.1k scripts 1 dependentsrstudio
vetiver:Version, Share, Deploy, and Monitor Models
The goal of 'vetiver' is to provide fluent tooling to version, share, deploy, and monitor a trained model. Functions handle both recording and checking the model's input data prototype, and predicting from a remote API endpoint. The 'vetiver' package is extensible, with generics that can support many kinds of models.
Maintained by Julia Silge. Last updated 6 months ago.
185 stars 10.48 score 466 scripts 1 dependentsbioc
GENESIS:GENetic EStimation and Inference in Structured samples (GENESIS): Statistical methods for analyzing genetic data from samples with population structure and/or relatedness
The GENESIS package provides methodology for estimating, inferring, and accounting for population and pedigree structure in genetic analyses. The current implementation provides functions to perform PC-AiR (Conomos et al., 2015, Gen Epi) and PC-Relate (Conomos et al., 2016, AJHG). PC-AiR performs a Principal Components Analysis on genome-wide SNP data for the detection of population structure in a sample that may contain known or cryptic relatedness. Unlike standard PCA, PC-AiR accounts for relatedness in the sample to provide accurate ancestry inference that is not confounded by family structure. PC-Relate uses ancestry representative principal components to adjust for population structure/ancestry and accurately estimate measures of recent genetic relatedness such as kinship coefficients, IBD sharing probabilities, and inbreeding coefficients. Additionally, functions are provided to perform efficient variance component estimation and mixed model association testing for both quantitative and binary phenotypes.
Maintained by Stephanie M. Gogarten. Last updated 2 months ago.
snpgeneticvariabilitygeneticsstatisticalmethoddimensionreductionprincipalcomponentgenomewideassociationqualitycontrolbiocviews
36 stars 10.44 score 342 scripts 1 dependentsbioc
oligo:Preprocessing tools for oligonucleotide arrays
A package to analyze oligonucleotide arrays (expression/SNP/tiling/exon) at probe-level. It currently supports Affymetrix (CEL files) and NimbleGen arrays (XYS files).
Maintained by Benilton Carvalho. Last updated 23 days ago.
microarrayonechanneltwochannelpreprocessingsnpdifferentialexpressionexonarraygeneexpressiondataimportzlib
3 stars 10.42 score 528 scripts 10 dependentsposit-dev
connectapi:Utilities for Interacting with the 'Posit Connect' Server API
Provides a helpful 'R6' class and methods for interacting with the 'Posit Connect' Server API along with some meaningful utility functions for regular tasks. API documentation varies by 'Posit Connect' installation and version, but the latest documentation is also hosted publicly at <https://docs.posit.co/connect/api/>.
Maintained by Toph Allen. Last updated 6 days ago.
47 stars 10.42 score 252 scripts 1 dependentsegeulgen
pathfindR:Enrichment Analysis Utilizing Active Subnetworks
Enrichment analysis enables researchers to uncover mechanisms underlying a phenotype. However, conventional methods for enrichment analysis do not take into account protein-protein interaction information, resulting in incomplete conclusions. 'pathfindR' is a tool for enrichment analysis utilizing active subnetworks. The main function identifies active subnetworks in a protein-protein interaction network using a user-provided list of genes and associated p values. It then performs enrichment analyses on the identified subnetworks, identifying enriched terms (i.e. pathways or, more broadly, gene sets) that possibly underlie the phenotype of interest. 'pathfindR' also offers functionalities to cluster the enriched terms and identify representative terms in each cluster, to score the enriched terms per sample and to visualize analysis results. The enrichment, clustering and other methods implemented in 'pathfindR' are described in detail in Ulgen E, Ozisik O, Sezerman OU. 2019. 'pathfindR': An R Package for Comprehensive Identification of Enriched Pathways in Omics Data Through Active Subnetworks. Front. Genet. <doi:10.3389/fgene.2019.00858>.
Maintained by Ege Ulgen. Last updated 1 months ago.
active-subnetworksenrichmentpathwaypathway-enrichment-analysissubnetwork
187 stars 10.38 score 138 scriptsbcgov
bcdata:Search and Retrieve Data from the BC Data Catalogue
Search, query, and download tabular and 'geospatial' data from the British Columbia Data Catalogue (<https://catalogue.data.gov.bc.ca/>). Search catalogue data records based on keywords, data licence, sector, data format, and B.C. government organization. View metadata directly in R, download many data formats, and query 'geospatial' data available via the B.C. government Web Feature Service ('WFS') using 'dplyr' syntax.
Maintained by Andy Teucher. Last updated 5 days ago.
83 stars 10.36 score 186 scripts 4 dependentsssnn-airr
alakazam:Immunoglobulin Clonal Lineage and Diversity Analysis
Provides methods for high-throughput adaptive immune receptor repertoire sequencing (AIRR-Seq; Rep-Seq) analysis. In particular, immunoglobulin (Ig) sequence lineage reconstruction, lineage topology analysis, diversity profiling, amino acid property analysis and gene usage. Citations: Gupta and Vander Heiden, et al (2017) <doi:10.1093/bioinformatics/btv359>, Stern, Yaari and Vander Heiden, et al (2014) <doi:10.1126/scitranslmed.3008879>.
Maintained by Susanna Marquez. Last updated 3 months ago.
10.33 score 424 scripts 7 dependentsms609
TreeDist:Calculate and Map Distances Between Phylogenetic Trees
Implements measures of tree similarity, including information-based generalized Robinson-Foulds distances (Phylogenetic Information Distance, Clustering Information Distance, Matching Split Information Distance; Smith 2020) <doi:10.1093/bioinformatics/btaa614>; Jaccard-Robinson-Foulds distances (Bocker et al. 2013) <doi:10.1007/978-3-642-40453-5_13>, including the Nye et al. (2006) metric <doi:10.1093/bioinformatics/bti720>; the Matching Split Distance (Bogdanowicz & Giaro 2012) <doi:10.1109/TCBB.2011.48>; Maximum Agreement Subtree distances; the Kendall-Colijn (2016) distance <doi:10.1093/molbev/msw124>, and the Nearest Neighbour Interchange (NNI) distance, approximated per Li et al. (1996) <doi:10.1007/3-540-61332-3_168>. Includes tools for visualizing mappings of tree space (Smith 2022) <doi:10.1093/sysbio/syab100>, for identifying islands of trees (Silva and Wilkinson 2021) <doi:10.1093/sysbio/syab015>, for calculating the median of sets of trees, and for computing the information content of trees and splits.
Maintained by Martin R. Smith. Last updated 2 months ago.
phylogeneticstree-distancephylogenetic-treestree-distancestreescpp
32 stars 10.32 score 97 scripts 5 dependentsbioc
pRoloc:A unifying bioinformatics framework for spatial proteomics
The pRoloc package implements machine learning and visualisation methods for the analysis and interogation of quantitiative mass spectrometry data to reliably infer protein sub-cellular localisation.
Maintained by Lisa Breckels. Last updated 4 days ago.
immunooncologyproteomicsmassspectrometryclassificationclusteringqualitycontrolbioconductorproteomics-dataspatial-proteomicsvisualisationopenblascpp
15 stars 10.31 score 101 scripts 2 dependentsrichardli
SUMMER:Small-Area-Estimation Unit/Area Models and Methods for Estimation in R
Provides methods for spatial and spatio-temporal smoothing of demographic and health indicators using survey data, with particular focus on estimating and projecting under-five mortality rates, described in Mercer et al. (2015) <doi:10.1214/15-AOAS872>, Li et al. (2019) <doi:10.1371/journal.pone.0210645>, Wu et al. (DHS Spatial Analysis Reports No. 21, 2021), and Li et al. (2023) <doi:10.48550/arXiv.2007.05117>.
Maintained by Zehang R Li. Last updated 3 months ago.
bayesian-inferencesmall-area-estimationspace-time
23 stars 10.28 score 134 scripts 2 dependentsbioc
GSEABase:Gene set enrichment data structures and methods
This package provides classes and methods to support Gene Set Enrichment Analysis (GSEA).
Maintained by Bioconductor Package Maintainer. Last updated 2 months ago.
geneexpressiongenesetenrichmentgraphandnetworkgokegg
10.27 score 1.5k scripts 77 dependentsbioc
graphite:GRAPH Interaction from pathway Topological Environment
Graph objects from pathway topology derived from KEGG, Panther, PathBank, PharmGKB, Reactome SMPDB and WikiPathways databases.
Maintained by Gabriele Sales. Last updated 5 months ago.
pathwaysthirdpartyclientgraphandnetworknetworkreactomekeggmetabolomicsbioinformaticsmirrorpathway-analysis
8 stars 10.24 score 122 scripts 21 dependentsbioc
EDASeq:Exploratory Data Analysis and Normalization for RNA-Seq
Numerical and graphical summaries of RNA-Seq read data. Within-lane normalization procedures to adjust for GC-content effect (or other gene-level effects) on read counts: loess robust local regression, global-scaling, and full-quantile normalization (Risso et al., 2011). Between-lane normalization procedures to adjust for distributional differences between lanes (e.g., sequencing depth): global-scaling and full-quantile normalization (Bullard et al., 2010).
Maintained by Davide Risso. Last updated 5 months ago.
immunooncologysequencingrnaseqpreprocessingqualitycontroldifferentialexpression
5 stars 10.24 score 594 scripts 9 dependentsropensci
qualtRics:Download 'Qualtrics' Survey Data
Provides functions to access survey results directly into R using the 'Qualtrics' API. 'Qualtrics' <https://www.qualtrics.com/about/> is an online survey and data collection software platform. See <https://api.qualtrics.com/> for more information about the 'Qualtrics' API. This package is community-maintained and is not officially supported by 'Qualtrics'.
Maintained by Julia Silge. Last updated 7 months ago.
apiqualtricsqualtrics-apisurveysurvey-data
221 stars 10.23 score 272 scriptsidigbio
ridigbio:Interface to the iDigBio Data API
An interface to iDigBio's search API that allows downloading specimen records. Searches are returned as a data.frame. Other functions such as the metadata end points return lists of information. iDigBio is a US project focused on digitizing and serving museum specimen collections on the web. See <https://www.idigbio.org> for information on iDigBio.
Maintained by Jesse Bennett. Last updated 20 days ago.
16 stars 10.23 score 63 scripts 7 dependentsbioc
zinbwave:Zero-Inflated Negative Binomial Model for RNA-Seq Data
Implements a general and flexible zero-inflated negative binomial model that can be used to provide a low-dimensional representations of single-cell RNA-seq data. The model accounts for zero inflation (dropouts), over-dispersion, and the count nature of the data. The model also accounts for the difference in library sizes and optionally for batch effects and/or other covariates, avoiding the need for pre-normalize the data.
Maintained by Davide Risso. Last updated 5 months ago.
immunooncologydimensionreductiongeneexpressionrnaseqsoftwaretranscriptomicssequencingsinglecell
43 stars 10.21 score 190 scripts 6 dependentsinsightsengineering
teal.modules.clinical:'teal' Modules for Standard Clinical Outputs
Provides user-friendly tools for creating and customizing clinical trial reports. By leveraging the 'teal' framework, this package provides 'teal' modules to easily create an interactive panel that allows for seamless adjustments to data presentation, thereby streamlining the creation of detailed and accurate reports.
Maintained by Dawid Kaledkowski. Last updated 1 months ago.
clinical-trialsmodulesnestoutputsshiny
35 stars 10.21 score 149 scriptsbioc
cBioPortalData:Exposes and Makes Available Data from the cBioPortal Web Resources
The cBioPortalData R package accesses study datasets from the cBio Cancer Genomics Portal. It accesses the data either from the pre-packaged zip / tar files or from the API interface that was recently implemented by the cBioPortal Data Team. The package can provide data in either tabular format or with MultiAssayExperiment object that uses familiar Bioconductor data representations.
Maintained by Marcel Ramos. Last updated 10 days ago.
softwareinfrastructurethirdpartyclientbioconductor-packagenci-itcru24ca289073
33 stars 10.17 score 147 scripts 4 dependentsbioc
singleCellTK:Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data
The Single Cell Toolkit (SCTK) in the singleCellTK package provides an interface to popular tools for importing, quality control, analysis, and visualization of single cell RNA-seq data. SCTK allows users to seamlessly integrate tools from various packages at different stages of the analysis workflow. A general "a la carte" workflow gives users the ability access to multiple methods for data importing, calculation of general QC metrics, doublet detection, ambient RNA estimation and removal, filtering, normalization, batch correction or integration, dimensionality reduction, 2-D embedding, clustering, marker detection, differential expression, cell type labeling, pathway analysis, and data exporting. Curated workflows can be used to run Seurat and Celda. Streamlined quality control can be performed on the command line using the SCTK-QC pipeline. Users can analyze their data using commands in the R console or by using an interactive Shiny Graphical User Interface (GUI). Specific analyses or entire workflows can be summarized and shared with comprehensive HTML reports generated by Rmarkdown. Additional documentation and vignettes can be found at camplab.net/sctk.
Maintained by Joshua David Campbell. Last updated 1 months ago.
singlecellgeneexpressiondifferentialexpressionalignmentclusteringimmunooncologybatcheffectnormalizationqualitycontroldataimportgui
182 stars 10.17 score 252 scriptsropensci
rdhs:API Client and Dataset Management for the Demographic and Health Survey (DHS) Data
Provides a client for (1) querying the DHS API for survey indicators and metadata (<https://api.dhsprogram.com/#/index.html>), (2) identifying surveys and datasets for analysis, (3) downloading survey datasets from the DHS website, (4) loading datasets and associate metadata into R, and (5) extracting variables and combining datasets for pooled analysis.
Maintained by OJ Watson. Last updated 1 months ago.
datasetdhsdhs-apiextractpeer-reviewedsurvey-data
37 stars 10.16 score 286 scripts 4 dependentsdslc-io
tidytuesdayR:Access the Weekly 'TidyTuesday' Project Dataset
'TidyTuesday' is a project by the 'Data Science Learning Community' in which they post a weekly dataset in a public data repository (<https://github.com/rfordatascience/tidytuesday>) for people to analyze and visualize. This package provides the tools to easily download this data and the description of the source.
Maintained by Jon Harmon. Last updated 5 days ago.
77 stars 10.13 score 3.0k scriptsgeoffjentry
twitteR:R Based Twitter Client
Provides an interface to the Twitter web API.
Maintained by Jeff Gentry. Last updated 9 years ago.
254 stars 10.12 score 2.0k scripts 1 dependentskogalur
randomForestSRC:Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)
Fast OpenMP parallel computing of Breiman's random forests for univariate, multivariate, unsupervised, survival, competing risks, class imbalanced classification and quantile regression. New Mahalanobis splitting for correlated outcomes. Extreme random forests and randomized splitting. Suite of imputation methods for missing data. Fast random forests using subsampling. Confidence regions and standard errors for variable importance. New improved holdout importance. Case-specific importance. Minimal depth variable importance. Visualize trees on your Safari or Google Chrome browser. Anonymous random forests for data privacy.
Maintained by Udaya B. Kogalur. Last updated 3 days ago.
124 stars 10.10 score 1.2k scripts 11 dependentsropensci
spocc:Interface to Species Occurrence Data Sources
A programmatic interface to many species occurrence data sources, including Global Biodiversity Information Facility ('GBIF'), 'iNaturalist', 'eBird', Integrated Digitized 'Biocollections' ('iDigBio'), 'VertNet', Ocean 'Biogeographic' Information System ('OBIS'), and Atlas of Living Australia ('ALA'). Includes functionality for retrieving species occurrence data, and combining those data.
Maintained by Hannah Owens. Last updated 2 months ago.
specimensapiweb-servicesoccurrencesspeciestaxonomygbifinatvertnetebirdidigbioobisalaantwebbisondataecoengineinaturalistoccurrencespecies-occurrencespocc
118 stars 10.09 score 552 scripts 5 dependentsjinseob2kim
jstable:Create Tables from Different Types of Regression
Create regression tables from generalized linear model(GLM), generalized estimating equation(GEE), generalized linear mixed-effects model(GLMM), Cox proportional hazards model, survey-weighted generalized linear model(svyglm) and survey-weighted Cox model results for publication.
Maintained by Jinseob Kim. Last updated 3 days ago.
28 stars 10.08 score 199 scripts 1 dependentsgshs-ornl
wbstats:Programmatic Access to Data and Statistics from the World Bank API
Search and download data from the World Bank Data API.
Maintained by Jesse Piburn. Last updated 4 years ago.
open-dataworld-bankworld-bank-apiworldbank
126 stars 10.07 score 1.1k scripts 3 dependentsropensci
tabulapdf:Extract Tables from PDF Documents
Bindings for the 'Tabula' <https://tabula.technology/> 'Java' library, which can extract tables from PDF files. This tool can reduce time and effort in data extraction processes in fields like investigative journalism. It allows for automatic and manual table extraction, the latter facilitated through a 'Shiny' interface, enabling manual areas selection\ with a computer mouse for data retrieval.
Maintained by Mauricio Vargas Sepulveda. Last updated 3 months ago.
javapdfpdf-documentpeer-reviewedropenscitabulatabular-dataopenjdk
552 stars 10.07 score 159 scripts 1 dependentsbioc
sva:Surrogate Variable Analysis
The sva package contains functions for removing batch effects and other unwanted variation in high-throughput experiment. Specifically, the sva package contains functions for the identifying and building surrogate variables for high-dimensional data sets. Surrogate variables are covariates constructed directly from high-dimensional data (like gene expression/RNA sequencing/methylation/brain imaging data) that can be used in subsequent analyses to adjust for unknown, unmodeled, or latent sources of noise. The sva package can be used to remove artifacts in three ways: (1) identifying and estimating surrogate variables for unknown sources of variation in high-throughput experiments (Leek and Storey 2007 PLoS Genetics,2008 PNAS), (2) directly removing known batch effects using ComBat (Johnson et al. 2007 Biostatistics) and (3) removing batch effects with known control probes (Leek 2014 biorXiv). Removing batch effects and using surrogate variables in differential expression analysis have been shown to reduce dependence, stabilize error rate estimates, and improve reproducibility, see (Leek and Storey 2007 PLoS Genetics, 2008 PNAS or Leek et al. 2011 Nat. Reviews Genetics).
Maintained by Jeffrey T. Leek. Last updated 5 months ago.
immunooncologymicroarraystatisticalmethodpreprocessingmultiplecomparisonsequencingrnaseqbatcheffectnormalization
10.04 score 3.2k scripts 50 dependentsbioc
BiocCheck:Bioconductor-specific package checks
BiocCheck guides maintainers through Bioconductor best practicies. It runs Bioconductor-specific package checks by searching through package code, examples, and vignettes. Maintainers are required to address all errors, warnings, and most notes produced.
Maintained by Marcel Ramos. Last updated 1 months ago.
infrastructurebioconductor-packagecore-services
8 stars 10.03 score 114 scripts 6 dependentsbioc
singscore:Rank-based single-sample gene set scoring method
A simple single-sample gene signature scoring method that uses rank-based statistics to analyze the sample's gene expression profile. It scores the expression activities of gene sets at a single-sample level.
Maintained by Malvika Kharbanda. Last updated 5 months ago.
softwaregeneexpressiongenesetenrichmentbioinformatics
41 stars 10.03 score 124 scripts 4 dependentsbioc
derfinder:Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution via the DER Finder approach
This package provides functions for annotation-agnostic differential expression analysis of RNA-seq data. Two implementations of the DER Finder approach are included in this package: (1) single base-level F-statistics and (2) DER identification at the expressed regions-level. The DER Finder approach can also be used to identify differentially bounded ChIP-seq peaks.
Maintained by Leonardo Collado-Torres. Last updated 4 months ago.
differentialexpressionsequencingrnaseqchipseqdifferentialpeakcallingsoftwareimmunooncologycoverageannotation-agnosticbioconductorderfinder
42 stars 10.03 score 78 scripts 6 dependentsropensci
nasapower:NASA POWER API Client
An API client for NASA POWER global meteorology, surface solar energy and climatology data API. POWER (Prediction Of Worldwide Energy Resources) data are freely available for download with varying spatial resolutions dependent on the original data and with several temporal resolutions depending on the POWER parameter and community. This work is funded through the NASA Earth Science Directorate Applied Science Program. For more on the data themselves, the methodologies used in creating, a web- based data viewer and web access, please see <https://power.larc.nasa.gov/>.
Maintained by Adam H. Sparks. Last updated 25 days ago.
nasameteorological-dataweatherglobalweather-datameteorologynasa-poweragroclimatologyearth-sciencedata-accessclimate-dataagroclimatology-dataweather-variables
101 stars 9.98 score 137 scripts 3 dependentsiqss
dataverse:Client for Dataverse 4+ Repositories
Provides access to Dataverse APIs <https://dataverse.org/> (versions 4-5), enabling data search, retrieval, and deposit. For Dataverse versions <= 3.0, use the archived 'dvn' package <https://cran.r-project.org/package=dvn>.
Maintained by Shiro Kuriwaki. Last updated 6 months ago.
datadata-depositdataversedataverse-apisword
61 stars 9.98 score 217 scripts 4 dependentspecanproject
PEcAn.assim.batch:PEcAn Functions Used for Ecological Forecasts and Reanalysis
The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.
Maintained by Istem Fer. Last updated 7 hours ago.
bayesiancyberinfrastructuredata-assimilationdata-scienceecosystem-modelecosystem-scienceforecastingmeta-analysisnational-science-foundationpecanplantsjagscpp
216 stars 9.97 score 20 scripts 2 dependentsbioc
goseq:Gene Ontology analyser for RNA-seq and other length biased data
Detects Gene Ontology and/or other user defined categories which are over/under represented in RNA-seq data.
Maintained by Federico Marini. Last updated 5 months ago.
immunooncologysequencinggogeneexpressiontranscriptionrnaseqdifferentialexpressionannotationgenesetenrichmentkeggpathwayssoftware
2 stars 9.97 score 636 scripts 9 dependentsdarwin-eu
PatientProfiles:Identify Characteristics of Patients in the OMOP Common Data Model
Identify the characteristics of patients in data mapped to the Observational Medical Outcomes Partnership (OMOP) common data model.
Maintained by Marti Catala. Last updated 24 days ago.
1 stars 9.97 score 225 scripts 9 dependentsbioc
rGREAT:GREAT Analysis - Functional Enrichment on Genomic Regions
GREAT (Genomic Regions Enrichment of Annotations Tool) is a type of functional enrichment analysis directly performed on genomic regions. This package implements the GREAT algorithm (the local GREAT analysis), also it supports directly interacting with the GREAT web service (the online GREAT analysis). Both analysis can be viewed by a Shiny application. rGREAT by default supports more than 600 organisms and a large number of gene set collections, as well as self-provided gene sets and organisms from users. Additionally, it implements a general method for dealing with background regions.
Maintained by Zuguang Gu. Last updated 18 days ago.
genesetenrichmentgopathwayssoftwaresequencingwholegenomegenomeannotationcoveragecpp
86 stars 9.96 score 320 scripts 1 dependentsdarwin-eu
CodelistGenerator:Identify Relevant Clinical Codes and Evaluate Their Use
Generate a candidate code list for the Observational Medical Outcomes Partnership (OMOP) common data model based on string matching. For a given search strategy, a candidate code list will be returned.
Maintained by Edward Burn. Last updated 4 days ago.
14 stars 9.94 score 165 scripts 4 dependentsbioc
RUVSeq:Remove Unwanted Variation from RNA-Seq Data
This package implements the remove unwanted variation (RUV) methods of Risso et al. (2014) for the normalization of RNA-Seq read counts between samples.
Maintained by Davide Risso. Last updated 5 months ago.
immunooncologydifferentialexpressionpreprocessingrnaseqsoftware
13 stars 9.91 score 482 scripts 5 dependentsbioc
OmnipathR:OmniPath web service client and more
A client for the OmniPath web service (https://www.omnipathdb.org) and many other resources. It also includes functions to transform and pretty print some of the downloaded data, functions to access a number of other resources such as BioPlex, ConsensusPathDB, EVEX, Gene Ontology, Guide to Pharmacology (IUPHAR/BPS), Harmonizome, HTRIdb, Human Phenotype Ontology, InWeb InBioMap, KEGG Pathway, Pathway Commons, Ramilowski et al. 2015, RegNetwork, ReMap, TF census, TRRUST and Vinayagam et al. 2011. Furthermore, OmnipathR features a close integration with the NicheNet method for ligand activity prediction from transcriptomics data, and its R implementation `nichenetr` (available only on github).
Maintained by Denes Turei. Last updated 1 months ago.
graphandnetworknetworkpathwayssoftwarethirdpartyclientdataimportdatarepresentationgenesignalinggeneregulationsystemsbiologytranscriptomicssinglecellannotationkeggcomplexesenzyme-ptmnetworksnetworks-biologyomnipathproteinsquarto
130 stars 9.90 score 226 scripts 2 dependentsbioc
methylumi:Handle Illumina methylation data
This package provides classes for holding and manipulating Illumina methylation data. Based on eSet, it can contain MIAME information, sample information, feature information, and multiple matrices of data. An "intelligent" import function, methylumiR can read the Illumina text files and create a MethyLumiSet. methylumIDAT can directly read raw IDAT files from HumanMethylation27 and HumanMethylation450 microarrays. Normalization, background correction, and quality control features for GoldenGate, Infinium, and Infinium HD arrays are also included.
Maintained by Sean Davis. Last updated 5 months ago.
dnamethylationtwochannelpreprocessingqualitycontrolcpgisland
9 stars 9.90 score 89 scripts 9 dependentsjaseziv
worldfootballR:Extract and Clean World Football (Soccer) Data
Allow users to obtain clean and tidy football (soccer) game, team and player data. Data is collected from a number of popular sites, including 'FBref', transfer and valuations data from 'Transfermarkt'<https://www.transfermarkt.com/> and shooting location and other match stats data from 'Understat'<https://understat.com/>. It gives users the ability to access data more efficiently, rather than having to export data tables to files before being able to complete their analysis.
Maintained by Jason Zivkovic. Last updated 13 hours ago.
fbreffootballfootball-datasoccer-datasports-datatransfermarktunderstat
509 stars 9.88 score 516 scripts 2 dependentsbioc
PureCN:Copy number calling and SNV classification using targeted short read sequencing
This package estimates tumor purity, copy number, and loss of heterozygosity (LOH), and classifies single nucleotide variants (SNVs) by somatic status and clonality. PureCN is designed for targeted short read sequencing data, integrates well with standard somatic variant detection and copy number pipelines, and has support for tumor samples without matching normal samples.
Maintained by Markus Riester. Last updated 17 hours ago.
copynumbervariationsoftwaresequencingvariantannotationvariantdetectioncoverageimmunooncologybioconductor-packagecell-free-dnacopy-numberlohtumor-heterogeneitytumor-mutational-burdentumor-purity
132 stars 9.88 score 40 scriptsbioc
GenVisR:Genomic Visualizations in R
Produce highly customizable publication quality graphics for genomic data primarily at the cohort level.
Maintained by Zachary Skidmore. Last updated 5 months ago.
infrastructuredatarepresentationclassificationdnaseq
217 stars 9.87 score 76 scriptsmlverse
luz:Higher Level 'API' for 'torch'
A high level interface for 'torch' providing utilities to reduce the the amount of code needed for common tasks, abstract away torch details and make the same code work on both the 'CPU' and 'GPU'. It's flexible enough to support expressing a large range of models. It's heavily inspired by 'fastai' by Howard et al. (2020) <arXiv:2002.04688>, 'Keras' by Chollet et al. (2015) and 'PyTorch Lightning' by Falcon et al. (2019) <doi:10.5281/zenodo.3828935>.
Maintained by Daniel Falbel. Last updated 7 months ago.
89 stars 9.86 score 318 scripts 4 dependentsjslefche
piecewiseSEM:Piecewise Structural Equation Modeling
Implements piecewise structural equation modeling from a single list of structural equations, with new methods for non-linear, latent, and composite variables, standardized coefficients, query-based prediction and indirect effects. See <http://jslefche.github.io/piecewiseSEM/> for more.
Maintained by Jon Lefcheck. Last updated 10 months ago.
163 stars 9.85 score 452 scriptsemilhvitfeldt
textdata:Download and Load Various Text Datasets
Provides a framework to download, parse, and store text datasets on the disk and load them when needed. Includes various sentiment lexicons and labeled text data sets for classification and analysis.
Maintained by Emil Hvitfeldt. Last updated 10 months ago.
75 stars 9.84 score 1.4k scripts 1 dependentsms609
TreeTools:Create, Modify and Analyse Phylogenetic Trees
Efficient implementations of functions for the creation, modification and analysis of phylogenetic trees. Applications include: generation of trees with specified shapes; tree rearrangement; analysis of tree shape; rooting of trees and extraction of subtrees; calculation and depiction of split support; plotting the position of rogue taxa (Klopfstein & Spasojevic 2019) <doi:10.1371/journal.pone.0212942>; calculation of ancestor-descendant relationships, of 'stemwardness' (Asher & Smith, 2022) <doi:10.1093/sysbio/syab072>, and of tree balance (Mir et al. 2013, Lemant et al. 2022) <doi:10.1016/j.mbs.2012.10.005>, <doi:10.1093/sysbio/syac027>; artificial extinction (Asher & Smith, 2022) <doi:10.1093/sysbio/syab072>; import and export of trees from Newick, Nexus (Maddison et al. 1997) <doi:10.1093/sysbio/46.4.590>, and TNT <https://www.lillo.org.ar/phylogeny/tnt/> formats; and analysis of splits and cladistic information.
Maintained by Martin R. Smith. Last updated 6 days ago.
evolutionary-biologyphylogenetic-treesphylogeneticscpp
23 stars 9.83 score 124 scripts 10 dependentsropensci
frictionless:Read and Write Frictionless Data Packages
Read and write Frictionless Data Packages. A 'Data Package' (<https://specs.frictionlessdata.io/data-package/>) is a simple container format and standard to describe and package a collection of (tabular) data. It is typically used to publish FAIR (<https://www.go-fair.org/fair-principles/>) and open datasets.
Maintained by Peter Desmet. Last updated 6 months ago.
30 stars 9.79 score 55 scripts 6 dependentsbioc
annotatr:Annotation of Genomic Regions to Genomic Annotations
Given a set of genomic sites/regions (e.g. ChIP-seq peaks, CpGs, differentially methylated CpGs or regions, SNPs, etc.) it is often of interest to investigate the intersecting genomic annotations. Such annotations include those relating to gene models (promoters, 5'UTRs, exons, introns, and 3'UTRs), CpGs (CpG islands, CpG shores, CpG shelves), or regulatory sequences such as enhancers. The annotatr package provides an easy way to summarize and visualize the intersection of genomic sites/regions with genomic annotations.
Maintained by Raymond G. Cavalcante. Last updated 5 months ago.
softwareannotationgenomeannotationfunctionalgenomicsvisualizationgenome-annotation
26 stars 9.76 score 246 scripts 5 dependentsbioc
RTCGAToolbox:A new tool for exporting TCGA Firehose data
Managing data from large scale projects such as The Cancer Genome Atlas (TCGA) for further analysis is an important and time consuming step for research projects. Several efforts, such as Firehose project, make TCGA pre-processed data publicly available via web services and data portals but it requires managing, downloading and preparing the data for following steps. We developed an open source and extensible R based data client for Firehose pre-processed data and demonstrated its use with sample case studies. Results showed that RTCGAToolbox could improve data management for researchers who are interested with TCGA data. In addition, it can be integrated with other analysis pipelines for following data analysis.
Maintained by Marcel Ramos. Last updated 3 months ago.
differentialexpressiongeneexpressionsequencing
18 stars 9.75 score 76 scripts 5 dependentsropensci
prism:Access Data from the Oregon State Prism Climate Project
Allows users to access the Oregon State Prism climate data (<https://prism.nacse.org/>). Using the web service API data can easily downloaded in bulk and loaded into R for spatial analysis. Some user friendly visualizations are also provided.
Maintained by Alan Butler. Last updated 5 days ago.
57 stars 9.74 score 354 scriptsmlverse
torchvision:Models, Datasets and Transformations for Images
Provides access to datasets, models and preprocessing facilities for deep learning with images. Integrates seamlessly with the 'torch' package and it's 'API' borrows heavily from 'PyTorch' vision package.
Maintained by Daniel Falbel. Last updated 7 months ago.
65 stars 9.74 score 313 scripts 6 dependentsohdsi
CohortConstructor:Build and Manipulate Study Cohorts Using a Common Data Model
Create and manipulate study cohorts in data mapped to the Observational Medical Outcomes Partnership Common Data Model.
Maintained by Edward Burn. Last updated 3 days ago.
2 stars 9.73 score 207 scripts 2 dependentsprestodb
RPresto:DBI Connector to Presto
Implements a 'DBI' compliant interface to Presto. Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes: <https://prestodb.io/>.
Maintained by Jarod G.R. Meng. Last updated 2 months ago.
132 stars 9.73 score 25 scripts 4 dependentsfemiguez
apsimx:Inspect, Read, Edit and Run 'APSIM' "Next Generation" and 'APSIM' Classic
The functions in this package inspect, read, edit and run files for 'APSIM' "Next Generation" ('JSON') and 'APSIM' "Classic" ('XML'). The files with an 'apsim' extension correspond to 'APSIM' Classic (7.x) - Windows only - and the ones with an 'apsimx' extension correspond to 'APSIM' "Next Generation". For more information about 'APSIM' see (<https://www.apsim.info/>) and for 'APSIM' next generation (<https://apsimnextgeneration.netlify.app/>).
Maintained by Fernando Miguez. Last updated 12 days ago.
59 stars 9.72 score 68 scripts 2 dependentspecanproject
PEcAnRTM:PEcAn Functions Used for Radiative Transfer Modeling
Functions for performing forward runs and inversions of radiative transfer models (RTMs). Inversions can be performed using maximum likelihood, or more complex hierarchical Bayesian methods. Underlying numerical analyses are optimized for speed using Fortran code.
Maintained by Alexey Shiklomanov. Last updated 7 hours ago.
bayesiancyberinfrastructuredata-assimilationdata-scienceecosystem-modelecosystem-scienceforecastingmeta-analysisnational-science-foundationpecanplantsfortranjagscpp
216 stars 9.72 score 132 scriptscenterforassessment
SGP:Student Growth Percentiles & Percentile Growth Trajectories
An analytic framework for the calculation of norm- and criterion-referenced academic growth estimates using large scale, longitudinal education assessment data as developed in Betebenner (2009) <doi:10.1111/j.1745-3992.2009.00161.x>.
Maintained by Damian W. Betebenner. Last updated 13 days ago.
percentile-growth-projectionsquantile-regressionsgpsgp-analysesstudent-growth-percentilesstudent-growth-projections
20 stars 9.69 score 88 scriptsappsilon
shiny.telemetry:'Shiny' App Usage Telemetry
Enables instrumentation of 'Shiny' apps for tracking user session events such as input changes, browser type, and session duration. These events can be sent to any of the available storage backends and analyzed using the included 'Shiny' app to gain insights about app usage and adoption.
Maintained by André Veríssimo. Last updated 4 months ago.
67 stars 9.69 score 29 scriptsrnabioco
valr:Genome Interval Arithmetic
Read and manipulate genome intervals and signals. Provides functionality similar to command-line tool suites within R, enabling interactive analysis and visualization of genome-scale data. Riemondy et al. (2017) <doi:10.12688/f1000research.11997.1>.
Maintained by Kent Riemondy. Last updated 22 days ago.
bedtoolsgenomeinterval-arithmeticcpp
90 stars 9.69 score 227 scriptsbioc
txdbmaker:Tools for making TxDb objects from genomic annotations
A set of tools for making TxDb objects from genomic annotations from various sources (e.g. UCSC, Ensembl, and GFF files). These tools allow the user to download the genomic locations of transcripts, exons, and CDS, for a given assembly, and to import them in a TxDb object. TxDb objects are implemented in the GenomicFeatures package, together with flexible methods for extracting the desired features in convenient formats.
Maintained by H. Pagès. Last updated 4 months ago.
infrastructuredataimportannotationgenomeannotationgenomeassemblygeneticssequencingbioconductor-packagecore-package
3 stars 9.68 score 92 scripts 87 dependentsbioc
TCGAutils:TCGA utility functions for data management
A suite of helper functions for checking and manipulating TCGA data including data obtained from the curatedTCGAData experiment package. These functions aim to simplify and make working with TCGA data more manageable. Exported functions include those that import data from flat files into Bioconductor objects, convert row annotations, and identifier translation via the GDC API.
Maintained by Marcel Ramos. Last updated 4 months ago.
softwareworkflowsteppreprocessingdataimportbioconductor-packagetcgau24ca289073utilities
27 stars 9.66 score 210 scripts 10 dependentsplangfelder
WGCNA:Weighted Correlation Network Analysis
Functions necessary to perform Weighted Correlation Network Analysis on high-dimensional data as originally described in Horvath and Zhang (2005) <doi:10.2202/1544-6115.1128> and Langfelder and Horvath (2008) <doi:10.1186/1471-2105-9-559>. Includes functions for rudimentary data cleaning, construction of correlation networks, module identification, summarization, and relating of variables and modules to sample traits. Also includes a number of utility functions for data manipulation and visualization.
Maintained by Peter Langfelder. Last updated 6 months ago.
54 stars 9.65 score 5.3k scripts 32 dependentsgrunwaldlab
metacoder:Tools for Parsing, Manipulating, and Graphing Taxonomic Abundance Data
Reads, plots, and manipulates large taxonomic data sets, like those generated from modern high-throughput sequencing, such as metabarcoding (i.e. amplification metagenomics, 16S metagenomics, etc). It provides a tree-based visualization called "heat trees" used to depict statistics for every taxon in a taxonomy using color and size. It also provides various functions to do common tasks in microbiome bioinformatics on data in the 'taxmap' format defined by the 'taxa' package. The 'metacoder' package is described in the publication by Foster et al. (2017) <doi:10.1371/journal.pcbi.1005404>.
Maintained by Zachary Foster. Last updated 2 months ago.
community-diversityhierarchicalmetabarcodingpcrtaxonomytreescpp
140 stars 9.64 score 328 scriptssdctools
sdcMicro:Statistical Disclosure Control Methods for Anonymization of Data and Risk Estimation
Data from statistical agencies and other institutions are mostly confidential. This package, introduced in Templ, Kowarik and Meindl (2017) <doi:10.18637/jss.v067.i04>, can be used for the generation of anonymized (micro)data, i.e. for the creation of public- and scientific-use files. The theoretical basis for the methods implemented can be found in Templ (2017) <doi:10.1007/978-3-319-50272-4>. Various risk estimation and anonymization methods are included. Note that the package includes a graphical user interface published in Meindl and Templ (2019) <doi:10.3390/a12090191> that allows to use various methods of this package.
Maintained by Matthias Templ. Last updated 1 months ago.
84 stars 9.63 score 258 scriptsfmmattioni
downloadthis:Implement Download Buttons in 'rmarkdown'
Implement download buttons in HTML output from 'rmarkdown' without the need for 'runtime:shiny'.
Maintained by Felipe Mattioni Maturana. Last updated 6 months ago.
146 stars 9.63 score 856 scripts 1 dependentsvimc
orderly:Lightweight Reproducible Reporting
Order, create and store reports from R. By defining a lightweight interface around the inputs and outputs of an analysis, a lot of the repetitive work for reproducible research can be automated. We define a simple format for organising and describing work that facilitates collaborative reproducible research and acknowledges that all analyses are run multiple times over their lifespans.
Maintained by Rich FitzJohn. Last updated 2 years ago.
117 stars 9.63 score 94 scripts 4 dependentsbioc
pcaExplorer:Interactive Visualization of RNA-seq Data Using a Principal Components Approach
This package provides functionality for interactive visualization of RNA-seq datasets based on Principal Components Analysis. The methods provided allow for quick information extraction and effective data exploration. A Shiny application encapsulates the whole analysis.
Maintained by Federico Marini. Last updated 3 months ago.
immunooncologyvisualizationrnaseqdimensionreductionprincipalcomponentqualitycontrolguireportwritingshinyappsbioconductorprincipal-componentsreproducible-researchrna-seq-analysisrna-seq-datashinytranscriptomeuser-friendly
56 stars 9.63 score 180 scriptsbioc
clusterExperiment:Compare Clusterings for Single-Cell Sequencing
Provides functionality for running and comparing many different clusterings of single-cell sequencing data or other large mRNA Expression data sets.
Maintained by Elizabeth Purdom. Last updated 5 months ago.
clusteringrnaseqsequencingsoftwaresinglecellcpp
38 stars 9.62 score 192 scripts 1 dependentsbioc
AnnotationForge:Tools for building SQLite-based annotation data packages
Provides code for generating Annotation packages and their databases. Packages produced are intended to be used with AnnotationDbi.
Maintained by Bioconductor Package Maintainer. Last updated 18 days ago.
annotationinfrastructurebioconductor-packagecore-package
5 stars 9.62 score 143 scripts 19 dependentshafen
trelliscopejs:Create Interactive Trelliscope Displays
Trelliscope is a scalable, flexible, interactive approach to visualizing data (Hafen, 2013 <doi:10.1109/LDAV.2013.6675164>). This package provides methods that make it easy to create a Trelliscope display specification for TrelliscopeJS. High-level functions are provided for creating displays from within 'tidyverse' or 'ggplot2' workflows. Low-level functions are also provided for creating new interfaces.
Maintained by Ryan Hafen. Last updated 1 years ago.
262 stars 9.61 score 1000 scripts 1 dependentsbioc
cytomapper:Visualization of highly multiplexed imaging data in R
Highly multiplexed imaging acquires the single-cell expression of selected proteins in a spatially-resolved fashion. These measurements can be visualised across multiple length-scales. First, pixel-level intensities represent the spatial distributions of feature expression with highest resolution. Second, after segmentation, expression values or cell-level metadata (e.g. cell-type information) can be visualised on segmented cell areas. This package contains functions for the visualisation of multiplexed read-outs and cell-level information obtained by multiplexed imaging technologies. The main functions of this package allow 1. the visualisation of pixel-level information across multiple channels, 2. the display of cell-level information (expression and/or metadata) on segmentation masks and 3. gating and visualisation of single cells.
Maintained by Lasse Meyer. Last updated 5 months ago.
immunooncologysoftwaresinglecellonechanneltwochannelmultiplecomparisonnormalizationdataimportbioimagingimaging-mass-cytometrysingle-cellspatial-analysis
32 stars 9.61 score 354 scripts 5 dependentsropensci
tidyhydat:Extract and Tidy Canadian 'Hydrometric' Data
Provides functions to access historical and real-time national 'hydrometric' data from Water Survey of Canada data sources (<https://dd.weather.gc.ca/hydrometric/csv/> and <https://collaboration.cmc.ec.gc.ca/cmc/hydrometrics/www/>) and then applies tidy data principles.
Maintained by Sam Albers. Last updated 20 days ago.
citzgovernment-datahydrologyhydrometricstidy-datawater-resources
71 stars 9.59 score 202 scripts 3 dependentsropensci
rdflib:Tools to Manipulate and Query Semantic Data
The Resource Description Framework, or 'RDF' is a widely used data representation model that forms the cornerstone of the Semantic Web. 'RDF' represents data as a graph rather than the familiar data table or rectangle of relational databases. The 'rdflib' package provides a friendly and concise user interface for performing common tasks on 'RDF' data, such as reading, writing and converting between the various serializations of 'RDF' data, including 'rdfxml', 'turtle', 'nquads', 'ntriples', and 'json-ld'; creating new 'RDF' graphs, and performing graph queries using 'SPARQL'. This package wraps the low level 'redland' R package which provides direct bindings to the 'redland' C library. Additionally, the package supports the newer and more developer friendly 'JSON-LD' format through the 'jsonld' package. The package interface takes inspiration from the Python 'rdflib' library.
Maintained by Carl Boettiger. Last updated 8 months ago.
57 stars 9.59 score 123 scripts 7 dependentsbioc
tidybulk:Brings transcriptomics to the tidyverse
This is a collection of utility functions that allow to perform exploration of and calculations to RNA sequencing data, in a modular, pipe-friendly and tidy fashion.
Maintained by Stefano Mangiola. Last updated 14 days ago.
assaydomaininfrastructurernaseqdifferentialexpressiongeneexpressionnormalizationclusteringqualitycontrolsequencingtranscriptiontranscriptomicsbioconductorbulk-transcriptional-analysesdeseq2differential-expressionedgerensembl-idsentrezgene-symbolsgseamds-dimensionspcapiperedundancytibbletidytidy-datatidyversetranscriptstsne
171 stars 9.57 score 172 scripts 1 dependentsbioc
recount:Explore and download data from the recount project
Explore and download data from the recount project available at https://jhubiostatistics.shinyapps.io/recount/. Using the recount package you can download RangedSummarizedExperiment objects at the gene, exon or exon-exon junctions level, the raw counts, the phenotype metadata used, the urls to the sample coverage bigWig files or the mean coverage bigWig file for a particular study. The RangedSummarizedExperiment objects can be used by different packages for performing differential expression analysis. Using http://bioconductor.org/packages/derfinder you can perform annotation-agnostic differential expression analyses with the data from the recount project as described at http://www.nature.com/nbt/journal/v35/n4/full/nbt.3838.html.
Maintained by Leonardo Collado-Torres. Last updated 4 months ago.
coveragedifferentialexpressiongeneexpressionrnaseqsequencingsoftwaredataimportimmunooncologyannotation-agnosticbioconductorcountderfinderdeseq2exongenehumanilluminajunctionrecount
41 stars 9.57 score 498 scripts 3 dependentsbusiness-science
anomalize:Tidy Anomaly Detection
The 'anomalize' package enables a "tidy" workflow for detecting anomalies in data. The main functions are time_decompose(), anomalize(), and time_recompose(). When combined, it's quite simple to decompose time series, detect anomalies, and create bands separating the "normal" data from the anomalous data at scale (i.e. for multiple time series). Time series decomposition is used to remove trend and seasonal components via the time_decompose() function and methods include seasonal decomposition of time series by Loess ("stl") and seasonal decomposition by piecewise medians ("twitter"). The anomalize() function implements two methods for anomaly detection of residuals including using an inner quartile range ("iqr") and generalized extreme studentized deviation ("gesd"). These methods are based on those used in the 'forecast' package and the Twitter 'AnomalyDetection' package. Refer to the associated functions for specific references for these methods.
Maintained by Matt Dancho. Last updated 1 years ago.
anomalyanomaly-detectiondecompositiondetect-anomaliesiqrtime-series
339 stars 9.56 score 332 scriptspln-team
PLNmodels:Poisson Lognormal Models
The Poisson-lognormal model and variants (Chiquet, Mariadassou and Robin, 2021 <doi:10.3389/fevo.2021.588292>) can be used for a variety of multivariate problems when count data are at play, including principal component analysis for count data, discriminant analysis, model-based clustering and network inference. Implements variational algorithms to fit such models accompanied with a set of functions for visualization and diagnostic.
Maintained by Julien Chiquet. Last updated 6 days ago.
count-datamultivariate-analysisnetwork-inferencepcapoisson-lognormal-modelopenblascpp
55 stars 9.54 score 226 scriptsdaattali
ddpcr:Analysis and Visualization of Droplet Digital PCR in R and on the Web
An interface to explore, analyze, and visualize droplet digital PCR (ddPCR) data in R. This is the first non-proprietary software for analyzing two-channel ddPCR data. An interactive tool was also created and is available online to facilitate this analysis for anyone who is not comfortable with using R.
Maintained by Dean Attali. Last updated 1 years ago.
61 stars 9.54 score 131 scripts 2 dependentse-sensing
sits:Satellite Image Time Series Analysis for Earth Observation Data Cubes
An end-to-end toolkit for land use and land cover classification using big Earth observation data, based on machine learning methods applied to satellite image data cubes, as described in Simoes et al (2021) <doi:10.3390/rs13132428>. Builds regular data cubes from collections in AWS, Microsoft Planetary Computer, Brazil Data Cube, Copernicus Data Space Environment (CDSE), Digital Earth Africa, Digital Earth Australia, NASA HLS using the Spatio-temporal Asset Catalog (STAC) protocol (<https://stacspec.org/>) and the 'gdalcubes' R package developed by Appel and Pebesma (2019) <doi:10.3390/data4030092>. Supports visualization methods for images and time series and smoothing filters for dealing with noisy time series. Includes functions for quality assessment of training samples using self-organized maps as presented by Santos et al (2021) <doi:10.1016/j.isprsjprs.2021.04.014>. Includes methods to reduce training samples imbalance proposed by Chawla et al (2002) <doi:10.1613/jair.953>. Provides machine learning methods including support vector machines, random forests, extreme gradient boosting, multi-layer perceptrons, temporal convolutional neural networks proposed by Pelletier et al (2019) <doi:10.3390/rs11050523>, and temporal attention encoders by Garnot and Landrieu (2020) <doi:10.48550/arXiv.2007.00586>. Supports GPU processing of deep learning models using torch <https://torch.mlverse.org/>. Performs efficient classification of big Earth observation data cubes and includes functions for post-classification smoothing based on Bayesian inference as described by Camara et al (2024) <doi:10.3390/rs16234572>, and methods for active learning and uncertainty assessment. Supports region-based time series analysis using package supercells <https://jakubnowosad.com/supercells/>. Enables best practices for estimating area and assessing accuracy of land change as recommended by Olofsson et al (2014) <doi:10.1016/j.rse.2014.02.015>. Minimum recommended requirements: 16 GB RAM and 4 CPU dual-core.
Maintained by Gilberto Camara. Last updated 2 months ago.
big-earth-datacbersearth-observationeo-datacubesgeospatialimage-time-seriesland-cover-classificationlandsatplanetary-computerr-spatialremote-sensingrspatialsatellite-image-time-seriessatellite-imagerysentinel-2stac-apistac-catalogcpp
494 stars 9.50 score 384 scriptsimmunomind
immunarch:Bioinformatics Analysis of T-Cell and B-Cell Immune Repertoires
A comprehensive framework for bioinformatics exploratory analysis of bulk and single-cell T-cell receptor and antibody repertoires. It provides seamless data loading, analysis and visualisation for AIRR (Adaptive Immune Receptor Repertoire) data, both bulk immunosequencing (RepSeq) and single-cell sequencing (scRNAseq). Immunarch implements most of the widely used AIRR analysis methods, such as: clonality analysis, estimation of repertoire similarities in distribution of clonotypes and gene segments, repertoire diversity analysis, annotation of clonotypes using external immune receptor databases and clonotype tracking in vaccination and cancer studies. A successor to our previously published 'tcR' immunoinformatics package (Nazarov 2015) <doi:10.1186/s12859-015-0613-1>.
Maintained by Vadim I. Nazarov. Last updated 1 years ago.
airr-analysisb-cell-receptorbcrbcr-repertoirebioinformaticsigig-repertoireimmune-repertoireimmune-repertoire-analysisimmune-repertoire-dataimmunoglobulinimmunoinformaticsimmunologyrep-seqrepertoire-analysissingle-cellsingle-cell-analysist-cell-receptortcrtcr-repertoirecpp
316 stars 9.49 score 203 scriptsjohn-d-fox
Rcmdr:R Commander
A platform-independent basic-statistics GUI (graphical user interface) for R, based on the tcltk package.
Maintained by John Fox. Last updated 5 months ago.
4 stars 9.48 score 636 scripts 38 dependentsrqtl
qtl2:Quantitative Trait Locus Mapping in Experimental Crosses
Provides a set of tools to perform quantitative trait locus (QTL) analysis in experimental crosses. It is a reimplementation of the 'R/qtl' package to better handle high-dimensional data and complex cross designs. Broman et al. (2019) <doi:10.1534/genetics.118.301595>.
Maintained by Karl W Broman. Last updated 23 days ago.
34 stars 9.48 score 1.1k scripts 5 dependentsthackl
gggenomes:A Grammar of Graphics for Comparative Genomics
An extension of 'ggplot2' for creating complex genomic maps. It builds on the power of 'ggplot2' and 'tidyverse' adding new 'ggplot2'-style geoms & positions and 'dplyr'-style verbs to manipulate the underlying data. It implements a layout concept inspired by 'ggraph' and introduces tracks to bring tidiness to the mess that is genomics data.
Maintained by Thomas Hackl. Last updated 2 months ago.
biological-datacomparative-genomicsgenomics-visualizationggplot-extensionggplot2
661 stars 9.47 score 123 scriptstbates
umx:Structural Equation Modeling and Twin Modeling in R
Quickly create, run, and report structural equation models, and twin models. See '?umx' for help, and umx_open_CRAN_page("umx") for NEWS. Timothy C. Bates, Michael C. Neale, Hermine H. Maes, (2019). umx: A library for Structural Equation and Twin Modelling in R. Twin Research and Human Genetics, 22, 27-41. <doi:10.1017/thg.2019.2>.
Maintained by Timothy C. Bates. Last updated 16 days ago.
behavior-geneticsgeneticsopenmxpsychologysemstatisticsstructural-equation-modelingtutorialstwin-modelsumx
44 stars 9.45 score 472 scriptsbioc
SpatialFeatureExperiment:Integrating SpatialExperiment with Simple Features in sf
A new S4 class integrating Simple Features with the R package sf to bring geospatial data analysis methods based on vector data to spatial transcriptomics. Also implements management of spatial neighborhood graphs and geometric operations. This pakage builds upon SpatialExperiment and SingleCellExperiment, hence methods for these parent classes can still be used.
Maintained by Lambda Moses. Last updated 2 months ago.
datarepresentationtranscriptomicsspatial
49 stars 9.40 score 322 scripts 1 dependentsusepa
tcpl:ToxCast Data Analysis Pipeline
The ToxCast Data Analysis Pipeline ('tcpl') is an R package that manages, curve-fits, plots, and stores ToxCast data to populate its linked MySQL database, 'invitrodb'. The package was developed for the chemical screening data curated by the US EPA's Toxicity Forecaster (ToxCast) program, but 'tcpl' can be used to support diverse chemical screening efforts.
Maintained by Jason Brown. Last updated 12 days ago.
36 stars 9.39 score 90 scriptspecanproject
PEcAn.data.land:PEcAn Functions Used for Ecological Forecasts and Reanalysis
The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.
Maintained by Mike Dietze. Last updated 7 hours ago.
bayesiancyberinfrastructuredata-assimilationdata-scienceecosystem-modelecosystem-scienceforecastingmeta-analysisnational-science-foundationpecanplantsjagscpp
216 stars 9.35 score 19 scripts 10 dependentsrte-antares-rpackage
antaresRead:Import, Manipulate and Explore the Results of an 'Antares' Simulation
Import, manipulate and explore results generated by 'Antares', a powerful open source software developed by RTE (Réseau de Transport d’Électricité) to simulate and study electric power systems (more information about 'Antares' here : <https://antares-simulator.org/>).
Maintained by Tatiana Vargas. Last updated 5 days ago.
infrastructuredataimportadequacybilanelectricityenergyhdf5linear-algebramonte-carlo-simulationoptimisationprevisionnelrhdf5rtesimulationtyndp
13 stars 9.32 score 148 scripts 3 dependentsbioc
GenomicInteractions:Utilities for handling genomic interaction data
Utilities for handling genomic interaction data such as ChIA-PET or Hi-C, annotating genomic features with interaction information, and producing plots and summary statistics.
Maintained by Liz Ing-Simmons. Last updated 5 months ago.
softwareinfrastructuredataimportdatarepresentationhic
7 stars 9.31 score 162 scripts 5 dependents