R-universe search: needs:DBI

r-spatial

sf:Simple Features for R

Support for simple feature access, a standardized way to encode and analyze spatial vector data. Binds to 'GDAL' <doi:10.5281/zenodo.5884351> for reading and writing data, to 'GEOS' <doi:10.5281/zenodo.11396894> for geometrical operations, and to 'PROJ' <doi:10.5281/zenodo.5884394> for projection conversions and datum transformations. Uses by default the 's2' package for geometry operations on geodetic (long/lat degree) coordinates.

Maintained by Edzer Pebesma. Last updated 5 days ago.

gdal geos proj spatial cpp

1.4k stars 22.44 score 117k scripts 1.2k dependents

tidyverse

tidyverse:Easily Install and Load the 'Tidyverse'

The 'tidyverse' is a set of packages that work in harmony because they share common data representations and 'API' design. This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step. Learn more about the 'tidyverse' at <https://www.tidyverse.org>.

Maintained by Hadley Wickham. Last updated 5 months ago.

data-science tidyverse

1.7k stars 20.23 score 664k scripts 125 dependents

tidyverse

dbplyr:A 'dplyr' Back End for Databases

A 'dplyr' back end for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features works with any database that has a 'DBI' back end; more advanced features require 'SQL' translation to be provided by the package author.

Maintained by Hadley Wickham. Last updated 4 months ago.

database

481 stars 19.72 score 5.2k scripts 736 dependents

r-dbi

RSQLite:SQLite Interface for R

Embeds the SQLite database engine in R and provides an interface compliant with the DBI package. The source for the SQLite engine and for various extensions in a recent version is included. System libraries will never be consulted because this package relies on static linking for the plugins it includes; this also ensures a consistent experience across all installations.

Maintained by Kirill Müller. Last updated 2 days ago.

database sqlite3 cpp

331 stars 18.78 score 8.1k scripts 1.1k dependents

r-spatial

stars:Spatiotemporal Arrays, Raster and Vector Data Cubes

Reading, manipulating, writing and plotting spatiotemporal arrays (raster and vector data cubes) in 'R', using 'GDAL' bindings provided by 'sf', and 'NetCDF' bindings by 'ncmeta' and 'RNetCDF'.

Maintained by Edzer Pebesma. Last updated 2 months ago.

raster satellite-images spatial

571 stars 18.27 score 7.2k scripts 137 dependents

rstudio

leaflet:Create Interactive Web Maps with the JavaScript 'Leaflet' Library

Create and customize interactive maps using the 'Leaflet' JavaScript library and the 'htmlwidgets' package. These maps can be used directly from the R console, from 'RStudio', in Shiny applications and R Markdown documents.

Maintained by Joe Cheng. Last updated 28 days ago.

gis leaflet-map spatial

821 stars 17.20 score 39k scripts 178 dependents

bioc

clusterProfiler:A universal enrichment tool for interpreting omics data

This package supports functional characteristics of both coding and non-coding genomics data for thousands of species with up-to-date gene annotation. It provides a univeral interface for gene functional annotation from a variety of sources and thus can be applied in diverse scenarios. It provides a tidy interface to access, manipulate, and visualize enrichment results to help users achieve efficient data interpretation. Datasets obtained from multiple treatments and time points can be analyzed and compared in a single run, easily revealing functional consensus and differences among distinct conditions.

Maintained by Guangchuang Yu. Last updated 4 months ago.

annotation clustering genesetenrichment go kegg multiplecomparison pathways reactome visualization enrichment-analysis gsea

1.1k stars 17.03 score 11k scripts 48 dependents

r-spatial

spdep:Spatial Dependence: Weighting Schemes, Statistics

A collection of functions to create spatial weights matrix objects from polygon 'contiguities', from point patterns by distance and tessellations, for summarizing these objects, and for permitting their use in spatial data analysis, including regional aggregation by minimum spanning tree; a collection of tests for spatial 'autocorrelation', including global 'Morans I' and 'Gearys C' proposed by 'Cliff' and 'Ord' (1973, ISBN: 0850860369) and (1981, ISBN: 0850860814), 'Hubert/Mantel' general cross product statistic, Empirical Bayes estimates and 'Assunção/Reis' (1999) <doi:10.1002/(SICI)1097-0258(19990830)18:16%3C2147::AID-SIM179%3E3.0.CO;2-I> Index, 'Getis/Ord' G ('Getis' and 'Ord' 1992) <doi:10.1111/j.1538-4632.1992.tb00261.x> and multicoloured join count statistics, 'APLE' ('Li 'et al.' ) <doi:10.1111/j.1538-4632.2007.00708.x>, local 'Moran's I', 'Gearys C' ('Anselin' 1995) <doi:10.1111/j.1538-4632.1995.tb00338.x> and 'Getis/Ord' G ('Ord' and 'Getis' 1995) <doi:10.1111/j.1538-4632.1995.tb00912.x>, 'saddlepoint' approximations ('Tiefelsdorf' 2002) <doi:10.1111/j.1538-4632.2002.tb01084.x> and exact tests for global and local 'Moran's I' ('Bivand et al.' 2009) <doi:10.1016/j.csda.2008.07.021> and 'LOSH' local indicators of spatial heteroscedasticity ('Ord' and 'Getis') <doi:10.1007/s00168-011-0492-y>. The implementation of most of these measures is described in 'Bivand' and 'Wong' (2018) <doi:10.1007/s11749-018-0599-x>, with further extensions in 'Bivand' (2022) <doi:10.1111/gean.12319>. 'Lagrange' multiplier tests for spatial dependence in linear models are provided ('Anselin et al'. 1996) <doi:10.1016/0166-0462(95)02111-6>, as are 'Rao' score tests for hypothesised spatial 'Durbin' models based on linear models ('Koley' and 'Bera' 2023) <doi:10.1080/17421772.2023.2256810>. A local indicators for categorical data (LICD) implementation based on 'Carrer et al.' (2021) <doi:10.1016/j.jas.2020.105306> and 'Bivand et al.' (2017) <doi:10.1016/j.spasta.2017.03.003> was added in 1.3-7. From 'spdep' and 'spatialreg' versions >= 1.2-1, the model fitting functions previously present in this package are defunct in 'spdep' and may be found in 'spatialreg'.

Maintained by Roger Bivand. Last updated 1 months ago.

spatial-autocorrelation spatial-dependence spatial-weights

131 stars 16.59 score 6.0k scripts 106 dependents

r-dbi

odbc:Connect to ODBC Compatible Databases (using the DBI Interface)

A DBI-compatible interface to ODBC databases.

Maintained by Hadley Wickham. Last updated 4 days ago.

database odbc unixodbc cpp

396 stars 16.31 score 2.9k scripts 23 dependents

r-tmap

tmap:Thematic Maps

Thematic maps are geographical maps in which spatial data distributions are visualized. This package offers a flexible, layer-based, and easy to use approach to create thematic maps, such as choropleths and bubble maps.

Maintained by Martijn Tennekes. Last updated 3 days ago.

choropleth-maps maps spatial thematic-maps visualisation

879 stars 16.25 score 13k scripts 24 dependents

bioc

biomaRt:Interface to BioMart databases (i.e. Ensembl)

In recent years a wealth of biological data has become available in public data repositories. Easy access to these valuable data resources and firm integration with data analysis is needed for comprehensive bioinformatics data analysis. biomaRt provides an interface to a growing collection of databases implementing the BioMart software suite (<http://www.biomart.org>). The package enables retrieval of large amounts of data in a uniform way without the need to know the underlying database schemas or write complex SQL queries. The most prominent examples of BioMart databases are maintain by Ensembl, which provides biomaRt users direct access to a diverse set of data and enables a wide range of powerful online queries from gene annotation to database mining.

Maintained by Mike Smith. Last updated 17 days ago.

annotation bioconductor biomart ensembl

38 stars 15.99 score 13k scripts 230 dependents

r-spatial

gstat:Spatial and Spatio-Temporal Geostatistical Modelling, Prediction and Simulation

Variogram modelling; simple, ordinary and universal point or block (co)kriging; spatio-temporal kriging; sequential Gaussian or indicator (co)simulation; variogram and variogram map plotting utility functions; supports sf and stars.

Maintained by Edzer Pebesma. Last updated 7 days ago.

openblas

197 stars 15.71 score 4.8k scripts 58 dependents

bioc

enrichplot:Visualization of Functional Enrichment Result

The 'enrichplot' package implements several visualization methods for interpreting functional enrichment results obtained from ORA or GSEA analysis. It is mainly designed to work with the 'clusterProfiler' package suite. All the visualization methods are developed based on 'ggplot2' graphics.

Maintained by Guangchuang Yu. Last updated 3 months ago.

annotation genesetenrichment go kegg pathways software visualization enrichment-analysis pathway-analysis

239 stars 15.71 score 3.1k scripts 58 dependents

thomasp85

gganimate:A Grammar of Animated Graphics

The grammar of graphics as implemented in the 'ggplot2' package has been successful in providing a powerful API for creating static visualisation. In order to extend the API for animated graphics this package provides a completely new set of grammar, fully compatible with 'ggplot2' for specifying transitions and animations in a flexible and extensible way.

Maintained by Thomas Lin Pedersen. Last updated 6 days ago.

animation data-visualization ggplot-extension ggplot2 transition

2.0k stars 15.53 score 13k scripts 24 dependents

ropensci

rnaturalearth:World Map Data from Natural Earth

Facilitates mapping by making natural earth map data from <https://www.naturalearthdata.com/> more easily available to R users.

Maintained by Philippe Massicotte. Last updated 15 days ago.

peer-reviewed

234 stars 15.51 score 7.2k scripts 47 dependents

bioc

GenomicFeatures:Query the gene models of a given organism/assembly

Extract the genomic locations of genes, transcripts, exons, introns, and CDS, for the gene models stored in a TxDb object. A TxDb object is a small database that contains the gene models of a given organism/assembly. Bioconductor provides a small collection of TxDb objects in the form of ready-to-install TxDb packages for the most commonly studied organisms. Additionally, the user can easily make a TxDb object (or package) for the organism/assembly of their choice by using the tools from the txdbmaker package.

Maintained by H. Pagès. Last updated 5 months ago.

genetics infrastructure annotation sequencing genomeannotation bioconductor-package core-package

26 stars 15.34 score 5.3k scripts 339 dependents

sparklyr

sparklyr:R Interface to Apache Spark

R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.

Maintained by Edgar Ruiz. Last updated 13 days ago.

apache-spark distributed dplyr ide livy machine-learning remote-clusters spark sparklyr

959 stars 15.20 score 4.0k scripts 21 dependents

bioc

AnnotationDbi:Manipulation of SQLite-based annotations in Bioconductor

Implements a user-friendly interface for querying SQLite-based annotation data packages.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

annotation microarray sequencing genomeannotation bioconductor-package core-package

9 stars 15.05 score 3.6k scripts 769 dependents

bioc

DOSE:Disease Ontology Semantic and Enrichment analysis

This package implements five methods proposed by Resnik, Schlicker, Jiang, Lin and Wang respectively for measuring semantic similarities among DO terms and gene products. Enrichment analyses including hypergeometric model and gene set enrichment analysis are also implemented for discovering disease associations of high-throughput biological data.

Maintained by Guangchuang Yu. Last updated 5 months ago.

annotation visualization multiplecomparison genesetenrichment pathways software disease-ontology enrichment-analysis semantic-similarity

119 stars 14.97 score 2.0k scripts 61 dependents

r-dbi

RPostgres:C++ Interface to PostgreSQL

Fully DBI-compliant C++-backed interface to PostgreSQL <https://www.postgresql.org/>, an open-source relational database.

Maintained by Kirill Müller. Last updated 1 months ago.

database postgres postgresql cpp

338 stars 14.78 score 1.6k scripts 31 dependents

bioc

GSVA:Gene Set Variation Analysis for Microarray and RNA-Seq Data

Gene Set Variation Analysis (GSVA) is a non-parametric, unsupervised method for estimating variation of gene set enrichment through the samples of a expression data set. GSVA performs a change in coordinate systems, transforming the data from a gene by sample matrix to a gene-set by sample matrix, thereby allowing the evaluation of pathway enrichment for each sample. This new matrix of GSVA enrichment scores facilitates applying standard analytical methods like functional enrichment, survival analysis, clustering, CNV-pathway analysis or cross-tissue pathway analysis, in a pathway-centric manner.

Maintained by Robert Castelo. Last updated 10 days ago.

functionalgenomics microarray rnaseq pathways genesetenrichment gene-set-enrichment genomics pathway-enrichment-analysis

212 stars 14.74 score 1.6k scripts 19 dependents

bioc

TCGAbiolinks:TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data

The aim of TCGAbiolinks is : i) facilitate the GDC open-access data retrieval, ii) prepare the data using the appropriate pre-processing strategies, iii) provide the means to carry out different standard analyses and iv) to easily reproduce earlier research results. In more detail, the package provides multiple methods for analysis (e.g., differential expression analysis, identifying differentially methylated regions) and methods for visualization (e.g., survival plots, volcano plots, starburst plots) in order to easily develop complete analysis pipelines.

Maintained by Tiago Chedraoui Silva. Last updated 1 months ago.

dnamethylation differentialmethylation generegulation geneexpression methylationarray differentialexpression pathways network sequencing survival software bioc bioconductor gdc integrative-analysis tcga tcga-data tcgabiolinks

310 stars 14.47 score 1.6k scripts 6 dependents

r-lidar

lidR:Airborne LiDAR Data Manipulation and Visualization for Forestry Applications

Airborne LiDAR (Light Detection and Ranging) interface for data manipulation and visualization. Read/write 'las' and 'laz' files, computation of metrics in area based approach, point filtering, artificial point reduction, classification from geographic data, normalization, individual tree segmentation and other manipulations.

Maintained by Jean-Romain Roussel. Last updated 2 months ago.

als forestry las laz lidar point-cloud remote-sensing openblas cpp openmp

623 stars 14.47 score 844 scripts 8 dependents

r-spatial

mapview:Interactive Viewing of Spatial Data in R

Quickly and conveniently create interactive visualisations of spatial data with or without background maps. Attributes of displayed features are fully queryable via pop-up windows. Additional functionality includes methods to visualise true- and false-color raster images and bounding boxes.

Maintained by Tim Appelhans. Last updated 3 months ago.

gis leaflet maps spatial visualization web-mapping

526 stars 14.39 score 7.3k scripts 27 dependents

bioc

xcms:LC-MS and GC-MS Data Analysis

Framework for processing and visualization of chromatographically separated and single-spectra mass spectral data. Imports from AIA/ANDI NetCDF, mzXML, mzData and mzML files. Preprocesses data for high-throughput, untargeted analyte profiling.

Maintained by Steffen Neumann. Last updated 17 days ago.

immunooncology massspectrometry metabolomics bioconductor feature-detection mass-spectrometry peak-detection cpp

196 stars 14.31 score 984 scripts 11 dependents

bioc

GOSemSim:GO-terms Semantic Similarity Measures

The semantic comparisons of Gene Ontology (GO) annotations provide quantitative ways to compute similarities between genes and gene groups, and have became important basis for many bioinformatics analysis approaches. GOSemSim is an R package for semantic similarity computation among GO terms, sets of GO terms, gene products and gene clusters. GOSemSim implemented five methods proposed by Resnik, Schlicker, Jiang, Lin and Wang respectively.

Maintained by Guangchuang Yu. Last updated 5 months ago.

annotation go clustering pathways network software bioinformatics gene-ontology semantic-similarity cpp

63 stars 14.12 score 708 scripts 68 dependents

bioc

ensembldb:Utilities to create and use Ensembl-based annotation databases

The package provides functions to create and use transcript centric annotation databases/packages. The annotation for the databases are directly fetched from Ensembl using their Perl API. The functionality and data is similar to that of the TxDb packages from the GenomicFeatures package, but, in addition to retrieve all gene/transcript models and annotations from the database, ensembldb provides a filter framework allowing to retrieve annotations for specific entries like genes encoded on a chromosome region or transcript models of lincRNA genes. EnsDb databases built with ensembldb contain also protein annotations and mappings between proteins and their encoding transcripts. Finally, ensembldb provides functions to map between genomic, transcript and protein coordinates.

Maintained by Johannes Rainer. Last updated 5 months ago.

genetics annotationdata sequencing coverage annotation bioconductor bioconductor-packages ensembl

35 stars 14.08 score 892 scripts 108 dependents

walkerke

tidycensus:Load US Census Boundary and Attribute Data as 'tidyverse' and 'sf'-Ready Data Frames

An integrated R interface to several United States Census Bureau APIs (<https://www.census.gov/data/developers/data-sets.html>) and the US Census Bureau's geographic boundary files. Allows R users to return Census and ACS data as tidyverse-ready data frames, and optionally returns a list-column with feature geometry for mapping and spatial analysis.

Maintained by Kyle Walker. Last updated 2 months ago.

648 stars 14.02 score 7.5k scripts 10 dependents

r-forge

survey:Analysis of Complex Survey Samples

Summary statistics, two-sample tests, rank tests, generalised linear models, cumulative link models, Cox models, loglinear models, and general maximum pseudolikelihood estimation for multistage stratified, cluster-sampled, unequally weighted survey samples. Variances by Taylor series linearisation or replicate weights. Post-stratification, calibration, and raking. Two-phase and multiphase subsampling designs. Graphics. PPS sampling without replacement. Small-area estimation. Dual-frame designs.

Maintained by "Thomas Lumley". Last updated 4 days ago.

cpp

1 stars 13.94 score 13k scripts 234 dependents

bioc

AnnotationHub:Client to access AnnotationHub resources

This package provides a client for the Bioconductor AnnotationHub web resource. The AnnotationHub web resource provides a central location where genomic files (e.g., VCF, bed, wig) and other resources from standard locations (e.g., UCSC, Ensembl) can be discovered. The resource includes metadata about each resource, e.g., a textual description, tags, and date of modification. The client creates and manages a local cache of files retrieved by the user, helping with quick and reproducible access.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

infrastructure dataimport gui thirdpartyclient core-package u24ca289073

17 stars 13.88 score 2.7k scripts 104 dependents

gergness

srvyr:'dplyr'-Like Syntax for Summary Statistics of Survey Data

Use piping, verbs like 'group_by' and 'summarize', and other 'dplyr' inspired syntactic style when calculating summary statistics on survey data using functions from the 'survey' package.

Maintained by Greg Freedman Ellis. Last updated 2 months ago.

survey

215 stars 13.88 score 1.8k scripts 15 dependents

duckdb

duckdb:DBI Package for the DuckDB Database Management System

The DuckDB project is an embedded analytical data management system with support for the Structured Query Language (SQL). This package includes all of DuckDB and an R Database Interface (DBI) connector.

Maintained by Kirill Müller. Last updated 13 days ago.

database duckdb olap cpp

159 stars 13.80 score 1.7k scripts 46 dependents

r-spatial

rgee:R Bindings for Calling the 'Earth Engine' API

Earth Engine <https://earthengine.google.com/> client library for R. All of the 'Earth Engine' API classes, modules, and functions are made available. Additional functions implemented include importing (exporting) of Earth Engine spatial objects, extraction of time series, interactive map display, assets management interface, and metadata display. See <https://r-spatial.github.io/rgee/> for further details.

Maintained by Cesar Aybar. Last updated 5 days ago.

earth-engine earthengine google-earth-engine googleearthengine spatial-analysis spatial-data

717 stars 13.77 score 1.9k scripts 3 dependents

bioc

BiocFileCache:Manage Files Across Sessions

This package creates a persistent on-disk cache of files that the user can add, update, and retrieve. It is useful for managing resources (such as custom Txdb objects) that are costly or difficult to create, web resources, and data files used across sessions.

Maintained by Lori Shepherd. Last updated 2 months ago.

dataimport core-package u24ca289073

13 stars 13.76 score 486 scripts 436 dependents

r-dbi

RMySQL:Database Interface and 'MySQL' Driver for R

Legacy 'DBI' interface to 'MySQL' / 'MariaDB' based on old code ported from S-PLUS. A modern 'MySQL' client written in 'C++' is available from the 'RMariaDB' package.

Maintained by Jeroen Ooms. Last updated 2 months ago.

database mysql

209 stars 13.68 score 3.7k scripts 15 dependents

dieghernan

tidyterra:'tidyverse' Methods and 'ggplot2' Helpers for 'terra' Objects

Extension of the 'tidyverse' for 'SpatRaster' and 'SpatVector' objects of the 'terra' package. It includes also new 'geom_' functions that provide a convenient way of visualizing 'terra' objects with 'ggplot2'.

Maintained by Diego Hernangómez. Last updated 22 hours ago.

terra ggplot-extension r-spatial rspatial

190 stars 13.59 score 1.9k scripts 26 dependents

kaz-yos

tableone:Create 'Table 1' to Describe Baseline Characteristics with or without Propensity Score Weights

Creates 'Table 1', i.e., description of baseline patient characteristics, which is essential in every medical research. Supports both continuous and categorical variables, as well as p-values and standardized mean differences. Weighted data are supported via the 'survey' package.

Maintained by Kazuki Yoshida. Last updated 3 years ago.

baseline-characteristics descriptive-statistics statistics

221 stars 13.55 score 2.3k scripts 12 dependents

bioc

Gviz:Plotting data and annotation information along genomic coordinates

Genomic data analyses requires integrated visualization of known genomic information and new experimental data. Gviz uses the biomaRt and the rtracklayer packages to perform live annotation queries to Ensembl and UCSC and translates this to e.g. gene/transcript structures in viewports of the grid graphics package. This results in genomic information plotted together with your data.

Maintained by Robert Ivanek. Last updated 5 months ago.

visualization microarray sequencing

79 stars 13.05 score 1.4k scripts 46 dependents

bioc

ChIPseeker:ChIPseeker for ChIP peak Annotation, Comparison, and Visualization

This package implements functions to retrieve the nearest genes around the peak, annotate genomic region of the peak, statstical methods for estimate the significance of overlap among ChIP peak data sets, and incorporate GEO database for user to compare the own dataset with those deposited in database. The comparison can be used to infer cooperative regulation and thus can be used to generate hypotheses. Several visualization functions are implemented to summarize the coverage of the peak experiment, average profile and heatmap of peaks binding to TSS regions, genomic annotation, distance to TSS, and overlap of peaks or genes.

Maintained by Guangchuang Yu. Last updated 5 months ago.

annotation chipseq software visualization multiplecomparison atac-seq chip-seq comparison epigenetics epigenomics

233 stars 13.05 score 1.6k scripts 5 dependents

ggrothendieck

sqldf:Manipulate R Data Frames Using SQL

The sqldf() function is typically passed a single argument which is an SQL select statement where the table names are ordinary R data frame names. sqldf() transparently sets up a database, imports the data frames into that database, performs the SQL select or other statement and returns the result using a heuristic to determine which class to assign to each column of the returned data frame. The sqldf() or read.csv.sql() functions can also be used to read filtered files into R even if the original files are larger than R itself can handle. 'RSQLite', 'RH2', 'RMySQL' and 'RPostgreSQL' backends are supported.

Maintained by G. Grothendieck. Last updated 3 years ago.

250 stars 13.04 score 8.1k scripts 52 dependents

r-spatial

spatialreg:Spatial Regression Analysis

A collection of all the estimation functions for spatial cross-sectional models (on lattice/areal data using spatial weights matrices) contained up to now in 'spdep'. These model fitting functions include maximum likelihood methods for cross-sectional models proposed by 'Cliff' and 'Ord' (1973, ISBN:0850860369) and (1981, ISBN:0850860814), fitting methods initially described by 'Ord' (1975) <doi:10.1080/01621459.1975.10480272>. The models are further described by 'Anselin' (1988) <doi:10.1007/978-94-015-7799-1>. Spatial two stage least squares and spatial general method of moment models initially proposed by 'Kelejian' and 'Prucha' (1998) <doi:10.1023/A:1007707430416> and (1999) <doi:10.1111/1468-2354.00027> are provided. Impact methods and MCMC fitting methods proposed by 'LeSage' and 'Pace' (2009) <doi:10.1201/9781420064254> are implemented for the family of cross-sectional spatial regression models. Methods for fitting the log determinant term in maximum likelihood and MCMC fitting are compared by 'Bivand et al.' (2013) <doi:10.1111/gean.12008>, and model fitting methods by 'Bivand' and 'Piras' (2015) <doi:10.18637/jss.v063.i18>; both of these articles include extensive lists of references. A recent review is provided by 'Bivand', 'Millo' and 'Piras' (2021) <doi:10.3390/math9111276>. 'spatialreg' >= 1.1-* corresponded to 'spdep' >= 1.1-1, in which the model fitting functions were deprecated and passed through to 'spatialreg', but masked those in 'spatialreg'. From versions 1.2-*, the functions have been made defunct in 'spdep'. From version 1.3-6, add Anselin-Kelejian (1997) test to `stsls` for residual spatial autocorrelation <doi:10.1177/016001769702000109>.

Maintained by Roger Bivand. Last updated 10 days ago.

bayesian impacts maximum-likelihood spatial-dependence spatial-econometrics spatial-regression openblas

46 stars 12.97 score 916 scripts 24 dependents

r-spatial

lwgeom:Bindings to Selected 'liblwgeom' Functions for Simple Features

Access to selected functions found in 'liblwgeom' <https://github.com/postgis/postgis/tree/master/liblwgeom>, the light-weight geometry library used by 'PostGIS' <http://postgis.net/>.

Maintained by Edzer Pebesma. Last updated 2 months ago.

proj geos cpp

61 stars 12.95 score 1.7k scripts 66 dependents

walkerke

tigris:Load Census TIGER/Line Shapefiles

Download TIGER/Line shapefiles from the United States Census Bureau (<https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html>) and load into R as 'sf' objects.

Maintained by Kyle Walker. Last updated 5 months ago.

331 stars 12.87 score 5.3k scripts 16 dependents

rstudio

pool:Object Pooling

Enables the creation of object pools, which make it less computationally expensive to fetch a new object. Currently the only supported pooled objects are 'DBI' connections.

Maintained by Hadley Wickham. Last updated 6 months ago.

255 stars 12.85 score 684 scripts 27 dependents

paleolimbot

ggspatial:Spatial Data Framework for ggplot2

Spatial data plus the power of the ggplot2 framework means easier mapping when input data are already in the form of spatial objects.

Maintained by Dewey Dunnington. Last updated 2 years ago.

379 stars 12.85 score 4.1k scripts 24 dependents

bioc

minfi:Analyze Illumina Infinium DNA methylation arrays

Tools to analyze & visualize Illumina Infinium methylation arrays.

Maintained by Kasper Daniel Hansen. Last updated 4 months ago.

immunooncology dnamethylation differentialmethylation epigenetics microarray methylationarray multichannel twochannel dataimport normalization preprocessing qualitycontrol

60 stars 12.82 score 996 scripts 27 dependents

bioc

SpatialExperiment:S4 Class for Spatially Resolved -omics Data

Defines an S4 class for storing data from spatial -omics experiments. The class extends SingleCellExperiment to support storage and retrieval of additional information from spot-based and molecule-based platforms, including spatial coordinates, images, and image metadata. A specialized constructor function is included for data from the 10x Genomics Visium platform.

Maintained by Dario Righelli. Last updated 5 months ago.

datarepresentation dataimport infrastructure immunooncology geneexpression transcriptomics singlecell spatial

59 stars 12.63 score 1.8k scripts 71 dependents

ohdsi

DatabaseConnector:Connecting to Various Database Platforms

An R 'DataBase Interface' ('DBI') compatible interface to various database platforms ('PostgreSQL', 'Oracle', 'Microsoft SQL Server', 'Amazon Redshift', 'Microsoft Parallel Database Warehouse', 'IBM Netezza', 'Apache Impala', 'Google BigQuery', 'Snowflake', 'Spark', 'SQLite', and 'InterSystems IRIS'). Also includes support for fetching data as 'Andromeda' objects. Uses either 'Java Database Connectivity' ('JDBC') or other 'DBI' drivers to connect to databases.

Maintained by Martijn Schuemie. Last updated 2 months ago.

hades openjdk

56 stars 12.63 score 772 scripts 11 dependents

inlabru-org

inlabru:Bayesian Latent Gaussian Modelling using INLA and Extensions

Facilitates spatial and general latent Gaussian modeling using integrated nested Laplace approximation via the INLA package (<https://www.r-inla.org>). Additionally, extends the GAM-like model class to more general nonlinear predictor expressions, and implements a log Gaussian Cox process likelihood for modeling univariate and spatial point processes based on ecological survey data. Model components are specified with general inputs and mapping methods to the latent variables, and the predictors are specified via general R expressions, with separate expressions for each observation likelihood model in multi-likelihood models. A prediction method based on fast Monte Carlo sampling allows posterior prediction of general expressions of the latent variables. Ecology-focused introduction in Bachl, Lindgren, Borchers, and Illian (2019) <doi:10.1111/2041-210X.13168>.

Maintained by Finn Lindgren. Last updated 3 hours ago.

96 stars 12.60 score 832 scripts 6 dependents

r-dbi

bigrquery:An Interface to Google's 'BigQuery' 'API'

Easily talk to Google's 'BigQuery' database from R.

Maintained by Hadley Wickham. Last updated 1 months ago.

bigquery database cpp

520 stars 12.47 score 1.8k scripts 4 dependents

r-spatial

leafem:'leaflet' Extensions for 'mapview'

Provides extensions for packages 'leaflet' & 'mapdeck', many of which are used by package 'mapview'. Focus is on functionality readily available in Geographic Information Systems such as 'Quantum GIS'. Includes functions to display coordinates of mouse pointer position, query image values via mouse pointer and zoom-to-layer buttons. Additionally, provides a feature type agnostic function to add points, lines, polygons to a map.

Maintained by Tim Appelhans. Last updated 1 months ago.

108 stars 12.41 score 704 scripts 55 dependents

trevorld

ggpattern:'ggplot2' Pattern Geoms

Provides 'ggplot2' geoms filled with various patterns. Includes a patterned version of every 'ggplot2' geom that has a region that can be filled with a pattern. Provides a suite of 'ggplot2' aesthetics and scales for controlling pattern appearances. Supports over a dozen builtin patterns (every pattern implemented by 'gridpattern') as well as allowing custom user-defined patterns.

Maintained by Trevor L. Davis. Last updated 2 months ago.

370 stars 12.36 score 1.7k scripts 3 dependents

bioc

TFBSTools:Software Package for Transcription Factor Binding Site (TFBS) Analysis

TFBSTools is a package for the analysis and manipulation of transcription factor binding sites. It includes matrices conversion between Position Frequency Matirx (PFM), Position Weight Matirx (PWM) and Information Content Matrix (ICM). It can also scan putative TFBS from sequence/alignment, query JASPAR database and provides a wrapper of de novo motif discovery software.

Maintained by Ge Tan. Last updated 19 days ago.

motifannotation generegulation motifdiscovery transcription alignment

28 stars 12.36 score 1.1k scripts 18 dependents

ropensci

stplanr:Sustainable Transport Planning

Tools for transport planning with an emphasis on spatial transport data and non-motorized modes. The package was originally developed to support the 'Propensity to Cycle Tool', a publicly available strategic cycle network planning tool (Lovelace et al. 2017) <doi:10.5198/jtlu.2016.862>, but has since been extended to support public transport routing and accessibility analysis (Moreno-Monroy et al. 2017) <doi:10.1016/j.jtrangeo.2017.08.012> and routing with locally hosted routing engines such as 'OSRM' (Lowans et al. 2023) <doi:10.1016/j.enconman.2023.117337>. The main functions are for creating and manipulating geographic "desire lines" from origin-destination (OD) data (building on the 'od' package); calculating routes on the transport network locally and via interfaces to routing services such as <https://cyclestreets.net/> (Desjardins et al. 2021) <doi:10.1007/s11116-021-10197-1>; and calculating route segment attributes such as bearing. The package implements the 'travel flow aggregration' method described in Morgan and Lovelace (2020) <doi:10.1177/2399808320942779> and the 'OD jittering' method described in Lovelace et al. (2022) <doi:10.32866/001c.33873>. Further information on the package's aim and scope can be found in the vignettes and in a paper in the R Journal (Lovelace and Ellison 2018) <doi:10.32614/RJ-2018-053>, and in a paper outlining the landscape of open source software for geographic methods in transport planning (Lovelace, 2021) <doi:10.1007/s10109-020-00342-2>.

Maintained by Robin Lovelace. Last updated 7 months ago.

cycle cycling desire-lines origin-destination peer-reviewed pubic-transport route-network routes routing spatial transport transport-planning transportation walking

427 stars 12.31 score 684 scripts 3 dependents

eliocamp

metR:Tools for Easier Analysis of Meteorological Fields

Many useful functions and extensions for dealing with meteorological data in the tidy data framework. Extends 'ggplot2' for better plotting of scalar and vector fields and provides commonly used analysis methods in the atmospheric sciences.

Maintained by Elio Campitelli. Last updated 12 days ago.

atmospheric-science ggplot2 visualization

146 stars 12.30 score 1000 scripts 22 dependents

bioc

ReactomePA:Reactome Pathway Analysis

This package provides functions for pathway analysis based on REACTOME pathway database. It implements enrichment analysis, gene set enrichment analysis and several functions for visualization. This package is not affiliated with the Reactome team.

Maintained by Guangchuang Yu. Last updated 5 months ago.

pathways visualization annotation multiplecomparison genesetenrichment reactome enrichment-analysis reactome-pathway-analysis reactomepa

40 stars 12.25 score 1.5k scripts 7 dependents

bioc

ggbio:Visualization tools for genomic data

The ggbio package extends and specializes the grammar of graphics for biological data. The graphics are designed to answer common scientific questions, in particular those often asked of high throughput genomics data. All core Bioconductor data structures are supported, where appropriate. The package supports detailed views of particular genomic regions, as well as genome-wide overviews. Supported overviews include ideograms and grand linear views. High-level plots include sequence fragment length, edge-linked interval to data view, mismatch pileup, and several splicing summaries.

Maintained by Michael Lawrence. Last updated 5 months ago.

infrastructure visualization

111 stars 12.23 score 734 scripts 16 dependents

r-dbi

RMariaDB:Database Interface and MariaDB Driver

Implements a DBI-compliant interface to MariaDB (<https://mariadb.org/>) and MySQL (<https://www.mysql.com/>) databases.

Maintained by Kirill Müller. Last updated 1 months ago.

database mariadb mysql cpp

133 stars 12.20 score 792 scripts 10 dependents

isciences

exactextractr:Fast Extraction from Raster Datasets using Polygons

Quickly and accurately summarizes raster values over polygonal areas ("zonal statistics").

Maintained by Daniel Baston. Last updated 8 months ago.

gis raster rcpp geos cpp

286 stars 12.13 score 1.4k scripts 14 dependents

tomoakin

RPostgreSQL:R Interface to the 'PostgreSQL' Database System

Database interface and 'PostgreSQL' driver for 'R'. This package provides a Database Interface 'DBI' compliant driver for 'R' to access 'PostgreSQL' database systems. In order to build and install this package from source, 'PostgreSQL' itself must be present your system to provide 'PostgreSQL' functionality via its libraries and header files. These files are provided as 'postgresql-devel' package under some Linux distributions. On 'macOS' and 'Microsoft Windows' system the attached 'libpq' library source will be used.

Maintained by Tomoaki Nishiyama. Last updated 3 days ago.

postgresql

66 stars 12.11 score 4.5k scripts 19 dependents

bioc

ExperimentHub:Client to access ExperimentHub resources

This package provides a client for the Bioconductor ExperimentHub web resource. ExperimentHub provides a central location where curated data from experiments, publications or training courses can be accessed. Each resource has associated metadata, tags and date of modification. The client creates and manages a local cache of files retrieved enabling quick and reproducible access.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

infrastructure dataimport gui thirdpartyclient core-package u24ca289073

10 stars 11.94 score 764 scripts 57 dependents

pecanproject

PEcAn.DB:PEcAn Functions Used for Ecological Forecasts and Reanalysis

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.

Maintained by David LeBauer. Last updated 10 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants

216 stars 11.91 score 127 scripts 27 dependents

hannameyer

CAST:'caret' Applications for Spatial-Temporal Models

Supporting functionality to run 'caret' with spatial or spatial-temporal data. 'caret' is a frequently used package for model training and prediction using machine learning. CAST includes functions to improve spatial or spatial-temporal modelling tasks using 'caret'. It includes the newly suggested 'Nearest neighbor distance matching' cross-validation to estimate the performance of spatial prediction models and allows for spatial variable selection to selects suitable predictor variables in view to their contribution to the spatial model performance. CAST further includes functionality to estimate the (spatial) area of applicability of prediction models. Methods are described in Meyer et al. (2018) <doi:10.1016/j.envsoft.2017.12.001>; Meyer et al. (2019) <doi:10.1016/j.ecolmodel.2019.108815>; Meyer and Pebesma (2021) <doi:10.1111/2041-210X.13650>; Milà et al. (2022) <doi:10.1111/2041-210X.13851>; Meyer and Pebesma (2022) <doi:10.1038/s41467-022-29838-9>; Linnenbrink et al. (2023) <doi:10.5194/egusphere-2023-1308>; Schumacher et al. (2024) <doi:10.5194/egusphere-2024-2730>. The package is described in detail in Meyer et al. (2024) <doi:10.48550/arXiv.2404.06978>.

Maintained by Hanna Meyer. Last updated 2 months ago.

autocorrelation caret feature-selection machine-learning overfitting predictive-modeling spatial spatio-temporal variable-selection

114 stars 11.85 score 298 scripts 1 dependents

prioritizr

prioritizr:Systematic Conservation Prioritization in R

Systematic conservation prioritization using mixed integer linear programming (MILP). It provides a flexible interface for building and solving conservation planning problems. Once built, conservation planning problems can be solved using a variety of commercial and open-source exact algorithm solvers. By using exact algorithm solvers, solutions can be generated that are guaranteed to be optimal (or within a pre-specified optimality gap). Furthermore, conservation problems can be constructed to optimize the spatial allocation of different management actions or zones, meaning that conservation practitioners can identify solutions that benefit multiple stakeholders. To solve large-scale or complex conservation planning problems, users should install the Gurobi optimization software (available from <https://www.gurobi.com/>) and the 'gurobi' R package (see Gurobi Installation Guide vignette for details). Users can also install the IBM CPLEX software (<https://www.ibm.com/products/ilog-cplex-optimization-studio/cplex-optimizer>) and the 'cplexAPI' R package (available at <https://github.com/cran/cplexAPI>). Additionally, the 'rcbc' R package (available at <https://github.com/dirkschumacher/rcbc>) can be used to generate solutions using the CBC optimization software (<https://github.com/coin-or/Cbc>). For further details, see Hanson et al. (2025) <doi:10.1111/cobi.14376>.

Maintained by Richard Schuster. Last updated 4 days ago.

biodiversity conservation conservation-planner optimization prioritization solver spatial cpp

124 stars 11.71 score 584 scripts 2 dependents

r-tmap

tmaptools:Thematic Map Tools

Set of tools for reading and processing spatial data. The aim is to supply the workflow to create thematic maps. This package also facilitates 'tmap', the package for visualizing thematic maps.

Maintained by Martijn Tennekes. Last updated 3 months ago.

42 stars 11.67 score 1.8k scripts 26 dependents

ateucher

rmapshaper:Client for 'mapshaper' for 'Geospatial' Operations

Edit and simplify 'geojson', 'Spatial', and 'sf' objects. This is wrapper around the 'mapshaper' 'JavaScript' library by Matthew Bloch <https://github.com/mbloch/mapshaper/> to perform topologically-aware polygon simplification, as well as other operations such as clipping, erasing, dissolving, and converting 'multi-part' to 'single-part' geometries.

Maintained by Andy Teucher. Last updated 9 months ago.

rlang

204 stars 11.64 score 2.1k scripts 18 dependents

pecanproject

PEcAn.data.atmosphere:PEcAn Functions Used for Managing Climate Driver Data

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The PECAn.data.atmosphere package converts climate driver data into a standard format for models integrated into PEcAn. As a standalone package, it provides an interface to access diverse climate data sets.

Maintained by David LeBauer. Last updated 10 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants

216 stars 11.63 score 64 scripts 14 dependents

bioc

bumphunter:Bump Hunter

Tools for finding bumps in genomic data

Maintained by Tamilselvi Guharaj. Last updated 5 months ago.

dnamethylation epigenetics infrastructure multiplecomparison immunooncology

16 stars 11.61 score 210 scripts 43 dependents

ggseg

ggseg:Plotting Tool for Brain Atlases

Contains 'ggplot2' geom for plotting brain atlases using simple features. The largest component of the package is the data for the two built-in atlases. Mowinckel & Vidal-Piñeiro (2020) <doi:10.1177/2515245920928009>.

Maintained by Athanasia Mo Mowinckel. Last updated 2 years ago.

221 stars 11.57 score 590 scripts 14 dependents

bioc

mia:Microbiome analysis

mia implements tools for microbiome analysis based on the SummarizedExperiment, SingleCellExperiment and TreeSummarizedExperiment infrastructure. Data wrangling and analysis in the context of taxonomic data is the main scope. Additional functions for common task are implemented such as community indices calculation and summarization.

Maintained by Tuomas Borman. Last updated 4 days ago.

microbiome software dataimport analysis bioconductor cpp

51 stars 11.51 score 316 scripts 5 dependents

darwin-eu

CDMConnector:Connect to an OMOP Common Data Model

Provides tools for working with observational health data in the Observational Medical Outcomes Partnership (OMOP) Common Data Model format with a pipe friendly syntax. Common data model database table references are stored in a single compound object along with metadata.

Maintained by Adam Black. Last updated 1 months ago.

12 stars 11.43 score 502 scripts 12 dependents

luukvdmeer

sfnetworks:Tidy Geospatial Networks

Provides a tidy approach to spatial network analysis, in the form of classes and functions that enable a seamless interaction between the network analysis package 'tidygraph' and the spatial analysis package 'sf'.

Maintained by Lucas van der Meer. Last updated 3 months ago.

geospatial-networks network-analysis rspatial simple-features spatial-analysis spatial-data-science spatial-networks tidygraph tidyverse

373 stars 11.43 score 332 scripts 7 dependents

bioc

annotate:Annotation for microarrays

Using R enviroments for annotation.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

annotation pathways go

11.41 score 812 scripts 239 dependents

bioc

VariantAnnotation:Annotation of Genetic Variants

Annotate variants, compute amino acid coding changes, predict coding outcomes.

Maintained by Bioconductor Package Maintainer. Last updated 3 months ago.

dataimport sequencing snp annotation genetics variantannotation curl bzip2 xz-utils zlib

11.39 score 1.9k scripts 152 dependents

doi-usgs

nhdplusTools:NHDPlus Tools

Tools for traversing and working with National Hydrography Dataset Plus (NHDPlus) data. All methods implemented in 'nhdplusTools' are available in the NHDPlus documentation available from the US Environmental Protection Agency <https://www.epa.gov/waterdata/basic-information>.

Maintained by David Blodgett. Last updated 1 months ago.

87 stars 11.38 score 348 scripts 5 dependents

bioc

pathview:a tool set for pathway based data integration and visualization

Pathview is a tool set for pathway based data integration and visualization. It maps and renders a wide variety of biological data on relevant pathway graphs. All users need is to supply their data and specify the target pathway. Pathview automatically downloads the pathway graph data, parses the data file, maps user data to the pathway, and render pathway graph with the mapped data. In addition, Pathview also seamlessly integrates with pathway and gene set (enrichment) analysis tools for large-scale and fully automated analysis.

Maintained by Weijun Luo. Last updated 3 days ago.

pathways graphandnetwork visualization genesetenrichment differentialexpression geneexpression microarray rnaseq genetics metabolomics proteomics systemsbiology sequencing

40 stars 11.37 score 1.6k scripts 10 dependents

ropensci

biomartr:Genomic Data Retrieval

Perform large scale genomic data retrieval and functional annotation retrieval. This package aims to provide users with a standardized way to automate genome, proteome, 'RNA', coding sequence ('CDS'), 'GFF', and metagenome retrieval from 'NCBI RefSeq', 'NCBI Genbank', 'ENSEMBL', and 'UniProt' databases. Furthermore, an interface to the 'BioMart' database (Smedley et al. (2009) <doi:10.1186/1471-2164-10-22>) allows users to retrieve functional annotation for genomic loci. In addition, users can download entire databases such as 'NCBI RefSeq' (Pruitt et al. (2007) <doi:10.1093/nar/gkl842>), 'NCBI nr', 'NCBI nt', 'NCBI Genbank' (Benson et al. (2013) <doi:10.1093/nar/gks1195>), etc. with only one command.

Maintained by Hajk-Georg Drost. Last updated 2 months ago.

biomart genomic-data-retrieval annotation-retrieval database-retrieval ncbi ensembl biological-data-retrieval ensembl-servers genome genome-annotation genome-retrieval genomics meta-analysis metagenomics ncbi-genbank peer-reviewed proteome sequenced-genomes

218 stars 11.35 score 129 scripts 3 dependents

riatelab

mapsf:Thematic Cartography

Create and integrate thematic maps in your workflow. This package helps to design various cartographic representations such as proportional symbols, choropleth or typology maps. It also offers several functions to display layout elements that improve the graphic presentation of maps (e.g. scale bar, north arrow, title, labels). 'mapsf' maps 'sf' objects on 'base' graphics.

Maintained by Timothée Giraud. Last updated 13 days ago.

cartography map spatial spatial-analysis

229 stars 11.32 score 414 scripts 12 dependents

inlabru-org

fmesher:Triangle Meshes and Related Geometry Tools

Generate planar and spherical triangle meshes, compute finite element calculations for 1- and 2-dimensional flat and curved manifolds with associated basis function spaces, methods for lines and polygons, and transparent handling of coordinate reference systems and coordinate transformation, including 'sf' and 'sp' geometries. The core 'fmesher' library code was originally part of the 'INLA' package, and implements parts of "Triangulations and Applications" by Hjelle and Daehlen (2006) <doi:10.1007/3-540-33261-8>.

Maintained by Finn Lindgren. Last updated 2 hours ago.

cpp

16 stars 11.28 score 261 scripts 26 dependents

bioc

karyoploteR:Plot customizable linear genomes displaying arbitrary data

karyoploteR creates karyotype plots of arbitrary genomes and offers a complete set of functions to plot arbitrary data on them. It mimicks many R base graphics functions coupling them with a coordinate change function automatically mapping the chromosome and data coordinates into the plot coordinates. In addition to the provided data plotting functions, it is easy to add new ones.

Maintained by Bernat Gel. Last updated 5 months ago.

visualization copynumbervariation sequencing coverage dnaseq chipseq methylseq dataimport onechannel bioconductor bioinformatics data-visualization genome genomics-visualization plotting-in-r

307 stars 11.25 score 656 scripts 4 dependents

tidyverse

duckplyr:A 'DuckDB'-Backed Version of 'dplyr'

A drop-in replacement for 'dplyr', powered by 'DuckDB' for performance. Offers convenient utilities for working with in-memory and larger-than-memory data while retaining full 'dplyr' compatibility.

Maintained by Kirill Müller. Last updated 5 days ago.

analytics dataframe dplyr duckdb performance

313 stars 11.22 score 220 scripts

ncss-tech

soilDB:Soil Database Interface

A collection of functions for reading soil data from U.S. Department of Agriculture Natural Resources Conservation Service (USDA-NRCS) and National Cooperative Soil Survey (NCSS) databases.

Maintained by Andrew Brown. Last updated 4 days ago.

kssl nasis nrcs soil soil-data-access soil-survey soilweb sql usda

86 stars 11.18 score 1.0k scripts 1 dependents

adeverse

adespatial:Multivariate Multiscale Spatial Analysis

Tools for the multiscale spatial analysis of multivariate data. Several methods are based on the use of a spatial weighting matrix and its eigenvector decomposition (Moran's Eigenvectors Maps, MEM). Several approaches are described in the review Dray et al (2012) <doi:10.1890/11-1183.1>.

Maintained by Aurélie Siberchicot. Last updated 11 days ago.

fortran openblas

36 stars 11.16 score 398 scripts 2 dependents

jamiemkass

ENMeval:Automated Tuning and Evaluations of Ecological Niche Models

Runs ecological niche models over all combinations of user-defined settings (i.e., tuning), performs cross validation to evaluate models, and returns data tables to aid in selection of optimal model settings that balance goodness-of-fit and model complexity. Also has functions to partition data spatially (or not) for cross validation, to plot multiple visualizations of results, to run null models to estimate significance and effect sizes of performance metrics, and to calculate range overlap between model predictions, among others. The package was originally built for Maxent models (Phillips et al. 2006, Phillips et al. 2017), but the current version allows possible extensions for any modeling algorithm. The extensive vignette, which guides users through most package functionality but unfortunately has a file size too big for CRAN, can be found here on the package's Github Pages website: <https://jamiemkass.github.io/ENMeval/articles/ENMeval-2.0-vignette.html>.

Maintained by Jamie M. Kass. Last updated 3 days ago.

49 stars 11.16 score 332 scripts 2 dependents

bioc

genefilter:genefilter: methods for filtering genes from high-throughput experiments

Some basic functions for filtering genes.

Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.

microarray fortran cpp

11.11 score 2.4k scripts 143 dependents

usepa

elevatr:Access Elevation Data from Various APIs

Several web services are available that provide access to elevation data. This package provides access to many of those services and returns elevation data either as an 'sf' simple features object from point elevation services or as a 'raster' object from raster elevation services. In future versions, 'elevatr' will drop support for 'raster' and will instead return 'terra' objects. Currently, the package supports access to the Amazon Web Services Terrain Tiles <https://registry.opendata.aws/terrain-tiles/>, the Open Topography Global Datasets API <https://opentopography.org/developers/>, and the USGS Elevation Point Query Service <https://apps.nationalmap.gov/epqs/>.

Maintained by Jeffrey Hollister. Last updated 7 months ago.

digital-elevation-model elevation-data elevatr epa mapzen-elevation-service r-language

206 stars 11.11 score 1.3k scripts 3 dependents

pbs-assess

sdmTMB:Spatial and Spatiotemporal SPDE-Based GLMMs with 'TMB'

Implements spatial and spatiotemporal GLMMs (Generalized Linear Mixed Effect Models) using 'TMB', 'fmesher', and the SPDE (Stochastic Partial Differential Equation) Gaussian Markov random field approximation to Gaussian random fields. One common application is for spatially explicit species distribution models (SDMs). See Anderson et al. (2024) <doi:10.1101/2022.03.24.485545>.

Maintained by Sean C. Anderson. Last updated 3 days ago.

ecology glmm spatial-analysis species-distribution-modelling tmb cpp

205 stars 11.04 score 848 scripts 1 dependents

bioc

Maaslin2:"Multivariable Association Discovery in Population-scale Meta-omics Studies"

MaAsLin2 is comprehensive R package for efficiently determining multivariable association between clinical metadata and microbial meta'omic features. MaAsLin2 relies on general linear models to accommodate most modern epidemiological study designs, including cross-sectional and longitudinal, and offers a variety of data exploration, normalization, and transformation methods. MaAsLin2 is the next generation of MaAsLin.

Maintained by Lauren McIver. Last updated 5 months ago.

metagenomics software microbiome normalization biobakery bioconductor differential-abundance-analysis false-discovery-rate multiple-covariates public repeated-measures tools

133 stars 11.03 score 532 scripts 3 dependents

ropensci

CoordinateCleaner:Automated Cleaning of Occurrence Records from Biological Collections

Automated flagging of common spatial and temporal errors in biological and paleontological collection data, for the use in conservation, ecology and paleontology. Includes automated tests to easily flag (and exclude) records assigned to country or province centroid, the open ocean, the headquarters of the Global Biodiversity Information Facility, urban areas or the location of biodiversity institutions (museums, zoos, botanical gardens, universities). Furthermore identifies per species outlier coordinates, zero coordinates, identical latitude/longitude and invalid coordinates. Also implements an algorithm to identify data sets with a significant proportion of rounded coordinates. Especially suited for large data sets. The reference for the methodology is: Zizka et al. (2019) <doi:10.1111/2041-210X.13152>.

Maintained by Alexander Zizka. Last updated 1 years ago.

82 stars 10.93 score 306 scripts 3 dependents

pdil

usmap:US Maps Including Alaska and Hawaii

Obtain United States map data frames of varying region types (e.g. county, state). The map data frames include Alaska and Hawaii conveniently placed to the bottom left, as they appear in most maps of the US. Convenience functions for plotting choropleths, visualizing spatial data, and working with FIPS codes are also provided.

Maintained by Paolo Di Lorenzo. Last updated 3 months ago.

counties data fips geodata mapping states usa

75 stars 10.89 score 1.7k scripts 2 dependents

ohdsi

PatientLevelPrediction:Develop Clinical Prediction Models Using the Common Data Model

A user friendly way to create patient level prediction models using the Observational Medical Outcomes Partnership Common Data Model. Given a cohort of interest and an outcome of interest, the package can use data in the Common Data Model to build a large set of features. These features can then be used to fit a predictive model with a number of machine learning algorithms. This is further described in Reps (2017) <doi:10.1093/jamia/ocy032>.

Maintained by Egill Fridgeirsson. Last updated 24 days ago.

hades openjdk

190 stars 10.85 score 297 scripts

ropensci

geojsonio:Convert Data from and to 'GeoJSON' or 'TopoJSON'

Convert data to 'GeoJSON' or 'TopoJSON' from various R classes, including vectors, lists, data frames, shape files, and spatial classes. 'geojsonio' does not aim to replace packages like 'sp', 'rgdal', 'rgeos', but rather aims to be a high level client to simplify conversions of data from and to 'GeoJSON' and 'TopoJSON'.

Maintained by Michael Mahoney. Last updated 1 years ago.

geojson topojson geospatial conversion data input-output io

151 stars 10.83 score 2.9k scripts 13 dependents

john-d-fox

effects:Effect Displays for Linear, Generalized Linear, and Other Models

Graphical and tabular effect displays, e.g., of interactions, for various statistical models with linear predictors.

Maintained by John Fox. Last updated 3 years ago.

6 stars 10.77 score 5.4k scripts 47 dependents

apache

apache.sedona:R Interface for Apache Sedona

R interface for 'Apache Sedona' based on 'sparklyr' (<https://sedona.apache.org>).

Maintained by Apache Sedona. Last updated 10 hours ago.

cluster-computing geospatial java python scala spatial-analysis spatial-query spatial-sql

2.0k stars 10.73 score 105 scripts

pecanproject

PEcAn.benchmark:PEcAn Functions Used for Benchmarking

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PEcAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation. The PEcAn.benchmark package provides utilities for comparing models and data, including a suite of statistical metrics and plots.

Maintained by Mike Dietze. Last updated 10 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants

216 stars 10.73 score 416 scripts 11 dependents

rstudio

pointblank:Data Validation and Organization of Metadata for Local and Remote Tables

Validate data in data frames, 'tibble' objects, 'Spark' 'DataFrames', and database tables. Validation pipelines can be made using easily-readable, consecutive validation steps. Upon execution of the validation plan, several reporting options are available. User-defined thresholds for failure rates allow for the determination of appropriate reporting actions. Many other workflows are available including an information management workflow, where the aim is to record, collect, and generate useful information on data tables.

Maintained by Richard Iannone. Last updated 5 days ago.

data-assertions data-checker data-dictionaries data-frames data-inference data-management data-profiler data-quality data-validation data-verification database-tables easy-to-understand reporting-tool schema-validation testing-tools yaml-configuration

942 stars 10.73 score 284 scripts

ropengov

giscoR:Download Map Data from GISCO API - Eurostat

Tools to download data from the GISCO (Geographic Information System of the Commission) Eurostat database <https://ec.europa.eu/eurostat/web/gisco>. Global and European map data available. This package is in no way officially related to or endorsed by Eurostat.

Maintained by Diego Hernangómez. Last updated 5 days ago.

ropengov spatial api-wrapper eurostat gisco thematic-maps eurostat-data ggplot2 gis

75 stars 10.70 score 424 scripts 5 dependents

bioc

GWASTools:Tools for Genome Wide Association Studies

Classes for storing very large GWAS data sets and annotation, and functions for GWAS data cleaning and analysis.

Maintained by Stephanie M. Gogarten. Last updated 13 days ago.

snp geneticvariability qualitycontrol microarray

17 stars 10.67 score 396 scripts 5 dependents

ohdsi

FeatureExtraction:Generating Features for a Cohort

An R interface for generating features for a cohort using data in the Common Data Model. Features can be constructed using default or custom made feature definitions. Furthermore it's possible to aggregate features and get the summary statistics.

Maintained by Ger Inberg. Last updated 11 days ago.

hades openjdk

62 stars 10.64 score 209 scripts 2 dependents

r-spatial

leafgl:High-Performance 'WebGl' Rendering for Package 'leaflet'

Provides bindings to the 'Leaflet.glify' JavaScript library which extends the 'leaflet' JavaScript library to render large data in the browser using 'WebGl'.

Maintained by Tim Appelhans. Last updated 5 months ago.

271 stars 10.63 score 157 scripts 27 dependents

bioc

tximeta:Transcript Quantification Import with Automatic Metadata

Transcript quantification import from Salmon and other quantifiers with automatic attachment of transcript ranges and release information, and other associated metadata. De novo transcriptomes can be linked to the appropriate sources with linkedTxomes and shared for computational reproducibility.

Maintained by Michael Love. Last updated 2 months ago.

annotation genomeannotation dataimport preprocessing rnaseq singlecell transcriptomics transcription geneexpression functionalgenomics reproducibleresearch reportwriting immunooncology

67 stars 10.58 score 466 scripts 1 dependents

bioc

ORFik:Open Reading Frames in Genomics

R package for analysis of transcript and translation features through manipulation of sequence data and NGS data like Ribo-Seq, RNA-Seq, TCP-Seq and CAGE. It is generalized in the sense that any transcript region can be analysed, as the name hints to it was made with investigation of ribosomal patterns over Open Reading Frames (ORFs) as it's primary use case. ORFik is extremely fast through use of C++, data.table and GenomicRanges. Package allows to reassign starts of the transcripts with the use of CAGE-Seq data, automatic shifting of RiboSeq reads, finding of Open Reading Frames for whole genomes and much more.

Maintained by Haakon Tjeldnes. Last updated 1 months ago.

immunooncology software sequencing riboseq rnaseq functionalgenomics coverage alignment dataimport cpp

33 stars 10.56 score 115 scripts 2 dependents

bioc

DECIPHER:Tools for curating, analyzing, and manipulating biological sequences

A toolset for deciphering and managing biological sequences.

Maintained by Erik Wright. Last updated 20 days ago.

clustering genetics sequencing dataimport visualization microarray qualitycontrol qpcr alignment wholegenome microbiome immunooncology geneprediction openmp

10.55 score 1.1k scripts 14 dependents

jmsigner

amt:Animal Movement Tools

Manage and analyze animal movement data. The functionality of 'amt' includes methods to calculate home ranges, track statistics (e.g. step lengths, speed, or turning angles), prepare data for fitting habitat selection analyses, and simulation of space-use from fitted step-selection functions.

Maintained by Johannes Signer. Last updated 5 months ago.

41 stars 10.54 score 418 scripts

ctmm-initiative

ctmm:Continuous-Time Movement Modeling

Functions for identifying, fitting, and applying continuous-space, continuous-time stochastic-process movement models to animal tracking data. The package is described in Calabrese et al (2016) <doi:10.1111/2041-210X.12559>, with models and methods based on those introduced and detailed in Fleming & Calabrese et al (2014) <doi:10.1086/675504>, Fleming et al (2014) <doi:10.1111/2041-210X.12176>, Fleming et al (2015) <doi:10.1103/PhysRevE.91.032107>, Fleming et al (2015) <doi:10.1890/14-2010.1>, Fleming et al (2016) <doi:10.1890/15-1607>, Péron & Fleming et al (2016) <doi:10.1186/s40462-016-0084-7>, Fleming & Calabrese (2017) <doi:10.1111/2041-210X.12673>, Péron et al (2017) <doi:10.1002/ecm.1260>, Fleming et al (2017) <doi:10.1016/j.ecoinf.2017.04.008>, Fleming et al (2018) <doi:10.1002/eap.1704>, Winner & Noonan et al (2018) <doi:10.1111/2041-210X.13027>, Fleming et al (2019) <doi:10.1111/2041-210X.13270>, Noonan & Fleming et al (2019) <doi:10.1186/s40462-019-0177-1>, Fleming et al (2020) <doi:10.1101/2020.06.12.130195>, Noonan et al (2021) <doi:10.1111/2041-210X.13597>, Fleming et al (2022) <doi:10.1111/2041-210X.13815>, Silva et al (2022) <doi:10.1111/2041-210X.13786>, Alston & Fleming et al (2023) <doi:10.1111/2041-210X.14025>.

Maintained by Christen H. Fleming. Last updated 10 days ago.

51 stars 10.54 score 534 scripts 4 dependents

datastorm-open

shinymanager:Authentication Management for 'Shiny' Applications

Simple and secure authentification mechanism for single 'Shiny' applications. Credentials can be stored in an encrypted 'SQLite' database or on your own SQL Database (Postgres, MySQL, ...). Source code of main application is protected until authentication is successful.

Maintained by Benoit Thieurmel. Last updated 11 months ago.

shiny shiny-server shinyapps

391 stars 10.51 score 316 scripts 2 dependents

bioc

ballgown:Flexible, isoform-level differential expression analysis

Tools for statistical analysis of assembled transcriptomes, including flexible differential expression analysis, visualization of transcript structures, and matching of assembled transcripts to annotation.

Maintained by Jack Fu. Last updated 5 months ago.

immunooncology rnaseq statisticalmethod preprocessing differentialexpression

145 stars 10.51 score 338 scripts 1 dependents

rvalavi

blockCV:Spatial and Environmental Blocking for K-Fold and LOO Cross-Validation

Creating spatially or environmentally separated folds for cross-validation to provide a robust error estimation in spatially structured environments; Investigating and visualising the effective range of spatial autocorrelation in continuous raster covariates and point samples to find an initial realistic distance band to separate training and testing datasets spatially described in Valavi, R. et al. (2019) <doi:10.1111/2041-210X.13107>.

Maintained by Roozbeh Valavi. Last updated 5 months ago.

cross-validation spatial spatial-cross-validation spatial-modelling species-distribution-modelling cpp

113 stars 10.49 score 302 scripts 3 dependents

riatelab

cartography:Thematic Cartography

Create and integrate maps in your R workflow. This package helps to design cartographic representations such as proportional symbols, choropleth, typology, flows or discontinuities maps. It also offers several features that improve the graphic presentation of maps, for instance, map palettes, layout elements (scale, north arrow, title...), labels or legends. See Giraud and Lambert (2017) <doi:10.1007/978-3-319-57336-6_13>.

Maintained by Timothée Giraud. Last updated 2 years ago.

cartography map thematic-maps cpp

399 stars 10.47 score 460 scripts 2 dependents

r-transit

tidytransit:Read, Validate, Analyze, and Map GTFS Feeds

Read General Transit Feed Specification (GTFS) zipfiles into a list of R dataframes. Perform validation of the data structure against the specification. Analyze the headways and frequencies at routes and stops. Create maps and perform spatial analysis on the routes and stops. Please see the GTFS documentation here for more detail: <https://gtfs.org/>.

Maintained by Flavio Poletti. Last updated 2 months ago.

gtfs public public-transport tidyverse transit transit-data transport transportation

151 stars 10.47 score 272 scripts 1 dependents

bioc

ChemmineR:Cheminformatics Toolkit for R

ChemmineR is a cheminformatics package for analyzing drug-like small molecule data in R. Its latest version contains functions for efficient processing of large numbers of molecules, physicochemical/structural property predictions, structural similarity searching, classification and clustering of compound libraries with a wide spectrum of algorithms. In addition, it offers visualization functions for compound clustering results and chemical structures.

Maintained by Thomas Girke. Last updated 5 months ago.

cheminformatics biomedicalinformatics pharmacogenetics pharmacogenomics microtitreplateassay cellbasedassays visualization infrastructure dataimport clustering proteomics metabolomics cpp

15 stars 10.45 score 253 scripts 12 dependents

bioc

GENESIS:GENetic EStimation and Inference in Structured samples (GENESIS): Statistical methods for analyzing genetic data from samples with population structure and/or relatedness

The GENESIS package provides methodology for estimating, inferring, and accounting for population and pedigree structure in genetic analyses. The current implementation provides functions to perform PC-AiR (Conomos et al., 2015, Gen Epi) and PC-Relate (Conomos et al., 2016, AJHG). PC-AiR performs a Principal Components Analysis on genome-wide SNP data for the detection of population structure in a sample that may contain known or cryptic relatedness. Unlike standard PCA, PC-AiR accounts for relatedness in the sample to provide accurate ancestry inference that is not confounded by family structure. PC-Relate uses ancestry representative principal components to adjust for population structure/ancestry and accurately estimate measures of recent genetic relatedness such as kinship coefficients, IBD sharing probabilities, and inbreeding coefficients. Additionally, functions are provided to perform efficient variance component estimation and mixed model association testing for both quantitative and binary phenotypes.

Maintained by Stephanie M. Gogarten. Last updated 2 months ago.

snp geneticvariability genetics statisticalmethod dimensionreduction principalcomponent genomewideassociation qualitycontrol biocviews

36 stars 10.44 score 342 scripts 1 dependents

bioc

oligo:Preprocessing tools for oligonucleotide arrays

A package to analyze oligonucleotide arrays (expression/SNP/tiling/exon) at probe-level. It currently supports Affymetrix (CEL files) and NimbleGen arrays (XYS files).

Maintained by Benilton Carvalho. Last updated 23 days ago.

microarray onechannel twochannel preprocessing snp differentialexpression exonarray geneexpression dataimport zlib

3 stars 10.42 score 528 scripts 10 dependents

egeulgen

pathfindR:Enrichment Analysis Utilizing Active Subnetworks

Enrichment analysis enables researchers to uncover mechanisms underlying a phenotype. However, conventional methods for enrichment analysis do not take into account protein-protein interaction information, resulting in incomplete conclusions. 'pathfindR' is a tool for enrichment analysis utilizing active subnetworks. The main function identifies active subnetworks in a protein-protein interaction network using a user-provided list of genes and associated p values. It then performs enrichment analyses on the identified subnetworks, identifying enriched terms (i.e. pathways or, more broadly, gene sets) that possibly underlie the phenotype of interest. 'pathfindR' also offers functionalities to cluster the enriched terms and identify representative terms in each cluster, to score the enriched terms per sample and to visualize analysis results. The enrichment, clustering and other methods implemented in 'pathfindR' are described in detail in Ulgen E, Ozisik O, Sezerman OU. 2019. 'pathfindR': An R Package for Comprehensive Identification of Enriched Pathways in Omics Data Through Active Subnetworks. Front. Genet. <doi:10.3389/fgene.2019.00858>.

Maintained by Ege Ulgen. Last updated 1 months ago.

active-subnetworks enrichment pathway pathway-enrichment-analysis subnetwork

187 stars 10.38 score 138 scripts

bcgov

bcdata:Search and Retrieve Data from the BC Data Catalogue

Search, query, and download tabular and 'geospatial' data from the British Columbia Data Catalogue (<https://catalogue.data.gov.bc.ca/>). Search catalogue data records based on keywords, data licence, sector, data format, and B.C. government organization. View metadata directly in R, download many data formats, and query 'geospatial' data available via the B.C. government Web Feature Service ('WFS') using 'dplyr' syntax.

Maintained by Andy Teucher. Last updated 5 days ago.

bcdc citz data-science env

83 stars 10.36 score 186 scripts 4 dependents

bioc

pRoloc:A unifying bioinformatics framework for spatial proteomics

The pRoloc package implements machine learning and visualisation methods for the analysis and interogation of quantitiative mass spectrometry data to reliably infer protein sub-cellular localisation.

Maintained by Lisa Breckels. Last updated 4 days ago.

immunooncology proteomics massspectrometry classification clustering qualitycontrol bioconductor proteomics-data spatial-proteomics visualisation openblas cpp

15 stars 10.31 score 101 scripts 2 dependents

richardli

SUMMER:Small-Area-Estimation Unit/Area Models and Methods for Estimation in R

Provides methods for spatial and spatio-temporal smoothing of demographic and health indicators using survey data, with particular focus on estimating and projecting under-five mortality rates, described in Mercer et al. (2015) <doi:10.1214/15-AOAS872>, Li et al. (2019) <doi:10.1371/journal.pone.0210645>, Wu et al. (DHS Spatial Analysis Reports No. 21, 2021), and Li et al. (2023) <doi:10.48550/arXiv.2007.05117>.

Maintained by Zehang R Li. Last updated 3 months ago.

bayesian-inference small-area-estimation space-time

23 stars 10.28 score 134 scripts 2 dependents

bioc

GSEABase:Gene set enrichment data structures and methods

This package provides classes and methods to support Gene Set Enrichment Analysis (GSEA).

Maintained by Bioconductor Package Maintainer. Last updated 2 months ago.

geneexpression genesetenrichment graphandnetwork go kegg

10.27 score 1.5k scripts 77 dependents

bioc

CAMERA:Collection of annotation related methods for mass spectrometry data

Annotation of peaklists generated by xcms, rule based annotation of isotopes and adducts, isotope validation, EIC correlation based tagging of unknown adducts and fragments

Maintained by Steffen Neumann. Last updated 5 months ago.

immunooncology massspectrometry metabolomics

11 stars 10.27 score 175 scripts 6 dependents

bioc

graphite:GRAPH Interaction from pathway Topological Environment

Graph objects from pathway topology derived from KEGG, Panther, PathBank, PharmGKB, Reactome SMPDB and WikiPathways databases.

Maintained by Gabriele Sales. Last updated 5 months ago.

pathways thirdpartyclient graphandnetwork network reactome kegg metabolomics bioinformatics mirror pathway-analysis

8 stars 10.24 score 122 scripts 21 dependents

bioc

EDASeq:Exploratory Data Analysis and Normalization for RNA-Seq

Numerical and graphical summaries of RNA-Seq read data. Within-lane normalization procedures to adjust for GC-content effect (or other gene-level effects) on read counts: loess robust local regression, global-scaling, and full-quantile normalization (Risso et al., 2011). Between-lane normalization procedures to adjust for distributional differences between lanes (e.g., sequencing depth): global-scaling and full-quantile normalization (Bullard et al., 2010).

Maintained by Davide Risso. Last updated 5 months ago.

immunooncology sequencing rnaseq preprocessing qualitycontrol differentialexpression

5 stars 10.24 score 594 scripts 9 dependents

idigbio

ridigbio:Interface to the iDigBio Data API

An interface to iDigBio's search API that allows downloading specimen records. Searches are returned as a data.frame. Other functions such as the metadata end points return lists of information. iDigBio is a US project focused on digitizing and serving museum specimen collections on the web. See <https://www.idigbio.org> for information on iDigBio.

Maintained by Jesse Bennett. Last updated 20 days ago.

16 stars 10.23 score 63 scripts 7 dependents

usepa

httk:High-Throughput Toxicokinetics

Pre-made models that can be rapidly tailored to various chemicals and species using chemical-specific in vitro data and physiological information. These tools allow incorporation of chemical toxicokinetics ("TK") and in vitro-in vivo extrapolation ("IVIVE") into bioinformatics, as described by Pearce et al. (2017) (<doi:10.18637/jss.v079.i04>). Chemical-specific in vitro data characterizing toxicokinetics have been obtained from relatively high-throughput experiments. The chemical-independent ("generic") physiologically-based ("PBTK") and empirical (for example, one compartment) "TK" models included here can be parameterized with in vitro data or in silico predictions which are provided for thousands of chemicals, multiple exposure routes, and various species. High throughput toxicokinetics ("HTTK") is the combination of in vitro data and generic models. We establish the expected accuracy of HTTK for chemicals without in vivo data through statistical evaluation of HTTK predictions for chemicals where in vivo data do exist. The models are systems of ordinary differential equations that are developed in MCSim and solved using compiled (C-based) code for speed. A Monte Carlo sampler is included for simulating human biological variability (Ring et al., 2017 <doi:10.1016/j.envint.2017.06.004>) and propagating parameter uncertainty (Wambaugh et al., 2019 <doi:10.1093/toxsci/kfz205>). Empirically calibrated methods are included for predicting tissue:plasma partition coefficients and volume of distribution (Pearce et al., 2017 <doi:10.1007/s10928-017-9548-7>). These functions and data provide a set of tools for using IVIVE to convert concentrations from high-throughput screening experiments (for example, Tox21, ToxCast) to real-world exposures via reverse dosimetry (also known as "RTK") (Wetmore et al., 2015 <doi:10.1093/toxsci/kfv171>).

Maintained by John Wambaugh. Last updated 2 months ago.

comptox ord

27 stars 10.22 score 307 scripts 1 dependents

bioc

zinbwave:Zero-Inflated Negative Binomial Model for RNA-Seq Data

Implements a general and flexible zero-inflated negative binomial model that can be used to provide a low-dimensional representations of single-cell RNA-seq data. The model accounts for zero inflation (dropouts), over-dispersion, and the count nature of the data. The model also accounts for the difference in library sizes and optionally for batch effects and/or other covariates, avoiding the need for pre-normalize the data.

Maintained by Davide Risso. Last updated 5 months ago.

immunooncology dimensionreduction geneexpression rnaseq software transcriptomics sequencing singlecell

43 stars 10.21 score 190 scripts 6 dependents

bioc

cBioPortalData:Exposes and Makes Available Data from the cBioPortal Web Resources

The cBioPortalData R package accesses study datasets from the cBio Cancer Genomics Portal. It accesses the data either from the pre-packaged zip / tar files or from the API interface that was recently implemented by the cBioPortal Data Team. The package can provide data in either tabular format or with MultiAssayExperiment object that uses familiar Bioconductor data representations.

Maintained by Marcel Ramos. Last updated 10 days ago.

software infrastructure thirdpartyclient bioconductor-package nci-itcr u24ca289073

33 stars 10.17 score 147 scripts 4 dependents

bioc

singleCellTK:Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data

The Single Cell Toolkit (SCTK) in the singleCellTK package provides an interface to popular tools for importing, quality control, analysis, and visualization of single cell RNA-seq data. SCTK allows users to seamlessly integrate tools from various packages at different stages of the analysis workflow. A general "a la carte" workflow gives users the ability access to multiple methods for data importing, calculation of general QC metrics, doublet detection, ambient RNA estimation and removal, filtering, normalization, batch correction or integration, dimensionality reduction, 2-D embedding, clustering, marker detection, differential expression, cell type labeling, pathway analysis, and data exporting. Curated workflows can be used to run Seurat and Celda. Streamlined quality control can be performed on the command line using the SCTK-QC pipeline. Users can analyze their data using commands in the R console or by using an interactive Shiny Graphical User Interface (GUI). Specific analyses or entire workflows can be summarized and shared with comprehensive HTML reports generated by Rmarkdown. Additional documentation and vignettes can be found at camplab.net/sctk.

Maintained by Joshua David Campbell. Last updated 1 months ago.

singlecell geneexpression differentialexpression alignment clustering immunooncology batcheffect normalization qualitycontrol dataimport gui

182 stars 10.17 score 252 scripts

ropensci

rdhs:API Client and Dataset Management for the Demographic and Health Survey (DHS) Data

Provides a client for (1) querying the DHS API for survey indicators and metadata (<https://api.dhsprogram.com/#/index.html>), (2) identifying surveys and datasets for analysis, (3) downloading survey datasets from the DHS website, (4) loading datasets and associate metadata into R, and (5) extracting variables and combining datasets for pooled analysis.

Maintained by OJ Watson. Last updated 1 months ago.

dataset dhs dhs-api extract peer-reviewed survey-data

37 stars 10.16 score 286 scripts 4 dependents

geoffjentry

twitteR:R Based Twitter Client

Provides an interface to the Twitter web API.

Maintained by Jeff Gentry. Last updated 9 years ago.

254 stars 10.12 score 2.0k scripts 1 dependents

ropensci

rfishbase:R Interface to 'FishBase'

A programmatic interface to 'FishBase', re-written based on an accompanying 'RESTful' API. Access tables describing over 30,000 species of fish, their biology, ecology, morphology, and more. This package also supports experimental access to 'SeaLifeBase' data, which contains nearly 200,000 species records for all types of aquatic life not covered by 'FishBase.'

Maintained by Carl Boettiger. Last updated 3 months ago.

fish fishbase taxonomy

116 stars 10.11 score 764 scripts 2 dependents

bleutner

RStoolbox:Remote Sensing Data Analysis

Toolbox for remote sensing image processing and analysis such as calculating spectral indexes, principal component transformation, unsupervised and supervised classification or fractional cover analyses.

Maintained by Konstantin Mueller. Last updated 2 months ago.

ggplot2 land-cover-mapping remote-sensing spectral-unmixing supervised-classification unsupervised-classification openblas cpp

275 stars 10.10 score 1.1k scripts

ropensci

spocc:Interface to Species Occurrence Data Sources

A programmatic interface to many species occurrence data sources, including Global Biodiversity Information Facility ('GBIF'), 'iNaturalist', 'eBird', Integrated Digitized 'Biocollections' ('iDigBio'), 'VertNet', Ocean 'Biogeographic' Information System ('OBIS'), and Atlas of Living Australia ('ALA'). Includes functionality for retrieving species occurrence data, and combining those data.

Maintained by Hannah Owens. Last updated 2 months ago.

specimens api web-services occurrences species taxonomy gbif inat vertnet ebird idigbio obis ala antweb bison data ecoengine inaturalist occurrence species-occurrence spocc

118 stars 10.09 score 552 scripts 5 dependents

jinseob2kim

jstable:Create Tables from Different Types of Regression

Create regression tables from generalized linear model(GLM), generalized estimating equation(GEE), generalized linear mixed-effects model(GLMM), Cox proportional hazards model, survey-weighted generalized linear model(svyglm) and survey-weighted Cox model results for publication.

Maintained by Jinseob Kim. Last updated 3 days ago.

label regression table

28 stars 10.08 score 199 scripts 1 dependents

murrayefford

secr:Spatially Explicit Capture-Recapture

Functions to estimate the density and size of a spatially distributed animal population sampled with an array of passive detectors, such as traps, or by searching polygons or transects. Models incorporating distance-dependent detection are fitted by maximizing the likelihood. Tools are included for data manipulation and model selection.

Maintained by Murray Efford. Last updated 5 days ago.

cpp

3 stars 10.06 score 410 scripts 5 dependents

bioc

sva:Surrogate Variable Analysis

The sva package contains functions for removing batch effects and other unwanted variation in high-throughput experiment. Specifically, the sva package contains functions for the identifying and building surrogate variables for high-dimensional data sets. Surrogate variables are covariates constructed directly from high-dimensional data (like gene expression/RNA sequencing/methylation/brain imaging data) that can be used in subsequent analyses to adjust for unknown, unmodeled, or latent sources of noise. The sva package can be used to remove artifacts in three ways: (1) identifying and estimating surrogate variables for unknown sources of variation in high-throughput experiments (Leek and Storey 2007 PLoS Genetics,2008 PNAS), (2) directly removing known batch effects using ComBat (Johnson et al. 2007 Biostatistics) and (3) removing batch effects with known control probes (Leek 2014 biorXiv). Removing batch effects and using surrogate variables in differential expression analysis have been shown to reduce dependence, stabilize error rate estimates, and improve reproducibility, see (Leek and Storey 2007 PLoS Genetics, 2008 PNAS or Leek et al. 2011 Nat. Reviews Genetics).

Maintained by Jeffrey T. Leek. Last updated 5 months ago.

immunooncology microarray statisticalmethod preprocessing multiplecomparison sequencing rnaseq batcheffect normalization

10.04 score 3.2k scripts 50 dependents

mages

ChainLadder:Statistical Methods and Models for Claims Reserving in General Insurance

Various statistical methods and models which are typically used for the estimation of outstanding claims reserves in general insurance, including those to estimate the claims development result as required under Solvency II.

Maintained by Markus Gesmann. Last updated 2 months ago.

82 stars 10.04 score 196 scripts 2 dependents

pecanproject

PEcAn.settings:PEcAn Settings package

Contains functions to read PEcAn settings files.

Maintained by David LeBauer. Last updated 10 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants

216 stars 10.03 score 54 scripts 17 dependents

bioc

BiocCheck:Bioconductor-specific package checks

BiocCheck guides maintainers through Bioconductor best practicies. It runs Bioconductor-specific package checks by searching through package code, examples, and vignettes. Maintainers are required to address all errors, warnings, and most notes produced.

Maintained by Marcel Ramos. Last updated 1 months ago.

infrastructure bioconductor-package core-services

8 stars 10.03 score 114 scripts 6 dependents

bioc

singscore:Rank-based single-sample gene set scoring method

A simple single-sample gene signature scoring method that uses rank-based statistics to analyze the sample's gene expression profile. It scores the expression activities of gene sets at a single-sample level.

Maintained by Malvika Kharbanda. Last updated 5 months ago.

software geneexpression genesetenrichment bioinformatics

41 stars 10.03 score 124 scripts 4 dependents

bioc

derfinder:Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution via the DER Finder approach

This package provides functions for annotation-agnostic differential expression analysis of RNA-seq data. Two implementations of the DER Finder approach are included in this package: (1) single base-level F-statistics and (2) DER identification at the expressed regions-level. The DER Finder approach can also be used to identify differentially bounded ChIP-seq peaks.

Maintained by Leonardo Collado-Torres. Last updated 4 months ago.

differentialexpression sequencing rnaseq chipseq differentialpeakcalling software immunooncology coverage annotation-agnostic bioconductor derfinder

42 stars 10.03 score 78 scripts 6 dependents

r-spatial

sftime:Classes and Methods for Simple Feature Objects that Have a Time Column

Classes and methods for spatial objects that have a registered time column, in particular for irregular spatiotemporal data. The time column can be of any type, but needs to be ordinal. Regularly laid out spatiotemporal data (vector or raster data cubes) are handled by package 'stars'.

Maintained by Henning Teickner. Last updated 1 months ago.

49 stars 9.99 score 27 scripts 60 dependents

pecanproject

PEcAn.assim.batch:PEcAn Functions Used for Ecological Forecasts and Reanalysis

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.

Maintained by Istem Fer. Last updated 10 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants jags cpp

216 stars 9.97 score 20 scripts 2 dependents

darwin-eu

omopgenerics:Methods and Classes for the OMOP Common Data Model

Provides definitions of core classes and methods used by analytic pipelines that query the OMOP (Observational Medical Outcomes Partnership) common data model.

Maintained by Martí Català. Last updated 24 days ago.

9.97 score 193 scripts 16 dependents

ropensci

spatsoc:Group Animal Relocation Data by Spatial and Temporal Relationship

Detects spatial and temporal groups in GPS relocations (Robitaille et al. (2019) <doi:10.1111/2041-210X.13215>). It can be used to convert GPS relocations to gambit-of-the-group format to build proximity-based social networks In addition, the randomizations function provides data-stream randomization methods suitable for GPS data.

Maintained by Alec L. Robitaille. Last updated 2 months ago.

animal gps network social spatial

24 stars 9.97 score 145 scripts 3 dependents

bioc

goseq:Gene Ontology analyser for RNA-seq and other length biased data

Detects Gene Ontology and/or other user defined categories which are over/under represented in RNA-seq data.

Maintained by Federico Marini. Last updated 5 months ago.

immunooncology sequencing go geneexpression transcription rnaseq differentialexpression annotation genesetenrichment kegg pathways software

2 stars 9.97 score 636 scripts 9 dependents

darwin-eu

PatientProfiles:Identify Characteristics of Patients in the OMOP Common Data Model

Identify the characteristics of patients in data mapped to the Observational Medical Outcomes Partnership (OMOP) common data model.

Maintained by Marti Catala. Last updated 24 days ago.

1 stars 9.97 score 225 scripts 9 dependents

pecanproject

PEcAn.priors:PEcAn Functions Used to Estimate Priors from Data

Functions to estimate priors from data.

Maintained by David LeBauer. Last updated 10 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants jags cpp

216 stars 9.96 score 13 scripts 6 dependents

bioc

rGREAT:GREAT Analysis - Functional Enrichment on Genomic Regions

GREAT (Genomic Regions Enrichment of Annotations Tool) is a type of functional enrichment analysis directly performed on genomic regions. This package implements the GREAT algorithm (the local GREAT analysis), also it supports directly interacting with the GREAT web service (the online GREAT analysis). Both analysis can be viewed by a Shiny application. rGREAT by default supports more than 600 organisms and a large number of gene set collections, as well as self-provided gene sets and organisms from users. Additionally, it implements a general method for dealing with background regions.

Maintained by Zuguang Gu. Last updated 18 days ago.

genesetenrichment go pathways software sequencing wholegenome genomeannotation coverage cpp

86 stars 9.96 score 320 scripts 1 dependents

darwin-eu

CodelistGenerator:Identify Relevant Clinical Codes and Evaluate Their Use

Generate a candidate code list for the Observational Medical Outcomes Partnership (OMOP) common data model based on string matching. For a given search strategy, a candidate code list will be returned.

Maintained by Edward Burn. Last updated 4 days ago.

14 stars 9.94 score 165 scripts 4 dependents

pecanproject

PEcAn.MA:PEcAn Functions Used for Meta-Analysis

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation. The PEcAn.MA package contains the functions used in the Bayesian meta-analysis of trait data.

Maintained by David LeBauer. Last updated 10 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants jags cpp

216 stars 9.92 score 7 scripts 7 dependents

bioc

RUVSeq:Remove Unwanted Variation from RNA-Seq Data

This package implements the remove unwanted variation (RUV) methods of Risso et al. (2014) for the normalization of RNA-Seq read counts between samples.

Maintained by Davide Risso. Last updated 5 months ago.

immunooncology differentialexpression preprocessing rnaseq software

13 stars 9.91 score 482 scripts 5 dependents

umr-amap

BIOMASS:Estimating Aboveground Biomass and Its Uncertainty in Tropical Forests

Contains functions for estimating above-ground biomass/carbon and its uncertainty in tropical forests. These functions allow to (1) retrieve and correct taxonomy, (2) estimate wood density and its uncertainty, (3) build height-diameter models, (4) manage tree and plot coordinates, (5) estimate above-ground biomass/carbon at stand level with associated uncertainty. To cite ‘BIOMASS’, please use citation(‘BIOMASS’). For more information, see Réjou-Méchain et al. (2017) <doi:10.1111/2041-210X.12753>.

Maintained by Dominique Lamonica. Last updated 10 days ago.

27 stars 9.91 score 68 scripts 1 dependents

bioc

OmnipathR:OmniPath web service client and more

A client for the OmniPath web service (https://www.omnipathdb.org) and many other resources. It also includes functions to transform and pretty print some of the downloaded data, functions to access a number of other resources such as BioPlex, ConsensusPathDB, EVEX, Gene Ontology, Guide to Pharmacology (IUPHAR/BPS), Harmonizome, HTRIdb, Human Phenotype Ontology, InWeb InBioMap, KEGG Pathway, Pathway Commons, Ramilowski et al. 2015, RegNetwork, ReMap, TF census, TRRUST and Vinayagam et al. 2011. Furthermore, OmnipathR features a close integration with the NicheNet method for ligand activity prediction from transcriptomics data, and its R implementation `nichenetr` (available only on github).

Maintained by Denes Turei. Last updated 1 months ago.

graphandnetwork network pathways software thirdpartyclient dataimport datarepresentation genesignaling generegulation systemsbiology transcriptomics singlecell annotation kegg complexes enzyme-ptm networks networks-biology omnipath proteins quarto

130 stars 9.90 score 226 scripts 2 dependents

bioc

methylumi:Handle Illumina methylation data

This package provides classes for holding and manipulating Illumina methylation data. Based on eSet, it can contain MIAME information, sample information, feature information, and multiple matrices of data. An "intelligent" import function, methylumiR can read the Illumina text files and create a MethyLumiSet. methylumIDAT can directly read raw IDAT files from HumanMethylation27 and HumanMethylation450 microarrays. Normalization, background correction, and quality control features for GoldenGate, Infinium, and Infinium HD arrays are also included.

Maintained by Sean Davis. Last updated 5 months ago.

dnamethylation twochannel preprocessing qualitycontrol cpgisland

9 stars 9.90 score 89 scripts 9 dependents

bioc

PureCN:Copy number calling and SNV classification using targeted short read sequencing

This package estimates tumor purity, copy number, and loss of heterozygosity (LOH), and classifies single nucleotide variants (SNVs) by somatic status and clonality. PureCN is designed for targeted short read sequencing data, integrates well with standard somatic variant detection and copy number pipelines, and has support for tumor samples without matching normal samples.

Maintained by Markus Riester. Last updated 20 hours ago.

copynumbervariation software sequencing variantannotation variantdetection coverage immunooncology bioconductor-package cell-free-dna copy-number loh tumor-heterogeneity tumor-mutational-burden tumor-purity

132 stars 9.88 score 40 scripts

bioc

GenVisR:Genomic Visualizations in R

Produce highly customizable publication quality graphics for genomic data primarily at the cohort level.

Maintained by Zachary Skidmore. Last updated 5 months ago.

infrastructure datarepresentation classification dnaseq

217 stars 9.87 score 76 scripts

r-spatial

leafpop:Include Tables, Images and Graphs in Leaflet Pop-Ups

Creates 'HTML' strings to embed tables, images or graphs in pop-ups of interactive maps created with packages like 'leaflet' or 'mapview'. Handles local images located on the file system or via remote URL. Handles graphs created with 'lattice' or 'ggplot2' as well as interactive plots created with 'htmlwidgets'.

Maintained by Tim Appelhans. Last updated 6 months ago.

114 stars 9.87 score 458 scripts 27 dependents

tslumley

mitools:Tools for Multiple Imputation of Missing Data

Tools to perform analyses and combine results from multiple-imputation datasets.

Maintained by Thomas Lumley. Last updated 6 years ago.

2 stars 9.83 score 716 scripts 249 dependents

thomasp85

transformr:Polygon and Path Transformations

In order to smoothly animate the transformation of polygons and paths, many aspects needs to be taken into account, such as differing number of control points, changing center of rotation, etc. The 'transformr' package provides an extensive framework for manipulating the shapes of polygons and paths and can be seen as the spatial brother to the 'tweenr' package.

Maintained by Thomas Lin Pedersen. Last updated 1 years ago.

animation data-visualization interpolation matching-shapes tweening cpp

116 stars 9.81 score 772 scripts 26 dependents

hafen

geofacet:'ggplot2' Faceting Utilities for Geographical Data

Provides geographical faceting functionality for 'ggplot2'. Geographical faceting arranges a sequence of plots of data for different geographical entities into a grid that preserves some of the geographical orientation.

Maintained by Ryan Hafen. Last updated 7 months ago.

geography ggplot2 visualization

339 stars 9.79 score 1.5k scripts 4 dependents

bioc

annotatr:Annotation of Genomic Regions to Genomic Annotations

Given a set of genomic sites/regions (e.g. ChIP-seq peaks, CpGs, differentially methylated CpGs or regions, SNPs, etc.) it is often of interest to investigate the intersecting genomic annotations. Such annotations include those relating to gene models (promoters, 5'UTRs, exons, introns, and 3'UTRs), CpGs (CpG islands, CpG shores, CpG shelves), or regulatory sequences such as enhancers. The annotatr package provides an easy way to summarize and visualize the intersection of genomic sites/regions with genomic annotations.

Maintained by Raymond G. Cavalcante. Last updated 5 months ago.

software annotation genomeannotation functionalgenomics visualization genome-annotation

26 stars 9.76 score 246 scripts 5 dependents

bioc

RTCGAToolbox:A new tool for exporting TCGA Firehose data

Managing data from large scale projects such as The Cancer Genome Atlas (TCGA) for further analysis is an important and time consuming step for research projects. Several efforts, such as Firehose project, make TCGA pre-processed data publicly available via web services and data portals but it requires managing, downloading and preparing the data for following steps. We developed an open source and extensible R based data client for Firehose pre-processed data and demonstrated its use with sample case studies. Results showed that RTCGAToolbox could improve data management for researchers who are interested with TCGA data. In addition, it can be integrated with other analysis pipelines for following data analysis.

Maintained by Marcel Ramos. Last updated 3 months ago.

differentialexpression geneexpression sequencing

18 stars 9.75 score 76 scripts 5 dependents

ohdsi

CohortConstructor:Build and Manipulate Study Cohorts Using a Common Data Model

Create and manipulate study cohorts in data mapped to the Observational Medical Outcomes Partnership Common Data Model.

Maintained by Edward Burn. Last updated 3 days ago.

2 stars 9.73 score 207 scripts 2 dependents

ropensci

osmextract:Download and Import Open Street Map Data Extracts

Match, download, convert and import Open Street Map data extracts obtained from several providers.

Maintained by Andrea Gilardi. Last updated 2 months ago.

geo geofabrik-zone open-data osm osm-pbf

173 stars 9.73 score 342 scripts

prestodb

RPresto:DBI Connector to Presto

Implements a 'DBI' compliant interface to Presto. Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes: <https://prestodb.io/>.

Maintained by Jarod G.R. Meng. Last updated 2 months ago.

132 stars 9.73 score 25 scripts 4 dependents

femiguez

apsimx:Inspect, Read, Edit and Run 'APSIM' "Next Generation" and 'APSIM' Classic

The functions in this package inspect, read, edit and run files for 'APSIM' "Next Generation" ('JSON') and 'APSIM' "Classic" ('XML'). The files with an 'apsim' extension correspond to 'APSIM' Classic (7.x) - Windows only - and the ones with an 'apsimx' extension correspond to 'APSIM' "Next Generation". For more information about 'APSIM' see (<https://www.apsim.info/>) and for 'APSIM' next generation (<https://apsimnextgeneration.netlify.app/>).

Maintained by Fernando Miguez. Last updated 12 days ago.

apsim apsimx

59 stars 9.72 score 68 scripts 2 dependents

rmaia

pavo:Perceptual Analysis, Visualization and Organization of Spectral Colour Data

A cohesive framework for the spectral and spatial analysis of colour described in Maia, Eliason, Bitton, Doucet & Shawkey (2013) <doi:10.1111/2041-210X.12069> and Maia, Gruson, Endler & White (2019) <doi:10.1111/2041-210X.13174>.

Maintained by Thomas White. Last updated 2 months ago.

72 stars 9.72 score 151 scripts 1 dependents

pecanproject

PEcAnRTM:PEcAn Functions Used for Radiative Transfer Modeling

Functions for performing forward runs and inversions of radiative transfer models (RTMs). Inversions can be performed using maximum likelihood, or more complex hierarchical Bayesian methods. Underlying numerical analyses are optimized for speed using Fortran code.

Maintained by Alexey Shiklomanov. Last updated 10 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants fortran jags cpp

216 stars 9.72 score 132 scripts

michaeldorman

nngeo:k-Nearest Neighbor Join for Spatial Data

K-nearest neighbor search for projected and non-projected 'sf' spatial layers. Nearest neighbor search uses (1) C code from 'GeographicLib' for lon-lat point layers, (2) function knn() from package 'nabor' for projected point layers, or (3) function st_distance() from package 'sf' for line or polygon layers. The package also includes several other utility functions for spatial analysis.

Maintained by Michael Dorman. Last updated 12 months ago.

81 stars 9.70 score 600 scripts 6 dependents

centerforassessment

SGP:Student Growth Percentiles & Percentile Growth Trajectories

An analytic framework for the calculation of norm- and criterion-referenced academic growth estimates using large scale, longitudinal education assessment data as developed in Betebenner (2009) <doi:10.1111/j.1745-3992.2009.00161.x>.

Maintained by Damian W. Betebenner. Last updated 13 days ago.

percentile-growth-projections quantile-regression sgp sgp-analyses student-growth-percentiles student-growth-projections

20 stars 9.69 score 88 scripts

appsilon

shiny.telemetry:'Shiny' App Usage Telemetry

Enables instrumentation of 'Shiny' apps for tracking user session events such as input changes, browser type, and session duration. These events can be sent to any of the available storage backends and analyzed using the included 'Shiny' app to gain insights about app usage and adoption.

Maintained by André Veríssimo. Last updated 4 months ago.

analytics rhinoverse shiny

67 stars 9.69 score 29 scripts

bioc

txdbmaker:Tools for making TxDb objects from genomic annotations

A set of tools for making TxDb objects from genomic annotations from various sources (e.g. UCSC, Ensembl, and GFF files). These tools allow the user to download the genomic locations of transcripts, exons, and CDS, for a given assembly, and to import them in a TxDb object. TxDb objects are implemented in the GenomicFeatures package, together with flexible methods for extracting the desired features in convenient formats.

Maintained by H. Pagès. Last updated 4 months ago.

infrastructure dataimport annotation genomeannotation genomeassembly genetics sequencing bioconductor-package core-package

3 stars 9.68 score 92 scripts 87 dependents

bioc

TCGAutils:TCGA utility functions for data management

A suite of helper functions for checking and manipulating TCGA data including data obtained from the curatedTCGAData experiment package. These functions aim to simplify and make working with TCGA data more manageable. Exported functions include those that import data from flat files into Bioconductor objects, convert row annotations, and identifier translation via the GDC API.

Maintained by Marcel Ramos. Last updated 4 months ago.

software workflowstep preprocessing dataimport bioconductor-package tcga u24ca289073 utilities

27 stars 9.66 score 210 scripts 10 dependents

plangfelder

WGCNA:Weighted Correlation Network Analysis

Functions necessary to perform Weighted Correlation Network Analysis on high-dimensional data as originally described in Horvath and Zhang (2005) <doi:10.2202/1544-6115.1128> and Langfelder and Horvath (2008) <doi:10.1186/1471-2105-9-559>. Includes functions for rudimentary data cleaning, construction of correlation networks, module identification, summarization, and relating of variables and modules to sample traits. Also includes a number of utility functions for data manipulation and visualization.

Maintained by Peter Langfelder. Last updated 6 months ago.

cpp

54 stars 9.65 score 5.3k scripts 32 dependents

vimc

orderly:Lightweight Reproducible Reporting

Order, create and store reports from R. By defining a lightweight interface around the inputs and outputs of an analysis, a lot of the repetitive work for reproducible research can be automated. We define a simple format for organising and describing work that facilitates collaborative reproducible research and acknowledges that all analyses are run multiple times over their lifespans.

Maintained by Rich FitzJohn. Last updated 2 years ago.

117 stars 9.63 score 94 scripts 4 dependents

bioc

pcaExplorer:Interactive Visualization of RNA-seq Data Using a Principal Components Approach

This package provides functionality for interactive visualization of RNA-seq datasets based on Principal Components Analysis. The methods provided allow for quick information extraction and effective data exploration. A Shiny application encapsulates the whole analysis.

Maintained by Federico Marini. Last updated 3 months ago.

immunooncology visualization rnaseq dimensionreduction principalcomponent qualitycontrol gui reportwriting shinyapps bioconductor principal-components reproducible-research rna-seq-analysis rna-seq-data shiny transcriptome user-friendly

56 stars 9.63 score 180 scripts

bioc

clusterExperiment:Compare Clusterings for Single-Cell Sequencing

Provides functionality for running and comparing many different clusterings of single-cell sequencing data or other large mRNA Expression data sets.

Maintained by Elizabeth Purdom. Last updated 5 months ago.

clustering rnaseq sequencing software singlecell cpp

38 stars 9.62 score 192 scripts 1 dependents

bioc

AnnotationForge:Tools for building SQLite-based annotation data packages

Provides code for generating Annotation packages and their databases. Packages produced are intended to be used with AnnotationDbi.

Maintained by Bioconductor Package Maintainer. Last updated 18 days ago.

annotation infrastructure bioconductor-package core-package

5 stars 9.62 score 143 scripts 19 dependents

bioc

cytomapper:Visualization of highly multiplexed imaging data in R

Highly multiplexed imaging acquires the single-cell expression of selected proteins in a spatially-resolved fashion. These measurements can be visualised across multiple length-scales. First, pixel-level intensities represent the spatial distributions of feature expression with highest resolution. Second, after segmentation, expression values or cell-level metadata (e.g. cell-type information) can be visualised on segmented cell areas. This package contains functions for the visualisation of multiplexed read-outs and cell-level information obtained by multiplexed imaging technologies. The main functions of this package allow 1. the visualisation of pixel-level information across multiple channels, 2. the display of cell-level information (expression and/or metadata) on segmentation masks and 3. gating and visualisation of single cells.

Maintained by Lasse Meyer. Last updated 5 months ago.

immunooncology software singlecell onechannel twochannel multiplecomparison normalization dataimport bioimaging imaging-mass-cytometry single-cell spatial-analysis

32 stars 9.61 score 354 scripts 5 dependents

ropensci

tidyhydat:Extract and Tidy Canadian 'Hydrometric' Data

Provides functions to access historical and real-time national 'hydrometric' data from Water Survey of Canada data sources (<https://dd.weather.gc.ca/hydrometric/csv/> and <https://collaboration.cmc.ec.gc.ca/cmc/hydrometrics/www/>) and then applies tidy data principles.

Maintained by Sam Albers. Last updated 20 days ago.

citz government-data hydrology hydrometrics tidy-data water-resources

71 stars 9.59 score 202 scripts 3 dependents

bioc

recount:Explore and download data from the recount project

Explore and download data from the recount project available at https://jhubiostatistics.shinyapps.io/recount/. Using the recount package you can download RangedSummarizedExperiment objects at the gene, exon or exon-exon junctions level, the raw counts, the phenotype metadata used, the urls to the sample coverage bigWig files or the mean coverage bigWig file for a particular study. The RangedSummarizedExperiment objects can be used by different packages for performing differential expression analysis. Using http://bioconductor.org/packages/derfinder you can perform annotation-agnostic differential expression analyses with the data from the recount project as described at http://www.nature.com/nbt/journal/v35/n4/full/nbt.3838.html.

Maintained by Leonardo Collado-Torres. Last updated 4 months ago.

coverage differentialexpression geneexpression rnaseq sequencing software dataimport immunooncology annotation-agnostic bioconductor count derfinder deseq2 exon gene human illumina junction recount

41 stars 9.57 score 498 scripts 3 dependents

jeffreyevans

spatialEco:Spatial Analysis and Modelling Utilities

Utilities to support spatial data manipulation, query, sampling and modelling in ecological applications. Functions include models for species population density, spatial smoothing, multivariate separability, point process model for creating pseudo- absences and sub-sampling, Quadrant-based sampling and analysis, auto-logistic modeling, sampling models, cluster optimization, statistical exploratory tools and raster-based metrics.

Maintained by Jeffrey S. Evans. Last updated 28 days ago.

biodiversity conservation ecology r-spatial raster spatial vector

110 stars 9.55 score 736 scripts 2 dependents

mstrimas

smoothr:Smooth and Tidy Spatial Features

Tools for smoothing and tidying spatial features (i.e. lines and polygons) to make them more aesthetically pleasing. Smooth curves, fill holes, and remove small fragments from lines and polygons.

Maintained by Matthew Strimas-Mackey. Last updated 2 years ago.

gis spatial

100 stars 9.53 score 440 scripts 9 dependents

s-u

RJDBC:Provides Access to Databases Through the JDBC Interface

The RJDBC package is an implementation of R's DBI interface using JDBC as a back-end. This allows R to connect to any DBMS that has a JDBC driver.

Maintained by Simon Urbanek. Last updated 2 years ago.

openjdk

52 stars 9.52 score 1.1k scripts 6 dependents

e-sensing

sits:Satellite Image Time Series Analysis for Earth Observation Data Cubes

An end-to-end toolkit for land use and land cover classification using big Earth observation data, based on machine learning methods applied to satellite image data cubes, as described in Simoes et al (2021) <doi:10.3390/rs13132428>. Builds regular data cubes from collections in AWS, Microsoft Planetary Computer, Brazil Data Cube, Copernicus Data Space Environment (CDSE), Digital Earth Africa, Digital Earth Australia, NASA HLS using the Spatio-temporal Asset Catalog (STAC) protocol (<https://stacspec.org/>) and the 'gdalcubes' R package developed by Appel and Pebesma (2019) <doi:10.3390/data4030092>. Supports visualization methods for images and time series and smoothing filters for dealing with noisy time series. Includes functions for quality assessment of training samples using self-organized maps as presented by Santos et al (2021) <doi:10.1016/j.isprsjprs.2021.04.014>. Includes methods to reduce training samples imbalance proposed by Chawla et al (2002) <doi:10.1613/jair.953>. Provides machine learning methods including support vector machines, random forests, extreme gradient boosting, multi-layer perceptrons, temporal convolutional neural networks proposed by Pelletier et al (2019) <doi:10.3390/rs11050523>, and temporal attention encoders by Garnot and Landrieu (2020) <doi:10.48550/arXiv.2007.00586>. Supports GPU processing of deep learning models using torch <https://torch.mlverse.org/>. Performs efficient classification of big Earth observation data cubes and includes functions for post-classification smoothing based on Bayesian inference as described by Camara et al (2024) <doi:10.3390/rs16234572>, and methods for active learning and uncertainty assessment. Supports region-based time series analysis using package supercells <https://jakubnowosad.com/supercells/>. Enables best practices for estimating area and assessing accuracy of land change as recommended by Olofsson et al (2014) <doi:10.1016/j.rse.2014.02.015>. Minimum recommended requirements: 16 GB RAM and 4 CPU dual-core.

Maintained by Gilberto Camara. Last updated 2 months ago.

big-earth-data cbers earth-observation eo-datacubes geospatial image-time-series land-cover-classification landsat planetary-computer r-spatial remote-sensing rspatial satellite-image-time-series satellite-imagery sentinel-2 stac-api stac-catalog cpp

494 stars 9.50 score 384 scripts

john-d-fox

Rcmdr:R Commander

A platform-independent basic-statistics GUI (graphical user interface) for R, based on the tcltk package.

Maintained by John Fox. Last updated 5 months ago.

4 stars 9.48 score 636 scripts 38 dependents

rqtl

qtl2:Quantitative Trait Locus Mapping in Experimental Crosses

Provides a set of tools to perform quantitative trait locus (QTL) analysis in experimental crosses. It is a reimplementation of the 'R/qtl' package to better handle high-dimensional data and complex cross designs. Broman et al. (2019) <doi:10.1534/genetics.118.301595>.

Maintained by Karl W Broman. Last updated 23 days ago.

cpp

34 stars 9.48 score 1.1k scripts 5 dependents

riatelab

maptiles:Download and Display Map Tiles

To create maps from tiles, 'maptiles' downloads, composes and displays tiles from a large number of providers (e.g. 'OpenStreetMap', 'Stadia', 'Esri', 'CARTO', or 'Thunderforest').

Maintained by Timothée Giraud. Last updated 2 months ago.

map rspatial tiles

109 stars 9.45 score 199 scripts 17 dependents

bioc

SpatialFeatureExperiment:Integrating SpatialExperiment with Simple Features in sf

A new S4 class integrating Simple Features with the R package sf to bring geospatial data analysis methods based on vector data to spatial transcriptomics. Also implements management of spatial neighborhood graphs and geometric operations. This pakage builds upon SpatialExperiment and SingleCellExperiment, hence methods for these parent classes can still be used.

Maintained by Lambda Moses. Last updated 2 months ago.

datarepresentation transcriptomics spatial

49 stars 9.40 score 322 scripts 1 dependents

usepa

tcpl:ToxCast Data Analysis Pipeline

The ToxCast Data Analysis Pipeline ('tcpl') is an R package that manages, curve-fits, plots, and stores ToxCast data to populate its linked MySQL database, 'invitrodb'. The package was developed for the chemical screening data curated by the US EPA's Toxicity Forecaster (ToxCast) program, but 'tcpl' can be used to support diverse chemical screening efforts.

Maintained by Jason Brown. Last updated 12 days ago.

ccte comptox ord

36 stars 9.39 score 90 scripts

r-barnes

dggridR:Discrete Global Grids

Spatial analyses involving binning require that every bin have the same area, but this is impossible using a rectangular grid laid over the Earth or over any projection of the Earth. Discrete global grids use hexagons, triangles, and diamonds to overcome this issue, overlaying the Earth with equally-sized bins. This package provides utilities for working with discrete global grids, along with utilities to aid in plotting such data.

Maintained by Sebastian Krantz. Last updated 6 months ago.

discrete-global-grids geospatial spatial-analysis cpp

168 stars 9.37 score 388 scripts 1 dependents

pecanproject

PEcAn.data.land:PEcAn Functions Used for Ecological Forecasts and Reanalysis

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.

Maintained by Mike Dietze. Last updated 10 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants jags cpp

216 stars 9.35 score 19 scripts 10 dependents

lindbrook

cholera:Amend, Augment and Aid Analysis of John Snow's Cholera Map

Amends errors, augments data and aids analysis of John Snow's map of the 1854 London cholera outbreak.

Maintained by lindbrook. Last updated 4 days ago.

cholera data-visualization datasets epidemiology john-snow public-health triangulation-delaunay voronoi voronoi-polygons

134 stars 9.34 score 95 scripts

ipeagit

gtfstools:General Transit Feed Specification (GTFS) Editing and Analysing Tools

Utility functions to read, manipulate, analyse and write transit feeds in the General Transit Feed Specification (GTFS) data format.

Maintained by Daniel Herszenhut. Last updated 2 months ago.

gtfs public-transport publictransport cpp

40 stars 9.31 score 126 scripts 3 dependents

bioc

GenomicInteractions:Utilities for handling genomic interaction data

Utilities for handling genomic interaction data such as ChIA-PET or Hi-C, annotating genomic features with interaction information, and producing plots and summary statistics.

Maintained by Liz Ing-Simmons. Last updated 5 months ago.

software infrastructure dataimport datarepresentation hic

7 stars 9.31 score 162 scripts 5 dependents

ohdsi

Andromeda:Asynchronous Disk-Based Representation of Massive Data

Storing very large data objects on a local drive, while still making it possible to manipulate the data in an efficient manner.

Maintained by Martijn Schuemie. Last updated 7 months ago.

hades

11 stars 9.29 score 57 scripts 8 dependents

bioc

EWCE:Expression Weighted Celltype Enrichment

Used to determine which cell types are enriched within gene lists. The package provides tools for testing enrichments within simple gene lists (such as human disease associated genes) and those resulting from differential expression studies. The package does not depend upon any particular Single Cell Transcriptome dataset and user defined datasets can be loaded in and used in the analyses.

Maintained by Alan Murphy. Last updated 2 months ago.

geneexpression transcription differentialexpression genesetenrichment genetics microarray mrnamicroarray onechannel rnaseq biomedicalinformatics proteomics visualization functionalgenomics singlecell deconvolution single-cell single-cell-rna-seq transcriptomics

56 stars 9.29 score 99 scripts

bioc

CNEr:CNE Detection and Visualization

Large-scale identification and advanced visualization of sets of conserved noncoding elements.

Maintained by Ge Tan. Last updated 5 months ago.

generegulation visualization dataimport

3 stars 9.28 score 35 scripts 19 dependents

bioc

IsoformSwitchAnalyzeR:Identify, Annotate and Visualize Isoform Switches with Functional Consequences from both short- and long-read RNA-seq data

Analysis of alternative splicing and isoform switches with predicted functional consequences (e.g. gain/loss of protein domains etc.) from quantification of all types of RNASeq by tools such as Kallisto, Salmon, StringTie, Cufflinks/Cuffdiff etc.

Maintained by Kristoffer Vitting-Seerup. Last updated 5 months ago.

geneexpression transcription alternativesplicing differentialexpression differentialsplicing visualization statisticalmethod transcriptomevariant biomedicalinformatics functionalgenomics systemsbiology transcriptomics rnaseq annotation functionalprediction geneprediction dataimport multiplecomparison batcheffect immunooncology

108 stars 9.26 score 125 scripts

mapme-initiative

mapme.biodiversity:Efficient Monitoring of Global Biodiversity Portfolios

Biodiversity areas, especially primary forest, serve a multitude of functions for local economy, regional functionality of the ecosystems as well as the global health of our planet. Recently, adverse changes in human land use practices and climatic responses to increased greenhouse gas emissions, put these biodiversity areas under a variety of different threats. The present package helps to analyse a number of biodiversity indicators based on freely available geographical datasets. It supports computational efficient routines that allow the analysis of potentially global biodiversity portfolios. The primary use case of the package is to support evidence based reporting of an organization's effort to protect biodiversity areas under threat and to identify regions were intervention is most duly needed.

Maintained by Darius A. Görgen. Last updated 3 days ago.

environment eo gis mapme spatial sustainability

35 stars 9.24 score 287 scripts

ropensci

stats19:Work with Open Road Traffic Casualty Data from Great Britain

Tools to help download, process and analyse the UK road collision data collected using the 'STATS19' form. The datasets are provided as 'CSV' files with detailed road safety information about the circumstances of car crashes and other incidents on the roads resulting in casualties in Great Britain from 1979 to present. Tables are available on 'colissions' with the circumstances (e.g. speed limit of road), information about 'vehicles' involved (e.g. type of vehicle), and 'casualties' (e.g. age). The statistics relate only to events on public roads that were reported to the police, and subsequently recorded, using the 'STATS19' collision reporting form. See the Department for Transport website <https://www.data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-accidents-safety-data> for more information on these datasets. The package is described in a paper in the Journal of Open Source Software (Lovelace et al. 2019) <doi:10.21105/joss.01181>. See Gilardi et al. (2022) <doi:10.1111/rssa.12823>, Vidal-Tortosa et al. (2021) <doi:10.1016/j.jth.2021.101291>, and Tait et al. (2023) <doi:10.1016/j.aap.2022.106895> for examples of how the data can be used for methodological and empirical road safety research.

Maintained by Robin Lovelace. Last updated 3 months ago.

stats19 road-safety transport car-crashes ropensci data

64 stars 9.20 score 193 scripts