Showing 200 of total 2168 results (show query)
ropensci
CoordinateCleaner:Automated Cleaning of Occurrence Records from Biological Collections
Automated flagging of common spatial and temporal errors in biological and paleontological collection data, for the use in conservation, ecology and paleontology. Includes automated tests to easily flag (and exclude) records assigned to country or province centroid, the open ocean, the headquarters of the Global Biodiversity Information Facility, urban areas or the location of biodiversity institutions (museums, zoos, botanical gardens, universities). Furthermore identifies per species outlier coordinates, zero coordinates, identical latitude/longitude and invalid coordinates. Also implements an algorithm to identify data sets with a significant proportion of rounded coordinates. Especially suited for large data sets. The reference for the methodology is: Zizka et al. (2019) <doi:10.1111/2041-210X.13152>.
Maintained by Alexander Zizka. Last updated 1 years ago.
40.9 match 82 stars 10.93 score 306 scripts 3 dependentsbioc
OmnipathR:OmniPath web service client and more
A client for the OmniPath web service (https://www.omnipathdb.org) and many other resources. It also includes functions to transform and pretty print some of the downloaded data, functions to access a number of other resources such as BioPlex, ConsensusPathDB, EVEX, Gene Ontology, Guide to Pharmacology (IUPHAR/BPS), Harmonizome, HTRIdb, Human Phenotype Ontology, InWeb InBioMap, KEGG Pathway, Pathway Commons, Ramilowski et al. 2015, RegNetwork, ReMap, TF census, TRRUST and Vinayagam et al. 2011. Furthermore, OmnipathR features a close integration with the NicheNet method for ligand activity prediction from transcriptomics data, and its R implementation `nichenetr` (available only on github).
Maintained by Denes Turei. Last updated 20 days ago.
graphandnetworknetworkpathwayssoftwarethirdpartyclientdataimportdatarepresentationgenesignalinggeneregulationsystemsbiologytranscriptomicssinglecellannotationkeggcomplexesenzyme-ptmnetworksnetworks-biologyomnipathproteinsquarto
31.1 match 126 stars 9.90 score 226 scripts 2 dependentsbioc
gDRutils:A package with helper functions for processing drug response data
This package contains utility functions used throughout the gDR platform to fit data, manipulate data, and convert and validate data structures. This package also has the necessary default constants for gDR platform. Many of the functions are utilized by the gDRcore package.
Maintained by Arkadiusz Gladki. Last updated 6 days ago.
40.0 match 2 stars 7.40 score 3 scripts 3 dependentsropensci
lingtypology:Linguistic Typology and Mapping
Provides R with the Glottolog database <https://glottolog.org/> and some more abilities for purposes of linguistic mapping. The Glottolog database contains the catalogue of languages of the world. This package helps researchers to make a linguistic maps, using philosophy of the Cross-Linguistic Linked Data project <https://clld.org/>, which allows for while at the same time facilitating uniform access to the data across publications. A tutorial for this package is available on GitHub pages <https://docs.ropensci.org/lingtypology/> and package vignette. Maps created by this package can be used both for the investigation and linguistic teaching. In addition, package provides an ability to download data from typological databases such as WALS, AUTOTYP and some others and to create your own database website.
Maintained by George Moroz. Last updated 5 months ago.
abvdafboatlasautotypebivaltypclldglottolog-databaselinguistic-mapslinguisticsphoiblesailstypologywals
30.7 match 51 stars 9.58 score 694 scriptsbioc
BridgeDbR:Code for using BridgeDb identifier mapping framework from within R
Use BridgeDb functions and load identifier mapping databases in R. It uses GitHub, Zenodo, and Figshare if you use this package to download identifier mappings files.
Maintained by Egon Willighagen. Last updated 4 months ago.
softwareannotationmetabolomicscheminformaticsbioconductor-packagebridgedbgenesidentifierslife-sciencesmetabolitesproteinsopenjdk
36.5 match 4 stars 6.97 score 43 scriptsropensci
webchem:Chemical Information from the Web
Chemical information from around the web. This package interacts with a suite of web services for chemical information. Sources include: Alan Wood's Compendium of Pesticide Common Names, Chemical Identifier Resolver, ChEBI, Chemical Translation Service, ChemSpider, ETOX, Flavornet, NIST Chemistry WebBook, OPSIN, PubChem, SRS, Wikidata.
Maintained by Tamás Stirling. Last updated 3 months ago.
cas-numberchemical-informationchemspideridentifierropensciwebscraping
23.2 match 165 stars 10.31 score 173 scripts 10 dependentsropensci
taxize:Taxonomic Information from Around the Web
Interacts with a suite of web application programming interfaces (API) for taxonomic tasks, such as getting database specific taxonomic identifiers, verifying species names, getting taxonomic hierarchies, fetching downstream and upstream taxonomic names, getting taxonomic synonyms, converting scientific to common names and vice versa, and more. Some of the services supported include 'NCBI E-utilities' (<https://www.ncbi.nlm.nih.gov/books/NBK25501/>), 'Encyclopedia of Life' (<https://eol.org/docs/what-is-eol/data-services>), 'Global Biodiversity Information Facility' (<https://techdocs.gbif.org/en/openapi/>), and many more. Links to the API documentation for other supported services are available in the documentation for their respective functions in this package.
Maintained by Zachary Foster. Last updated 13 days ago.
taxonomybiologynomenclaturejsonapiwebapi-clientidentifiersspeciesnamesapi-wrapperbiodiversitydarwincoredatataxize
15.6 match 274 stars 13.63 score 1.6k scripts 23 dependentsedjnet
tidywikidatar:Explore 'Wikidata' Through Tidy Data Frames
Query 'Wikidata' API <https://www.wikidata.org/wiki/Wikidata:Main_Page> with ease, get tidy data frames in response, and cache data in a local database.
Maintained by Giorgio Comai. Last updated 8 months ago.
25.1 match 26 stars 7.76 score 46 scripts 2 dependentsjl5000
tidyged:Handle GEDCOM Files Using Tidyverse Principles
Create and summarise family tree GEDCOM files using tidy dataframes.
Maintained by Jamie Lendrum. Last updated 3 years ago.
31.4 match 8 stars 5.96 score 23 scripts 3 dependentsdataoneorg
dataone:R Interface to the DataONE REST API
Provides read and write access to data and metadata from the DataONE network <https://www.dataone.org> of data repositories. Each DataONE repository implements a consistent repository application programming interface. Users call methods in R to access these remote repository functions, such as methods to query the metadata catalog, get access to metadata for particular data packages, and read the data objects from the data repository. Users can also insert and update data objects on repositories that support these methods.
Maintained by Matthew B. Jones. Last updated 3 years ago.
18.7 match 36 stars 9.93 score 472 scripts 3 dependentsbioc
biomaRt:Interface to BioMart databases (i.e. Ensembl)
In recent years a wealth of biological data has become available in public data repositories. Easy access to these valuable data resources and firm integration with data analysis is needed for comprehensive bioinformatics data analysis. biomaRt provides an interface to a growing collection of databases implementing the BioMart software suite (<http://www.biomart.org>). The package enables retrieval of large amounts of data in a uniform way without the need to know the underlying database schemas or write complex SQL queries. The most prominent examples of BioMart databases are maintain by Ensembl, which provides biomaRt users direct access to a diverse set of data and enables a wide range of powerful online queries from gene annotation to database mining.
Maintained by Mike Smith. Last updated 3 days ago.
annotationbioconductorbiomartensembl
11.4 match 38 stars 15.99 score 13k scripts 230 dependentsbioc
goProfiles:goProfiles: an R package for the statistical analysis of functional profiles
The package implements methods to compare lists of genes based on comparing the corresponding 'functional profiles'.
Maintained by Alex Sanchez. Last updated 5 months ago.
annotationgogeneexpressiongenesetenrichmentgraphandnetworkmicroarraymultiplecomparisonpathwayssoftware
31.0 match 5.48 score 6 scripts 1 dependentsbioc
ELMER:Inferring Regulatory Element Landscapes and Transcription Factor Networks Using Cancer Methylomes
ELMER is designed to use DNA methylation and gene expression from a large number of samples to infere regulatory element landscape and transcription factor network in primary tissue.
Maintained by Tiago Chedraoui Silva. Last updated 5 months ago.
dnamethylationgeneexpressionmotifannotationsoftwaregeneregulationtranscriptionnetwork
21.0 match 7.42 score 176 scriptsolink-proteomics
OlinkAnalyze:Facilitate Analysis of Proteomic Data from Olink
A collection of functions to facilitate analysis of proteomic data from Olink, primarily NPX data that has been exported from Olink Software. The functions also work on QUANT data from Olink by log- transforming the QUANT data. The functions are focused on reading data, facilitating data wrangling and quality control analysis, performing statistical analysis and generating figures to visualize the results of the statistical analysis. The goal of this package is to help users extract biological insights from proteomic data run on the Olink platform.
Maintained by Kathleen Nevola. Last updated 21 days ago.
olinkproteomicsproteomics-data-analysis
16.0 match 104 stars 9.72 score 61 scriptsbioc
DTA:Dynamic Transcriptome Analysis
Dynamic Transcriptome Analysis (DTA) can monitor the cellular response to perturbations with higher sensitivity and temporal resolution than standard transcriptomics. The package implements the underlying kinetic modeling approach capable of the precise determination of synthesis- and decay rates from individual microarray or RNAseq measurements.
Maintained by Bjoern Schwalb. Last updated 5 months ago.
microarraydifferentialexpressiongeneexpressiontranscription
31.6 match 4.78 score 5 scripts 1 dependentsreside-ic
ids:Generate Random Identifiers
Generate random or human readable and pronounceable identifiers.
Maintained by Rich FitzJohn. Last updated 3 years ago.
11.3 match 94 stars 13.27 score 175 scripts 165 dependentsramiromagno
quincunx:REST API Client for the 'PGS' Catalog
Programmatic access to the 'PGS' Catalog. This package provides easy access to 'PGS' Catalog data by accessing the REST API <https://www.pgscatalog.org/rest/>.
Maintained by Ramiro Magno. Last updated 3 years ago.
ebigwaspolygenic-risk-scorespolygenic-scores
47.5 match 14 stars 3.10 score 18 scriptskbroman
qtl:Tools for Analyzing QTL Experiments
Analysis of experimental crosses to identify genes (called quantitative trait loci, QTLs) contributing to variation in quantitative traits. Broman et al. (2003) <doi:10.1093/bioinformatics/btg112>.
Maintained by Karl W Broman. Last updated 7 months ago.
11.4 match 80 stars 12.79 score 2.4k scripts 29 dependentsmicrosoft
wpa:Tools for Analysing and Visualising Viva Insights Data
Opinionated functions that enable easier and faster analysis of Viva Insights data. There are three main types of functions in 'wpa': (i) Standard functions create a 'ggplot' visual or a summary table based on a specific Viva Insights metric; (2) Report Generation functions generate HTML reports on a specific analysis area, e.g. Collaboration; (3) Other miscellaneous functions cover more specific applications (e.g. Subject Line text mining) of Viva Insights data. This package adheres to 'tidyverse' principles and works well with the pipe syntax. 'wpa' is built with the beginner-to-intermediate R users in mind, and is optimised for simplicity.
Maintained by Martin Chan. Last updated 4 months ago.
21.5 match 30 stars 6.69 score 39 scripts 1 dependentslucaweihs
SEMID:Identifiability of Linear Structural Equation Models
Provides routines to check identifiability or non-identifiability of linear structural equation models as described in Drton, Foygel, and Sullivant (2011) <doi:10.1214/10-AOS859>, Foygel, Draisma, and Drton (2012) <doi:10.1214/12-AOS1012>, and other works. The routines are based on the graphical representation of structural equation models.
Maintained by Nils Sturma. Last updated 2 years ago.
35.1 match 4 stars 4.06 score 29 scriptsjiang-junyao
CACIMAR:cross-species analysis of cell identities, markers and regulations
A toolkit to perform cross-species analysis based on scRNA-seq data. CACIMAR contains 5 main features. (1) identify Markers in each cluster. (2) Cell type annotaion (3) identify conserved markers. (4) identify conserved cell types. (5) identify conserved modules of regulatory networks.
Maintained by Junyao Jiang. Last updated 3 months ago.
cross-species-analysisscrna-seq
25.5 match 12 stars 5.26 score 6 scriptsbioc
xcms:LC-MS and GC-MS Data Analysis
Framework for processing and visualization of chromatographically separated and single-spectra mass spectral data. Imports from AIA/ANDI NetCDF, mzXML, mzData and mzML files. Preprocesses data for high-throughput, untargeted analyte profiling.
Maintained by Steffen Neumann. Last updated 4 days ago.
immunooncologymassspectrometrymetabolomicsbioconductorfeature-detectionmass-spectrometrypeak-detectioncpp
9.1 match 196 stars 14.31 score 984 scripts 11 dependentsdarwin-eu
omopgenerics:Methods and Classes for the OMOP Common Data Model
Provides definitions of core classes and methods used by analytic pipelines that query the OMOP (Observational Medical Outcomes Partnership) common data model.
Maintained by Martí Català. Last updated 11 days ago.
12.7 match 9.97 score 193 scripts 16 dependentsbioc
MsQuality:MsQuality - Quality metric calculation from Spectra and MsExperiment objects
The MsQuality provides functionality to calculate quality metrics for mass spectrometry-derived, spectral data at the per-sample level. MsQuality relies on the mzQC framework of quality metrics defined by the Human Proteom Organization-Proteomics Standards Initiative (HUPO-PSI). These metrics quantify the quality of spectral raw files using a controlled vocabulary. The package is especially addressed towards users that acquire mass spectrometry data on a large scale (e.g. data sets from clinical settings consisting of several thousands of samples). The MsQuality package allows to calculate low-level quality metrics that require minimum information on mass spectrometry data: retention time, m/z values, and associated intensities. MsQuality relies on the Spectra package, or alternatively the MsExperiment package, and its infrastructure to store spectral data.
Maintained by Thomas Naake. Last updated 2 months ago.
metabolomicsproteomicsmassspectrometryqualitycontrolmass-spectrometryqc
22.5 match 7 stars 5.45 score 2 scriptsbioc
singleCellTK:Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data
The Single Cell Toolkit (SCTK) in the singleCellTK package provides an interface to popular tools for importing, quality control, analysis, and visualization of single cell RNA-seq data. SCTK allows users to seamlessly integrate tools from various packages at different stages of the analysis workflow. A general "a la carte" workflow gives users the ability access to multiple methods for data importing, calculation of general QC metrics, doublet detection, ambient RNA estimation and removal, filtering, normalization, batch correction or integration, dimensionality reduction, 2-D embedding, clustering, marker detection, differential expression, cell type labeling, pathway analysis, and data exporting. Curated workflows can be used to run Seurat and Celda. Streamlined quality control can be performed on the command line using the SCTK-QC pipeline. Users can analyze their data using commands in the R console or by using an interactive Shiny Graphical User Interface (GUI). Specific analyses or entire workflows can be summarized and shared with comprehensive HTML reports generated by Rmarkdown. Additional documentation and vignettes can be found at camplab.net/sctk.
Maintained by Joshua David Campbell. Last updated 25 days ago.
singlecellgeneexpressiondifferentialexpressionalignmentclusteringimmunooncologybatcheffectnormalizationqualitycontroldataimportgui
11.9 match 181 stars 10.16 score 252 scriptsbioc
decontam:Identify Contaminants in Marker-gene and Metagenomics Sequencing Data
Simple statistical identification of contaminating sequence features in marker-gene or metagenomics data. Works on any kind of feature derived from environmental sequencing data (e.g. ASVs, OTUs, taxonomic groups, MAGs,...). Requires DNA quantitation data or sequenced negative control samples.
Maintained by Benjamin Callahan. Last updated 5 months ago.
immunooncologymicrobiomesequencingclassificationmetagenomicsampliconbioinformaticscontaminationmetabarcoding
10.6 match 153 stars 11.42 score 524 scripts 6 dependentsmicrosoft
vivainsights:Analyze and Visualize Data from 'Microsoft Viva Insights'
Provides a versatile range of functions, including exploratory data analysis, time-series analysis, organizational network analysis, and data validation, whilst at the same time implements a set of best practices in analyzing and visualizing data specific to 'Microsoft Viva Insights'.
Maintained by Martin Chan. Last updated 25 days ago.
19.7 match 11 stars 6.12 score 68 scriptsopenbiox
UCSCXenaShiny:Interactive Analysis of UCSC Xena Data
Provides functions and a Shiny application for downloading, analyzing and visualizing datasets from UCSC Xena (<http://xena.ucsc.edu/>), which is a collection of UCSC-hosted public databases such as TCGA, ICGC, TARGET, GTEx, CCLE, and others.
Maintained by Shixiang Wang. Last updated 4 months ago.
cancer-datasetshiny-appsucsc-xena
14.0 match 96 stars 8.54 score 35 scriptsbioc
rhdf5:R Interface to HDF5
This package provides an interface between HDF5 and R. HDF5's main features are the ability to store and access very large and/or complex datasets and a wide variety of metadata on mass storage (disk) through a completely portable file format. The rhdf5 package is thus suited for the exchange of large and/or complex datasets between R and other software package, and for letting R applications work on datasets that are larger than the available RAM.
Maintained by Mike Smith. Last updated 2 months ago.
infrastructuredataimporthdf5rhdf5opensslcurlzlibcpp
7.2 match 62 stars 15.93 score 4.2k scripts 232 dependentsr-forge
FME:A Flexible Modelling Environment for Inverse Modelling, Sensitivity, Identifiability and Monte Carlo Analysis
Provides functions to help in fitting models to data, to perform Monte Carlo, sensitivity and identifiability analysis. It is intended to work with models be written as a set of differential equations that are solved either by an integration routine from package 'deSolve', or a steady-state solver from package 'rootSolve'. However, the methods can also be used with other types of functions.
Maintained by Karline Soetaert. Last updated 2 years ago.
13.3 match 8.62 score 382 scripts 9 dependentssciviews
pastecs:Package for Analysis of Space-Time Ecological Series
Regularisation, decomposition and analysis of space-time series. The pastecs R package is a PNEC-Art4 and IFREMER (Benoit Beliaeff <Benoit.Beliaeff@ifremer.fr>) initiative to bring PASSTEC 2000 functionalities to R.
Maintained by Philippe Grosjean. Last updated 1 years ago.
11.0 match 4 stars 10.34 score 2.1k scripts 13 dependentscharlie86
spotifyr:R Wrapper for the 'Spotify' Web API
An R wrapper for pulling data from the 'Spotify' Web API <https://developer.spotify.com/documentation/web-api/> in bulk, or post items on a 'Spotify' user's playlist.
Maintained by Daniel Antal. Last updated 5 months ago.
music-information-retrievalspotify
13.3 match 374 stars 8.54 score 936 scriptsdeepayan
lattice:Trellis Graphics for R
A powerful and elegant high-level data visualization system inspired by Trellis graphics, with an emphasis on multivariate data. Lattice is sufficient for typical graphics needs, and is also flexible enough to handle most nonstandard requirements. See ?Lattice for an introduction.
Maintained by Deepayan Sarkar. Last updated 11 months ago.
6.3 match 68 stars 17.33 score 27k scripts 13k dependentspmartr
pmartR:Panomics Marketplace - Quality Control and Statistical Analysis for Panomics Data
Provides functionality for quality control processing and statistical analysis of mass spectrometry (MS) omics data, in particular proteomic (either at the peptide or the protein level), lipidomic, and metabolomic data, as well as RNA-seq based count data and nuclear magnetic resonance (NMR) data. This includes data transformation, specification of groups that are to be compared against each other, filtering of features and/or samples, data normalization, data summarization (correlation, PCA), and statistical comparisons between defined groups. Implements methods described in: Webb-Robertson et al. (2014) <doi:10.1074/mcp.M113.030932>. Webb-Robertson et al. (2011) <doi:10.1002/pmic.201100078>. Matzke et al. (2011) <doi:10.1093/bioinformatics/btr479>. Matzke et al. (2013) <doi:10.1002/pmic.201200269>. Polpitiya et al. (2008) <doi:10.1093/bioinformatics/btn217>. Webb-Robertson et al. (2010) <doi:10.1021/pr1005247>.
Maintained by Lisa Bramer. Last updated 4 days ago.
data-summarizationlipidsmass-spectrometrymetabolitesmetabolomics-datapeptidesproteinsrna-seq-analysisopenblascpp
13.9 match 40 stars 7.69 score 144 scriptsnutriverse
nipnTK:National Information Platforms for Nutrition Anthropometric Data Toolkit
An implementation of the National Information Platforms for Nutrition or NiPN's analytic methods for assessing quality of anthropometric datasets that include measurements of weight, height or length, middle upper arm circumference, sex and age. The focus is on anthropometric status but many of the presented methods could be applied to other variables.
Maintained by Ernest Guevarra. Last updated 5 months ago.
anthropometrydata-qualitynipnnutrition
17.5 match 5 stars 5.92 score 28 scripts 1 dependentssjmack
HLAtools:Toolkit for HLA Immunogenomics
A toolkit for the analysis and management of data for genes in the so-called "Human Leukocyte Antigen" (HLA) region. Functions extract reference data from the Anthony Nolan HLA Informatics Group/ImmunoGeneTics HLA 'GitHub' repository (ANHIG/IMGTHLA) <https://github.com/ANHIG/IMGTHLA>, validate Genotype List (GL) Strings, convert between UNIFORMAT and GL String Code (GLSC) formats, translate HLA alleles and GLSCs across ImmunoPolymorphism Database (IPD) IMGT/HLA Database release versions, identify differences between pairs of alleles at a locus, generate customized, multi-position sequence alignments, trim and convert allele-names across nomenclature epochs, and extend existing data-analysis methods.
Maintained by Steven Mack. Last updated 14 days ago.
16.6 match 4 stars 6.21 score 7 scripts 1 dependentsspatstat
spatstat.geom:Geometrical Functionality of the 'spatstat' Family
Defines spatial data types and supports geometrical operations on them. Data types include point patterns, windows (domains), pixel images, line segment patterns, tessellations and hyperframes. Capabilities include creation and manipulation of data (using command line or graphical interaction), plotting, geometrical operations (rotation, shift, rescale, affine transformation), convex hull, discretisation and pixellation, Dirichlet tessellation, Delaunay triangulation, pairwise distances, nearest-neighbour distances, distance transform, morphological operations (erosion, dilation, closing, opening), quadrat counting, geometrical measurement, geometrical covariance, colour maps, calculus on spatial domains, Gaussian blur, level sets of images, transects of images, intersections between objects, minimum distance matching. (Excludes spatial data on a network, which are supported by the package 'spatstat.linnet'.)
Maintained by Adrian Baddeley. Last updated 20 hours ago.
classes-and-objectsdistance-calculationgeometrygeometry-processingimagesmensurationplottingpoint-patternsspatial-dataspatial-data-analysis
8.2 match 7 stars 12.11 score 241 scripts 227 dependentsbioc
clustifyr:Classifier for Single-cell RNA-seq Using Cell Clusters
Package designed to aid in classifying cells from single-cell RNA sequencing data using external reference data (e.g., bulk RNA-seq, scRNA-seq, microarray, gene lists). A variety of correlation based methods and gene list enrichment methods are provided to assist cell type assignment.
Maintained by Rui Fu. Last updated 5 months ago.
singlecellannotationsequencingmicroarraygeneexpressionassign-identitiesclustersmarker-genesrna-seqsingle-cell-rna-seq
10.0 match 119 stars 9.63 score 296 scriptstroyhernandez
tinyspotifyr:Tinyverse R Wrapper for the 'Spotify' Web API
An R wrapper for the 'Spotify' Web API <https://developer.spotify.com/web-api/>.
Maintained by Troy Hernandez. Last updated 1 years ago.
19.8 match 13 stars 4.81 score 5 scriptsr-dbi
DBI:R Database Interface
A database interface definition for communication between R and relational database management systems. All classes in this package are virtual and need to be extended by the various R/DBMS implementations.
Maintained by Kirill Müller. Last updated 3 months ago.
4.5 match 302 stars 20.88 score 19k scripts 2.9k dependentstidyverse
dplyr:A Grammar of Data Manipulation
A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
Maintained by Hadley Wickham. Last updated 14 days ago.
3.6 match 4.8k stars 24.68 score 659k scripts 7.8k dependentsbioc
KEGGREST:Client-side REST access to the Kyoto Encyclopedia of Genes and Genomes (KEGG)
A package that provides a client interface to the Kyoto Encyclopedia of Genes and Genomes (KEGG) REST API. Only for academic use by academic users belonging to academic institutions (see <https://www.kegg.jp/kegg/rest/>). Note that KEGGREST is based on KEGGSOAP by J. Zhang, R. Gentleman, and Marc Carlson, and KEGG (python package) by Aurelien Mazurie.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
annotationpathwaysthirdpartyclientkeggbioconductor-packagecore-package
6.1 match 9 stars 14.46 score 688 scripts 775 dependentsekstroem
dataMaid:A Suite of Checks for Identification of Potential Errors in a Data Frame as Part of the Data Screening Process
Data screening is an important first step of any statistical analysis. dataMaid auto generates a customizable data report with a thorough summary of the checks and the results that a human can use to identify possible errors. It provides an extendable suite of test for common potential errors in a dataset.
Maintained by Claus Thorn Ekstrøm. Last updated 3 years ago.
data-cleaningdata-screeningreproducible-research
11.7 match 143 stars 7.53 score 236 scriptsatlasoflivingaustralia
galah:Biodiversity Data from the GBIF Node Network
The Global Biodiversity Information Facility ('GBIF', <https://www.gbif.org>) sources data from an international network of data providers, known as 'nodes'. Several of these nodes - the "living atlases" (<https://living-atlases.gbif.org>) - maintain their own web services using software originally developed by the Atlas of Living Australia ('ALA', <https://www.ala.org.au>). 'galah' enables the R community to directly access data and resources hosted by 'GBIF' and its partner nodes.
Maintained by Martin Westgate. Last updated 1 months ago.
9.6 match 43 stars 9.17 score 275 scripts 1 dependentsrkillick
changepoint:Methods for Changepoint Detection
Implements various mainstream and specialised changepoint methods for finding single and multiple changepoints within data. Many popular non-parametric and frequentist methods are included. The cpt.mean(), cpt.var(), cpt.meanvar() functions should be your first point of call.
Maintained by Rebecca Killick. Last updated 4 months ago.
7.9 match 133 stars 11.05 score 736 scripts 40 dependentsadeckmyn
maps:Draw Geographical Maps
Display of maps. Projection code and larger maps are in separate packages ('mapproj' and 'mapdata').
Maintained by Alex Deckmyn. Last updated 2 months ago.
5.9 match 24 stars 14.70 score 19k scripts 490 dependentstherneau
survival:Survival Analysis
Contains the core survival analysis routines, including definition of Surv objects, Kaplan-Meier and Aalen-Johansen (multi-state) curves, Cox models, and parametric accelerated failure time models.
Maintained by Terry M Therneau. Last updated 3 months ago.
4.3 match 400 stars 20.43 score 29k scripts 3.9k dependentscboettig
contentid:An Interface for Content-Based Identifiers
An interface for creating, registering, and resolving content-based identifiers for data management. Content-based identifiers rely on the 'cryptographic' hashes to refer to the files they identify, thus, anyone possessing the file can compute the identifier using a well-known standard algorithm, such as 'SHA256'. By registering a URL at which the content is accessible to a public archive (such as Hash Archive) or depositing data in a scientific repository such 'Zenodo', 'DataONE' or 'SoftwareHeritage', the content identifier can serve many functions typically associated with A Digital Object Identifier ('DOI'). Unlike location-based identifiers like 'DOIs', content-based identifiers permit the same content to be registered in many locations.
Maintained by Carl Boettiger. Last updated 2 months ago.
11.3 match 46 stars 7.59 score 108 scripts 3 dependentsbioc
RCy3:Functions to Access and Control Cytoscape
Vizualize, analyze and explore networks using Cytoscape via R. Anything you can do using the graphical user interface of Cytoscape, you can now do with a single RCy3 function.
Maintained by Alex Pico. Last updated 5 months ago.
visualizationgraphandnetworkthirdpartyclientnetwork
6.4 match 52 stars 13.39 score 628 scripts 15 dependentsbioc
GOexpress:Visualise microarray and RNAseq data using gene ontology annotations
The package contains methods to visualise the expression profile of genes from a microarray or RNA-seq experiment, and offers a supervised clustering approach to identify GO terms containing genes with expression levels that best classify two or more predefined groups of samples. Annotations for the genes present in the expression dataset may be obtained from Ensembl through the biomaRt package, if not provided by the user. The default random forest framework is used to evaluate the capacity of each gene to cluster samples according to the factor of interest. Finally, GO terms are scored by averaging the rank (alternatively, score) of their respective gene sets to cluster the samples. P-values may be computed to assess the significance of GO term ranking. Visualisation function include gene expression profile, gene ontology-based heatmaps, and hierarchical clustering of experimental samples using gene expression data.
Maintained by Kevin Rue-Albrecht. Last updated 5 months ago.
softwaregeneexpressiontranscriptiondifferentialexpressiongenesetenrichmentdatarepresentationclusteringtimecoursemicroarraysequencingrnaseqannotationmultiplecomparisonpathwaysgovisualizationimmunooncologybioconductorbioconductor-packagebioconductor-statsgeneontologygeneset-enrichment
12.3 match 9 stars 6.75 score 31 scriptsmarksendak
constellation:Identify Event Sequences Using Time Series Joins
Examine any number of time series data frames to identify instances in which various criteria are met within specified time frames. In clinical medicine, these types of events are often called "constellations of signs and symptoms", because a single condition depends on a series of events occurring within a certain amount of time of each other. This package was written to work with any number of time series data frames and is optimized for speed to work well with data frames with millions of rows.
Maintained by Mark Sendak. Last updated 6 years ago.
electronic-health-recordelectronic-health-recordshealthcarepatientstimeseries
17.2 match 6 stars 4.76 score 19 scriptsbioc
MassSpecWavelet:Peak Detection for Mass Spectrometry data using wavelet-based algorithms
Peak Detection in Mass Spectrometry data is one of the important preprocessing steps. The performance of peak detection affects subsequent processes, including protein identification, profile alignment and biomarker identification. Using Continuous Wavelet Transform (CWT), this package provides a reliable algorithm for peak detection that does not require any type of smoothing or previous baseline correction method, providing more consistent results for different spectra. See <doi:10.1093/bioinformatics/btl355} for further details.
Maintained by Sergio Oller Moreno. Last updated 3 months ago.
immunooncologymassspectrometryproteomicspeakdetection
8.7 match 9 stars 9.38 score 37 scripts 17 dependentsrosieluain
mort:Identifying Potential Mortalities and Expelled Tags in Aquatic Acoustic Telemetry Arrays
A toolkit for identifying potential mortalities and expelled tags in aquatic acoustic telemetry arrays. Designed for arrays with non-overlapping receivers.
Maintained by Rosie Smith. Last updated 6 months ago.
acoustic-telemetryaquaticmortality-estimation
15.2 match 4 stars 5.35 score 16 scriptsropensci
oai:General Purpose 'Oai-PMH' Services Client
A general purpose client to work with any 'OAI-PMH' (Open Archives Initiative Protocol for 'Metadata' Harvesting) service. The 'OAI-PMH' protocol is described at <http://www.openarchives.org/OAI/openarchivesprotocol.html>. Functions are provided to work with the 'OAI-PMH' verbs: 'GetRecord', 'Identify', 'ListIdentifiers', 'ListMetadataFormats', 'ListRecords', and 'ListSets'.
Maintained by Michal Bojanowski. Last updated 2 years ago.
data-accessoai-pmhpeer-reviewedscholarly-api
9.4 match 15 stars 8.55 score 23 scripts 24 dependentssvilsen
STRMPS:Analysis of Short Tandem Repeat (STR) Massively Parallel Sequencing (MPS) Data
Loading, identifying, aggregating, manipulating, and analysing short tandem repeat regions of massively parallel sequencing data in forensic genetics. The analyses and framework implemented in this package relies on the papers of Vilsen et al. (2017) <doi:10.1016/j.fsigen.2017.01.017> and Vilsen et al. (2018) <doi:10.1016/j.fsigen.2018.04.003>. Note: that the parallelisation in the package relies on mclapply() and, thus, speed-ups will only be seen on UNIX based systems.
Maintained by Søren B. Vilsen. Last updated 4 days ago.
biostringspwalignshortreadiranges
18.4 match 4.30 scoreropensci
datapack:A Flexible Container to Transport and Manipulate Data and Associated Resources
Provides a flexible container to transport and manipulate complex sets of data. These data may consist of multiple data files and associated meta data and ancillary files. Individual data objects have associated system level meta data, and data files are linked together using the OAI-ORE standard resource map which describes the relationships between the files. The OAI- ORE standard is described at <https://www.openarchives.org/ore/>. Data packages can be serialized and transported as structured files that have been created following the BagIt specification. The BagIt specification is described at <https://tools.ietf.org/html/draft-kunze-bagit-08>.
Maintained by Matthew B. Jones. Last updated 3 years ago.
9.2 match 44 stars 8.56 score 195 scripts 4 dependentsbioc
Gviz:Plotting data and annotation information along genomic coordinates
Genomic data analyses requires integrated visualization of known genomic information and new experimental data. Gviz uses the biomaRt and the rtracklayer packages to perform live annotation queries to Ensembl and UCSC and translates this to e.g. gene/transcript structures in viewports of the grid graphics package. This results in genomic information plotted together with your data.
Maintained by Robert Ivanek. Last updated 5 months ago.
visualizationmicroarraysequencing
6.0 match 79 stars 13.08 score 1.4k scripts 48 dependentsmoosa-r
rbioapi:User-Friendly R Interface to Biologic Web Services' API
Currently fully supports Enrichr, JASPAR, miEAA, PANTHER, Reactome, STRING, and UniProt! The goal of rbioapi is to provide a user-friendly and consistent interface to biological databases and services. In a way that insulates the user from the technicalities of using web services API and creates a unified and easy-to-use interface to biological and medical web services. This is an ongoing project; New databases and services will be added periodically. Feel free to suggest any databases or services you often use.
Maintained by Moosa Rezwani. Last updated 1 months ago.
api-clientbioinformaticsbiologyenrichmentenrichment-analysisenrichrjasparmieaaover-representation-analysispantherreactomestringuniprot
10.2 match 20 stars 7.60 score 55 scriptsropensci
ritis:Integrated Taxonomic Information System Client
An interface to the Integrated Taxonomic Information System ('ITIS') (<https://www.itis.gov>). Includes functions to work with the 'ITIS' REST API methods (<https://www.itis.gov/ws_description.html>), as well as the 'Solr' web service (<https://www.itis.gov/solr_documentation.html>).
Maintained by Julia Blum. Last updated 1 months ago.
taxonomybiologynomenclaturejsonapiwebapi-clientidentifiersspeciesnamesapi-wrapperitistaxize
10.0 match 16 stars 7.72 score 64 scripts 24 dependentsdataobservatory-eu
dataset:Create Data Frames that are Easier to Exchange and Reuse
The aim of the 'dataset' package is to make tidy datasets easier to release, exchange and reuse. It organizes and formats data frame 'R' objects into well-referenced, well-described, interoperable datasets into release and reuse ready form.
Maintained by Daniel Antal. Last updated 21 days ago.
9.9 match 15 stars 7.81 score 76 scripts 1 dependentsncbi-hackathons
geneHummus:A Pipeline to Define Gene Families in Legumes and Beyond
A pipeline with high specificity and sensitivity in extracting proteins from the RefSeq database (National Center for Biotechnology Information). Manual identification of gene families is highly time-consuming and laborious, requiring an iterative process of manual and computational analysis to identify members of a given family. The pipelines implements an automatic approach for the identification of gene families based on the conserved domains that specifically define that family. See Die et al. (2018) <doi:10.1101/436659> for more information and examples.
Maintained by Jose V. Die. Last updated 5 years ago.
18.3 match 8 stars 4.20 score 3 scriptsbioc
topGO:Enrichment Analysis for Gene Ontology
topGO package provides tools for testing GO terms while accounting for the topology of the GO graph. Different test statistics and different methods for eliminating local similarities and dependencies between GO terms can be implemented and applied.
Maintained by Adrian Alexa. Last updated 5 months ago.
8.5 match 8.96 score 2.0k scripts 20 dependentseltoulemonde
dataPreparation:Automated Data Preparation
Do most of the painful data preparation for a data science project with a minimum amount of code; Take advantages of 'data.table' efficiency and use some algorithmic trick in order to perform data preparation in a time and RAM efficient way.
Maintained by Emmanuel-Lin Toulemonde. Last updated 2 years ago.
data-preparationdata-preprocessingdata-sciencedate-conversionspeedvariable-eliminationvariable-selection
13.9 match 31 stars 5.46 score 86 scriptsbioc
segmentSeq:Methods for identifying small RNA loci from high-throughput sequencing data
High-throughput sequencing technologies allow the production of large volumes of short sequences, which can be aligned to the genome to create a set of matches to the genome. By looking for regions of the genome which to which there are high densities of matches, we can infer a segmentation of the genome into regions of biological significance. The methods in this package allow the simultaneous segmentation of data from multiple samples, taking into account replicate data, in order to create a consensus segmentation. This has obvious applications in a number of classes of sequencing experiments, particularly in the discovery of small RNA loci and novel mRNA transcriptome discovery.
Maintained by Samuel Granjeaud. Last updated 4 months ago.
multiplecomparisonsequencingalignmentdifferentialexpressionqualitycontroldataimport
12.3 match 6.17 score 42 scriptshenrikbengtsson
R.utils:Various Programming Utilities
Utility functions useful when programming and developing R packages.
Maintained by Henrik Bengtsson. Last updated 1 years ago.
5.5 match 63 stars 13.74 score 5.7k scripts 814 dependentsts404
WikidataR:Read-Write API Client Library for Wikidata
Read from, interrogate, and write to Wikidata <https://www.wikidata.org> - the multilingual, interdisciplinary, semantic knowledgebase. Includes functions to: read from Wikidata (single items, properties, or properties); query Wikidata (retrieving all items that match a set of criteria via Wikidata SPARQL query service); write to Wikidata (adding new items or statements via QuickStatements); and handle and manipulate Wikidata objects (as lists and tibbles). Uses the Wikidata and QuickStatements APIs.
Maintained by Thomas Shafee. Last updated 2 months ago.
8.6 match 22 stars 8.64 score 109 scripts 25 dependentsrnabioco
valr:Genome Interval Arithmetic
Read and manipulate genome intervals and signals. Provides functionality similar to command-line tool suites within R, enabling interactive analysis and visualization of genome-scale data. Riemondy et al. (2017) <doi:10.12688/f1000research.11997.1>.
Maintained by Kent Riemondy. Last updated 8 days ago.
bedtoolsgenomeinterval-arithmeticcpp
7.6 match 90 stars 9.69 score 227 scriptssantikka
causaleffect:Deriving Expressions of Joint Interventional Distributions and Transport Formulas in Causal Models
Functions for identification and transportation of causal effects. Provides a conditional causal effect identification algorithm (IDC) by Shpitser, I. and Pearl, J. (2006) <http://ftp.cs.ucla.edu/pub/stat_ser/r329-uai.pdf>, an algorithm for transportability from multiple domains with limited experiments by Bareinboim, E. and Pearl, J. (2014) <http://ftp.cs.ucla.edu/pub/stat_ser/r443.pdf>, and a selection bias recovery algorithm by Bareinboim, E. and Tian, J. (2015) <http://ftp.cs.ucla.edu/pub/stat_ser/r445.pdf>. All of the previously mentioned algorithms are based on a causal effect identification algorithm by Tian , J. (2002) <http://ftp.cs.ucla.edu/pub/stat_ser/r309.pdf>.
Maintained by Santtu Tikka. Last updated 2 years ago.
causal-inferencecausal-modelscausality-algorithmsdirected-acyclic-graphgraphsidentifiabilityidentificationigraph
13.8 match 29 stars 5.28 score 44 scripts 1 dependentsekstroem
dataReporter:Reproducible Data Screening Checks and Report of Possible Errors
Data screening is an important first step of any statistical analysis. 'dataReporter' auto generates a customizable data report with a thorough summary of the checks and the results that a human can use to identify possible errors. It provides an extendable suite of test for common potential errors in a dataset. See Petersen AH, Ekstrøm CT (2019). "dataMaid: Your Assistant for Documenting Supervised Data Quality Screening in R." _Journal of Statistical Software_, *90*(6), 1-38 <doi:10.18637/jss.v090.i06> for more information.
Maintained by Claus Thorn Ekstrøm. Last updated 2 years ago.
11.7 match 86 stars 6.16 score 34 scriptsdoi-usgs
sbtools:USGS ScienceBase Tools
Tools for interacting with U.S. Geological Survey ScienceBase <https://www.sciencebase.gov> interfaces. ScienceBase is a data cataloging and collaborative data management platform. Functions included for querying ScienceBase, and creating and fetching datasets.
Maintained by David Blodgett. Last updated 10 months ago.
9.0 match 21 stars 7.94 score 127 scripts 2 dependentswalkerke
tidycensus:Load US Census Boundary and Attribute Data as 'tidyverse' and 'sf'-Ready Data Frames
An integrated R interface to several United States Census Bureau APIs (<https://www.census.gov/data/developers/data-sets.html>) and the US Census Bureau's geographic boundary files. Allows R users to return Census and ACS data as tidyverse-ready data frames, and optionally returns a list-column with feature geometry for mapping and spatial analysis.
Maintained by Kyle Walker. Last updated 2 months ago.
5.0 match 647 stars 14.27 score 7.5k scripts 10 dependentsbioc
dada2:Accurate, high-resolution sample inference from amplicon sequencing data
The dada2 package infers exact amplicon sequence variants (ASVs) from high-throughput amplicon sequencing data, replacing the coarser and less accurate OTU clustering approach. The dada2 pipeline takes as input demultiplexed fastq files, and outputs the sequence variants and their sample-wise abundances after removing substitution and chimera errors. Taxonomic classification is available via a native implementation of the RDP naive Bayesian classifier, and species-level assignment to 16S rRNA gene fragments by exact matching.
Maintained by Benjamin Callahan. Last updated 5 months ago.
immunooncologymicrobiomesequencingclassificationmetagenomicsampliconbioconductorbioinformaticsmetabarcodingtaxonomycpp
5.4 match 485 stars 13.17 score 3.0k scripts 4 dependentsbioc
IsoformSwitchAnalyzeR:Identify, Annotate and Visualize Isoform Switches with Functional Consequences from both short- and long-read RNA-seq data
Analysis of alternative splicing and isoform switches with predicted functional consequences (e.g. gain/loss of protein domains etc.) from quantification of all types of RNASeq by tools such as Kallisto, Salmon, StringTie, Cufflinks/Cuffdiff etc.
Maintained by Kristoffer Vitting-Seerup. Last updated 5 months ago.
geneexpressiontranscriptionalternativesplicingdifferentialexpressiondifferentialsplicingvisualizationstatisticalmethodtranscriptomevariantbiomedicalinformaticsfunctionalgenomicssystemsbiologytranscriptomicsrnaseqannotationfunctionalpredictiongenepredictiondataimportmultiplecomparisonbatcheffectimmunooncology
7.5 match 108 stars 9.26 score 125 scriptsbioc
miRspongeR:Identification and analysis of miRNA sponge regulation
This package provides several functions to explore miRNA sponge (also called ceRNA or miRNA decoy) regulation from putative miRNA-target interactions or/and transcriptomics data (including bulk, single-cell and spatial gene expression data). It provides eight popular methods for identifying miRNA sponge interactions, and an integrative method to integrate miRNA sponge interactions from different methods, as well as the functions to validate miRNA sponge interactions, and infer miRNA sponge modules, conduct enrichment analysis of miRNA sponge modules, and conduct survival analysis of miRNA sponge modules. By using a sample control variable strategy, it provides a function to infer sample-specific miRNA sponge interactions. In terms of sample-specific miRNA sponge interactions, it implements three similarity methods to construct sample-sample correlation network.
Maintained by Junpeng Zhang. Last updated 5 months ago.
geneexpressionbiomedicalinformaticsnetworkenrichmentsurvivalmicroarraysoftwaresinglecellspatialrnaseqcernamirnasponge
11.7 match 5 stars 5.88 score 8 scriptssantikka
cfid:Identification of Counterfactual Queries in Causal Models
Facilitates the identification of counterfactual queries in structural causal models via the ID* and IDC* algorithms by Shpitser, I. and Pearl, J. (2007, 2008) <arXiv:1206.5294>, <https://jmlr.org/papers/v9/shpitser08a.html>. Provides a simple interface for defining causal diagrams and counterfactual conjunctions. Construction of parallel worlds graphs and counterfactual graphs is carried out automatically based on the counterfactual query and the causal diagram. See Tikka, S. (2023) <doi:10.32614/RJ-2023-053> for a tutorial of the package.
Maintained by Santtu Tikka. Last updated 8 months ago.
causal-inferencecausal-modelscausality-algorithmscounterfactualcounterfactualsdirected-acyclic-graphidentifiability
16.9 match 7 stars 4.02 score 2 scripts 1 dependentsropensci
redland:RDF Library Bindings in R
Provides methods to parse, query and serialize information stored in the Resource Description Framework (RDF). RDF is described at <https://www.w3.org/TR/rdf-primer/>. This package supports RDF by implementing an R interface to the Redland RDF C library, described at <https://librdf.org/docs/api/index.html>. In brief, RDF provides a structured graph consisting of Statements composed of Subject, Predicate, and Object Nodes.
Maintained by Matthew B. Jones. Last updated 1 years ago.
8.5 match 17 stars 7.85 score 98 scripts 13 dependentsopengeos
whitebox:'WhiteboxTools' R Frontend
An R frontend for the 'WhiteboxTools' library, which is an advanced geospatial data analysis platform developed by Prof. John Lindsay at the University of Guelph's Geomorphometry and Hydrogeomatics Research Group. 'WhiteboxTools' can be used to perform common geographical information systems (GIS) analysis operations, such as cost-distance analysis, distance buffering, and raster reclassification. Remote sensing and image processing tasks include image enhancement (e.g. panchromatic sharpening, contrast adjustments), image mosaicing, numerous filtering operations, simple classification (k-means), and common image transformations. 'WhiteboxTools' also contains advanced tooling for spatial hydrological analysis (e.g. flow-accumulation, watershed delineation, stream network analysis, sink removal), terrain analysis (e.g. common terrain indices such as slope, curvatures, wetness index, hillshading; hypsometric analysis; multi-scale topographic position analysis), and LiDAR data processing. Suggested citation: Lindsay (2016) <doi:10.1016/j.cageo.2016.07.003>.
Maintained by Andrew Brown. Last updated 5 months ago.
geomorphometrygeoprocessinggeospatialgishydrologyremote-sensingrstudio
6.9 match 173 stars 9.65 score 203 scripts 2 dependentskharchenkolab
pagoda2:Single Cell Analysis and Differential Expression
Analyzing and interactively exploring large-scale single-cell RNA-seq datasets. 'pagoda2' primarily performs normalization and differential gene expression analysis, with an interactive application for exploring single-cell RNA-seq datasets. It performs basic tasks such as cell size normalization, gene variance normalization, and can be used to identify subpopulations and run differential expression within individual samples. 'pagoda2' was written to rapidly process modern large-scale scRNAseq datasets of approximately 1e6 cells. The companion web application allows users to explore which gene expression patterns form the different subpopulations within your data. The package also serves as the primary method for preprocessing data for conos, <https://github.com/kharchenkolab/conos>. This package interacts with data available through the 'p2data' package, which is available in a 'drat' repository. To access this data package, see the instructions at <https://github.com/kharchenkolab/pagoda2>. The size of the 'p2data' package is approximately 6 MB.
Maintained by Evan Biederstedt. Last updated 1 years ago.
scrna-seqsingle-cellsingle-cell-rna-seqtranscriptomicsopenblascppopenmp
8.3 match 222 stars 8.00 score 282 scriptsrobjhyndman
forecast:Forecasting Functions for Time Series and Linear Models
Methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling.
Maintained by Rob Hyndman. Last updated 7 months ago.
forecastforecastingopenblascpp
3.5 match 1.1k stars 18.63 score 16k scripts 239 dependentsbioc
sesame:SEnsible Step-wise Analysis of DNA MEthylation BeadChips
Tools For analyzing Illumina Infinium DNA methylation arrays. SeSAMe provides utilities to support analyses of multiple generations of Infinium DNA methylation BeadChips, including preprocessing, quality control, visualization and inference. SeSAMe features accurate detection calling, intelligent inference of ethnicity, sex and advanced quality control routines.
Maintained by Wanding Zhou. Last updated 2 months ago.
dnamethylationmethylationarraypreprocessingqualitycontrolbioinformaticsdna-methylationmicroarray
7.1 match 69 stars 9.08 score 258 scripts 1 dependentsmilanwiedemann
suddengains:Identify Sudden Gains in Longitudinal Data
Identify sudden gains based on the three criteria outlined by Tang and DeRubeis (1999) <doi:10.1037/0022-006X.67.6.894> to a selection of repeated measures. Sudden losses, defined as the opposite of sudden gains can also be identified. Two different datasets can be created, one including all sudden gains/losses and one including one selected sudden gain/loss for each case. It can extract scores around sudden gains/losses. It can plot the average change around sudden gains/losses and trajectories of individual cases.
Maintained by Milan Wiedemann. Last updated 2 years ago.
change-detectionsudden-gainssudden-losses
12.4 match 7 stars 5.15 score 10 scriptstidymodels
rsample:General Resampling Infrastructure
Classes and functions to create and summarize different types of resampling objects (e.g. bootstrap, cross-validation).
Maintained by Hannah Frick. Last updated 6 days ago.
3.8 match 341 stars 16.72 score 5.2k scripts 79 dependentsyulab-smu
ggfun:Miscellaneous Functions for 'ggplot2'
Useful functions and utilities for 'ggplot' object (e.g., geometric layers, themes, and utilities to edit the object).
Maintained by Guangchuang Yu. Last updated 1 days ago.
6.0 match 18 stars 10.53 score 58 scripts 152 dependentsblasbenito
collinear:Automated Multicollinearity Management
Effortless multicollinearity management in data frames with both numeric and categorical variables for statistical and machine learning applications. The package simplifies multicollinearity analysis by combining four robust methods: 1) target encoding for categorical variables (Micci-Barreca, D. 2001 <doi:10.1145/507533.507538>); 2) automated feature prioritization to prevent key variable loss during filtering; 3) pairwise correlation for all variable combinations (numeric-numeric, numeric-categorical, categorical-categorical); and 4) fast computation of variance inflation factors.
Maintained by Blas M. Benito. Last updated 2 months ago.
machine-learningmulticollinearitystatistics
11.4 match 11 stars 5.51 score 15 scripts 1 dependentsr-lib
lintr:A 'Linter' for R Code
Checks adherence to a given style, syntax errors and possible semantic issues. Supports on the fly checking of R code edited with 'RStudio IDE', 'Emacs', 'Vim', 'Sublime Text', 'Atom' and 'Visual Studio Code'.
Maintained by Michael Chirico. Last updated 7 hours ago.
3.7 match 1.2k stars 16.99 score 916 scripts 33 dependentsbioc
flowCore:flowCore: Basic structures for flow cytometry data
Provides S4 data structures and basic functions to deal with flow cytometry data.
Maintained by Mike Jiang. Last updated 5 months ago.
immunooncologyinfrastructureflowcytometrycellbasedassayscpp
6.0 match 10.34 score 1.7k scripts 59 dependentsinterstellar-consultation-services
covid19dbcand:Selected 'Drugbank' Drugs for COVID-19 Treatment Related Data in R Format
Provides different datasets parsed from 'Drugbank' <https://www.drugbank.ca/covid-19> database using 'dbparser' package. It is a smaller version from 'dbdataset' package. It contains only information about COVID-19 possible treatment.
Maintained by Mohammed Ali. Last updated 11 months ago.
datasetdbparserdrugbankdrugbank-database
13.7 match 3 stars 4.48 score 6 scriptsbioc
rTRM:Identification of Transcriptional Regulatory Modules from Protein-Protein Interaction Networks
rTRM identifies transcriptional regulatory modules (TRMs) from protein-protein interaction networks.
Maintained by Diego Diez. Last updated 5 months ago.
transcriptionnetworkgeneregulationgraphandnetworkbioconductorbioinformatics
12.6 match 3 stars 4.86 score 3 scripts 1 dependentsbioc
annotate:Annotation for microarrays
Using R enviroments for annotation.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
5.3 match 11.41 score 812 scripts 243 dependentsbnosac
udpipe:Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit
This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at <https://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at <doi:10.18653/v1/K17-3009>. The toolkit also contains functionalities for commonly used data manipulations on texts which are enriched with the output of the parser. Namely functionalities and algorithms for collocations, token co-occurrence, document term matrix handling, term frequency inverse document frequency calculations, information retrieval metrics (Okapi BM25), handling of multi-word expressions, keyword detection (Rapid Automatic Keyword Extraction, noun phrase extraction, syntactical patterns) sentiment scoring and semantic similarity analysis.
Maintained by Jan Wijffels. Last updated 2 years ago.
conlldependency-parserlemmatizationnatural-language-processingnlppos-taggingr-pkgrcpptext-miningtokenizerudpipecpp
5.1 match 215 stars 11.83 score 1.2k scripts 9 dependentsdarwin-eu
PatientProfiles:Identify Characteristics of Patients in the OMOP Common Data Model
Identify the characteristics of patients in data mapped to the Observational Medical Outcomes Partnership (OMOP) common data model.
Maintained by Marti Catala. Last updated 11 days ago.
6.0 match 1 stars 9.97 score 225 scripts 9 dependentsbioc
CoreGx:Classes and Functions to Serve as the Basis for Other 'Gx' Packages
A collection of functions and classes which serve as the foundation for our lab's suite of R packages, such as 'PharmacoGx' and 'RadioGx'. This package was created to abstract shared functionality from other lab package releases to increase ease of maintainability and reduce code repetition in current and future 'Gx' suite programs. Major features include a 'CoreSet' class, from which 'RadioSet' and 'PharmacoSet' are derived, along with get and set methods for each respective slot. Additional functions related to fitting and plotting dose response curves, quantifying statistical correlation and calculating area under the curve (AUC) or survival fraction (SF) are included. For more details please see the included documentation, as well as: Smirnov, P., Safikhani, Z., El-Hachem, N., Wang, D., She, A., Olsen, C., Freeman, M., Selby, H., Gendoo, D., Grossman, P., Beck, A., Aerts, H., Lupien, M., Goldenberg, A. (2015) <doi:10.1093/bioinformatics/btv723>. Manem, V., Labie, M., Smirnov, P., Kofia, V., Freeman, M., Koritzinksy, M., Abazeed, M., Haibe-Kains, B., Bratman, S. (2018) <doi:10.1101/449793>.
Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.
softwarepharmacogenomicsclassificationsurvival
8.9 match 6.53 score 63 scripts 6 dependentsbioc
SPIAT:Spatial Image Analysis of Tissues
SPIAT (**Sp**atial **I**mage **A**nalysis of **T**issues) is an R package with a suite of data processing, quality control, visualization and data analysis tools. SPIAT is compatible with data generated from single-cell spatial proteomics platforms (e.g. OPAL, CODEX, MIBI, cellprofiler). SPIAT reads spatial data in the form of X and Y coordinates of cells, marker intensities and cell phenotypes. SPIAT includes six analysis modules that allow visualization, calculation of cell colocalization, categorization of the immune microenvironment relative to tumor areas, analysis of cellular neighborhoods, and the quantification of spatial heterogeneity, providing a comprehensive toolkit for spatial data analysis.
Maintained by Yuzhou Feng. Last updated 2 days ago.
biomedicalinformaticscellbiologyspatialclusteringdataimportimmunooncologyqualitycontrolsinglecellsoftwarevisualization
6.8 match 22 stars 8.59 score 69 scriptsshixiangwang
IDConverter:Convert Identifiers in Biological Databases
Identifiers in biological databases connect different levels of metadata, phenotype data or genotype data. This tool is designed to easily convert identifiers within or between different biological databases (Wang, Shixiang, et al. (2021) <DOI:10.1371/journal.pgen.1009557>).
Maintained by Shixiang Wang. Last updated 2 years ago.
19.4 match 9 stars 3.00 score 22 scriptsbioc
doppelgangR:Identify likely duplicate samples from genomic or meta-data
The main function is doppelgangR(), which takes as minimal input a list of ExpressionSet object, and searches all list pairs for duplicated samples. The search is based on the genomic data (exprs(eset)), phenotype/clinical data (pData(eset)), and "smoking guns" - supposedly unique identifiers found in pData(eset).
Maintained by Levi Waldron. Last updated 5 months ago.
immunooncologyrnaseqmicroarraygeneexpressionqualitycontrolbioconductor-package
8.7 match 5 stars 6.67 score 31 scriptsbioc
derfinder:Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution via the DER Finder approach
This package provides functions for annotation-agnostic differential expression analysis of RNA-seq data. Two implementations of the DER Finder approach are included in this package: (1) single base-level F-statistics and (2) DER identification at the expressed regions-level. The DER Finder approach can also be used to identify differentially bounded ChIP-seq peaks.
Maintained by Leonardo Collado-Torres. Last updated 3 months ago.
differentialexpressionsequencingrnaseqchipseqdifferentialpeakcallingsoftwareimmunooncologycoverageannotation-agnosticbioconductorderfinder
5.6 match 42 stars 10.03 score 78 scripts 6 dependentsreconhub
epicontacts:Handling, Visualisation and Analysis of Epidemiological Contacts
A collection of tools for representing epidemiological contact data, composed of case line lists and contacts between cases. Also contains procedures for data handling, interactive graphics, and statistics.
Maintained by Finlay Campbell. Last updated 2 months ago.
6.4 match 15 stars 8.86 score 112 scripts 2 dependentsms609
TreeTools:Create, Modify and Analyse Phylogenetic Trees
Efficient implementations of functions for the creation, modification and analysis of phylogenetic trees. Applications include: generation of trees with specified shapes; tree rearrangement; analysis of tree shape; rooting of trees and extraction of subtrees; calculation and depiction of split support; plotting the position of rogue taxa (Klopfstein & Spasojevic 2019) <doi:10.1371/journal.pone.0212942>; calculation of ancestor-descendant relationships, of 'stemwardness' (Asher & Smith, 2022) <doi:10.1093/sysbio/syab072>, and of tree balance (Mir et al. 2013, Lemant et al. 2022) <doi:10.1016/j.mbs.2012.10.005>, <doi:10.1093/sysbio/syac027>; artificial extinction (Asher & Smith, 2022) <doi:10.1093/sysbio/syab072>; import and export of trees from Newick, Nexus (Maddison et al. 1997) <doi:10.1093/sysbio/46.4.590>, and TNT <https://www.lillo.org.ar/phylogeny/tnt/> formats; and analysis of splits and cladistic information.
Maintained by Martin R. Smith. Last updated 1 months ago.
evolutionary-biologyphylogenetic-treesphylogeneticscpp
5.7 match 21 stars 9.92 score 124 scripts 10 dependentsdaattali
ddpcr:Analysis and Visualization of Droplet Digital PCR in R and on the Web
An interface to explore, analyze, and visualize droplet digital PCR (ddPCR) data in R. This is the first non-proprietary software for analyzing two-channel ddPCR data. An interactive tool was also created and is available online to facilitate this analysis for anyone who is not comfortable with using R.
Maintained by Dean Attali. Last updated 12 months ago.
5.9 match 61 stars 9.54 score 131 scripts 2 dependentsbioc
ISAnalytics:Analyze gene therapy vector insertion sites data identified from genomics next generation sequencing reads for clonal tracking studies
In gene therapy, stem cells are modified using viral vectors to deliver the therapeutic transgene and replace functional properties since the genetic modification is stable and inherited in all cell progeny. The retrieval and mapping of the sequences flanking the virus-host DNA junctions allows the identification of insertion sites (IS), essential for monitoring the evolution of genetically modified cells in vivo. A comprehensive toolkit for the analysis of IS is required to foster clonal trackign studies and supporting the assessment of safety and long term efficacy in vivo. This package is aimed at (1) supporting automation of IS workflow, (2) performing base and advance analysis for IS tracking (clonal abundance, clonal expansions and statistics for insertional mutagenesis, etc.), (3) providing basic biology insights of transduced stem cells in vivo.
Maintained by Francesco Gazzo. Last updated 3 months ago.
biomedicalinformaticssequencingsinglecell
9.6 match 3 stars 5.83 score 15 scriptsspatstat
spatstat.linnet:Linear Networks Functionality of the 'spatstat' Family
Defines types of spatial data on a linear network and provides functionality for geometrical operations, data analysis and modelling of data on a linear network, in the 'spatstat' family of packages. Contains definitions and support for linear networks, including creation of networks, geometrical measurements, topological connectivity, geometrical operations such as inserting and deleting vertices, intersecting a network with another object, and interactive editing of networks. Data types defined on a network include point patterns, pixel images, functions, and tessellations. Exploratory methods include kernel estimation of intensity on a network, K-functions and pair correlation functions on a network, simulation envelopes, nearest neighbour distance and empty space distance, relative risk estimation with cross-validated bandwidth selection. Formal hypothesis tests of random pattern (chi-squared, Kolmogorov-Smirnov, Monte Carlo, Diggle-Cressie-Loosmore-Ford, Dao-Genton, two-stage Monte Carlo) and tests for covariate effects (Cox-Berman-Waller-Lawson, Kolmogorov-Smirnov, ANOVA) are also supported. Parametric models can be fitted to point pattern data using the function lppm() similar to glm(). Only Poisson models are implemented so far. Models may involve dependence on covariates and dependence on marks. Models are fitted by maximum likelihood. Fitted point process models can be simulated, automatically. Formal hypothesis tests of a fitted model are supported (likelihood ratio test, analysis of deviance, Monte Carlo tests) along with basic tools for model selection (stepwise(), AIC()) and variable selection (sdr). Tools for validating the fitted model include simulation envelopes, residuals, residual plots and Q-Q plots, leverage and influence diagnostics, partial residuals, and added variable plots. Random point patterns on a network can be generated using a variety of models.
Maintained by Adrian Baddeley. Last updated 2 months ago.
density-estimationheat-equationkernel-density-estimationnetwork-analysispoint-processesspatial-data-analysisstatistical-analysisstatistical-inferencestatistical-models
5.8 match 6 stars 9.64 score 35 scripts 43 dependentsscheike
timereg:Flexible Regression Models for Survival Data
Programs for Martinussen and Scheike (2006), `Dynamic Regression Models for Survival Data', Springer Verlag. Plus more recent developments. Additive survival model, semiparametric proportional odds model, fast cumulative residuals, excess risk models and more. Flexible competing risks regression including GOF-tests. Two-stage frailty modelling. PLS for the additive risk model. Lasso in the 'ahaz' package.
Maintained by Thomas Scheike. Last updated 6 months ago.
5.3 match 31 stars 10.42 score 289 scripts 44 dependentsropensci
targets:Dynamic Function-Oriented 'Make'-Like Declarative Pipelines
Pipeline tools coordinate the pieces of computationally demanding analysis projects. The 'targets' package is a 'Make'-like pipeline tool for statistics and data science in R. The package skips costly runtime for tasks that are already up to date, orchestrates the necessary computation with implicit parallel computing, and abstracts files as R objects. If all the current output matches the current upstream code and data, then the whole pipeline is up to date, and the results are more trustworthy than otherwise. The methodology in this package borrows from GNU 'Make' (2015, ISBN:978-9881443519) and 'drake' (2018, <doi:10.21105/joss.00550>).
Maintained by William Michael Landau. Last updated 9 hours ago.
data-sciencehigh-performance-computingmakepeer-reviewedpipeliner-targetopiareproducibilityreproducible-researchtargetsworkflow
3.6 match 975 stars 15.18 score 4.6k scripts 22 dependentsropensci
citecorp:Client for the Open Citations Corpus
Client for the Open Citations Corpus (<http://opencitations.net/>). Includes a set of functions for getting one identifier type from another, as well as getting references and citations for a given identifier.
Maintained by David Selby. Last updated 1 months ago.
doimetadatacitationopencitationsbibtexcitationspmcidpmidsparql
11.4 match 11 stars 4.81 score 13 scriptsropensci
rdhs:API Client and Dataset Management for the Demographic and Health Survey (DHS) Data
Provides a client for (1) querying the DHS API for survey indicators and metadata (<https://api.dhsprogram.com/#/index.html>), (2) identifying surveys and datasets for analysis, (3) downloading survey datasets from the DHS website, (4) loading datasets and associate metadata into R, and (5) extracting variables and combining datasets for pooled analysis.
Maintained by OJ Watson. Last updated 19 days ago.
datasetdhsdhs-apiextractpeer-reviewedsurvey-data
5.4 match 35 stars 10.07 score 286 scripts 3 dependentsmrc-ide
epireview:Tools to update and summarise the latest pathogen data from the Pathogen Epidemiology Review Group (PERG)
Contains the latest open access pathogen data from the Pathogen Epidemiology Review Group (PERG). Tools are available to update pathogen databases with new peer-reviewed data as it becomes available, and to summarise the latest data using tables and figures.
Maintained by Sangeeta Bhatia. Last updated 4 days ago.
8.1 match 30 stars 6.76 score 6 scriptsswarchal
platetools:Tools and Plots for Multi-Well Plates
Collection of functions for working with multi-well microtitre plates, mainly 96, 384 and 1536 well plates.
Maintained by Scott Warchal. Last updated 1 years ago.
8.8 match 54 stars 6.20 score 146 scriptsbioc
ORFhunteR:Predict open reading frames in nucleotide sequences
The ORFhunteR package is a R and C++ library for an automatic determination and annotation of open reading frames (ORF) in a large set of RNA molecules. It efficiently implements the machine learning model based on vectorization of nucleotide sequences and the random forest classification algorithm. The ORFhunteR package consists of a set of functions written in the R language in conjunction with C++. The efficiency of the package was confirmed by the examples of the analysis of RNA molecules from the NCBI RefSeq and Ensembl databases. The package can be used in basic and applied biomedical research related to the study of the transcriptome of normal as well as altered (for example, cancer) human cells.
Maintained by Vasily V. Grinev. Last updated 5 months ago.
technologystatisticalmethodsequencingrnaseqclassificationfeatureextractioncpp
12.0 match 1 stars 4.48 scorebioc
wavClusteR:Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data
The package provides an integrated pipeline for the analysis of PAR-CLIP data. PAR-CLIP-induced transitions are first discriminated from sequencing errors, SNPs and additional non-experimental sources by a non- parametric mixture model. The protein binding sites (clusters) are then resolved at high resolution and cluster statistics are estimated using a rigorous Bayesian framework. Post-processing of the results, data export for UCSC genome browser visualization and motif search analysis are provided. In addition, the package allows to integrate RNA-Seq data to estimate the False Discovery Rate of cluster detection. Key functions support parallel multicore computing. Note: while wavClusteR was designed for PAR-CLIP data analysis, it can be applied to the analysis of other NGS data obtained from experimental procedures that induce nucleotide substitutions (e.g. BisSeq).
Maintained by Federico Comoglio. Last updated 5 months ago.
immunooncologysequencingtechnologyripseqrnaseqbayesian
11.7 match 4.60 score 3 scriptsbioc
gemma.R:A wrapper for Gemma's Restful API to access curated gene expression data and differential expression analyses
Low- and high-level wrappers for Gemma's RESTful API. They enable access to curated expression and differential expression data from over 10,000 published studies. Gemma is a web site, database and a set of tools for the meta-analysis, re-use and sharing of genomics data, currently primarily targeted at the analysis of gene expression profiles.
Maintained by Ogan Mancarci. Last updated 4 months ago.
softwaredataimportmicroarraysinglecellthirdpartyclientdifferentialexpressiongeneexpressionbayesianannotationexperimentaldesignnormalizationbatcheffectpreprocessingbioinformaticsgemmagenomicstranscriptomics
7.0 match 10 stars 7.57 score 26 scriptsbergsmat
nonmemica:Create and Evaluate NONMEM Models in a Project Context
Systematically creates and modifies NONMEM(R) control streams. Harvests NONMEM output, builds run logs, creates derivative data, generates diagnostics. NONMEM (ICON Development Solutions <https://www.iconplc.com/>) is software for nonlinear mixed effects modeling. See 'package?nonmemica'.
Maintained by Tim Bergsma. Last updated 2 months ago.
11.6 match 4 stars 4.58 score 45 scriptsalarm-redist
redist:Simulation Methods for Legislative Redistricting
Enables researchers to sample redistricting plans from a pre-specified target distribution using Sequential Monte Carlo and Markov Chain Monte Carlo algorithms. The package allows for the implementation of various constraints in the redistricting process such as geographic compactness and population parity requirements. Tools for analysis such as computation of various summary statistics and plotting functionality are also included. The package implements the SMC algorithm of McCartan and Imai (2023) <doi:10.1214/23-AOAS1763>, the enumeration algorithm of Fifield, Imai, Kawahara, and Kenny (2020) <doi:10.1080/2330443X.2020.1791773>, the Flip MCMC algorithm of Fifield, Higgins, Imai and Tarr (2020) <doi:10.1080/10618600.2020.1739532>, the Merge-split/Recombination algorithms of Carter et al. (2019) <arXiv:1911.01503> and DeFord et al. (2021) <doi:10.1162/99608f92.eb30390f>, and the Short-burst optimization algorithm of Cannon et al. (2020) <arXiv:2011.02288>.
Maintained by Christopher T. Kenny. Last updated 2 months ago.
geospatialgerrymanderingredistrictingsamplingopenblascppopenmp
5.8 match 68 stars 9.17 score 259 scriptsbupaverse
bupaR:Business Process Analysis in R
Comprehensive Business Process Analysis toolkit. Creates S3-class for event log objects, and related handler functions. Imports related packages for filtering event data, computation of descriptive statistics, handling of 'Petri Net' objects and visualization of process maps. See also packages 'edeaR','processmapR', 'eventdataR' and 'processmonitR'.
Maintained by Gert Janssenswillen. Last updated 2 years ago.
5.8 match 55 stars 9.07 score 389 scripts 11 dependentscran
mgcv:Mixed GAM Computation Vehicle with Automatic Smoothness Estimation
Generalized additive (mixed) models, some of their extensions and other generalized ridge regression with multiple smoothing parameter estimation by (Restricted) Marginal Likelihood, Generalized Cross Validation and similar, or using iterated nested Laplace approximation for fully Bayesian inference. See Wood (2017) <doi:10.1201/9781315370279> for an overview. Includes a gam() function, a wide variety of smoothers, 'JAGS' support and distributions beyond the exponential family.
Maintained by Simon Wood. Last updated 1 years ago.
4.1 match 32 stars 12.71 score 17k scripts 7.8k dependentsbioc
scDblFinder:scDblFinder
The scDblFinder package gathers various methods for the detection and handling of doublets/multiplets in single-cell sequencing data (i.e. multiple cells captured within the same droplet or reaction volume). It includes methods formerly found in the scran package, the new fast and comprehensive scDblFinder method, and a reimplementation of the Amulet detection method for single-cell ATAC-seq.
Maintained by Pierre-Luc Germain. Last updated 2 months ago.
preprocessingsinglecellrnaseqatacseqdoubletssingle-cell
4.3 match 184 stars 12.34 score 888 scripts 1 dependentsbayesiandemography
poputils:Demographic Analysis and Data Manipulation
Perform tasks commonly encountered when preparing and analysing demographic data. Some functions are intended for end users, and others for developers. Includes functions for working with life tables.
Maintained by John Bryant. Last updated 6 months ago.
9.4 match 5.57 score 49 scripts 1 dependentsbioc
AUCell:AUCell: Analysis of 'gene set' activity in single-cell RNA-seq data (e.g. identify cells with specific gene signatures)
AUCell allows to identify cells with active gene sets (e.g. signatures, gene modules...) in single-cell RNA-seq data. AUCell uses the "Area Under the Curve" (AUC) to calculate whether a critical subset of the input gene set is enriched within the expressed genes for each cell. The distribution of AUC scores across all the cells allows exploring the relative expression of the signature. Since the scoring method is ranking-based, AUCell is independent of the gene expression units and the normalization procedure. In addition, since the cells are evaluated individually, it can easily be applied to bigger datasets, subsetting the expression matrix if needed.
Maintained by Gert Hulselmans. Last updated 5 months ago.
singlecellgenesetenrichmenttranscriptomicstranscriptiongeneexpressionworkflowstepnormalization
6.1 match 8.59 score 860 scripts 4 dependentsbioc
phyloseq:Handling and analysis of high-throughput microbiome census data
phyloseq provides a set of classes and tools to facilitate the import, storage, analysis, and graphical display of microbiome census data.
Maintained by Paul J. McMurdie. Last updated 5 months ago.
immunooncologysequencingmicrobiomemetagenomicsclusteringclassificationmultiplecomparisongeneticvariability
3.8 match 597 stars 13.90 score 8.4k scripts 37 dependentsbioc
maftools:Summarize, Analyze and Visualize MAF Files
Analyze and visualize Mutation Annotation Format (MAF) files from large scale sequencing studies. This package provides various functions to perform most commonly used analyses in cancer genomics and to create feature rich customizable visualzations with minimal effort.
Maintained by Anand Mayakonda. Last updated 5 months ago.
datarepresentationdnaseqvisualizationdrivermutationvariantannotationfeatureextractionclassificationsomaticmutationsequencingfunctionalgenomicssurvivalbioinformaticscancer-genome-atlascancer-genomicsgenomicsmaf-filestcgacurlbzip2xz-utilszlib
3.6 match 459 stars 14.63 score 948 scripts 18 dependentsbioc
proActiv:Estimate Promoter Activity from RNA-Seq data
Most human genes have multiple promoters that control the expression of different isoforms. The use of these alternative promoters enables the regulation of isoform expression pre-transcriptionally. Alternative promoters have been found to be important in a wide number of cell types and diseases. proActiv is an R package that enables the analysis of promoters from RNA-seq data. proActiv uses aligned reads as input, and generates counts and normalized promoter activity estimates for each annotated promoter. In particular, proActiv accepts junction files from TopHat2 or STAR or BAM files as inputs. These estimates can then be used to identify which promoter is active, which promoter is inactive, and which promoters change their activity across conditions. proActiv also allows visualization of promoter activity across conditions.
Maintained by Joseph Lee. Last updated 5 months ago.
rnaseqgeneexpressiontranscriptionalternativesplicinggeneregulationdifferentialsplicingfunctionalgenomicsepigeneticstranscriptomicspreprocessingalternative-promotersgenomicspromoter-activitypromoter-annotationrna-seq-data
7.8 match 51 stars 6.66 score 15 scriptswincowgerdev
OpenSpecy:Analyze, Process, Identify, and Share Raman and (FT)IR Spectra
Raman and (FT)IR spectral analysis tool for plastic particles and other environmental samples (Cowger et al. 2021, <doi:10.1021/acs.analchem.1c00123>). With read_any(), Open Specy provides a single function for reading individual, batch, or map spectral data files like .asp, .csv, .jdx, .spc, .spa, .0, and .zip. process_spec() simplifies processing spectra, including smoothing, baseline correction, range restriction and flattening, intensity conversions, wavenumber alignment, and min-max normalization. Spectra can be identified in batch using an onboard reference library (Cowger et al. 2020, <doi:10.1177/0003702820929064>) using match_spec(). A Shiny app is available via run_app() or online at <https://openanalysis.org/openspecy/>.
Maintained by Win Cowger. Last updated 18 days ago.
6.8 match 29 stars 7.58 score 22 scriptsbioc
psichomics:Graphical Interface for Alternative Splicing Quantification, Analysis and Visualisation
Interactive R package with an intuitive Shiny-based graphical interface for alternative splicing quantification and integrative analyses of alternative splicing and gene expression based on The Cancer Genome Atlas (TCGA), the Genotype-Tissue Expression project (GTEx), Sequence Read Archive (SRA) and user-provided data. The tool interactively performs survival, dimensionality reduction and median- and variance-based differential splicing and gene expression analyses that benefit from the incorporation of clinical and molecular sample-associated features (such as tumour stage or survival). Interactive visual access to genomic mapping and functional annotation of selected alternative splicing events is also included.
Maintained by Nuno Saraiva-Agostinho. Last updated 5 months ago.
sequencingrnaseqalternativesplicingdifferentialsplicingtranscriptionguiprincipalcomponentsurvivalbiomedicalinformaticstranscriptomicsimmunooncologyvisualizationmultiplecomparisongeneexpressiondifferentialexpressionalternative-splicingbioconductordata-analysesdifferential-gene-expressiondifferential-splicing-analysisgene-expressiongtexrecount2rna-seq-datasplicing-quantificationsratcgavast-toolscpp
7.4 match 36 stars 6.95 score 31 scriptsr-spatial
gstat:Spatial and Spatio-Temporal Geostatistical Modelling, Prediction and Simulation
Variogram modelling; simple, ordinary and universal point or block (co)kriging; spatio-temporal kriging; sequential Gaussian or indicator (co)simulation; variogram and variogram map plotting utility functions; supports sf and stars.
Maintained by Edzer Pebesma. Last updated 11 days ago.
3.5 match 197 stars 14.78 score 4.8k scripts 57 dependentsedzer
intervals:Tools for Working with Points and Intervals
Tools for working with and comparing sets of points and intervals.
Maintained by Edzer Pebesma. Last updated 7 months ago.
5.4 match 11 stars 9.40 score 122 scripts 90 dependentsbioc
bioassayR:Cross-target analysis of small molecule bioactivity
bioassayR is a computational tool that enables simultaneous analysis of thousands of bioassay experiments performed over a diverse set of compounds and biological targets. Unique features include support for large-scale cross-target analyses of both public and custom bioassays, generation of high throughput screening fingerprints (HTSFPs), and an optional preloaded database that provides access to a substantial portion of publicly available bioactivity data.
Maintained by Thomas Girke. Last updated 5 months ago.
immunooncologymicrotitreplateassaycellbasedassaysvisualizationinfrastructuredataimportbioinformaticsproteomicsmetabolomics
7.6 match 5 stars 6.70 score 46 scriptscran
gprofiler2:Interface to the 'g:Profiler' Toolset
A toolset for functional enrichment analysis and visualization, gene/protein/SNP identifier conversion and mapping orthologous genes across species via 'g:Profiler' (<https://biit.cs.ut.ee/gprofiler/>). The main tools are: (1) 'g:GOSt' - functional enrichment analysis and visualization of gene lists; (2) 'g:Convert' - gene/protein/transcript identifier conversion across various namespaces; (3) 'g:Orth' - orthology search across species; (4) 'g:SNPense' - mapping SNP rs identifiers to chromosome positions, genes and variant effects. This package is an R interface corresponding to the 2019 update of 'g:Profiler' and provides access to 'g:Profiler' for versions 'e94_eg41_p11' and higher. See the package 'gProfileR' for accessing older versions from the 'g:Profiler' toolset.
Maintained by Liis Kolberg. Last updated 1 years ago.
6.4 match 4 stars 7.97 score 1.5k scripts 16 dependentsssnn-airr
scoper:Spectral Clustering-Based Method for Identifying B Cell Clones
Provides a computational framework for identification of B cell clones from Adaptive Immune Receptor Repertoire sequencing (AIRR-Seq) data. Three main functions are included (identicalClones, hierarchicalClones, and spectralClones) that perform clustering among sequences of BCRs/IGs (B cell receptors/immunoglobulins) which share the same V gene, J gene and junction length. Nouri N and Kleinstein SH (2018) <doi: 10.1093/bioinformatics/bty235>. Nouri N and Kleinstein SH (2019) <doi: 10.1101/788620>. Gupta NT, et al. (2017) <doi: 10.4049/jimmunol.1601850>.
Maintained by Susanna Marquez. Last updated 2 months ago.
9.3 match 5.43 score 89 scriptsbioc
ggtree:an R package for visualization of tree and annotation data
'ggtree' extends the 'ggplot2' plotting system which implemented the grammar of graphics. 'ggtree' is designed for visualization and annotation of phylogenetic trees and other tree-like structures with their annotation data.
Maintained by Guangchuang Yu. Last updated 5 months ago.
alignmentannotationclusteringdataimportmultiplesequencealignmentphylogeneticsreproducibleresearchsoftwarevisualizationannotationsggplot2phylogenetic-trees
3.0 match 864 stars 16.86 score 5.1k scripts 109 dependentsbioc
GeneNetworkBuilder:GeneNetworkBuilder: a bioconductor package for building regulatory network using ChIP-chip/ChIP-seq data and Gene Expression Data
Appliation for discovering direct or indirect targets of transcription factors using ChIP-chip or ChIP-seq, and microarray or RNA-seq gene expression data. Inputting a list of genes of potential targets of one TF from ChIP-chip or ChIP-seq, and the gene expression results, GeneNetworkBuilder generates a regulatory network of the TF.
Maintained by Jianhong Ou. Last updated 10 days ago.
sequencingmicroarraygraphandnetworkcpp
13.4 match 3.77 score 17 scriptsbioc
r3Cseq:Analysis of Chromosome Conformation Capture and Next-generation Sequencing (3C-seq)
This package is used for the analysis of long-range chromatin interactions from 3C-seq assay.
Maintained by Supat Thongjuea. Last updated 5 months ago.
10.3 match 3 stars 4.85 score 17 scriptspaterijk
MCDA:Support for the Multicriteria Decision Aiding Process
Support for the analyst in a Multicriteria Decision Aiding (MCDA) process with algorithms, preference elicitation and data visualisation functions. Sébastien Bigaret, Richard Hodgett, Patrick Meyer, Tatyana Mironova, Alexandru Olteanu (2017) Supporting the multi-criteria decision aiding process : R and the MCDA package, Euro Journal On Decision Processes, Volume 5, Issue 1 - 4, pages 169 - 194 <doi:10.1007/s40070-017-0064-1>.
Maintained by Patrick Meyer. Last updated 2 years ago.
8.3 match 30 stars 6.04 score 182 scriptsbioc
limma:Linear Models for Microarray and Omics Data
Data analysis, linear models and differential expression for omics data.
Maintained by Gordon Smyth. Last updated 7 days ago.
exonarraygeneexpressiontranscriptionalternativesplicingdifferentialexpressiondifferentialsplicinggenesetenrichmentdataimportbayesianclusteringregressiontimecoursemicroarraymicrornaarraymrnamicroarrayonechannelproprietaryplatformstwochannelsequencingrnaseqbatcheffectmultiplecomparisonnormalizationpreprocessingqualitycontrolbiomedicalinformaticscellbiologycheminformaticsepigeneticsfunctionalgenomicsgeneticsimmunooncologymetabolomicsproteomicssystemsbiologytranscriptomics
3.6 match 13.81 score 16k scripts 585 dependentsropensci
rdatacite:Client for the 'DataCite' API
Client for the web service methods provided by 'DataCite' (<https://www.datacite.org/>), including functions to interface with their 'RESTful' search API. The API is backed by 'Elasticsearch', allowing expressive queries, including faceting.
Maintained by Bianca Kramer. Last updated 2 years ago.
datascholarlydatasethttpsapiweb-servicesapi-wrapperdataciteidentifiermetadataoai-pmhsolr
10.0 match 25 stars 4.99 score 26 scriptskkholst
lava:Latent Variable Models
A general implementation of Structural Equation Models with latent variables (MLE, 2SLS, and composite likelihood estimators) with both continuous, censored, and ordinal outcomes (Holst and Budtz-Joergensen (2013) <doi:10.1007/s00180-012-0344-y>). Mixture latent variable models and non-linear latent variable models (Holst and Budtz-Joergensen (2020) <doi:10.1093/biostatistics/kxy082>). The package also provides methods for graph exploration (d-separation, back-door criterion), simulation of general non-linear latent variable models, and estimation of influence functions for a broad range of statistical models.
Maintained by Klaus K. Holst. Last updated 2 months ago.
latent-variable-modelssimulationstatisticsstructural-equation-models
3.9 match 33 stars 12.85 score 610 scripts 476 dependentsbioc
MSnbase:Base Functions and Classes for Mass Spectrometry and Proteomics
MSnbase provides infrastructure for manipulation, processing and visualisation of mass spectrometry and proteomics data, ranging from raw to quantitative and annotated data.
Maintained by Laurent Gatto. Last updated 4 days ago.
immunooncologyinfrastructureproteomicsmassspectrometryqualitycontroldataimportbioconductorbioinformaticsmass-spectrometryproteomics-datavisualisationcpp
3.9 match 130 stars 12.81 score 772 scripts 36 dependentsbioc
genefu:Computation of Gene Expression-Based Signatures in Breast Cancer
This package contains functions implementing various tasks usually required by gene expression analysis, especially in breast cancer studies: gene mapping between different microarray platforms, identification of molecular subtypes, implementation of published gene signatures, gene selection, and survival analysis.
Maintained by Benjamin Haibe-Kains. Last updated 4 months ago.
differentialexpressiongeneexpressionvisualizationclusteringclassification
6.7 match 7.42 score 193 scripts 3 dependentsmountainmath
cmhc:Access, Retrieve, and Work with CMHC Data
Wrapper around the Canadian Mortgage and Housing Corporation (CMHC) web interface. It enables programmatic and reproducible access to a wide variety of housing data from CMHC.
Maintained by Jens von Bergmann. Last updated 1 months ago.
6.8 match 20 stars 7.09 score 68 scriptsropensci
EDIutils:An API Client for the Environmental Data Initiative Repository
A client for the Environmental Data Initiative repository REST API. The 'EDI' data repository <https://portal.edirepository.org/nis/home.jsp> is for publication and reuse of ecological data with emphasis on metadata accuracy and completeness. It is built upon the 'PASTA+' software stack <https://pastaplus-core.readthedocs.io/en/latest/index.html#> and was developed in collaboration with the US 'LTER' Network <https://lternet.edu/>. 'EDIutils' includes functions to search and access existing data, evaluate and upload new data, and assist other data management tasks common to repository users.
Maintained by Colin Smith. Last updated 1 years ago.
ecologyeml-metadataopen-accessopen-dataresearch-data-managementresearch-data-repository
7.4 match 10 stars 6.47 score 117 scriptsbioc
flowWorkspace:Infrastructure for representing and interacting with gated and ungated cytometry data sets.
This package is designed to facilitate comparison of automated gating methods against manual gating done in flowJo. This package allows you to import basic flowJo workspaces into BioConductor and replicate the gating from flowJo using the flowCore functionality. Gating hierarchies, groups of samples, compensation, and transformation are performed so that the output matches the flowJo analysis.
Maintained by Greg Finak. Last updated 11 days ago.
immunooncologyflowcytometrydataimportpreprocessingdatarepresentationzlibopenblascpp
6.0 match 7.89 score 576 scripts 10 dependentsbioc
scQTLtools:An R package for single-cell eQTL analysis and visualization
This package specializes in analyzing and visualizing eQTL at the single-cell level. It can read gene expression matrices or Seurat data, or SingleCellExperiment object along with genotype data. It offers a function for cis-eQTL analysis to detect eQTL within a given range, and another function to fit models with three methods. Using this package, users can also generate single-cell level visualization result.
Maintained by Xiaofeng Wu. Last updated 2 months ago.
softwaregeneexpressiongeneticvariabilitysnpdifferentialexpressiongenomicvariationvariantdetectiongeneticsfunctionalgenomicssystemsbiologyregressionsinglecellnormalizationvisualizationrna-seqsc-eqtl
9.5 match 3 stars 4.95 scorebnaras
distcomp:Computations over Distributed Data without Aggregation
Implementing algorithms and fitting models when sites (possibly remote) share computation summaries rather than actual data over HTTP with a master R process (using 'opencpu', for example). A stratified Cox model and a singular value decomposition are provided. The former makes direct use of code from the R 'survival' package. (That is, the underlying Cox model code is derived from that in the R 'survival' package.) Sites may provide data via several means: CSV files, Redcap API, etc. An extensible design allows for new methods to be added in the future and includes facilities for local prototyping and testing. Web applications are provided (via 'shiny') for the implemented methods to help in designing and deploying the computations.
Maintained by Balasubramanian Narasimhan. Last updated 9 months ago.
8.9 match 9 stars 5.33 score 47 scriptsbioc
scran:Methods for Single-Cell RNA-Seq Data Analysis
Implements miscellaneous functions for interpretation of single-cell RNA-seq data. Methods are provided for assignment of cell cycle phase, detection of highly variable and significantly correlated genes, identification of marker genes, and other common tasks in routine single-cell analysis workflows.
Maintained by Aaron Lun. Last updated 5 months ago.
immunooncologynormalizationsequencingrnaseqsoftwaregeneexpressiontranscriptomicssinglecellclusteringbioconductor-packagehuman-cell-atlassingle-cell-rna-seqopenblascpp
3.6 match 41 stars 13.14 score 7.6k scripts 36 dependentsropensci
rotl:Interface to the 'Open Tree of Life' API
An interface to the 'Open Tree of Life' API to retrieve phylogenetic trees, information about studies used to assemble the synthetic tree, and utilities to match taxonomic names to 'Open Tree identifiers'. The 'Open Tree of Life' aims at assembling a comprehensive phylogenetic tree for all named species.
Maintained by Francois Michonneau. Last updated 2 years ago.
metadataropensciphylogeneticsindependant-contrastsbiodiversitypeer-reviewedphylogenytaxonomy
3.9 match 40 stars 12.05 score 356 scripts 29 dependentsegeulgen
driveR:Prioritizing Cancer Driver Genes Using Genomics Data
Cancer genomes contain large numbers of somatic alterations but few genes drive tumor development. Identifying cancer driver genes is critical for precision oncology. Most of current approaches either identify driver genes based on mutational recurrence or using estimated scores predicting the functional consequences of mutations. 'driveR' is a tool for personalized or batch analysis of genomic data for driver gene prioritization by combining genomic information and prior biological knowledge. As features, 'driveR' uses coding impact metaprediction scores, non-coding impact scores, somatic copy number alteration scores, hotspot gene/double-hit gene condition, 'phenolyzer' gene scores and memberships to cancer-related KEGG pathways. It uses these features to estimate cancer-type-specific probability for each gene of being a cancer driver using the related task of a multi-task learning classification model. The method is described in detail in Ulgen E, Sezerman OU. 2021. driveR: driveR: a novel method for prioritizing cancer driver genes using somatic genomics data. BMC Bioinformatics <doi:10.1186/s12859-021-04203-7>.
Maintained by Ege Ulgen. Last updated 2 years ago.
cancer-drivernessdriverdriver-gene-prioritizationidentify-driver-genesranking-genesscoring
7.4 match 15 stars 6.29 score 260 scriptsstatist7
sitar:Super Imposition by Translation and Rotation Growth Curve Analysis
Functions for fitting and plotting SITAR (Super Imposition by Translation And Rotation) growth curve models. SITAR is a shape-invariant model with a regression B-spline mean curve and subject-specific random effects on both the measurement and age scales. The model was first described by Lindstrom (1995) <doi:10.1002/sim.4780141807> and developed as the SITAR method by Cole et al (2010) <doi:10.1093/ije/dyq115>.
Maintained by Tim Cole. Last updated 2 months ago.
5.4 match 13 stars 8.69 score 58 scripts 3 dependentsbioc
Nebulosa:Single-Cell Data Visualisation Using Kernel Gene-Weighted Density Estimation
This package provides a enhanced visualization of single-cell data based on gene-weighted density estimation. Nebulosa recovers the signal from dropped-out features and allows the inspection of the joint expression from multiple features (e.g. genes). Seurat and SingleCellExperiment objects can be used within Nebulosa.
Maintained by Jose Alquicira-Hernandez. Last updated 5 months ago.
softwaregeneexpressionsinglecellvisualizationdimensionreductionsingle-cellsingle-cell-analysissingle-cell-multiomicssingle-cell-rna-seq
4.8 match 99 stars 9.66 score 494 scriptshugaped
MBNMAdose:Dose-Response MBNMA Models
Fits Bayesian dose-response model-based network meta-analysis (MBNMA) that incorporate multiple doses within an agent by modelling different dose-response functions, as described by Mawdsley et al. (2016) <doi:10.1002/psp4.12091>. By modelling dose-response relationships this can connect networks of evidence that might otherwise be disconnected, and can improve precision on treatment estimates. Several common dose-response functions are provided; others may be added by the user. Various characteristics and assumptions can be flexibly added to the models, such as shared class effects. The consistency of direct and indirect evidence in the network can be assessed using unrelated mean effects models and/or by node-splitting at the treatment level.
Maintained by Hugo Pedder. Last updated 1 months ago.
7.0 match 10 stars 6.60 scorerrrlw
dragonking:Statistical Tools to Identify Dragon Kings
Statistical tests and test statistics to identify events in a dataset that are dragon kings (DKs). The statistical methods in this package were reviewed in Wheatley & Sornette (2015) <doi:10.2139/ssrn.2645709>.
Maintained by Raoul Wadhwa. Last updated 6 years ago.
15.4 match 2 stars 3.00 score 6 scriptsrwalkbout
walkboutr:Generate Walk Bouts from GPS and Accelerometry Data
Process GPS and accelerometry data to generate walk bouts. A walk bout is a period of activity with accelerometer movement matching the patterns of walking with corresponding GPS measurements that confirm travel. The inputs of the 'walkboutr' package are individual-level accelerometry and GPS data. The outputs of the model are walk bouts with corresponding times, duration, and summary statistics on the sample population, which collapse all personally identifying information. These bouts can be used to measure walking both as an outcome of a change to the built environment or as a predictor of health outcomes such as a cardioprotective behavior. Kang B, Moudon AV, Hurvitz PM, Saelens BE (2017) <doi:10.1016/j.trd.2017.09.026>.
Maintained by Lauren Blair Wilner. Last updated 1 years ago.
11.0 match 4.18 score 9 scriptsbioc
cTRAP:Identification of candidate causal perturbations from differential gene expression data
Compare differential gene expression results with those from known cellular perturbations (such as gene knock-down, overexpression or small molecules) derived from the Connectivity Map. Such analyses allow not only to infer the molecular causes of the observed difference in gene expression but also to identify small molecules that could drive or revert specific transcriptomic alterations.
Maintained by Nuno Saraiva-Agostinho. Last updated 5 months ago.
differentialexpressiongeneexpressionrnaseqtranscriptomicspathwaysimmunooncologygenesetenrichmentbioconductorbioinformaticscmapgene-expressionl1000
9.0 match 5 stars 5.08 score 16 scriptsjtextor
dagitty:Graphical Analysis of Structural Causal Models
A port of the web-based software 'DAGitty', available at <https://dagitty.net>, for analyzing structural causal models (also known as directed acyclic graphs or DAGs). This package computes covariate adjustment sets for estimating causal effects, enumerates instrumental variables, derives testable implications (d-separation and vanishing tetrads), generates equivalent models, and includes a simple facility for data simulation.
Maintained by Johannes Textor. Last updated 3 months ago.
3.6 match 302 stars 12.83 score 1.7k scripts 11 dependentsramiromagno
gwasrapidd:'REST' 'API' Client for the 'NHGRI'-'EBI' 'GWAS' Catalog
'GWAS' R 'API' Data Download. This package provides easy access to the 'NHGRI'-'EBI' 'GWAS' Catalog data by accessing the 'REST' 'API' <https://www.ebi.ac.uk/gwas/rest/docs/api/>.
Maintained by Ramiro Magno. Last updated 1 years ago.
thirdpartyclientbiomedicalinformaticsgenomewideassociationsnpassociation-studiesgwas-cataloghumanrest-clienttraittrait-ontology
5.6 match 95 stars 8.10 score 49 scripts 1 dependentsbioc
betaHMM:A Hidden Markov Model Approach for Identifying Differentially Methylated Sites and Regions for Beta-Valued DNA Methylation Data
A novel approach utilizing a homogeneous hidden Markov model. And effectively model untransformed beta values. To identify DMCs while considering the spatial. Correlation of the adjacent CpG sites.
Maintained by Koyel Majumdar. Last updated 3 months ago.
dnamethylationdifferentialmethylationimmunooncologybiomedicalinformaticsmethylationarraysoftwaremultiplecomparisonsequencingspatialcoveragegenetargethiddenmarkovmodelmicroarray
10.9 match 4.18 scoresfilges
umiAnalyzer:Tools for Analyzing Sequencing Data with Unique Molecular Identifiers
Tools for analyzing sequencing data containing unique molecular identifiers generated by 'UMIErrorCorrect' (<https://github.com/stahlberggroup/umierrorcorrect>).
Maintained by Stefan Filges. Last updated 3 years ago.
targeted-sequencingunique-molecular-identifiersvariant-analysis
10.1 match 4.46 score 58 scriptsbioc
BrowserViz:BrowserViz: interactive R/browser graphics using websockets and JSON
Interactvive graphics in a web browser from R, using websockets and JSON.
Maintained by Arkadiusz Gladki. Last updated 5 months ago.
7.0 match 2 stars 6.28 score 20 scripts 2 dependentsadamlilith
fasterRaster:Faster Raster and Spatial Vector Processing Using 'GRASS GIS'
Processing of large-in-memory/large-on disk rasters and spatial vectors using 'GRASS GIS' <https://grass.osgeo.org/>. Most functions in the 'terra' package are recreated. Processing of medium-sized and smaller spatial objects will nearly always be faster using 'terra' or 'sf', but for large-in-memory/large-on-disk objects, 'fasterRaster' may be faster. To use most of the functions, you must have the stand-alone version (not the 'OSGeoW4' installer version) of 'GRASS GIS' 8.0 or higher.
Maintained by Adam B. Smith. Last updated 20 days ago.
aspectdistancefragmentationfragmentation-indicesgisgrassgrass-gisrasterraster-projectionrasterizeslopetopographyvectorization
5.6 match 58 stars 7.69 score 8 scriptsfrbcesab
forcis:An R Client to Access the FORCIS Database
Provides an interface to the FORCIS database (<https://zenodo.org/doi/10.5281/zenodo.7390791>) on global foraminifera distribution. This package allows to download and to handle FORCIS data. It is part of the FRB-CESAB working group FORCIS. <https://www.fondationbiodiversite.fr/en/the-frb-in-action/programs-and-projects/le-cesab/forcis/>.
Maintained by Nicolas Casajus. Last updated 13 days ago.
7.5 match 4 stars 5.76 score 5 scriptsbioc
CAGEfightR:Analysis of Cap Analysis of Gene Expression (CAGE) data using Bioconductor
CAGE is a widely used high throughput assay for measuring transcription start site (TSS) activity. CAGEfightR is an R/Bioconductor package for performing a wide range of common data analysis tasks for CAGE and 5'-end data in general. Core functionality includes: import of CAGE TSSs (CTSSs), tag (or unidirectional) clustering for TSS identification, bidirectional clustering for enhancer identification, annotation with transcript and gene models, correlation of TSS and enhancer expression, calculation of TSS shapes, quantification of CAGE expression as expression matrices and genome brower visualization.
Maintained by Malte Thodberg. Last updated 5 months ago.
softwaretranscriptioncoveragegeneexpressiongeneregulationpeakdetectiondataimportdatarepresentationtranscriptomicssequencingannotationgenomebrowsersnormalizationpreprocessingvisualization
5.8 match 8 stars 7.46 score 67 scripts 1 dependentsbioc
DEGseq:Identify Differentially Expressed Genes from RNA-seq data
DEGseq is an R package to identify differentially expressed genes from RNA-Seq data.
Maintained by Likun Wang. Last updated 5 months ago.
rnaseqpreprocessinggeneexpressiondifferentialexpressionimmunooncologycpp
8.5 match 5.03 score 54 scriptsrandrescastaneda
joyn:Tool for Diagnosis of Tables Joins and Complementary Join Features
Tool for diagnosing table joins. It combines the speed of `collapse` and `data.table`, the flexibility of `dplyr`, and the diagnosis and features of the `merge` command in `Stata`.
Maintained by R.Andres Castaneda. Last updated 3 months ago.
6.1 match 9 stars 7.00 score 31 scriptsbioc
OmaDB:R wrapper for the OMA REST API
A package for the orthology prediction data download from OMA database.
Maintained by Klara Kaleb. Last updated 5 months ago.
softwarecomparativegenomicsfunctionalgenomicsgeneticsannotationgofunctionalprediction
6.8 match 2 stars 6.23 score 5 scriptsropensci
pangaear:Client for the 'Pangaea' Database
Tools to interact with the 'Pangaea' Database (<https://www.pangaea.de>), including functions for searching for data, fetching 'datasets' by 'dataset' 'ID', and working with the 'Pangaea' 'OAI-PMH' service.
Maintained by Scott Chamberlain. Last updated 2 years ago.
pangaeaenvironmental scienceearth sciencearchivepaleontologyecologychemistryatmosphereapi-clientdatapaleobiologyscientificwebservice-client
6.8 match 21 stars 6.27 score 29 scriptsbioc
cBioPortalData:Exposes and Makes Available Data from the cBioPortal Web Resources
The cBioPortalData R package accesses study datasets from the cBio Cancer Genomics Portal. It accesses the data either from the pre-packaged zip / tar files or from the API interface that was recently implemented by the cBioPortal Data Team. The package can provide data in either tabular format or with MultiAssayExperiment object that uses familiar Bioconductor data representations.
Maintained by Marcel Ramos. Last updated 11 days ago.
softwareinfrastructurethirdpartyclientbioconductor-packagenci-itcru24ca289073
4.2 match 33 stars 10.15 score 147 scripts 4 dependentsropensci
allodb:Tree Biomass Estimation at Extra-Tropical Forest Plots
Standardize and simplify the tree biomass estimation process across globally distributed extratropical forests.
Maintained by Erika Gonzalez-Akre. Last updated 11 days ago.
7.1 match 38 stars 5.94 score 38 scriptsbioc
RegionalST:Investigating regions of interest and performing regional cell type-specific analysis with spatial transcriptomics data
This package analyze spatial transcriptomics data through cross-regional cell type-specific analysis. It selects regions of interest (ROIs) and identifys cross-regional cell type-specific differential signals. The ROIs can be selected using automatic algorithm or through manual selection. It facilitates manual selection of ROIs using a shiny application.
Maintained by Ziyi Li. Last updated 3 months ago.
spatialtranscriptomicsreactomekegg
9.8 match 4.30 score 8 scriptsjosiahparry
sfdep:Spatial Dependence for Simple Features
An interface to 'spdep' to integrate with 'sf' objects and the 'tidyverse'.
Maintained by Dexter Locke. Last updated 6 months ago.
6.0 match 130 stars 7.01 score 130 scriptsbioc
RcisTarget:RcisTarget Identify transcription factor binding motifs enriched on a list of genes or genomic regions
RcisTarget identifies transcription factor binding motifs (TFBS) over-represented on a gene list. In a first step, RcisTarget selects DNA motifs that are significantly over-represented in the surroundings of the transcription start site (TSS) of the genes in the gene-set. This is achieved by using a database that contains genome-wide cross-species rankings for each motif. The motifs that are then annotated to TFs and those that have a high Normalized Enrichment Score (NES) are retained. Finally, for each motif and gene-set, RcisTarget predicts the candidate target genes (i.e. genes in the gene-set that are ranked above the leading edge).
Maintained by Gert Hulselmans. Last updated 5 months ago.
generegulationmotifannotationtranscriptomicstranscriptiongenesetenrichmentgenetarget
4.4 match 37 stars 9.47 score 191 scriptsbioc
BioNERO:Biological Network Reconstruction Omnibus
BioNERO aims to integrate all aspects of biological network inference in a single package, including data preprocessing, exploratory analyses, network inference, and analyses for biological interpretations. BioNERO can be used to infer gene coexpression networks (GCNs) and gene regulatory networks (GRNs) from gene expression data. Additionally, it can be used to explore topological properties of protein-protein interaction (PPI) networks. GCN inference relies on the popular WGCNA algorithm. GRN inference is based on the "wisdom of the crowds" principle, which consists in inferring GRNs with multiple algorithms (here, CLR, GENIE3 and ARACNE) and calculating the average rank for each interaction pair. As all steps of network analyses are included in this package, BioNERO makes users avoid having to learn the syntaxes of several packages and how to communicate between them. Finally, users can also identify consensus modules across independent expression sets and calculate intra and interspecies module preservation statistics between different networks.
Maintained by Fabricio Almeida-Silva. Last updated 5 months ago.
softwaregeneexpressiongeneregulationsystemsbiologygraphandnetworkpreprocessingnetworknetworkinference
5.3 match 27 stars 7.78 score 50 scripts 1 dependentsbioc
epimutacions:Robust outlier identification for DNA methylation data
The package includes some statistical outlier detection methods for epimutations detection in DNA methylation data. The methods included in the package are MANOVA, Multivariate linear models, isolation forest, robust mahalanobis distance, quantile and beta. The methods compare a case sample with a suspected disease against a reference panel (composed of healthy individuals) to identify epimutations in the given case sample. It also contains functions to annotate and visualize the identified epimutations.
Maintained by Dolors Pelegri-Siso. Last updated 5 months ago.
dnamethylationbiologicalquestionpreprocessingstatisticalmethodnormalizationcpp
9.8 match 4.23 score 28 scriptsthomasjemielita
StratifiedMedicine:Stratified Medicine
A toolkit for stratified medicine, subgroup identification, and precision medicine. Current tools include (1) filtering models (reduce covariate space), (2) patient-level estimate models (counterfactual patient-level quantities, such as the conditional average treatment effect), (3) subgroup identification models (find subsets of patients with similar treatment effects), and (4) treatment effect estimation and inference (for the overall population and discovered subgroups). These tools can be customized and are directly used in PRISM (patient response identifiers for stratified medicine; Jemielita and Mehrotra 2019 <arXiv:1912.03337>. This package is in beta and will be continually updated.
Maintained by Thomas Jemielita. Last updated 3 years ago.
8.7 match 2 stars 4.73 score 27 scriptslarmarange
broom.helpers:Helpers for Model Coefficients Tibbles
Provides suite of functions to work with regression model 'broom::tidy()' tibbles. The suite includes functions to group regression model terms by variable, insert reference and header rows for categorical variables, add variable labels, and more.
Maintained by Joseph Larmarange. Last updated 11 days ago.
3.6 match 22 stars 11.45 score 165 scripts 2 dependentsbioc
Category:Category Analysis
A collection of tools for performing category (gene set enrichment) analysis.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
annotationgopathwaysgenesetenrichment
5.2 match 7.93 score 183 scripts 16 dependentsbioc
MetMashR:Metabolite Mashing with R
A package to merge, filter sort, organise and otherwise mash together metabolite annotation tables. Metabolite annotations can be imported from multiple sources (software) and combined using workflow steps based on S4 class templates derived from the `struct` package. Other modular workflow steps such as filtering, merging, splitting, normalisation and rest-api queries are included.
Maintained by Gavin Rhys Lloyd. Last updated 5 months ago.
7.0 match 2 stars 5.81 score 5 scriptsopatp
GTFSwizard:Exploring and Manipulating 'GTFS' Files
Exploring, analyzing, and manipulating General Transit Feed Specification (GTFS) files, which represent public transportation schedules and geographic data. The package allows users to filter data by routes, trips, stops, and time, generate spatial visualizations, and perform detailed analyses of transit networks, including headway, dwell times, and route frequencies. Designed for transit planners, researchers, and data analysts, 'GTFSwizard' integrates functionalities from popular packages to enable efficient GTFS data manipulation and visualization.
Maintained by Nelson de O. Quesado Filho. Last updated 2 months ago.
9.2 match 1 stars 4.41 scorebioc
gDRcore:Processing functions and interface to process and analyze drug dose-response data
This package contains core functions to process and analyze drug response data. The package provides tools for normalizing, averaging, and calculation of gDR metrics data. All core functions are wrapped into the pipeline function allowing analyzing the data in a straightforward way.
Maintained by Arkadiusz Gladki. Last updated 6 days ago.
5.6 match 2 stars 7.25 score 4 scripts 1 dependentsbioc
CatsCradle:This package provides methods for analysing spatial transcriptomics data and for discovering gene clusters
This package addresses two broad areas. It allows for in-depth analysis of spatial transcriptomic data by identifying tissue neighbourhoods. These are contiguous regions of tissue surrounding individual cells. 'CatsCradle' allows for the categorisation of neighbourhoods by the cell types contained in them and the genes expressed in them. In particular, it produces Seurat objects whose individual elements are neighbourhoods rather than cells. In addition, it enables the categorisation and annotation of genes by producing Seurat objects whose elements are genes.
Maintained by Michael Shapiro. Last updated 1 months ago.
biologicalquestionstatisticalmethodgeneexpressionsinglecelltranscriptomicsspatial
6.2 match 3 stars 6.50 scorebioc
matter:Out-of-core statistical computing and signal processing
Toolbox for larger-than-memory scientific computing and visualization, providing efficient out-of-core data structures using files or shared memory, for dense and sparse vectors, matrices, and arrays, with applications to nonuniformly sampled signals and images.
Maintained by Kylie A. Bemis. Last updated 3 months ago.
infrastructuredatarepresentationdataimportdimensionreductionpreprocessingcpp
4.3 match 57 stars 9.52 score 64 scripts 2 dependentsbioc
roar:Identify differential APA usage from RNA-seq alignments
Identify preferential usage of APA sites, comparing two biological conditions, starting from known alternative sites and alignments obtained from standard RNA-seq experiments.
Maintained by Elena Grassi. Last updated 5 months ago.
sequencinghighthroughputsequencingrnaseqtranscription
8.0 match 4 stars 5.05 score 14 scriptsheardacat
Ramble:Parser Combinator for R
Parser generator for R using combinatory parsers. It is inspired by combinatory parsers developed in Haskell.
Maintained by Chapman Siu. Last updated 8 years ago.
combinatory-parsersparser-combinatorsparsing
6.8 match 22 stars 5.93 score 39 scriptsropensci
stplanr:Sustainable Transport Planning
Tools for transport planning with an emphasis on spatial transport data and non-motorized modes. The package was originally developed to support the 'Propensity to Cycle Tool', a publicly available strategic cycle network planning tool (Lovelace et al. 2017) <doi:10.5198/jtlu.2016.862>, but has since been extended to support public transport routing and accessibility analysis (Moreno-Monroy et al. 2017) <doi:10.1016/j.jtrangeo.2017.08.012> and routing with locally hosted routing engines such as 'OSRM' (Lowans et al. 2023) <doi:10.1016/j.enconman.2023.117337>. The main functions are for creating and manipulating geographic "desire lines" from origin-destination (OD) data (building on the 'od' package); calculating routes on the transport network locally and via interfaces to routing services such as <https://cyclestreets.net/> (Desjardins et al. 2021) <doi:10.1007/s11116-021-10197-1>; and calculating route segment attributes such as bearing. The package implements the 'travel flow aggregration' method described in Morgan and Lovelace (2020) <doi:10.1177/2399808320942779> and the 'OD jittering' method described in Lovelace et al. (2022) <doi:10.32866/001c.33873>. Further information on the package's aim and scope can be found in the vignettes and in a paper in the R Journal (Lovelace and Ellison 2018) <doi:10.32614/RJ-2018-053>, and in a paper outlining the landscape of open source software for geographic methods in transport planning (Lovelace, 2021) <doi:10.1007/s10109-020-00342-2>.
Maintained by Robin Lovelace. Last updated 7 months ago.
cyclecyclingdesire-linesorigin-destinationpeer-reviewedpubic-transportroute-networkroutesroutingspatialtransporttransport-planningtransportationwalking
3.3 match 427 stars 12.31 score 684 scripts 3 dependentsbioc
MoonlightR:Identify oncogenes and tumor suppressor genes from omics data
Motivation: The understanding of cancer mechanism requires the identification of genes playing a role in the development of the pathology and the characterization of their role (notably oncogenes and tumor suppressors). Results: We present an R/bioconductor package called MoonlightR which returns a list of candidate driver genes for specific cancer types on the basis of TCGA expression data. The method first infers gene regulatory networks and then carries out a functional enrichment analysis (FEA) (implementing an upstream regulator analysis, URA) to score the importance of well-known biological processes with respect to the studied cancer type. Eventually, by means of random forests, MoonlightR predicts two specific roles for the candidate driver genes: i) tumor suppressor genes (TSGs) and ii) oncogenes (OCGs). As a consequence, this methodology does not only identify genes playing a dual role (e.g. TSG in one cancer type and OCG in another) but also helps in elucidating the biological processes underlying their specific roles. In particular, MoonlightR can be used to discover OCGs and TSGs in the same cancer type. This may help in answering the question whether some genes change role between early stages (I, II) and late stages (III, IV) in breast cancer. In the future, this analysis could be useful to determine the causes of different resistances to chemotherapeutic treatments.
Maintained by Matteo Tiberti. Last updated 5 months ago.
dnamethylationdifferentialmethylationgeneregulationgeneexpressionmethylationarraydifferentialexpressionpathwaysnetworksurvivalgenesetenrichmentnetworkenrichment
6.1 match 17 stars 6.57 scorebioc
bioCancer:Interactive Multi-Omics Cancers Data Visualization and Analysis
This package is a Shiny App to visualize and analyse interactively Multi-Assays of Cancer Genomic Data.
Maintained by Karim Mezhoud. Last updated 5 months ago.
guidatarepresentationnetworkmultiplecomparisonpathwaysreactomevisualizationgeneexpressiongenetargetanalysisbiocancer-interfacecancercancer-studiesrmarkdown
6.7 match 20 stars 5.95 score 7 scriptsbioc
BioTIP:BioTIP: An R package for characterization of Biological Tipping-Point
Adopting tipping-point theory to transcriptome profiles to unravel disease regulatory trajectory.
Maintained by Yuxi (Jennifer) Sun. Last updated 5 months ago.
sequencingrnaseqgeneexpressiontranscriptionsoftware
5.8 match 18 stars 6.84 score 37 scriptsbioc
fobitools:Tools for Manipulating the FOBI Ontology
A set of tools for interacting with the Food-Biomarker Ontology (FOBI). A collection of basic manipulation tools for biological significance analysis, graphs, and text mining strategies for annotating nutritional data.
Maintained by Pol Castellano-Escuder. Last updated 4 months ago.
massspectrometrymetabolomicssoftwarevisualizationbiomedicalinformaticsgraphandnetworkannotationcheminformaticspathwaysgenesetenrichmentbiological-intrerpretationbiological-knowledgebiological-significance-analysisenrichment-analysisfood-biomarker-ontologyknowledge-graphnutritionobofoundryontologytext-mining
7.8 match 1 stars 5.08 score 5 scriptsjknowles
merTools:Tools for Analyzing Mixed Effect Regression Models
Provides methods for extracting results from mixed-effect model objects fit with the 'lme4' package. Allows construction of prediction intervals efficiently from large scale linear and generalized linear mixed-effects models. This method draws from the simulation framework used in the Gelman and Hill (2007) textbook: Data Analysis Using Regression and Multilevel/Hierarchical Models.
Maintained by Jared E. Knowles. Last updated 1 years ago.
3.8 match 105 stars 10.49 score 768 scriptspredictiveecology
SpaDES.tools:Additional Tools for Developing Spatially Explicit Discrete Event Simulation (SpaDES) Models
Provides GIS and map utilities, plus additional modeling tools for developing cellular automata, dynamic raster models, and agent based models in 'SpaDES'. Included are various methods for spatial spreading, spatial agents, GIS operations, random map generation, and others. See '?SpaDES.tools' for an categorized overview of these additional tools. The suggested package 'NLMR' can be installed from the following repository: (<https://PredictiveEcology.r-universe.dev>).
Maintained by Alex M Chubaty. Last updated 4 months ago.
5.2 match 4 stars 7.55 score 52 scripts 6 dependentstbates
umx:Structural Equation Modeling and Twin Modeling in R
Quickly create, run, and report structural equation models, and twin models. See '?umx' for help, and umx_open_CRAN_page("umx") for NEWS. Timothy C. Bates, Michael C. Neale, Hermine H. Maes, (2019). umx: A library for Structural Equation and Twin Modelling in R. Twin Research and Human Genetics, 22, 27-41. <doi:10.1017/thg.2019.2>.
Maintained by Timothy C. Bates. Last updated 3 days ago.
behavior-geneticsgeneticsopenmxpsychologysemstatisticsstructural-equation-modelingtutorialstwin-modelsumx
4.1 match 44 stars 9.45 score 472 scriptsbioc
InPAS:Identify Novel Alternative PolyAdenylation Sites (PAS) from RNA-seq data
Alternative polyadenylation (APA) is one of the important post- transcriptional regulation mechanisms which occurs in most human genes. InPAS facilitates the discovery of novel APA sites and the differential usage of APA sites from RNA-Seq data. It leverages cleanUpdTSeq to fine tune identified APA sites by removing false sites.
Maintained by Jianhong Ou. Last updated 2 months ago.
alternative polyadenylationdifferential polyadenylation site usagerna-seqgene regulationtranscription
9.0 match 4.30 score 1 scriptserichare
bulletr:Algorithms for Matching Bullet Lands
Analyze bullet lands using nonparametric methods. We provide a reading routine for x3p files (see <http://www.openfmc.org> for more information) and a host of analysis functions designed to assess the probability that two bullets were fired from the same gun barrel.
Maintained by Eric Hare. Last updated 7 years ago.
10.4 match 3.72 score 52 scriptsr-dbi
RMySQL:Database Interface and 'MySQL' Driver for R
Legacy 'DBI' interface to 'MySQL' / 'MariaDB' based on old code ported from S-PLUS. A modern 'MySQL' client written in 'C++' is available from the 'RMariaDB' package.
Maintained by Jeroen Ooms. Last updated 1 months ago.
2.8 match 209 stars 13.68 score 3.7k scripts 15 dependentsbioc
M3Drop:Michaelis-Menten Modelling of Dropouts in single-cell RNASeq
This package fits a model to the pattern of dropouts in single-cell RNASeq data. This model is used as a null to identify significantly variable (i.e. differentially expressed) genes for use in downstream analysis, such as clustering cells. Also includes an method for calculating exact Pearson residuals in UMI-tagged data using a library-size aware negative binomial model.
Maintained by Tallulah Andrews. Last updated 5 months ago.
rnaseqsequencingtranscriptomicsgeneexpressionsoftwaredifferentialexpressiondimensionreductionfeatureextractionhuman-cell-atlasrna-seqsingle-cellsingle-cell-rna-seq
4.4 match 29 stars 8.71 score 119 scripts 2 dependents