Showing 200 of total 422 results (show query)
dirkschumacher
listcomp:List Comprehensions
An implementation of list comprehensions as purely syntactic sugar with a minor runtime overhead. It constructs nested for-loops and executes the byte-compiled loops to collect the results.
Maintained by Dirk Schumacher. Last updated 3 years ago.
comprehensionslist-comprehensionslistcomprehensions
24.0 match 19 stars 5.33 score 3 scripts 7 dependentscmann3
eList:List Comprehension and Tools
Create list comprehensions (and other types of comprehension) similar to those in 'python', 'haskell', and other languages. List comprehension in 'R' converts a regular for() loop into a vectorized lapply() function. Support for looping with multiple variables, parallelization, and across non-standard objects included. Package also contains a variety of functions to help with list comprehension.
Maintained by Chris Mann. Last updated 4 years ago.
20.3 match 2 stars 4.48 score 9 scripts 1 dependentspatrickroocks
listcompr:List Comprehension for R
Syntactic shortcuts for creating synthetic lists, vectors, data frames, and matrices using list comprehension.
Maintained by Patrick Roocks. Last updated 3 years ago.
data-frameslist-comprehensionmatrixsyntactic-sugarvector
19.5 match 5 stars 4.40 score 5 scriptsgdemin
comprehenr:List Comprehensions
Provides 'Python'-style list comprehensions. List comprehension expressions use usual loops (for(), while() and repeat()) and usual if() as list producers. In many cases it gives more concise notation than standard "*apply + filter" strategy.
Maintained by Gregory Demin. Last updated 2 years ago.
9.9 match 20 stars 7.45 score 228 scripts 4 dependentsbioc
singleCellTK:Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data
The Single Cell Toolkit (SCTK) in the singleCellTK package provides an interface to popular tools for importing, quality control, analysis, and visualization of single cell RNA-seq data. SCTK allows users to seamlessly integrate tools from various packages at different stages of the analysis workflow. A general "a la carte" workflow gives users the ability access to multiple methods for data importing, calculation of general QC metrics, doublet detection, ambient RNA estimation and removal, filtering, normalization, batch correction or integration, dimensionality reduction, 2-D embedding, clustering, marker detection, differential expression, cell type labeling, pathway analysis, and data exporting. Curated workflows can be used to run Seurat and Celda. Streamlined quality control can be performed on the command line using the SCTK-QC pipeline. Users can analyze their data using commands in the R console or by using an interactive Shiny Graphical User Interface (GUI). Specific analyses or entire workflows can be summarized and shared with comprehensive HTML reports generated by Rmarkdown. Additional documentation and vignettes can be found at camplab.net/sctk.
Maintained by Joshua David Campbell. Last updated 24 days ago.
singlecellgeneexpressiondifferentialexpressionalignmentclusteringimmunooncologybatcheffectnormalizationqualitycontroldataimportgui
7.0 match 181 stars 10.16 score 252 scriptsmassimoaria
bibliometrix:Comprehensive Science Mapping Analysis
Tool for quantitative research in scientometrics and bibliometrics. It implements the comprehensive workflow for science mapping analysis proposed in Aria M. and Cuccurullo C. (2017) <doi:10.1016/j.joi.2017.08.007>. 'bibliometrix' provides various routines for importing bibliographic data from 'SCOPUS', 'Clarivate Analytics Web of Science' (<https://www.webofknowledge.com/>), 'Digital Science Dimensions' (<https://www.dimensions.ai/>), 'OpenAlex' (<https://openalex.org/>), 'Cochrane Library' (<https://www.cochranelibrary.com/>), 'Lens' (<https://lens.org>), and 'PubMed' (<https://pubmed.ncbi.nlm.nih.gov/>) databases, performing bibliometric analysis and building networks for co-citation, coupling, scientific collaboration and co-word analysis.
Maintained by Massimo Aria. Last updated 8 days ago.
bibliometric-analysisbibliometricscitationcitation-networkcitationsco-authorsco-occurenceco-word-analysiscorrespondence-analysiscouplingisi-webjournalmanuscriptquantitative-analysisscholarssciencescience-mappingscientificscientometricsscopus
5.5 match 545 stars 12.54 score 518 scripts 2 dependentsr-lum
Luminescence:Comprehensive Luminescence Dating Data Analysis
A collection of various R functions for the purpose of Luminescence dating data analysis. This includes, amongst others, data import, export, application of age models, curve deconvolution, sequence analysis and plotting of equivalent dose distributions.
Maintained by Sebastian Kreutzer. Last updated 1 days ago.
bayesian-statisticsdata-sciencegeochronologyluminescenceluminescence-datingopen-scienceoslplottingradiofluorescencetlxsygcpp
4.8 match 15 stars 10.77 score 178 scripts 8 dependentsr-forge
carData:Companion to Applied Regression Data Sets
Datasets to Accompany J. Fox and S. Weisberg, An R Companion to Applied Regression, Third Edition, Sage (2019).
Maintained by John Fox. Last updated 5 months ago.
3.8 match 12.41 score 944 scripts 919 dependentslightbluetitan
usdatasets:A Comprehensive Collection of U.S. Datasets
Provides a diverse collection of U.S. datasets encompassing various fields such as crime, economics, education, finance, energy, healthcare, and more. It serves as a valuable resource for researchers and analysts seeking to perform in-depth analyses and derive insights from U.S.-specific data.
Maintained by Renzo Caceres Rossi. Last updated 5 months ago.
7.7 match 7 stars 5.99 score 141 scriptsbioc
musicatk:Mutational Signature Comprehensive Analysis Toolkit
Mutational signatures are carcinogenic exposures or aberrant cellular processes that can cause alterations to the genome. We created musicatk (MUtational SIgnature Comprehensive Analysis ToolKit) to address shortcomings in versatility and ease of use in other pre-existing computational tools. Although many different types of mutational data have been generated, current software packages do not have a flexible framework to allow users to mix and match different types of mutations in the mutational signature inference process. Musicatk enables users to count and combine multiple mutation types, including SBS, DBS, and indels. Musicatk calculates replication strand, transcription strand and combinations of these features along with discovery from unique and proprietary genomic feature associated with any mutation type. Musicatk also implements several methods for discovery of new signatures as well as methods to infer exposure given an existing set of signatures. Musicatk provides functions for visualization and downstream exploratory analysis including the ability to compare signatures between cohorts and find matching signatures in COSMIC V2 or COSMIC V3.
Maintained by Joshua D. Campbell. Last updated 5 months ago.
softwarebiologicalquestionsomaticmutationvariantannotation
6.5 match 13 stars 7.02 score 20 scriptsemilhvitfeldt
paletteer:Comprehensive Collection of Color Palettes
The choices of color palettes in R can be quite overwhelming with palettes spread over many packages with many different API's. This packages aims to collect all color palettes across the R ecosystem under the same package with a streamlined API.
Maintained by Emil Hvitfeldt. Last updated 9 months ago.
3.1 match 957 stars 13.50 score 6.9k scripts 23 dependentsrstudio
reticulate:Interface to 'Python'
Interface to 'Python' modules, classes, and functions. When calling into 'Python', R data types are automatically converted to their equivalent 'Python' types. When values are returned from 'Python' to R they are converted back to R types. Compatible with all versions of 'Python' >= 2.7.
Maintained by Tomasz Kalinowski. Last updated 2 days ago.
2.0 match 1.7k stars 21.07 score 18k scripts 427 dependentsbioc
Informeasure:R implementation of information measures
This package consolidates a comprehensive set of information measurements, encompassing mutual information, conditional mutual information, interaction information, partial information decomposition, and part mutual information.
Maintained by Chu Pan. Last updated 5 months ago.
geneexpressionnetworkinferencenetworksoftware
9.4 match 3 stars 4.48 score 4 scriptslightbluetitan
crimedatasets:A Comprehensive Collection of Crime-Related Datasets
A comprehensive collection of datasets exclusively focused on crimes, criminal activities, and related topics. This package serves as a valuable resource for researchers, analysts, and students interested in crime analysis, criminology, social and economic studies related to criminal behavior. Datasets span global and local contexts, with a mix of tabular and spatial data.
Maintained by Renzo Caceres Rossi. Last updated 3 months ago.
8.3 match 8 stars 4.90 score 3 scriptsbioc
miRBaseConverter:A comprehensive and high-efficiency tool for converting and retrieving the information of miRNAs in different miRBase versions
A comprehensive tool for converting and retrieving the miRNA Name, Accession, Sequence, Version, History and Family information in different miRBase versions. It can process a huge number of miRNAs in a short time without other depends.
Maintained by Taosheng Xu Taosheng Xu. Last updated 5 months ago.
6.0 match 1 stars 6.50 score 70 scriptsacharaakshit
rminizinc:R Interface to 'MiniZinc'
Constraint optimization, or constraint programming, is the name given to identifying feasible solutions out of a very large set of candidates, where the problem can be modeled in terms of arbitrary constraints. 'MiniZinc' is a free and open-source constraint modeling language. Constraint satisfaction and discrete optimization problems can be formulated in a high-level modeling language. Models are compiled into an intermediate representation that is understood by a wide range of solvers. 'MiniZinc' itself provides several solvers, for instance 'GeCode'. R users can use the package to solve constraint programming problems without using 'MiniZinc' directly, modify existing 'MiniZinc' models and also create their own models.
Maintained by Akshit Achara. Last updated 3 years ago.
8.0 match 13 stars 4.81 score 5 scriptsbioc
Qtlizer:Comprehensive QTL annotation of GWAS results
This R package provides access to the Qtlizer web server. Qtlizer annotates lists of common small variants (mainly SNPs) and genes in humans with associated changes in gene expression using the most comprehensive database of published quantitative trait loci (QTLs).
Maintained by Matthias Munz. Last updated 16 days ago.
genomewideassociationsnpgeneticslinkagedisequilibriumeqtlgwasvariant-annotation
6.4 match 3 stars 5.73 score 2 scriptslightbluetitan
educationR:A Comprehensive Collection of Educational Datasets
Provides a comprehensive collection of datasets related to education, covering topics such as student performance, learning methods, test scores, absenteeism, and other educational metrics. This package is designed as a resource for educational researchers, data analysts, and statisticians to explore and analyze data in the field of education.
Maintained by Renzo Caceres Rossi. Last updated 3 months ago.
8.4 match 4 stars 4.30 score 3 scriptsopenintrostat
openintro:Datasets and Supplemental Functions from 'OpenIntro' Textbooks and Labs
Supplemental functions and data for 'OpenIntro' resources, which includes open-source textbooks and resources for introductory statistics (<https://www.openintro.org/>). The package contains datasets used in our open-source textbooks along with custom plotting functions for reproducing book figures. Note that many functions and examples include color transparency; some plotting elements may not show up properly (or at all) when run in some versions of Windows operating system.
Maintained by Mine Çetinkaya-Rundel. Last updated 3 months ago.
3.1 match 240 stars 11.39 score 6.0k scriptsjacob-long
interactions:Comprehensive, User-Friendly Toolkit for Probing Interactions
A suite of functions for conducting and interpreting analysis of statistical interaction in regression models that was formerly part of the 'jtools' package. Functionality includes visualization of two- and three-way interactions among continuous and/or categorical variables as well as calculation of "simple slopes" and Johnson-Neyman intervals (see e.g., Bauer & Curran, 2005 <doi:10.1207/s15327906mbr4003_5>). These capabilities are implemented for generalized linear models in addition to the standard linear regression context.
Maintained by Jacob A. Long. Last updated 8 months ago.
interactionsmoderationsocial-sciencesstatistics
2.9 match 131 stars 11.39 score 1.2k scripts 5 dependentsbioc
EnrichedHeatmap:Making Enriched Heatmaps
Enriched heatmap is a special type of heatmap which visualizes the enrichment of genomic signals on specific target regions. Here we implement enriched heatmap by ComplexHeatmap package. Since this type of heatmap is just a normal heatmap but with some special settings, with the functionality of ComplexHeatmap, it would be much easier to customize the heatmap as well as concatenating to a list of heatmaps to show correspondance between different data sources.
Maintained by Zuguang Gu. Last updated 5 months ago.
softwarevisualizationsequencinggenomeannotationcoveragecpp
3.0 match 190 stars 10.87 score 330 scripts 1 dependentslightbluetitan
OncoDataSets:A Comprehensive Collection of Cancer Types and Cancer-related DataSets
Offers a rich collection of data focused on cancer research, covering survival rates, genetic studies, biomarkers, and epidemiological insights. Designed for researchers, analysts, and bioinformatics practitioners, the package includes datasets on various cancer types such as melanoma, leukemia, breast, ovarian, and lung cancer, among others. It aims to facilitate advanced research, analysis, and understanding of cancer epidemiology, genetics, and treatment outcomes.
Maintained by Renzo Caceres Rossi. Last updated 3 months ago.
7.5 match 3 stars 4.18 score 6 scriptsbioc
motifbreakR:A Package For Predicting The Disruptiveness Of Single Nucleotide Polymorphisms On Transcription Factor Binding Sites
We introduce motifbreakR, which allows the biologist to judge in the first place whether the sequence surrounding the polymorphism is a good match, and in the second place how much information is gained or lost in one allele of the polymorphism relative to another. MotifbreakR is both flexible and extensible over previous offerings; giving a choice of algorithms for interrogation of genomes with motifs from public sources that users can choose from; these are 1) a weighted-sum probability matrix, 2) log-probabilities, and 3) weighted by relative entropy. MotifbreakR can predict effects for novel or previously described variants in public databases, making it suitable for tasks beyond the scope of its original design. Lastly, it can be used to interrogate any genome curated within Bioconductor (currently there are 32 species, a total of 109 versions).
Maintained by Simon Gert Coetzee. Last updated 5 months ago.
chipseqvisualizationmotifannotationtranscription
3.2 match 28 stars 8.96 score 103 scriptsbioc
crisprDesign:Comprehensive design of CRISPR gRNAs for nucleases and base editors
Provides a comprehensive suite of functions to design and annotate CRISPR guide RNA (gRNAs) sequences. This includes on- and off-target search, on-target efficiency scoring, off-target scoring, full gene and TSS contextual annotations, and SNP annotation (human only). It currently support five types of CRISPR modalities (modes of perturbations): CRISPR knockout, CRISPR activation, CRISPR inhibition, CRISPR base editing, and CRISPR knockdown. All types of CRISPR nucleases are supported, including DNA- and RNA-target nucleases such as Cas9, Cas12a, and Cas13d. All types of base editors are also supported. gRNA design can be performed on reference genomes, transcriptomes, and custom DNA and RNA sequences. Both unpaired and paired gRNA designs are enabled.
Maintained by Jean-Philippe Fortin. Last updated 11 days ago.
crisprfunctionalgenomicsgenetargetbioconductorbioconductor-packagecrispr-cas9crispr-designcrispr-targetgenomics-analysisgrnagrna-sequencegrna-sequencessgrnasgrna-design
3.4 match 22 stars 8.28 score 80 scripts 3 dependentsbioc
deconvR:Simulation and Deconvolution of Omic Profiles
This package provides a collection of functions designed for analyzing deconvolution of the bulk sample(s) using an atlas of reference omic signature profiles and a user-selected model. Users are given the option to create or extend a reference atlas and,also simulate the desired size of the bulk signature profile of the reference cell types.The package includes the cell-type-specific methylation atlas and, Illumina Epic B5 probe ids that can be used in deconvolution. Additionally,we included BSmeth2Probe, to make mapping WGBS data to their probe IDs easier.
Maintained by Irem B. Gündüz. Last updated 5 months ago.
dnamethylationregressiongeneexpressionrnaseqsinglecellstatisticalmethodtranscriptomicsbioconductor-packagedeconvolutiondna-methylationomics
4.8 match 10 stars 5.78 score 15 scriptscjendres1
nhanesA:NHANES Data Retrieval
Utility to retrieve data from the National Health and Nutrition Examination Survey (NHANES) website <https://www.cdc.gov/nchs/nhanes/>.
Maintained by Christopher Endres. Last updated 2 months ago.
2.9 match 59 stars 9.37 score 239 scriptsbioc
MicrobiotaProcess:A comprehensive R package for managing and analyzing microbiome and other ecological data within the tidy framework
MicrobiotaProcess is an R package for analysis, visualization and biomarker discovery of microbial datasets. It introduces MPSE class, this make it more interoperable with the existing computing ecosystem. Moreover, it introduces a tidy microbiome data structure paradigm and analysis grammar. It provides a wide variety of microbiome data analysis procedures under the unified and common framework (tidy-like framework).
Maintained by Shuangbin Xu. Last updated 5 months ago.
visualizationmicrobiomesoftwaremultiplecomparisonfeatureextractionmicrobiome-analysismicrobiome-data
2.7 match 183 stars 9.70 score 126 scripts 1 dependentslightbluetitan
MedDataSets:Comprehensive Medical, Disease, Treatment, and Drug Datasets
Provides an extensive collection of datasets related to medicine, diseases, treatments, drugs, and public health. This package covers topics such as drug effectiveness, vaccine trials, survival rates, infectious disease outbreaks, and medical treatments. The included datasets span various health conditions, including AIDS, cancer, bacterial infections, and COVID-19, along with information on pharmaceuticals and vaccines. These datasets are sourced from the R ecosystem and other R packages, remaining unaltered to ensure data integrity. This package serves as a valuable resource for researchers, analysts, and healthcare professionals interested in conducting medical and public health data analysis in R.
Maintained by Renzo Caceres Rossi. Last updated 5 months ago.
4.6 match 8 stars 5.68 score 60 scriptsccsosa
GOCompare:Comprehensive GO Terms Comparison Between Species
Supports the assessment of functional enrichment analyses obtained for several lists of genes and provides a workflow to analyze them between two species via weighted graphs. Methods are described in Sosa et al. (2023) <doi:10.1016/j.ygeno.2022.110528>.
Maintained by Chrystian Camilo Sosa. Last updated 4 months ago.
6.3 match 9 stars 4.13 score 1 scriptsrevolutionanalytics
foreach:Provides Foreach Looping Construct
Support for the foreach looping construct. Foreach is an idiom that allows for iterating over elements in a collection, without the use of an explicit loop counter. This package in particular is intended to be used for its return value, rather than for its side effects. In that sense, it is similar to the standard lapply function, but doesn't require the evaluation of a function. Using foreach without side effects also facilitates executing the loop in parallel.
Maintained by Folashade Daniel. Last updated 3 years ago.
1.5 match 54 stars 17.16 score 43k scripts 2.8k dependentskwstat
pals:Color Palettes, Colormaps, and Tools to Evaluate Them
A comprehensive collection of color palettes, colormaps, and tools to evaluate them. See Kovesi (2015) <doi:10.48550/arXiv.1509.03700>.
Maintained by Kevin Wright. Last updated 9 days ago.
2.2 match 83 stars 11.39 score 2.1k scripts 8 dependentsbioc
rCGH:Comprehensive Pipeline for Analyzing and Visualizing Array-Based CGH Data
A comprehensive pipeline for analyzing and interactively visualizing genomic profiles generated through commercial or custom aCGH arrays. As inputs, rCGH supports Agilent dual-color Feature Extraction files (.txt), from 44 to 400K, Affymetrix SNP6.0 and cytoScanHD probeset.txt, cychp.txt, and cnchp.txt files exported from ChAS or Affymetrix Power Tools. rCGH also supports custom arrays, provided data complies with the expected format. This package takes over all the steps required for individual genomic profiles analysis, from reading files to profiles segmentation and gene annotations. This package also provides several visualization functions (static or interactive) which facilitate individual profiles interpretation. Input files can be in compressed format, e.g. .bz2 or .gz.
Maintained by Frederic Commo. Last updated 5 months ago.
acghcopynumbervariationpreprocessingfeatureextraction
5.0 match 4 stars 5.10 score 26 scripts 1 dependentsropensci
ijtiff:Comprehensive TIFF I/O with Full Support for 'ImageJ' TIFF Files
General purpose TIFF file I/O for R users. Currently the only such package with read and write support for TIFF files with floating point (real-numbered) pixels, and the only package that can correctly import TIFF files that were saved from 'ImageJ' and write TIFF files than can be correctly read by 'ImageJ' <https://imagej.net/ij/>. Also supports text image I/O.
Maintained by Rory Nolan. Last updated 6 days ago.
image-manipulationimagejpeer-reviewedtiff-filestiff-imagestiff
2.8 match 18 stars 8.97 score 36 scripts 7 dependentsropensci
ckanr:Client for the Comprehensive Knowledge Archive Network ('CKAN') API
Client for 'CKAN' API (<https://ckan.org/>). Includes interface to 'CKAN' 'APIs' for search, list, show for packages, organizations, and resources. In addition, provides an interface to the 'datastore' API.
Maintained by Francisco Alves. Last updated 2 years ago.
databaseopen-datackanapidatadatasetapi-wrapperckan-api
2.9 match 100 stars 8.67 score 448 scripts 4 dependentsbioc
RnBeads:RnBeads
RnBeads facilitates comprehensive analysis of various types of DNA methylation data at the genome scale.
Maintained by Fabian Mueller. Last updated 1 months ago.
dnamethylationmethylationarraymethylseqepigeneticsqualitycontrolpreprocessingbatcheffectdifferentialmethylationsequencingcpgislandimmunooncologytwochanneldataimport
3.5 match 6.85 score 169 scripts 1 dependentsalsguimaraes
MUS:Monetary Unit Sampling and Estimation Methods, Widely Used in Auditing
Sampling and evaluation methods to apply Monetary Unit Sampling (or in older literature Dollar Unit Sampling) during an audit of financial statements.
Maintained by Henning Prömpers. Last updated 6 years ago.
5.3 match 5 stars 4.60 score 16 scriptserhard-lab
grandR:Comprehensive Analysis of Nucleotide Conversion Sequencing Data
Nucleotide conversion sequencing experiments have been developed to add a temporal dimension to RNA-seq and single-cell RNA-seq. Such experiments require specialized tools for primary processing such as GRAND-SLAM, (see 'Jürges et al' <doi:10.1093/bioinformatics/bty256>) and specialized tools for downstream analyses. 'grandR' provides a comprehensive toolbox for quality control, kinetic modeling, differential gene expression analysis and visualization of such data.
Maintained by Florian Erhard. Last updated 1 months ago.
3.4 match 11 stars 7.03 score 18 scripts 1 dependentsadamlilith
fasterRaster:Faster Raster and Spatial Vector Processing Using 'GRASS GIS'
Processing of large-in-memory/large-on disk rasters and spatial vectors using 'GRASS GIS' <https://grass.osgeo.org/>. Most functions in the 'terra' package are recreated. Processing of medium-sized and smaller spatial objects will nearly always be faster using 'terra' or 'sf', but for large-in-memory/large-on-disk objects, 'fasterRaster' may be faster. To use most of the functions, you must have the stand-alone version (not the 'OSGeoW4' installer version) of 'GRASS GIS' 8.0 or higher.
Maintained by Adam B. Smith. Last updated 19 days ago.
aspectdistancefragmentationfragmentation-indicesgisgrassgrass-gisrasterraster-projectionrasterizeslopetopographyvectorization
3.1 match 58 stars 7.69 score 8 scriptshomerhanumat
tigerstats:R Functions for Elementary Statistics
A collection of data sets and functions that are useful in the teaching of statistics at an elementary level to students who may have little or no previous experience with the command line. The functions for elementary inferential procedures follow a uniform interface for user input. Some of the functions are instructional applets that can only be run on the R Studio integrated development environment with package 'manipulate' installed. Other instructional applets are Shiny apps that may be run locally. In teaching the package is used alongside of package 'mosaic', 'mosaicData' and 'abd', which are therefore listed as dependencies.
Maintained by Homer White. Last updated 4 years ago.
4.0 match 16 stars 5.77 score 327 scriptsbeckerbenj
eatGADS:Data Management of Large Hierarchical Data
Import 'SPSS' data, handle and change 'SPSS' meta data, store and access large hierarchical data in 'SQLite' data bases.
Maintained by Benjamin Becker. Last updated 23 days ago.
3.1 match 1 stars 7.36 score 34 scripts 1 dependentssonsoleslp
tna:Transition Network Analysis (TNA)
Provides tools for performing Transition Network Analysis (TNA) to study relational dynamics, including functions for building and plotting TNA models, calculating centrality measures, and identifying dominant events and patterns. TNA statistical techniques (e.g., bootstrapping and permutation tests) ensure the reliability of observed insights and confirm that identified dynamics are meaningful. See (Saqr et al., 2025) <doi:10.1145/3706468.3706513> for more details on TNA.
Maintained by Sonsoles López-Pernas. Last updated 3 days ago.
educational-data-mininglearning-analyticsmarkov-modeltemporal-analysis
3.5 match 4 stars 6.48 score 5 scriptsarnaudgallou
plume:A Simple Author Handler for Scientific Writing
Handles and formats author information in scientific writing in 'R Markdown' and 'Quarto'. 'plume' provides easy-to-use and flexible tools for injecting author metadata in 'YAML' headers as well as generating author and contribution lists (among others) as strings from tabular data.
Maintained by Arnaud Gallou. Last updated 30 days ago.
authorscontributioncontributionslistlistsmarkdownpaperpreprintquartoroleroles
3.3 match 21 stars 6.80 score 15 scriptsecor
RMAWGEN:Multi-Site Auto-Regressive Weather GENerator
S3 and S4 functions are implemented for spatial multi-site stochastic generation of daily time series of temperature and precipitation. These tools make use of Vector AutoRegressive models (VARs). The weather generator model is then saved as an object and is calibrated by daily instrumental "Gaussianized" time series through the 'vars' package tools. Once obtained this model, it can it can be used for weather generations and be adapted to work with several climatic monthly time series.
Maintained by Emanuele Cordano. Last updated 26 days ago.
4.0 match 3 stars 5.62 score 115 scripts 4 dependentsbioc
oposSOM:Comprehensive analysis of transcriptome data
This package translates microarray expression data into metadata of reduced dimension. It provides various sample-centered and group-centered visualizations, sample similarity analyses and functional enrichment analyses. The underlying SOM algorithm combines feature clustering, multidimensional scaling and dimension reduction, along with strong visualization capabilities. It enables extraction and description of functional expression modules inherent in the data.
Maintained by Henry Loeffler-Wirth. Last updated 5 months ago.
geneexpressiondifferentialexpressiongenesetenrichmentdatarepresentationvisualizationcpp
5.0 match 4.48 score 7 scriptsperson-c
easybio:Comprehensive Single-Cell Annotation and Transcriptomic Analysis Toolkit
Provides a comprehensive toolkit for single-cell annotation with the 'CellMarker2.0' database (see Xia Li, Peng Wang, Yunpeng Zhang (2023) <doi: 10.1093/nar/gkac947>). Streamlines biological label assignment in single-cell RNA-seq data and facilitates transcriptomic analysis, including preparation of TCGA<https://portal.gdc.cancer.gov/> and GEO<https://www.ncbi.nlm.nih.gov/geo/> datasets, differential expression analysis and visualization of enrichment analysis results. Additional utility functions support various bioinformatics workflows. See Wei Cui (2024) <doi: 10.1101/2024.09.14.609619> for more details.
Maintained by Wei Cui. Last updated 13 days ago.
limmageoqueryedgerfgseabioinformaticscellmarker2gsearna-seqsingle-cell
3.4 match 10 stars 6.62 score 35 scriptsropensci
rredlist:'IUCN' Red List Client
'IUCN' Red List (<https://api.iucnredlist.org/>) client. The 'IUCN' Red List is a global list of threatened and endangered species. Functions cover all of the Red List 'API' routes. An 'API' key is required.
Maintained by William Gearty. Last updated 1 months ago.
iucnbiodiversityapiweb-servicestraitshabitatspeciesconservationapi-wrapperiucn-red-listtaxize
1.9 match 53 stars 11.49 score 195 scripts 24 dependentsasa12138
pctax:Professional Comprehensive Omics Data Analysis
Provides a comprehensive suite of tools for analyzing omics data. It includes functionalities for alpha diversity analysis, beta diversity analysis, differential abundance analysis, community assembly analysis, visualization of phylogenetic tree, and functional enrichment analysis. With a progressive approach, the package offers a range of analysis methods to explore and understand the complex communities. It is designed to support researchers and practitioners in conducting in-depth and professional omics data analysis.
Maintained by Chen Peng. Last updated 4 months ago.
microbiomesoftwarevisualizationomics
3.5 match 14 stars 5.89 score 14 scriptspalderman
DSSAT:A Comprehensive R Interface for the DSSAT Cropping Systems Model
The purpose of this package is to provide a comprehensive R interface to the Decision Support System for Agrotechnology Transfer Cropping Systems Model (DSSAT-CSM; see <https://dssat.net> for more information). The package provides cross-platform functions to read and write input files, run DSSAT-CSM, and read output files.
Maintained by Phillip D. Alderman. Last updated 1 years ago.
3.4 match 22 stars 5.57 score 34 scriptszheng206
ComBatFamQC:Comprehensive Batch Effect Diagnostics and Harmonization
Provides a comprehensive framework for batch effect diagnostics, harmonization, and post-harmonization downstream analysis. Features include interactive visualization tools, robust statistical tests, and a range of harmonization techniques. Additionally, 'ComBatFamQC' enables the creation of life-span age trend plots with estimated age-adjusted centiles and facilitates the generation of covariate-corrected residuals for analytical purposes. Methods for harmonization are based on approaches described in Johnson et al., (2007) <doi:10.1093/biostatistics/kxj037>, Beer et al., (2020) <doi:10.1016/j.neuroimage.2020.117129>, Pomponio et al., (2020) <doi:10.1016/j.neuroimage.2019.116450>, and Chen et al., (2021) <doi:10.1002/hbm.25688>.
Maintained by Zheng Ren. Last updated 2 months ago.
diagnostic-toolharmonizationrshinyapp
3.5 match 2 stars 5.35 score 16 scriptstonigi
dtw:Dynamic Time Warping Algorithms
A comprehensive implementation of dynamic time warping (DTW) algorithms in R. DTW computes the optimal (least cumulative distance) alignment between points of two time series. Common DTW variants covered include local (slope) and global (window) constraints, subsequence matches, arbitrary distance definitions, normalizations, minimum variance matching, and so on. Provides cumulative distances, alignments, specialized plot styles, etc., as described in Giorgino (2009) <doi:10.18637/jss.v031.i07>.
Maintained by Toni Giorgino. Last updated 2 years ago.
2.2 match 5 stars 8.48 score 582 scripts 49 dependentsstatleila
priorityelasticnet:Comprehensive Analysis of Multi-Omics Data Using an Offset-Based Method
Priority-ElasticNet extends the Priority-LASSO method (Klau et al. (2018) <doi:10.1186/s12859-018-2344-6>) by incorporating the ElasticNet penalty, allowing for both L1 and L2 regularization. This approach fits successive ElasticNet models for several blocks of (omics) data with different priorities, using the predicted values from each block as an offset for the subsequent block. It also offers robust options to handle block-wise missingness in multi-omics data, improving the flexibility and applicability of the model in the presence of incomplete datasets.
Maintained by Laila Qadir Musib. Last updated 2 months ago.
5.5 match 3.34 scorewelch-lab
rliger:Linked Inference of Genomic Experimental Relationships
Uses an extension of nonnegative matrix factorization to identify shared and dataset-specific factors. See Welch J, Kozareva V, et al (2019) <doi:10.1016/j.cell.2019.05.006>, and Liu J, Gao C, Sodicoff J, et al (2020) <doi:10.1038/s41596-020-0391-8> for more details.
Maintained by Yichen Wang. Last updated 2 months ago.
nonnegative-matrix-factorizationsingle-cellopenblascpp
1.7 match 408 stars 10.77 score 334 scripts 1 dependentsbiogenies
countfitteR:Comprehensive Automatized Evaluation of Distribution Models for Count Data
A large number of measurements generate count data. This is a statistical data type that only assumes non-negative integer values and is generated by counting. Typically, counting data can be found in biomedical applications, such as the analysis of DNA double-strand breaks. The number of DNA double-strand breaks can be counted in individual cells using various bioanalytical methods. For diagnostic applications, it is relevant to record the distribution of the number data in order to determine their biomedical significance (Roediger, S. et al., 2018. Journal of Laboratory and Precision Medicine. <doi:10.21037/jlpm.2018.04.10>). The software offers functions for a comprehensive automated evaluation of distribution models of count data. In addition to programmatic interaction, a graphical user interface (web server) is included, which enables fast and interactive data-scientific analyses. The user is supported in selecting the most suitable counting distribution for his own data set.
Maintained by Jaroslaw Chilimoniuk. Last updated 2 years ago.
cancercancer-imaging-researchcount-datacount-distributionfoci
3.4 match 4 stars 5.33 score 27 scriptsdzhakparov
GeneSelectR:Comprehensive Feature Selection Worfkflow for Bulk RNAseq Datasets
GeneSelectR is a versatile R package designed for efficient RNA sequencing data analysis. Its key innovation lies in the seamless integration of the Python sklearn machine learning framework with R-based bioinformatics tools. This integration enables GeneSelectR to perform robust ML-driven feature selection while simultaneously leveraging the power of Gene Ontology (GO) enrichment and semantic similarity analyses. By combining these diverse methodologies, GeneSelectR offers a comprehensive workflow that optimizes both the computational aspects of ML and the biological insights afforded by advanced bioinformatics analyses. Ideal for researchers in bioinformatics, GeneSelectR stands out as a unique tool for analyzing complex RNAseq datasets with enhanced precision and relevance.
Maintained by Damir Zhakparov. Last updated 10 months ago.
3.4 match 19 stars 5.28 score 7 scriptsnandp1
gpbStat:Comprehensive Statistical Analysis of Plant Breeding Experiments
Performs statistical data analysis of various Plant Breeding experiments. Contains functions for Line by Tester analysis as per Arunachalam, V.(1974) <http://repository.ias.ac.in/89299/> and Diallel analysis as per Griffing, B. (1956) <https://www.publish.csiro.au/bi/pdf/BI9560463>.
Maintained by Nandan Patil. Last updated 4 months ago.
biometricsgeneticsplantbreeding
2.9 match 3 stars 6.08 score 27 scriptsbioc
decoupleR:decoupleR: Ensemble of computational methods to infer biological activities from omics data
Many methods allow us to extract biological activities from omics data using information from prior knowledge resources, reducing the dimensionality for increased statistical power and better interpretability. Here, we present decoupleR, a Bioconductor package containing different statistical methods to extract these signatures within a unified framework. decoupleR allows the user to flexibly test any method with any resource. It incorporates methods that take into account the sign and weight of network interactions. decoupleR can be used with any omic, as long as its features can be linked to a biological process based on prior knowledge. For example, in transcriptomics gene sets regulated by a transcription factor, or in phospho-proteomics phosphosites that are targeted by a kinase.
Maintained by Pau Badia-i-Mompel. Last updated 5 months ago.
differentialexpressionfunctionalgenomicsgeneexpressiongeneregulationnetworksoftwarestatisticalmethodtranscription
1.6 match 230 stars 11.27 score 316 scripts 3 dependentsinsightsengineering
teal.data:Data Model for 'teal' Applications
Provides a 'teal_data' class as a unified data model for 'teal' applications focusing on reproducibility and relational data.
Maintained by Dawid Kaledkowski. Last updated 2 months ago.
1.8 match 11 stars 9.93 score 44 scripts 8 dependentsduncantl
CodeDepends:Analysis of R Code for Reproducible Research and Code Comprehension
Tools for analyzing R expressions or blocks of code and determining the dependencies between them. It focuses on R scripts, but can be used on the bodies of functions. There are many facilities including the ability to summarize or get a high-level view of code, determining dependencies between variables, code improvement suggestions.
Maintained by Gabriel Becker. Last updated 1 years ago.
2.9 match 89 stars 5.87 score 70 scripts 1 dependentscjvanlissa
worcs:Workflow for Open Reproducible Code in Science
Create reproducible and transparent research projects in 'R'. This package is based on the Workflow for Open Reproducible Code in Science (WORCS), a step-by-step procedure based on best practices for Open Science. It includes an 'RStudio' project template, several convenience functions, and all dependencies required to make your project reproducible and transparent. WORCS is explained in the tutorial paper by Van Lissa, Brandmaier, Brinkman, Lamprecht, Struiksma, & Vreede (2021). <doi:10.3233/DS-210031>.
Maintained by Caspar J. Van Lissa. Last updated 11 days ago.
1.8 match 83 stars 9.26 score 59 scriptsbioc
CaMutQC:An R Package for Comprehensive Filtration and Selection of Cancer Somatic Mutations
CaMutQC is able to filter false positive mutations generated due to technical issues, as well as to select candidate cancer mutations through a series of well-structured functions by labeling mutations with various flags. And a detailed and vivid filter report will be offered after completing a whole filtration or selection section. Also, CaMutQC integrates serveral methods and gene panels for Tumor Mutational Burden (TMB) estimation.
Maintained by Xin Wang. Last updated 5 months ago.
softwarequalitycontrolgenetargetcancer-genomicssomatic-mutations
2.8 match 7 stars 5.92 score 1 scriptsrodivinity
mbreaks:Estimation and Inference for Structural Breaks in Linear Regression Models
Functions provide comprehensive treatments for estimating, inferring, testing and model selecting in linear regression models with structural breaks. The tests, estimation methods, inference and information criteria implemented are discussed in Bai and Perron (1998) "Estimating and Testing Linear Models with Multiple Structural Changes" <doi:10.2307/2998540>.
Maintained by Linh Nguyen. Last updated 4 months ago.
4.1 match 4.04 score 11 scriptspsychmeta
psychmeta:Psychometric Meta-Analysis Toolkit
Tools for computing bare-bones and psychometric meta-analyses and for generating psychometric data for use in meta-analysis simulations. Supports bare-bones, individual-correction, and artifact-distribution methods for meta-analyzing correlations and d values. Includes tools for converting effect sizes, computing sporadic artifact corrections, reshaping meta-analytic databases, computing multivariate corrections for range variation, and more. Bugs can be reported to <https://github.com/psychmeta/psychmeta/issues> or <issues@psychmeta.com>.
Maintained by Jeffrey A. Dahlke. Last updated 9 months ago.
hacktoberfestmeta-analysispsychologypsychometricpsychometrics
2.0 match 57 stars 8.25 score 151 scriptstalegari
pkggraph:A Consistent and Intuitive Platform to Explore the Dependencies of Packages on the Comprehensive R Archive Network Like Repositories
Interactively explore various dependencies of a package(s) (on the Comprehensive R Archive Network Like repositories) and perform analysis using tidy philosophy. Most of the functions return a 'tibble' object (enhancement of 'dataframe') which can be used for further analysis. The package offers functions to produce 'network' and 'igraph' dependency graphs. The 'plot' method produces a static plot based on 'ggnetwork' and 'plotd3' function produces an interactive D3 plot based on 'networkD3'.
Maintained by KS Srikanth. Last updated 6 years ago.
3.2 match 9 stars 5.12 score 29 scriptsbioc
HDF5Array:HDF5 datasets as array-like objects in R
The HDF5Array package is an HDF5 backend for DelayedArray objects. It implements the HDF5Array, H5SparseMatrix, H5ADMatrix, and TENxMatrix classes, 4 convenient and memory-efficient array-like containers for representing and manipulating either: (1) a conventional (a.k.a. dense) HDF5 dataset, (2) an HDF5 sparse matrix (stored in CSR/CSC/Yale format), (3) the central matrix of an h5ad file (or any matrix in the /layers group), or (4) a 10x Genomics sparse matrix. All these containers are DelayedArray extensions and thus support all operations (delayed or block-processed) supported by DelayedArray objects.
Maintained by Hervé Pagès. Last updated 26 days ago.
infrastructuredatarepresentationdataimportsequencingrnaseqcoverageannotationgenomeannotationsinglecellimmunooncologybioconductor-packagecore-packageu24ca289073
1.2 match 12 stars 13.19 score 844 scripts 123 dependentsgaospecial
ggVennDiagram:A 'ggplot2' Implement of Venn Diagram
Easy-to-use functions to generate 2-7 sets Venn or upset plot in publication quality. 'ggVennDiagram' plot Venn or upset using well-defined geometry dataset and 'ggplot2'. The shapes of 2-4 sets Venn use circles and ellipses, while the shapes of 4-7 sets Venn use irregular polygons (4 has both forms), which are developed and imported from another package 'venn', authored by Adrian Dusa. We provided internal functions to integrate shape data with user provided sets data, and calculated the geometry of every regions/intersections of them, then separately plot Venn in four components, set edges/labels, and region edges/labels. From version 1.0, it is possible to customize these components as you demand in ordinary 'ggplot2' grammar. From version 1.4.4, it supports unlimited number of sets, as it can draw a plain upset plot automatically when number of sets is more than 7.
Maintained by Chun-Hui Gao. Last updated 5 months ago.
set-operationsupsetupsetplotvenn-diagramvenn-plot
1.2 match 289 stars 12.67 score 1.3k scripts 4 dependentsjacgoldsm
peruse:A Tidy API for Sequence Iteration and Set Comprehension
A friendly API for sequence iteration and set comprehension.
Maintained by Jacob Goldsmith. Last updated 4 years ago.
5.5 match 1 stars 2.70 score 2 scriptsnathanael-g-durst
KrakenR:Comprehensive R Interface for Accessing Kraken Cryptocurrency Exchange REST API
A comprehensive R interface to access data from the Kraken cryptocurrency exchange REST API <https://docs.kraken.com/api/>. It allows users to retrieve various market data, such as asset information, trading pairs, and price data. The package is designed to facilitate efficient data access for analysis, strategy development, and monitoring of cryptocurrency market trends.
Maintained by Nathanaël Dürst. Last updated 3 days ago.
3.3 match 4.48 score 10 scriptsdenironyx
tidycountries:Access and Manipulate Comprehensive Country Level Data in Tidy Format
A comprehensive and user-friendly interface for accessing, manipulating, and analyzing country-level data from around the world. It allows users to retrieve detailed information on countries, including names, regions, continents, populations, currencies, calling codes, and more, all in a tidy data format. The package is designed to work seamlessly within the 'tidyverse' ecosystem, making it easy to filter, arrange, and visualize country-level data in R.
Maintained by Dennis Irorere. Last updated 5 months ago.
3.3 match 9 stars 4.35 score 7 scriptszyang2k
bluebike:Blue Bike Comprehensive Data
Facilitates the importation of the Boston Blue Bike trip data since 2015. Functions include the computation of trip distances of given trip data. It can also map the location of stations within a given radius and calculate the distance to nearby stations. Data is from <https://www.bluebikes.com/system-data>.
Maintained by Ziyue Yang. Last updated 3 years ago.
3.1 match 4 stars 4.60 score 7 scriptssilentspringinstitute
RNHANES:Facilitates Analysis of CDC NHANES Data
Tools for downloading and analyzing CDC NHANES data, with a focus on analytical laboratory data.
Maintained by Herb Susmann. Last updated 2 days ago.
1.8 match 77 stars 7.58 score 83 scriptsdecisionpatterns
na.tools:Comprehensive Library for Working with Missing (NA) Values in Vectors
This comprehensive toolkit provide a consistent and extensible framework for working with missing values in vectors. The companion package 'tidyimpute' provides similar functionality for list-like and table-like structures). Functions exist for detection, removal, replacement, imputation, recollection, etc. of 'NAs'.
Maintained by Christopher Brown. Last updated 6 years ago.
3.4 match 2 stars 4.04 score 109 scriptsbioc
PhosR:A set of methods and tools for comprehensive analysis of phosphoproteomics data
PhosR is a package for the comprenhensive analysis of phosphoproteomic data. There are two major components to PhosR: processing and downstream analysis. PhosR consists of various processing tools for phosphoproteomics data including filtering, imputation, normalisation, and functional analysis for inferring active kinases and signalling pathways.
Maintained by Taiyun Kim. Last updated 5 months ago.
softwareresearchfieldproteomics
2.9 match 4.71 score 51 scriptsbioc
seq.hotSPOT:Targeted sequencing panel design based on mutation hotspots
seq.hotSPOT provides a resource for designing effective sequencing panels to help improve mutation capture efficacy for ultradeep sequencing projects. Using SNV datasets, this package designs custom panels for any tissue of interest and identify the genomic regions likely to contain the most mutations. Establishing efficient targeted sequencing panels can allow researchers to study mutation burden in tissues at high depth without the economic burden of whole-exome or whole-genome sequencing. This tool was developed to make high-depth sequencing panels to study low-frequency clonal mutations in clinically normal and cancerous tissues.
Maintained by Sydney Grant. Last updated 5 months ago.
softwaretechnologysequencingdnaseqwholegenome
3.2 match 4.00 score 3 scriptsozancanozdemir
turkeyelections:The Most Comprehensive R Package for Turkish Election Results
Includes the results of general, local, and presidential elections held in Turkey between 1995 and 2023, broken down by provinces and overall national results. It facilitates easy processing of this data and the creation of visual representations based on these election results.
Maintained by Ozancan Ozdemir. Last updated 9 months ago.
2.9 match 15 stars 4.18 score 1 scriptszzz1990771
geeVerse:A Comprehensive Analysis of High Dimensional Longitudinal Data
To provide a comprehensive analysis of high dimensional longitudinal data,this package provides analysis for any combination of 1) simultaneous variable selection and estimation, 2) mean regression or quantile regression for heterogeneous data, 3) cross-sectional or longitudinal data, 4) balanced or imbalanced data, 5) moderate, high or even ultra-high dimensional data, via computationally efficient implementations of penalized generalized estimating equations.
Maintained by Tianhai Zu. Last updated 4 months ago.
3.4 match 3.30 score 5 scriptsisaakiel
mortAAR:Analysis of Archaeological Mortality Data
A collection of functions for the analysis of archaeological mortality data (on the topic see e.g. Chamberlain 2006 <https://books.google.de/books?id=nG5FoO_becAC&lpg=PA27&ots=LG0b_xrx6O&dq=life%20table%20archaeology&pg=PA27#v=onepage&q&f=false>). It takes demographic data in different formats and displays the result in a standard life table as well as plots the relevant indices (percentage of deaths, survivorship, probability of death, life expectancy, percentage of population). It also checks for possible biases in the age structure and applies corrections to life tables.
Maintained by Nils Mueller-Scheessel. Last updated 2 months ago.
anthropologyarchaeologydemographystatistics
1.5 match 15 stars 7.49 score 23 scriptsbioc
YAPSA:Yet Another Package for Signature Analysis
This package provides functions and routines for supervised analyses of mutational signatures (i.e., the signatures have to be known, cf. L. Alexandrov et al., Nature 2013 and L. Alexandrov et al., Bioaxiv 2018). In particular, the family of functions LCD (LCD = linear combination decomposition) can use optimal signature-specific cutoffs which takes care of different detectability of the different signatures. Moreover, the package provides different sets of mutational signatures, including the COSMIC and PCAWG SNV signatures and the PCAWG Indel signatures; the latter infering that with YAPSA, the concept of supervised analysis of mutational signatures is extended to Indel signatures. YAPSA also provides confidence intervals as computed by profile likelihoods and can perform signature analysis on a stratified mutational catalogue (SMC = stratify mutational catalogue) in order to analyze enrichment and depletion patterns for the signatures in different strata.
Maintained by Zuguang Gu. Last updated 5 months ago.
sequencingdnaseqsomaticmutationvisualizationclusteringgenomicvariationstatisticalmethodbiologicalquestion
1.8 match 6.41 score 57 scriptsmasurp
specr:Conducting and Visualizing Specification Curve Analyses
Provides utilities for conducting specification curve analyses (Simonsohn, Simmons & Nelson (2020, <doi: 10.1038/s41562-020-0912-z>) or multiverse analyses (Steegen, Tuerlinckx, Gelman & Vanpaemel, 2016, <doi: 10.1177/1745691616658637>) including functions to setup, run, evaluate, and plot all specifications.
Maintained by Philipp K. Masur. Last updated 10 months ago.
1.3 match 68 stars 8.02 score 85 scriptscran
GWASinspector:Comprehensive and Easy to Use Quality Control of GWAS Results
When evaluating the results of a genome-wide association study (GWAS), it is important to perform a quality control to ensure that the results are valid, complete, correctly formatted, and, in case of meta-analysis, consistent with other studies that have applied the same analysis. This package was developed to facilitate and streamline this process and provide the user with a comprehensive report.
Maintained by Alireza Ani. Last updated 10 months ago.
5.1 match 2.00 scoreeddelbuettel
digest:Create Compact Hash Digests of R Objects
Implementation of a function 'digest()' for the creation of hash digests of arbitrary R objects (using the 'md5', 'sha-1', 'sha-256', 'crc32', 'xxhash', 'murmurhash', 'spookyhash', 'blake3', 'crc32c', 'xxh3_64', and 'xxh3_128' algorithms) permitting easy comparison of R language objects, as well as functions such as 'hmac()' to create hash-based message authentication code. Please note that this package is not meant to be deployed for cryptographic purposes for which more comprehensive (and widely tested) libraries such as 'OpenSSL' should be used.
Maintained by Dirk Eddelbuettel. Last updated 2 months ago.
0.5 match 114 stars 19.82 score 11k scripts 6.9k dependentsineelhere
clintrialx:Connect and Work with Clinical Trials Data Sources
Are you spending too much time fetching and managing clinical trial data? Struggling with complex queries and bulk data extraction? What if you could simplify this process with just a few lines of code? Introducing 'clintrialx' - Fetch clinical trial data from sources like 'ClinicalTrials.gov' <https://clinicaltrials.gov/> and the 'Clinical Trials Transformation Initiative - Access to Aggregate Content of ClinicalTrials.gov' database <https://aact.ctti-clinicaltrials.org/>, supporting pagination and bulk downloads. Also, you can generate HTML reports based on the data obtained from the sources!
Maintained by Indraneel Chakraborty. Last updated 5 days ago.
aactbioinformaticsclinical-dataclinical-trialsclinicaltrialsgovcttidatadata-managementmedical-informaticsr-languagetrials
1.8 match 15 stars 5.76 score 11 scriptsbioc
abseqR:Reporting and data analysis functionalities for Rep-Seq datasets of antibody libraries
AbSeq is a comprehensive bioinformatic pipeline for the analysis of sequencing datasets generated from antibody libraries and abseqR is one of its packages. abseqR empowers the users of abseqPy (https://github.com/malhamdoosh/abseqPy) with plotting and reporting capabilities and allows them to generate interactive HTML reports for the convenience of viewing and sharing with other researchers. Additionally, abseqR extends abseqPy to compare multiple repertoire analyses and perform further downstream analysis on its output.
Maintained by JiaHong Fong. Last updated 5 months ago.
sequencingvisualizationreportwritingqualitycontrolmultiplecomparison
2.5 match 4.00 score 3 scriptsbioc
scmeth:Functions to conduct quality control analysis in methylation data
Functions to analyze methylation data can be found here. Some functions are relevant for single cell methylation data but most other functions can be used for any methylation data. Highlight of this workflow is the comprehensive quality control report.
Maintained by Divy Kangeyan. Last updated 5 months ago.
dnamethylationqualitycontrolpreprocessingsinglecellimmunooncologybioconductor-packagemethylationsingle-cell-methylation
2.1 match 4.70 score 5 scriptsbioc
GeneExpressionSignature:Gene Expression Signature based Similarity Metric
This package gives the implementations of the gene expression signature and its distance to each. Gene expression signature is represented as a list of genes whose expression is correlated with a biological state of interest. And its distance is defined using a nonparametric, rank-based pattern-matching strategy based on the Kolmogorov-Smirnov statistic. Gene expression signature and its distance can be used to detect similarities among the signatures of drugs, diseases, and biological states of interest.
Maintained by Yang Cao. Last updated 5 months ago.
1.9 match 1 stars 5.00 score 5 scriptsyufeng031
bestridge:A Comprehensive R Package for Best Subset Selection
The bestridge package is designed to provide a one-stand service for users to successfully carry out best ridge regression in various complex situations via the primal dual active set algorithm proposed by Wen, C., Zhang, A., Quan, S. and Wang, X. (2020) <doi:10.18637/jss.v094.i04>. This package allows users to perform the regression, classification, count regression and censored regression for (ultra) high dimensional data, and it also supports advanced usages like group variable selection and nuisance variable selection.
Maintained by Liyuan Hu. Last updated 3 years ago.
4.6 match 2.00 score 6 scriptsbioc
SGCP:SGCP: A semi-supervised pipeline for gene clustering using self-training approach in gene co-expression networks
SGC is a semi-supervised pipeline for gene clustering in gene co-expression networks. SGC consists of multiple novel steps that enable the computation of highly enriched modules in an unsupervised manner. But unlike all existing frameworks, it further incorporates a novel step that leverages Gene Ontology information in a semi-supervised clustering method that further improves the quality of the computed modules.
Maintained by Niloofar AghaieAbiane. Last updated 5 months ago.
geneexpressiongenesetenrichmentnetworkenrichmentsystemsbiologyclassificationclusteringdimensionreductiongraphandnetworkneuralnetworknetworkmrnamicroarrayrnaseqvisualizationbioinformaticsgenecoexpressionnetworkgraphsnetworkclusteringnetworksself-trainingsemi-supervised-learningunsupervised-learning
1.8 match 2 stars 5.12 score 44 scriptsoobianom
r2symbols:Symbols for 'Markdown' and 'Shiny' Application
Direct insertion of over 1000 symbols (e.g. currencies, letters, emojis, arrows, mathematical symbols and so on) into 'Rmarkdown' documents and 'Shiny' applications by incorporating 'HTML' hex codes.
Maintained by Obinna Obianom. Last updated 2 years ago.
1.3 match 11 stars 6.67 score 94 scripts 1 dependentsropensci
magick:Advanced Graphics and Image-Processing in R
Bindings to 'ImageMagick': the most comprehensive open-source image processing library available. Supports many common formats (png, jpeg, tiff, pdf, etc) and manipulations (rotate, scale, crop, trim, flip, blur, etc). All operations are vectorized via the Magick++ STL meaning they operate either on a single frame or a series of frames for working with layers, collages, or animation. In RStudio images are automatically previewed when printed to the console, resulting in an interactive editing environment. The latest version of the package includes a native graphics device for creating in-memory graphics or drawing onto images using pixel coordinates.
Maintained by Jeroen Ooms. Last updated 20 days ago.
image-manipulationimage-processingimagemagickcpp
0.5 match 468 stars 17.31 score 9.0k scripts 256 dependentsquanteda
quanteda:Quantitative Analysis of Textual Data
A fast, flexible, and comprehensive framework for quantitative text analysis in R. Provides functionality for corpus management, creating and manipulating tokens and n-grams, exploring keywords in context, forming and manipulating sparse matrices of documents by features and feature co-occurrences, analyzing keywords, computing feature similarities and distances, applying content dictionaries, applying supervised and unsupervised machine learning, visually representing text and text analyses, and more.
Maintained by Kenneth Benoit. Last updated 2 months ago.
corpusnatural-language-processingquantedatext-analyticsonetbbcpp
0.5 match 851 stars 16.68 score 5.4k scripts 51 dependentswviechtb
metafor:Meta-Analysis Package for R
A comprehensive collection of functions for conducting meta-analyses in R. The package includes functions to calculate various effect sizes or outcome measures, fit equal-, fixed-, random-, and mixed-effects models to such data, carry out moderator and meta-regression analyses, and create various types of meta-analytical plots (e.g., forest, funnel, radial, L'Abbe, Baujat, bubble, and GOSH plots). For meta-analyses of binomial and person-time data, the package also provides functions that implement specialized methods, including the Mantel-Haenszel method, Peto's method, and a variety of suitable generalized linear (mixed-effects) models (i.e., mixed-effects logistic and Poisson regression models). Finally, the package provides functionality for fitting meta-analytic multivariate/multilevel models that account for non-independent sampling errors and/or true effects (e.g., due to the inclusion of multiple treatment studies, multiple endpoints, or other forms of clustering). Network meta-analyses and meta-analyses accounting for known correlation structures (e.g., due to phylogenetic relatedness) can also be conducted. An introduction to the package can be found in Viechtbauer (2010) <doi:10.18637/jss.v036.i03>.
Maintained by Wolfgang Viechtbauer. Last updated 2 days ago.
meta-analysismixed-effectsmultilevel-modelsmultivariate
0.5 match 246 stars 16.30 score 4.9k scripts 92 dependentsspatstat
spatstat:Spatial Point Pattern Analysis, Model-Fitting, Simulation, Tests
Comprehensive open-source toolbox for analysing Spatial Point Patterns. Focused mainly on two-dimensional point patterns, including multitype/marked points, in any spatial region. Also supports three-dimensional point patterns, space-time point patterns in any number of dimensions, point patterns on a linear network, and patterns of other geometrical objects. Supports spatial covariate data such as pixel images. Contains over 3000 functions for plotting spatial data, exploratory data analysis, model-fitting, simulation, spatial sampling, model diagnostics, and formal inference. Data types include point patterns, line segment patterns, spatial windows, pixel images, tessellations, and linear networks. Exploratory methods include quadrat counts, K-functions and their simulation envelopes, nearest neighbour distance and empty space statistics, Fry plots, pair correlation function, kernel smoothed intensity, relative risk estimation with cross-validated bandwidth selection, mark correlation functions, segregation indices, mark dependence diagnostics, and kernel estimates of covariate effects. Formal hypothesis tests of random pattern (chi-squared, Kolmogorov-Smirnov, Monte Carlo, Diggle-Cressie-Loosmore-Ford, Dao-Genton, two-stage Monte Carlo) and tests for covariate effects (Cox-Berman-Waller-Lawson, Kolmogorov-Smirnov, ANOVA) are also supported. Parametric models can be fitted to point pattern data using the functions ppm(), kppm(), slrm(), dppm() similar to glm(). Types of models include Poisson, Gibbs and Cox point processes, Neyman-Scott cluster processes, and determinantal point processes. Models may involve dependence on covariates, inter-point interaction, cluster formation and dependence on marks. Models are fitted by maximum likelihood, logistic regression, minimum contrast, and composite likelihood methods. A model can be fitted to a list of point patterns (replicated point pattern data) using the function mppm(). The model can include random effects and fixed effects depending on the experimental design, in addition to all the features listed above. Fitted point process models can be simulated, automatically. Formal hypothesis tests of a fitted model are supported (likelihood ratio test, analysis of deviance, Monte Carlo tests) along with basic tools for model selection (stepwise(), AIC()) and variable selection (sdr). Tools for validating the fitted model include simulation envelopes, residuals, residual plots and Q-Q plots, leverage and influence diagnostics, partial residuals, and added variable plots.
Maintained by Adrian Baddeley. Last updated 2 months ago.
cluster-processcox-point-processgibbs-processkernel-densitynetwork-analysispoint-processpoisson-processspatial-analysisspatial-dataspatial-data-analysisspatial-statisticsspatstatstatistical-methodsstatistical-modelsstatistical-testsstatistics
0.5 match 200 stars 16.32 score 5.5k scripts 41 dependentsbioc
iSEEu:iSEE Universe
iSEEu (the iSEE universe) contains diverse functionality to extend the usage of the iSEE package, including additional classes for the panels, or modes allowing easy configuration of iSEE applications.
Maintained by Kevin Rue-Albrecht. Last updated 5 months ago.
immunooncologyvisualizationguidimensionreductionfeatureextractionclusteringtranscriptiongeneexpressiontranscriptomicssinglecellcellbasedassayshacktoberfest
1.1 match 9 stars 7.15 score 35 scripts 1 dependentsutsavlamichhane
mbX:A Comprehensive Microbiome Data Processing Pipeline
Provides tools for cleaning, processing, and preparing microbiome sequencing data (e.g., 16S rRNA) for downstream analysis. Supports CSV, TXT, and 'Excel' file formats. The main function, ezclean(), automates microbiome data transformation, including format validation, transposition, numeric conversion, and metadata integration. Also ensures efficient handling of taxonomic levels, resolves duplicated taxa entries, and outputs a well-structured, analysis-ready dataset.
Maintained by Utsav Lamichhane. Last updated 13 days ago.
3.0 match 2.70 scorebioc
biomaRt:Interface to BioMart databases (i.e. Ensembl)
In recent years a wealth of biological data has become available in public data repositories. Easy access to these valuable data resources and firm integration with data analysis is needed for comprehensive bioinformatics data analysis. biomaRt provides an interface to a growing collection of databases implementing the BioMart software suite (<http://www.biomart.org>). The package enables retrieval of large amounts of data in a uniform way without the need to know the underlying database schemas or write complex SQL queries. The most prominent examples of BioMart databases are maintain by Ensembl, which provides biomaRt users direct access to a diverse set of data and enables a wide range of powerful online queries from gene annotation to database mining.
Maintained by Mike Smith. Last updated 2 days ago.
annotationbioconductorbiomartensembl
0.5 match 38 stars 15.99 score 13k scripts 230 dependentsandriyprotsak5
UAHDataScienceUC:Learn Clustering Techniques Through Examples and Code
A comprehensive educational package combining clustering algorithms with detailed step-by-step explanations. Provides implementations of both traditional (hierarchical, k-means) and modern (Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Gaussian Mixture Models (GMM), genetic k-means) clustering methods as described in Ezugwu et. al., (2022) <doi:10.1016/j.engappai.2022.104743>. Includes educational datasets highlighting different clustering challenges, based on 'scikit-learn' examples (Pedregosa et al., 2011) <https://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html>. Features detailed algorithm explanations, visualizations, and weighted distance calculations for enhanced learning.
Maintained by Andriy Protsak Protsak. Last updated 27 days ago.
3.5 match 2.30 scoregokmenzararsiz
dtComb:Statistical Combination of Diagnostic Tests
A system for combining two diagnostic tests using various approaches that include statistical and machine-learning-based methodologies. These approaches are divided into four groups: linear combination methods, non-linear combination methods, mathematical operators, and machine learning algorithms. See the <https://biotools.erciyes.edu.tr/dtComb/> website for more information, documentation, and examples.
Maintained by Gokmen Zararsiz. Last updated 5 months ago.
1.7 match 4.70 score 7 scriptsecor
geotopbricks:An R Plug-in for the Distributed Hydrological Model GEOtop
It analyzes raster maps and other information as input/output files from the Hydrological Distributed Model GEOtop. It contains functions and methods to import maps and other keywords from geotop.inpts file. Some examples with simulation cases of GEOtop 2.x/3.x are presented in the package. Any information about the GEOtop Distributed Hydrological Model source code is available on www.geotop.org. Technical details about the model are available in Endrizzi et al (2014) <https://gmd.copernicus.org/articles/7/2831/2014/gmd-7-2831-2014.html>.
Maintained by Emanuele Cordano. Last updated 2 months ago.
1.6 match 4 stars 4.83 score 112 scriptsmichael-cw
susographql:Comprehensive Interface to the Survey Solutions 'GraphQL' API
Provides a complete suite of tools for interacting with the Survey Solutions 'GraphQL' API <https://demo.mysurvey.solutions/graphql/>. This package encompasses all currently available queries and mutations, including the latest features for map uploads. It is built on the modern 'httr2' package, offering a streamlined and efficient interface without relying on external 'GraphQL' client packages. In addition to core API functionalities, the package includes a range of helper functions designed to facilitate the use of available query filters.
Maintained by Michael Wild. Last updated 1 years ago.
2.9 match 2.70 score 4 scriptsjacobkap
crimeutils:A Comprehensive Set of Functions to Clean, Analyze, and Present Crime Data
A collection of functions that make it easier to understand crime (or other) data, and assist others in understanding it. The package helps you read data from various sources, clean it, fix column names, and graph the data.
Maintained by Jacob Kaplan. Last updated 2 years ago.
2.8 match 1 stars 2.78 score 12 scriptschaoliu-cl
conversim:Conversation Similarity Analysis Package
A comprehensive toolkit for analyzing and comparing conversations. This package provides functions to calculate various similarity measures between conversations, including topic, lexical, semantic, structural, stylistic, sentiment, participant, and timing similarities. It supports both pairwise conversation comparisons and analysis of multiple dyads.
Maintained by Chao Liu. Last updated 6 months ago.
1.8 match 4.30 score 10 scriptscoffeemuggler
eseis:Environmental Seismology Toolbox
Environmental seismology is a scientific field that studies the seismic signals, emitted by Earth surface processes. This package provides all relevant functions to read/write seismic data files, prepare, analyse and visualise seismic data, and generate reports of the processing history.
Maintained by Michael Dietze. Last updated 4 months ago.
1.7 match 9 stars 4.42 score 58 scriptsbioc
bugsigdbr:R-side access to published microbial signatures from BugSigDB
The bugsigdbr package implements convenient access to bugsigdb.org from within R/Bioconductor. The goal of the package is to facilitate import of BugSigDB data into R/Bioconductor, provide utilities for extracting microbe signatures, and enable export of the extracted signatures to plain text files in standard file formats such as GMT.
Maintained by Ludwig Geistlinger. Last updated 9 days ago.
dataimportgenesetenrichmentmetagenomicsmicrobiomebioconductor-package
1.2 match 3 stars 6.46 score 48 scriptsbioc
ENmix:Quality control and analysis tools for Illumina DNA methylation BeadChip
Tools for quanlity control, analysis and visulization of Illumina DNA methylation array data.
Maintained by Zongli Xu. Last updated 2 days ago.
dnamethylationpreprocessingqualitycontroltwochannelmicroarrayonechannelmethylationarraybatcheffectnormalizationdataimportregressionprincipalcomponentepigeneticsmultichanneldifferentialmethylationimmunooncology
1.3 match 6.01 score 115 scriptsr-lib
clock:Date-Time Types and Tools
Provides a comprehensive library for date-time manipulations using a new family of orthogonal date-time classes (durations, time points, zoned-times, and calendars) that partition responsibilities so that the complexities of time zones are only considered when they are really needed. Capabilities include: date-time parsing, formatting, arithmetic, extraction and updating of components, and rounding.
Maintained by Davis Vaughan. Last updated 2 days ago.
0.5 match 106 stars 14.48 score 296 scripts 407 dependentsmy-jiang
vamc:A Monte Carlo Valuation Framework for Variable Annuities
Implementation of a Monte Carlo simulation engine for valuing synthetic portfolios of variable annuities, which reflect realistic features of common annuity contracts in practice. It aims to facilitate the development and dissemination of research related to the efficient valuation of a portfolio of large variable annuities. The main valuation methodology was proposed by Gan (2017) <doi:10.1515/demo-2017-0021>.
Maintained by Mingyi Jiang. Last updated 5 years ago.
2.9 match 1 stars 2.45 score 28 scriptsveronica0206
nlpsem:Linear and Nonlinear Longitudinal Process in Structural Equation Modeling Framework
Provides computational tools for nonlinear longitudinal models, in particular the intrinsically nonlinear models, in four scenarios: (1) univariate longitudinal processes with growth factors, with or without covariates including time-invariant covariates (TICs) and time-varying covariates (TVCs); (2) multivariate longitudinal processes that facilitate the assessment of correlation or causation between multiple longitudinal variables; (3) multiple-group models for scenarios (1) and (2) to evaluate differences among manifested groups, and (4) longitudinal mixture models for scenarios (1) and (2), with an assumption that trajectories are from multiple latent classes. The methods implemented are introduced in Jin Liu (2023) <arXiv:2302.03237v2>.
Maintained by Jin Liu. Last updated 4 months ago.
1.0 match 145 stars 6.91 score 16 scriptsfalafel19
AutoPipe:Automated Transcriptome Classifier Pipeline: Comprehensive Transcriptome Analysis
An unsupervised fully-automated pipeline for transcriptome analysis or a supervised option to identify characteristic genes from predefined subclasses. We rely on the 'pamr' <http://www.bioconductor.org/packages//2.7/bioc/html/pamr.html> clustering algorithm to cluster the Data and then draw a heatmap of the clusters with the most significant genes and the least significant genes according to the 'pamr' algorithm. This way we get easy to grasp heatmaps that show us for each cluster which are the clusters most defining genes.
Maintained by Karam Daka. Last updated 6 years ago.
2.9 match 2.48 scorecran
UAHDataScienceUC:Learn Clustering Techniques Through Examples and Code
A comprehensive educational package combining clustering algorithms with detailed step-by-step explanations. Provides implementations of both traditional (hierarchical, k-means) and modern (Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Gaussian Mixture Models (GMM), genetic k-means) clustering methods as described in Ezugwu et. al., (2022) <doi:10.1016/j.engappai.2022.104743>. Includes educational datasets highlighting different clustering challenges, based on 'scikit-learn' examples (Pedregosa et al., 2011) <https://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html>. Features detailed algorithm explanations, visualizations, and weighted distance calculations for enhanced learning.
Maintained by Andriy Protsak Protsak. Last updated 27 days ago.
3.5 match 2.00 scorecran
rPDBapi:A Comprehensive Interface for Accessing the Protein Data Bank
Streamlines the interaction with the 'RCSB' Protein Data Bank ('PDB') <https://www.rcsb.org/>. This interface offers an intuitive and powerful tool for searching and retrieving a diverse range of data types from the 'PDB'. It includes advanced functionalities like BLAST and sequence motif queries. Built upon the existing XML-based API of the 'PDB', it simplifies the creation of custom requests, thereby enhancing usability and flexibility for researchers.
Maintained by Selcuk Korkmaz. Last updated 5 months ago.
2.9 match 2.40 score 2 scriptsbioc
CINdex:Chromosome Instability Index
The CINdex package addresses important area of high-throughput genomic analysis. It allows the automated processing and analysis of the experimental DNA copy number data generated by Affymetrix SNP 6.0 arrays or similar high throughput technologies. It calculates the chromosome instability (CIN) index that allows to quantitatively characterize genome-wide DNA copy number alterations as a measure of chromosomal instability. This package calculates not only overall genomic instability, but also instability in terms of copy number gains and losses separately at the chromosome and cytoband level.
Maintained by Yuriy Gusev. Last updated 5 months ago.
softwarecopynumbervariationgenomicvariationacghmicroarraygeneticssequencing
1.7 match 4.08 score 2 scriptsbioc
wavClusteR:Sensitive and highly resolved identification of RNA-protein interaction sites in PAR-CLIP data
The package provides an integrated pipeline for the analysis of PAR-CLIP data. PAR-CLIP-induced transitions are first discriminated from sequencing errors, SNPs and additional non-experimental sources by a non- parametric mixture model. The protein binding sites (clusters) are then resolved at high resolution and cluster statistics are estimated using a rigorous Bayesian framework. Post-processing of the results, data export for UCSC genome browser visualization and motif search analysis are provided. In addition, the package allows to integrate RNA-Seq data to estimate the False Discovery Rate of cluster detection. Key functions support parallel multicore computing. Note: while wavClusteR was designed for PAR-CLIP data analysis, it can be applied to the analysis of other NGS data obtained from experimental procedures that induce nucleotide substitutions (e.g. BisSeq).
Maintained by Federico Comoglio. Last updated 5 months ago.
immunooncologysequencingtechnologyripseqrnaseqbayesian
1.5 match 4.60 score 3 scriptsmiraisolutions
XLConnect:Excel Connector for R
Provides comprehensive functionality to read, write and format Excel data.
Maintained by Martin Studer. Last updated 18 days ago.
cross-platformexcelr-languagexlconnectopenjdk
0.6 match 130 stars 12.28 score 1.2k scripts 1 dependentsocheab
FormulR:Comprehensive Tools for Drug Formulation Analysis and Visualization
This presents a comprehensive set of tools for the analysis and visualization of drug formulation data. It includes functions for statistical analysis, regression modeling, hypothesis testing, and comparative analysis to assess the impact of formulation parameters on drug release and other critical attributes. Additionally, the package offers a variety of data visualization functions, such as scatterplots, histograms, and boxplots, to facilitate the interpretation of formulation data. With its focus on usability and efficiency, this package aims to streamline the drug formulation process and aid researchers in making informed decisions during formulation design and optimization.
Maintained by Oche Ambrose George. Last updated 12 months ago.
3.4 match 2.00 scorelcef97
SchoolDataIT:Retrieve, Harmonise and Map Open Data Regarding the Italian School System
Compiles and displays the available data sets regarding the Italian school system, with a focus on the infrastructural aspects. Input datasets are downloaded from the web, with the aim of updating everything to real time. The functions are divided in four main modules, namely 'Get', to scrape raw data from the web 'Util', various utilities needed to process raw data 'Group', to aggregate data at the municipality or province level 'Map', to visualize the output datasets.
Maintained by Leonardo Cefalo. Last updated 2 months ago.
1.8 match 3.88 scorekosukehamazaki
RAINBOWR:Genome-Wide Association Study with SNP-Set Methods
By using 'RAINBOWR' (Reliable Association INference By Optimizing Weights with R), users can test multiple SNPs (Single Nucleotide Polymorphisms) simultaneously by kernel-based (SNP-set) methods. This package can also be applied to haplotype-based GWAS (Genome-Wide Association Study). Users can test not only additive effects but also dominance and epistatic effects. In detail, please check our paper on PLOS Computational Biology: Kosuke Hamazaki and Hiroyoshi Iwata (2020) <doi:10.1371/journal.pcbi.1007663>.
Maintained by Kosuke Hamazaki. Last updated 3 months ago.
1.1 match 22 stars 5.99 score 22 scriptsbioc
scDblFinder:scDblFinder
The scDblFinder package gathers various methods for the detection and handling of doublets/multiplets in single-cell sequencing data (i.e. multiple cells captured within the same droplet or reaction volume). It includes methods formerly found in the scran package, the new fast and comprehensive scDblFinder method, and a reimplementation of the Amulet detection method for single-cell ATAC-seq.
Maintained by Pierre-Luc Germain. Last updated 2 months ago.
preprocessingsinglecellrnaseqatacseqdoubletssingle-cell
0.5 match 184 stars 12.34 score 888 scripts 1 dependentsrsquaredacademy
olsrr:Tools for Building OLS Regression Models
Tools designed to make it easier for users, particularly beginner/intermediate R users to build ordinary least squares regression models. Includes comprehensive regression output, heteroskedasticity tests, collinearity diagnostics, residual diagnostics, measures of influence, model fit assessment and variable selection procedures.
Maintained by Aravind Hebbali. Last updated 4 months ago.
collinearity-diagnosticslinear-modelsregressionstepwise-regression
0.5 match 103 stars 12.19 score 1.4k scripts 4 dependentsropensci
rotl:Interface to the 'Open Tree of Life' API
An interface to the 'Open Tree of Life' API to retrieve phylogenetic trees, information about studies used to assemble the synthetic tree, and utilities to match taxonomic names to 'Open Tree identifiers'. The 'Open Tree of Life' aims at assembling a comprehensive phylogenetic tree for all named species.
Maintained by Francois Michonneau. Last updated 2 years ago.
metadataropensciphylogeneticsindependant-contrastsbiodiversitypeer-reviewedphylogenytaxonomy
0.5 match 40 stars 12.05 score 356 scripts 29 dependentsbioc
iSEEhex:iSEE extension for summarising data points in hexagonal bins
This package provides panels summarising data points in hexagonal bins for `iSEE`. It is part of `iSEEu`, the iSEE universe of panels that extend the `iSEE` package.
Maintained by Kevin Rue-Albrecht. Last updated 5 months ago.
softwareinfrastructurebioconductoriseeushiny-r
1.1 match 5.38 score 7 scripts 2 dependentsguido-s
netmeta:Network Meta-Analysis using Frequentist Methods
A comprehensive set of functions providing frequentist methods for network meta-analysis (Balduzzi et al., 2023) <doi:10.18637/jss.v106.i02> and supporting Schwarzer et al. (2015) <doi:10.1007/978-3-319-21416-0>, Chapter 8 "Network Meta-Analysis": - frequentist network meta-analysis following Rücker (2012) <doi:10.1002/jrsm.1058>; - additive network meta-analysis for combinations of treatments (Rücker et al., 2020) <doi:10.1002/bimj.201800167>; - network meta-analysis of binary data using the Mantel-Haenszel or non-central hypergeometric distribution method (Efthimiou et al., 2019) <doi:10.1002/sim.8158>, or penalised logistic regression (Evrenoglou et al., 2022) <doi:10.1002/sim.9562>; - rankograms and ranking of treatments by the Surface under the cumulative ranking curve (SUCRA) (Salanti et al., 2013) <doi:10.1016/j.jclinepi.2010.03.016>; - ranking of treatments using P-scores (frequentist analogue of SUCRAs without resampling) according to Rücker & Schwarzer (2015) <doi:10.1186/s12874-015-0060-8>; - split direct and indirect evidence to check consistency (Dias et al., 2010) <doi:10.1002/sim.3767>, (Efthimiou et al., 2019) <doi:10.1002/sim.8158>; - league table with network meta-analysis results; - 'comparison-adjusted' funnel plot (Chaimani & Salanti, 2012) <doi:10.1002/jrsm.57>; - net heat plot and design-based decomposition of Cochran's Q according to Krahn et al. (2013) <doi:10.1186/1471-2288-13-35>; - measures characterizing the flow of evidence between two treatments by König et al. (2013) <doi:10.1002/sim.6001>; - automated drawing of network graphs described in Rücker & Schwarzer (2016) <doi:10.1002/jrsm.1143>; - partial order of treatment rankings ('poset') and Hasse diagram for 'poset' (Carlsen & Bruggemann, 2014) <doi:10.1002/cem.2569>; (Rücker & Schwarzer, 2017) <doi:10.1002/jrsm.1270>; - contribution matrix as described in Papakonstantinou et al. (2018) <doi:10.12688/f1000research.14770.3> and Davies et al. (2022) <doi:10.1002/sim.9346>; - subgroup network meta-analysis.
Maintained by Guido Schwarzer. Last updated 2 days ago.
meta-analysisnetwork-meta-analysisrstudio
0.5 match 33 stars 11.82 score 199 scripts 10 dependentssrivastavbudugutta
tvtools:Comprehensive Tools for Panel Data Analysis - 'tvtools'
Longitudinal data offers insights into population changes over time but often requires a flexible structure, especially with varying follow-up intervals. Panel data is one way to store such records, though it adds complexity to analysis. The 'tvtools' package for R simplifies exploring and analyzing panel data.
Maintained by Srivastav Budugutta. Last updated 5 months ago.
2.9 match 2.00 score 4 scriptsgreen-striped-gecko
PopGenReport:A Simple Framework to Analyse Population and Landscape Genetic Data
Provides beginner friendly framework to analyse population genetic data. Based on 'adegenet' objects it uses 'knitr' to create comprehensive reports on spatial genetic data. For detailed information how to use the package refer to the comprehensive tutorials or visit <http://www.popgenreport.org/>.
Maintained by Bernd Gruber. Last updated 1 years ago.
0.8 match 5 stars 7.27 score 82 scripts 1 dependentslbaole17
superdiag:A Comprehensive Test Suite for Testing Markov Chain Nonconvergence
The 'superdiag' package provides a comprehensive test suite for testing Markov Chain nonconvergence. It integrates five standard empirical MCMC convergence diagnostics (Gelman-Rubin, Geweke, Heidelberger-Welch, Raftery-Lewis, and Hellinger distance) and plotting functions for trace plots and density histograms. The functions of the package can be used to present all diagnostic statistics and graphs at once for conveniently checking MCMC nonconvergence.
Maintained by Le Bao. Last updated 4 years ago.
3.4 match 1.68 score 16 scripts 1 dependentsbioc
Maaslin2:"Multivariable Association Discovery in Population-scale Meta-omics Studies"
MaAsLin2 is comprehensive R package for efficiently determining multivariable association between clinical metadata and microbial meta'omic features. MaAsLin2 relies on general linear models to accommodate most modern epidemiological study designs, including cross-sectional and longitudinal, and offers a variety of data exploration, normalization, and transformation methods. MaAsLin2 is the next generation of MaAsLin.
Maintained by Lauren McIver. Last updated 5 months ago.
metagenomicssoftwaremicrobiomenormalizationbiobakerybioconductordifferential-abundance-analysisfalse-discovery-ratemultiple-covariatespublicrepeated-measurestools
0.5 match 133 stars 11.03 score 532 scripts 3 dependentsr-lib
tzdb:Time Zone Database Information
Provides an up-to-date copy of the Internet Assigned Numbers Authority (IANA) Time Zone Database. It is updated periodically to reflect changes made by political bodies to time zone boundaries, UTC offsets, and daylight saving time rules. Additionally, this package provides a C++ interface for working with the 'date' library. 'date' provides comprehensive support for working with dates and date-times, which this package exposes to make it easier for other R packages to utilize. Headers are provided for calendar specific calculations, along with a limited interface for time zone manipulations.
Maintained by Davis Vaughan. Last updated 2 days ago.
0.5 match 7 stars 10.90 score 38 scripts 2.4k dependentsazure
AzureGraph:Simple Interface to 'Microsoft Graph'
A simple interface to the 'Microsoft Graph' API <https://learn.microsoft.com/en-us/graph/overview>. 'Graph' is a comprehensive framework for accessing data in various online Microsoft services. This package was originally intended to provide an R interface only to the 'Azure Active Directory' part, with a view to supporting interoperability of R and 'Azure': users, groups, registered apps and service principals. However it has since been expanded into a more general tool for interacting with Graph. Part of the 'AzureR' family of packages.
Maintained by Hong Ooi. Last updated 2 years ago.
azure-active-directory-graph-apiazure-sdk-rmicrosoft-graph-api
0.5 match 32 stars 10.30 score 36 scripts 21 dependentstarnduong
ks:Kernel Smoothing
Kernel smoothers for univariate and multivariate data, with comprehensive visualisation and bandwidth selection capabilities, including for densities, density derivatives, cumulative distributions, clustering, classification, density ridges, significant modal regions, and two-sample hypothesis tests. Chacon & Duong (2018) <doi:10.1201/9780429485572>.
Maintained by Tarn Duong. Last updated 6 months ago.
0.5 match 6 stars 10.14 score 920 scripts 262 dependentsfang-zhaoyuan
GANPAdata:The GANPA Datasets Package
This is a dataset package for GANPA, which implements a network-based gene weighting approach to pathway analysis. This package includes data useful for GANPA, such as a functional association network, pathways, an expression dataset and multi-subunit proteins.
Maintained by Zhaoyuan Fang. Last updated 14 years ago.
3.5 match 1.48 score 7 scripts 1 dependentsnanxstats
protr:Generating Various Numerical Representation Schemes for Protein Sequences
Comprehensive toolkit for generating various numerical features of protein sequences described in Xiao et al. (2015) <DOI:10.1093/bioinformatics/btv042>. For full functionality, the software 'ncbi-blast+' is needed, see <https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html> for more information.
Maintained by Nan Xiao. Last updated 6 months ago.
bioinformaticsfeature-engineeringfeature-extractionmachine-learningpeptidesprotein-sequencessequence-analysis
0.5 match 52 stars 10.02 score 173 scripts 3 dependentsegeulgen
pathfindR:Enrichment Analysis Utilizing Active Subnetworks
Enrichment analysis enables researchers to uncover mechanisms underlying a phenotype. However, conventional methods for enrichment analysis do not take into account protein-protein interaction information, resulting in incomplete conclusions. 'pathfindR' is a tool for enrichment analysis utilizing active subnetworks. The main function identifies active subnetworks in a protein-protein interaction network using a user-provided list of genes and associated p values. It then performs enrichment analyses on the identified subnetworks, identifying enriched terms (i.e. pathways or, more broadly, gene sets) that possibly underlie the phenotype of interest. 'pathfindR' also offers functionalities to cluster the enriched terms and identify representative terms in each cluster, to score the enriched terms per sample and to visualize analysis results. The enrichment, clustering and other methods implemented in 'pathfindR' are described in detail in Ulgen E, Ozisik O, Sezerman OU. 2019. 'pathfindR': An R Package for Comprehensive Identification of Enriched Pathways in Omics Data Through Active Subnetworks. Front. Genet. <doi:10.3389/fgene.2019.00858>.
Maintained by Ege Ulgen. Last updated 28 days ago.
active-subnetworksenrichmentpathwaypathway-enrichment-analysissubnetwork
0.5 match 186 stars 10.13 score 138 scriptsazure
AzureRMR:Interface to 'Azure Resource Manager'
A lightweight but powerful R interface to the 'Azure Resource Manager' REST API. The package exposes a comprehensive class framework and related tools for creating, updating and deleting 'Azure' resource groups, resources and templates. While 'AzureRMR' can be used to manage any 'Azure' service, it can also be extended by other packages to provide extra functionality for specific services. Part of the 'AzureR' family of packages.
Maintained by Hong Ooi. Last updated 1 years ago.
azureazure-resource-managerazure-sdk-rcloud
0.5 match 20 stars 9.94 score 51 scripts 12 dependentsimmunomind
immunarch:Bioinformatics Analysis of T-Cell and B-Cell Immune Repertoires
A comprehensive framework for bioinformatics exploratory analysis of bulk and single-cell T-cell receptor and antibody repertoires. It provides seamless data loading, analysis and visualisation for AIRR (Adaptive Immune Receptor Repertoire) data, both bulk immunosequencing (RepSeq) and single-cell sequencing (scRNAseq). Immunarch implements most of the widely used AIRR analysis methods, such as: clonality analysis, estimation of repertoire similarities in distribution of clonotypes and gene segments, repertoire diversity analysis, annotation of clonotypes using external immune receptor databases and clonotype tracking in vaccination and cancer studies. A successor to our previously published 'tcR' immunoinformatics package (Nazarov 2015) <doi:10.1186/s12859-015-0613-1>.
Maintained by Vadim I. Nazarov. Last updated 12 months ago.
airr-analysisb-cell-receptorbcrbcr-repertoirebioinformaticsigig-repertoireimmune-repertoireimmune-repertoire-analysisimmune-repertoire-dataimmunoglobulinimmunoinformaticsimmunologyrep-seqrepertoire-analysissingle-cellsingle-cell-analysist-cell-receptortcrtcr-repertoirecpp
0.5 match 315 stars 9.49 score 203 scriptsbioc
RBGL:An interface to the BOOST graph library
A fairly extensive and comprehensive interface to the graph algorithms contained in the BOOST library.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
0.6 match 8.59 score 320 scripts 132 dependentsbupaverse
bupaR:Business Process Analysis in R
Comprehensive Business Process Analysis toolkit. Creates S3-class for event log objects, and related handler functions. Imports related packages for filtering event data, computation of descriptive statistics, handling of 'Petri Net' objects and visualization of process maps. See also packages 'edeaR','processmapR', 'eventdataR' and 'processmonitR'.
Maintained by Gert Janssenswillen. Last updated 2 years ago.
0.5 match 55 stars 9.07 score 389 scripts 11 dependentsbluefoxr
COINr:Composite Indicator Construction and Analysis
A comprehensive high-level package, for composite indicator construction and analysis. It is a "development environment" for composite indicators and scoreboards, which includes utilities for construction (indicator selection, denomination, imputation, data treatment, normalisation, weighting and aggregation) and analysis (multivariate analysis, correlation plotting, short cuts for principal component analysis, global sensitivity analysis, and more). A composite indicator is completely encapsulated inside a single hierarchical list called a "coin". This allows a fast and efficient work flow, as well as making quick copies, testing methodological variations and making comparisons. It also includes many plotting options, both statistical (scatter plots, distribution plots) as well as for presenting results.
Maintained by William Becker. Last updated 2 months ago.
0.5 match 26 stars 9.07 score 73 scripts 1 dependentswjakethompson
taylor:Lyrics and Song Data for Taylor Swift's Discography
A comprehensive resource for data on Taylor Swift songs. Data is included for all officially released studio albums, extended plays (EPs), and individual singles are included. Data comes from 'Genius' (lyrics) and 'Spotify' (song characteristics). Additional functions are included for easily creating data visualizations with color palettes inspired by Taylor Swift's album covers.
Maintained by W. Jake Thompson. Last updated 1 months ago.
color-palettesdatagenius-lyricsggplot2-themeslyricsspotifyspotify-apitaylor-swift
0.5 match 45 stars 8.79 score 105 scriptsmagnusdv
pedtools:Creating and Working with Pedigrees and Marker Data
A comprehensive collection of tools for creating, manipulating and visualising pedigrees and genetic marker data. Pedigrees can be read from text files or created on the fly with built-in functions. A range of utilities enable modifications like adding or removing individuals, breaking loops, and merging pedigrees. An online tool for creating pedigrees interactively, based on 'pedtools', is available at <https://magnusdv.shinyapps.io/quickped>. 'pedtools' is the hub of the 'pedsuite', a collection of packages for pedigree analysis. A detailed presentation of the 'pedsuite' is given in the book 'Pedigree Analysis in R' (Vigeland, 2021, ISBN:9780128244302).
Maintained by Magnus Dehli Vigeland. Last updated 2 months ago.
0.5 match 25 stars 8.83 score 60 scripts 18 dependentseddelbuettel
RQuantLib:R Interface to the 'QuantLib' Library
The 'RQuantLib' package makes parts of 'QuantLib' accessible from R The 'QuantLib' project aims to provide a comprehensive software framework for quantitative finance. The goal is to provide a standard open source library for quantitative analysis, modeling, trading, and risk management of financial assets.
Maintained by Dirk Eddelbuettel. Last updated 2 months ago.
0.5 match 123 stars 8.52 score 194 scriptsjamesliley
SPARRAfairness:Analysis of Differential Behaviour of SPARRA Score Across Demographic Groups
The SPARRA risk score (Scottish Patients At Risk of admission and Re-Admission) estimates yearly risk of emergency hospital admission using electronic health records on a monthly basis for most of the Scottish population. This package implements a suite of functions used to analyse the behaviour and performance of the score, focusing particularly on differential performance over demographically-defined groups. It includes useful utility functions to plot receiver-operator-characteristic, precision-recall and calibration curves, draw stock human figures, estimate counterfactual quantities without the need to re-compute risk scores, to simulate a semi-realistic dataset.
Maintained by James Liley. Last updated 4 months ago.
1.6 match 2.70 score 4 scriptsbioc
SPIAT:Spatial Image Analysis of Tissues
SPIAT (**Sp**atial **I**mage **A**nalysis of **T**issues) is an R package with a suite of data processing, quality control, visualization and data analysis tools. SPIAT is compatible with data generated from single-cell spatial proteomics platforms (e.g. OPAL, CODEX, MIBI, cellprofiler). SPIAT reads spatial data in the form of X and Y coordinates of cells, marker intensities and cell phenotypes. SPIAT includes six analysis modules that allow visualization, calculation of cell colocalization, categorization of the immune microenvironment relative to tumor areas, analysis of cellular neighborhoods, and the quantification of spatial heterogeneity, providing a comprehensive toolkit for spatial data analysis.
Maintained by Yuzhou Feng. Last updated 16 hours ago.
biomedicalinformaticscellbiologyspatialclusteringdataimportimmunooncologyqualitycontrolsinglecellsoftwarevisualization
0.5 match 22 stars 8.59 score 69 scriptsaxlehner
SpatialRDD:Conduct Multiple Types of Geographic Regression Discontinuity Designs
Spatial versions of Regression Discontinuity Designs (RDDs) are becoming increasingly popular as tools for causal inference. However, conducting state-of-the-art analyses often involves tedious and time-consuming steps. This package offers comprehensive functionalities for executing all required spatial and econometric tasks in a streamlined manner. Moreover, it equips researchers with tools for performing essential placebo and balancing checks comprehensively. The fact that researchers do not have to rely on 'APIs' of external 'GIS' software ensures replicability and raises the standard for spatial RDDs.
Maintained by Alexander Lehner. Last updated 12 months ago.
0.8 match 37 stars 5.57 score 8 scriptsbioc
UniProt.ws:R Interface to UniProt Web Services
The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. This package provides a collection of functions for retrieving, processing, and re-packaging UniProt web services. The package makes use of UniProt's modernized REST API and allows mapping of identifiers accross different databases.
Maintained by Marcel Ramos. Last updated 2 months ago.
annotationinfrastructuregokeggbiocartabioconductor-packagecore-package
0.5 match 4 stars 8.38 score 167 scripts 4 dependentsasa12138
MetaNet:Network Analysis for Omics Data
Comprehensive network analysis package. Calculate correlation network fastly, accelerate lots of analysis by parallel computing. Support for multi-omics data, search sub-nets fluently. Handle bigger data, more than 10,000 nodes in each omics. Offer various layout method for multi-omics network and some interfaces to other software ('Gephi', 'Cytoscape', 'ggplot2'), easy to visualize. Provide comprehensive topology indexes calculation, including ecological network stability.
Maintained by Chen Peng. Last updated 11 days ago.
dataimportnetwork analysisomicssoftwarevisualization
0.8 match 13 stars 5.51 score 9 scriptsinsightsengineering
chevron:Standard TLGs for Clinical Trials Reporting
Provide standard tables, listings, and graphs (TLGs) libraries used in clinical trials. This package implements a structure to reformat the data with 'dunlin', create reporting tables using 'rtables' and 'tern' with standardized input arguments to enable quick generation of standard outputs. In addition, it also provides comprehensive data checks and script generation functionality.
Maintained by Joe Zhu. Last updated 24 days ago.
clinical-trialsgraphslistingsnestreportingtables
0.5 match 12 stars 8.24 score 12 scriptsax3man
phylopath:Perform Phylogenetic Path Analysis
A comprehensive and easy to use R implementation of confirmatory phylogenetic path analysis as described by Von Hardenberg and Gonzalez-Voyer (2012) <doi:10.1111/j.1558-5646.2012.01790.x>.
Maintained by Wouter van der Bijl. Last updated 6 months ago.
analysiscomparative-methodspathphylogenetics
0.5 match 13 stars 8.10 score 81 scripts 1 dependentsrrrlw
TDAstats:Pipeline for Topological Data Analysis
A comprehensive toolset for any useR conducting topological data analysis, specifically via the calculation of persistent homology in a Vietoris-Rips complex. The tools this package currently provides can be conveniently split into three main sections: (1) calculating persistent homology; (2) conducting statistical inference on persistent homology calculations; (3) visualizing persistent homology and statistical inference. The published form of TDAstats can be found in Wadhwa et al. (2018) <doi:10.21105/joss.00860>. For a general background on computing persistent homology for topological data analysis, see Otter et al. (2017) <doi:10.1140/epjds/s13688-017-0109-5>. To learn more about how the permutation test is used for nonparametric statistical inference in topological data analysis, read Robinson & Turner (2017) <doi:10.1007/s41468-017-0008-7>. To learn more about how TDAstats calculates persistent homology, you can visit the GitHub repository for Ripser, the software that works behind the scenes at <https://github.com/Ripser/ripser>. This package has been published as Wadhwa et al. (2018) <doi:10.21105/joss.00860>.
Maintained by Raoul Wadhwa. Last updated 3 years ago.
data-scienceggplot2homologyhomology-calculationshomology-computationjosspersistent-homologypipelineripsertdatopological-data-analysistopologytopology-visualizationvisualizationcpp
0.5 match 40 stars 8.30 score 46 scripts 4 dependentsbioc
POMA:Tools for Omics Data Analysis
The POMA package offers a comprehensive toolkit designed for omics data analysis, streamlining the process from initial visualization to final statistical analysis. Its primary goal is to simplify and unify the various steps involved in omics data processing, making it more accessible and manageable within a single, intuitive R package. Emphasizing on reproducibility and user-friendliness, POMA leverages the standardized SummarizedExperiment class from Bioconductor, ensuring seamless integration and compatibility with a wide array of Bioconductor tools. This approach guarantees maximum flexibility and replicability, making POMA an essential asset for researchers handling omics datasets. See https://github.com/pcastellanoescuder/POMAShiny. Paper: Castellano-Escuder et al. (2021) <doi:10.1371/journal.pcbi.1009148> for more details.
Maintained by Pol Castellano-Escuder. Last updated 4 months ago.
batcheffectclassificationclusteringdecisiontreedimensionreductionmultidimensionalscalingnormalizationpreprocessingprincipalcomponentregressionrnaseqsoftwarestatisticalmethodvisualizationbioconductorbioinformaticsdata-visualizationdimension-reductionexploratory-data-analysismachine-learningomics-data-integrationpipelinepre-processingstatistical-analysisuser-friendlyworkflow
0.5 match 11 stars 8.23 score 20 scripts 1 dependentsbioc
countsimQC:Compare Characteristic Features of Count Data Sets
countsimQC provides functionality to create a comprehensive report comparing a broad range of characteristics across a collection of count matrices. One important use case is the comparison of one or more synthetic count matrices to a real count matrix, possibly the one underlying the simulations. However, any collection of count matrices can be compared.
Maintained by Charlotte Soneson. Last updated 3 months ago.
microbiomernaseqsinglecellexperimentaldesignqualitycontrolreportwritingvisualizationimmunooncology
0.5 match 27 stars 7.69 score 24 scriptsmlampros
nmslibR:Non Metric Space (Approximate) Library
A Non-Metric Space Library ('NMSLIB' <https://github.com/nmslib/nmslib>) wrapper, which according to the authors "is an efficient cross-platform similarity search library and a toolkit for evaluation of similarity search methods. The goal of the 'NMSLIB' <https://github.com/nmslib/nmslib> Library is to create an effective and comprehensive toolkit for searching in generic non-metric spaces. Being comprehensive is important, because no single method is likely to be sufficient in all cases. Also note that exact solutions are hardly efficient in high dimensions and/or non-metric spaces. Hence, the main focus is on approximate methods". The wrapper also includes Approximate Kernel k-Nearest-Neighbor functions based on the 'NMSLIB' <https://github.com/nmslib/nmslib> 'Python' Library.
Maintained by Lampros Mouselimis. Last updated 2 years ago.
approximate-nearest-neighbor-searchnmslibnon-metricpythonreticulatecppopenmp
0.8 match 12 stars 5.14 score 23 scriptscran
rocbc:Statistical Inference for Box-Cox Based Receiver Operating Characteristic Curves
Generation of Box-Cox based ROC curves and several aspects of inferences and hypothesis testing. Can be used when inferences for one biomarker (Bantis LE, Nakas CT, Reiser B. (2018)<doi:10.1002/bimj.201700107>) are of interest or when comparisons of two correlated biomarkers (Bantis LE, Nakas CT, Reiser B. (2021)<doi:10.1002/bimj.202000128>) are of interest. Provides inferences and comparisons around the AUC, the Youden index, the sensitivity at a given specificity level (and vice versa), the optimal operating point of the ROC curve (in the Youden sense), and the Youden based cutoff.
Maintained by Benjamin Brewer. Last updated 11 months ago.
1.7 match 2.30 scorehaghish
shapley:Weighted Mean SHAP and CI for Robust Feature Selection in ML Grid
This R package introduces Weighted Mean SHapley Additive exPlanations (WMSHAP), an innovative method for calculating SHAP values for a grid of fine-tuned base-learner machine learning models as well as stacked ensembles, a method not previously available due to the common reliance on single best-performing models. By integrating the weighted mean SHAP values from individual base-learners comprising the ensemble or individual base-learners in a tuning grid search, the package weights SHAP contributions according to each model's performance, assessed by multiple either R squared (for both regression and classification models). alternatively, this software also offers weighting SHAP values based on the area under the precision-recall curve (AUCPR), the area under the curve (AUC), and F2 measures for binary classifiers. It further extends this framework to implement weighted confidence intervals for weighted mean SHAP values, offering a more comprehensive and robust feature importance evaluation over a grid of machine learning models, instead of solely computing SHAP values for the best model. This methodology is particularly beneficial for addressing the severe class imbalance (class rarity) problem by providing a transparent, generalized measure of feature importance that mitigates the risk of reporting SHAP values for an overfitted or biased model and maintains robustness under severe class imbalance, where there is no universal criteria of identifying the absolute best model. Furthermore, the package implements hypothesis testing to ascertain the statistical significance of SHAP values for individual features, as well as comparative significance testing of SHAP contributions between features. Additionally, it tackles a critical gap in feature selection literature by presenting criteria for the automatic feature selection of the most important features across a grid of models or stacked ensembles, eliminating the need for arbitrary determination of the number of top features to be extracted. This utility is invaluable for researchers analyzing feature significance, particularly within severely imbalanced outcomes where conventional methods fall short. Moreover, it is also expected to report democratic feature importance across a grid of models, resulting in a more comprehensive and generalizable feature selection. The package further implements a novel method for visualizing SHAP values both at subject level and feature level as well as a plot for feature selection based on the weighted mean SHAP ratios.
Maintained by E. F. Haghish. Last updated 3 days ago.
class-imbalanceclass-imbalance-problemfeature-extractionfeature-importancefeature-selectionmachine-learningmachine-learning-algorithmsshapshap-analysisshap-valuesshapelyshapley-additive-explanationsshapley-decompositionshapley-valueshapley-valuesshapleyvalueweighted-shapweighted-shap-confidence-intervalweighted-shapleyweighted-shapley-ci
0.8 match 14 stars 5.19 score 17 scriptsropengov
retroharmonize:Ex Post Survey Data Harmonization
Assist in reproducible retrospective (ex-post) harmonization of data, particularly individual level survey data, by providing tools for organizing metadata, standardizing the coding of variables, and variable names and value labels, including missing values, and documenting the data transformations, with the help of comprehensive s3 classes.
Maintained by Daniel Antal. Last updated 2 months ago.
0.5 match 10 stars 7.62 score 59 scriptsmccarthy-m-g
palettes:Methods for Colour Vectors and Colour Palettes
Provides a comprehensive library for colour vectors and colour palettes using a new family of colour classes (palettes_colour and palettes_palette) that always print as hex codes with colour previews. Capabilities include: formatting, casting and coercion, extraction and updating of components, plotting, colour mixing arithmetic, and colour interpolation.
Maintained by Michael McCarthy. Last updated 6 months ago.
color-palettecolorscolour-palettecoloursggplot2gtpalettesvctrs
0.5 match 25 stars 7.58 score 42 scripts 1 dependentsrobindenz1
simDAG:Simulate Data from a DAG and Associated Node Information
Simulate complex data from a given directed acyclic graph and information about each individual node. Root nodes are simply sampled from the specified distribution. Child Nodes are simulated according to one of many implemented regressions, such as logistic regression, linear regression, poisson regression and more. Also includes a comprehensive framework for discrete-time simulation, which can generate even more complex longitudinal data.
Maintained by Robin Denz. Last updated 21 days ago.
causal-inferencedirected-acyclic-graphsimulation
0.5 match 10 stars 7.55 score 77 scriptsepiverse-trace
cleanepi:Clean and Standardize Epidemiological Data
Cleaning and standardizing tabular data package, tailored specifically for curating epidemiological data. It streamlines various data cleaning tasks that are typically expected when working with datasets in epidemiology. It returns the processed data in the same format, and generates a comprehensive report detailing the outcomes of each cleaning task.
Maintained by Karim Mané. Last updated 3 days ago.
data-cleaningepidemiologyepiverse
0.5 match 9 stars 7.44 score 19 scriptsdgerbing
lessR:Less Code, More Results
Each function replaces multiple standard R functions. For example, two function calls, Read() and CountAll(), generate summary statistics for all variables in the data frame, plus histograms and bar charts as appropriate. Other functions provide for summary statistics via pivot tables, a comprehensive regression analysis, ANOVA and t-test, visualizations including the Violin/Box/Scatter plot for a numerical variable, bar chart, histogram, box plot, density curves, calibrated power curve, reading multiple data formats with the same function call, variable labels, time series with aggregation and forecasting, color themes, and Trellis (facet) graphics. Also includes a confirmatory factor analysis of multiple indicator measurement models, pedagogical routines for data simulation such as for the Central Limit Theorem, generation and rendering of regression instructions for interpretative output, and interactive visualizations.
Maintained by David W. Gerbing. Last updated 1 months ago.
0.5 match 6 stars 7.47 score 394 scripts 3 dependentsjo-karl
ccpsyc:Methods for Cross-Cultural Psychology
With the development of new cross-cultural methods this package is intended to combine multiple functions automating and simplifying functions providing a unified analysis approach for commonly employed methods.
Maintained by Johannes Karl. Last updated 2 years ago.
1.9 match 1 stars 2.00 score 1 scriptseltebioinformatics
mulea:Enrichment Analysis Using Multiple Ontologies and False Discovery Rate
Background - Traditional gene set enrichment analyses are typically limited to a few ontologies and do not account for the interdependence of gene sets or terms, resulting in overcorrected p-values. To address these challenges, we introduce mulea, an R package offering comprehensive overrepresentation and functional enrichment analysis. Results - mulea employs a progressive empirical false discovery rate (eFDR) method, specifically designed for interconnected biological data, to accurately identify significant terms within diverse ontologies. mulea expands beyond traditional tools by incorporating a wide range of ontologies, encompassing Gene Ontology, pathways, regulatory elements, genomic locations, and protein domains. This flexibility enables researchers to tailor enrichment analysis to their specific questions, such as identifying enriched transcriptional regulators in gene expression data or overrepresented protein domains in protein sets. To facilitate seamless analysis, mulea provides gene sets (in standardised GMT format) for 27 model organisms, covering 22 ontology types from 16 databases and various identifiers resulting in almost 900 files. Additionally, the muleaData ExperimentData Bioconductor package simplifies access to these pre-defined ontologies. Finally, mulea's architecture allows for easy integration of user-defined ontologies, or GMT files from external sources (e.g., MSigDB or Enrichr), expanding its applicability across diverse research areas. Conclusions - mulea is distributed as a CRAN R package. It offers researchers a powerful and flexible toolkit for functional enrichment analysis, addressing limitations of traditional tools with its progressive eFDR and by supporting a variety of ontologies. Overall, mulea fosters the exploration of diverse biological questions across various model organisms.
Maintained by Tamas Stirling. Last updated 3 months ago.
annotationdifferentialexpressiongeneexpressiongenesetenrichmentgographandnetworkmultiplecomparisonpathwaysreactomesoftwaretranscriptionvisualizationenrichmentenrichment-analysisfunctional-enrichment-analysisgene-set-enrichmentontologiestranscriptomicscpp
0.5 match 28 stars 7.36 score 34 scriptsrsquaredacademy
blorr:Tools for Developing Binary Logistic Regression Models
Tools designed to make it easier for beginner and intermediate users to build and validate binary logistic regression models. Includes bivariate analysis, comprehensive regression output, model fit statistics, variable selection procedures, model validation techniques and a 'shiny' app for interactive model building.
Maintained by Aravind Hebbali. Last updated 4 months ago.
logistic-regression-modelsregressioncpp
0.5 match 17 stars 7.13 score 144 scripts 1 dependentscran
hudr:Providing Data from the US Department of Housing and Urban Development
Provides functions to access data from the US Department of Housing and Urban Development <https://www.huduser.gov/portal/dataset/fmr-api.html>.
Maintained by Paul Richardson. Last updated 2 years ago.
3.2 match 1.15 score 14 scriptspauljohn32
rockchalk:Regression Estimation and Presentation
A collection of functions for interpretation and presentation of regression analysis. These functions are used to produce the statistics lectures in <https://pj.freefaculty.org/guides/>. Includes regression diagnostics, regression tables, and plots of interactions and "moderator" variables. The emphasis is on "mean-centered" and "residual-centered" predictors. The vignette 'rockchalk' offers a fairly comprehensive overview. The vignette 'Rstyle' has advice about coding in R. The package title 'rockchalk' refers to our school motto, 'Rock Chalk Jayhawk, Go K.U.'.
Maintained by Paul E. Johnson. Last updated 3 years ago.
0.5 match 7.13 score 584 scripts 18 dependentsapariciojohan
flexFitR:Flexible Non-Linear Least Square Model Fitting
Provides tools for flexible non-linear least squares model fitting using general-purpose optimization techniques. The package supports a variety of optimization algorithms, including those provided by the 'optimx' package, making it suitable for handling complex non-linear models. Features include parallel processing support via the 'future' and 'foreach' packages, comprehensive model diagnostics, and visualization capabilities. Implements methods described in Nash and Varadhan (2011, <doi:10.18637/jss.v043.i09>).
Maintained by Johan Aparicio. Last updated 9 days ago.
0.5 match 2 stars 7.09 score 77 scriptsbioc
TADCompare:TADCompare: Identification and characterization of differential TADs
TADCompare is an R package designed to identify and characterize differential Topologically Associated Domains (TADs) between multiple Hi-C contact matrices. It contains functions for finding differential TADs between two datasets, finding differential TADs over time and identifying consensus TADs across multiple matrices. It takes all of the main types of HiC input and returns simple, comprehensive, easy to analyze results.
Maintained by Mikhail Dozmorov. Last updated 5 months ago.
softwarehicsequencingfeatureextractionclustering
0.5 match 23 stars 7.04 score 10 scriptsmpascariu
MortalityLaws:Parametric Mortality Models, Life Tables and HMD
Fit the most popular human mortality 'laws', and construct full and abridge life tables given various input indices. A mortality law is a parametric function that describes the dying-out process of individuals in a population during a significant portion of their life spans. For a comprehensive review of the most important mortality laws see Tabeau (2001) <doi:10.1007/0-306-47562-6_1>. Practical functions for downloading data from various human mortality databases are provided as well.
Maintained by Marius D. Pascariu. Last updated 1 years ago.
actuarial-sciencedemographydownload-hmdhuman-mortality-lawslife-tablemortality
0.5 match 32 stars 7.00 score 103 scripts 1 dependentscran
gss:General Smoothing Splines
A comprehensive package for structural multivariate function estimation using smoothing splines.
Maintained by Chong Gu. Last updated 5 months ago.
0.6 match 3 stars 6.40 score 137 dependentsstatistikat
surveysd:Survey Standard Error Estimation for Cumulated Estimates and their Differences in Complex Panel Designs
Calculate point estimates and their standard errors in complex household surveys using bootstrap replicates. Bootstrapping considers survey design with a rotating panel. A comprehensive description of the methodology can be found under <https://statistikat.github.io/surveysd/articles/methodology.html>.
Maintained by Johannes Gussenbauer. Last updated 3 months ago.
bootstraperror-estimationsurveycpp
0.5 match 9 stars 6.86 score 67 scriptsikosmidis
cranly:Package Directives and Collaboration Networks in CRAN
Core visualizations and summaries for the CRAN package database. The package provides comprehensive methods for cleaning up and organizing the information in the CRAN package database, for building package directives networks (depends, imports, suggests, enhances, linking to) and collaboration networks, producing package dependence trees, and for computing useful summaries and producing interactive visualizations from the resulting networks and summaries. The resulting networks can be coerced to 'igraph' <https://CRAN.R-project.org/package=igraph> objects for further analyses and modelling.
Maintained by Ioannis Kosmidis. Last updated 3 years ago.
network-analysisnetwork-visualization
0.5 match 49 stars 6.85 score 32 scripts 1 dependentsasa12138
ReporterScore:Generalized Reporter Score-Based Enrichment Analysis for Omics Data
Inspired by the classic 'RSA', we developed the improved 'Generalized Reporter Score-based Analysis (GRSA)' method, implemented in the R package 'ReporterScore', along with comprehensive visualization methods and pathway databases. 'GRSA' is a threshold-free method that works well with all types of biomedical features, such as genes, chemical compounds, and microbial species. Importantly, the 'GRSA' supports multi-group and longitudinal experimental designs, because of the included multi-group-compatible statistical methods.
Maintained by Chen Peng. Last updated 2 months ago.
0.5 match 67 stars 6.79 score 13 scriptsjhk0530
gemini.R:Interface for 'Google Gemini' API
Provides a comprehensive interface for Google Gemini API, enabling users to access and utilize Gemini Large Language Model (LLM) functionalities directly from R. This package facilitates seamless integration with Google Gemini, allowing for advanced language processing, text generation, and other AI-driven capabilities within the R environment. For more information, please visit <https://ai.google.dev/docs/gemini_api_overview>.
Maintained by Jinhwan Kim. Last updated 5 days ago.
0.5 match 68 stars 6.66 score 37 scripts 1 dependentsnepem-ufsc
pliman:Tools for Plant Image Analysis
Tools for both single and batch image manipulation and analysis (Olivoto, 2022 <doi:10.1111/2041-210X.13803>) and phytopathometry (Olivoto et al., 2022 <doi:10.1007/S40858-021-00487-5>). The tools can be used for the quantification of leaf area, object counting, extraction of image indexes, shape measurement, object landmark identification, and Elliptical Fourier Analysis of object outlines (Claude (2008) <doi:10.1007/978-0-387-77789-4>). The package also provides a comprehensive pipeline for generating shapefiles with complex layouts and supports high-throughput phenotyping of RGB, multispectral, and hyperspectral orthomosaics. This functionality facilitates field phenotyping using UAV- or satellite-based imagery.
Maintained by Tiago Olivoto. Last updated 2 days ago.
0.5 match 10 stars 6.68 score 476 scriptskylegrealis
froggeR:Enhance 'Quarto' Project Workflows and Standards
Streamlines 'Quarto' workflows by providing tools for consistent project setup and documentation. Enables portability through reusable metadata, automated project structure creation, and standardized templates. Features include enhanced project initialization, pre-formatted 'Quarto' documents, comprehensive data protection settings, custom styling, and structured documentation generation. Designed to improve efficiency and collaboration in R data science projects by reducing repetitive setup tasks while maintaining consistent formatting across multiple documents. There are many valuable resources providing in-depth explanations of customizing 'Quarto' templates and theme styling by the Posit team: <https://quarto.org/docs/output-formats/html-themes.html#customizing-themes> & <https://quarto.org/docs/output-formats/html-themes-more.html>, and at the Bootstrap community's GitHub at <https://github.com/twbs/bootstrap/blob/main/scss/_variables.scss>.
Maintained by Kyle Grealis. Last updated 5 hours ago.
data-scienceproject-managementquarto
0.5 match 26 stars 6.67 score 6 scriptscefet-rj-dal
daltoolbox:Leveraging Experiment Lines to Data Analytics
The natural increase in the complexity of current research experiments and data demands better tools to enhance productivity in Data Analytics. The package is a framework designed to address the modern challenges in data analytics workflows. The package is inspired by Experiment Line concepts. It aims to provide seamless support for users in developing their data mining workflows by offering a uniform data model and method API. It enables the integration of various data mining activities, including data preprocessing, classification, regression, clustering, and time series prediction. It also offers options for hyper-parameter tuning and supports integration with existing libraries and languages. Overall, the package provides researchers with a comprehensive set of functionalities for data science, promoting ease of use, extensibility, and integration with various tools and libraries. Information on Experiment Line is based on Ogasawara et al. (2009) <doi:10.1007/978-3-642-02279-1_20>.
Maintained by Eduardo Ogasawara. Last updated 1 months ago.
0.5 match 1 stars 6.65 score 536 scripts 4 dependentsjohnschwenck
bp:Blood Pressure Analysis in R
A comprehensive package to aid in the analysis of blood pressure data of all forms by providing both descriptive and visualization tools for researchers.
Maintained by John Schwenck. Last updated 3 years ago.
0.5 match 26 stars 6.23 score 13 scriptsbioc
GSEABenchmarkeR:Reproducible GSEA Benchmarking
The GSEABenchmarkeR package implements an extendable framework for reproducible evaluation of set- and network-based methods for enrichment analysis of gene expression data. This includes support for the efficient execution of these methods on comprehensive real data compendia (microarray and RNA-seq) using parallel computation on standard workstations and institutional computer grids. Methods can then be assessed with respect to runtime, statistical significance, and relevance of the results for the phenotypes investigated.
Maintained by Ludwig Geistlinger. Last updated 5 months ago.
immunooncologymicroarrayrnaseqgeneexpressiondifferentialexpressionpathwaysgraphandnetworknetworkgenesetenrichmentnetworkenrichmentvisualizationreportwritingbioconductor-packageu24ca289073
0.5 match 13 stars 6.55 score 23 scriptstvpham
iq:Protein Quantification in Mass Spectrometry-Based Proteomics
An implementation of the MaxLFQ algorithm by Cox et al. (2014) <doi:10.1074/mcp.M113.031591> in a comprehensive pipeline for processing proteomics data in data-independent acquisition mode (Pham et al. 2020 <doi:10.1093/bioinformatics/btz961>). It offers additional options for protein quantification using the N most intense fragment ions, using all fragment ions, and a wrapper for the median polish algorithm by Tukey (1977, ISBN:0201076160). In general, the tool can be used to integrate multiple proportional observations into a single quantitative value.
Maintained by Thang Pham. Last updated 15 days ago.
0.5 match 27 stars 6.49 score 25 scriptsriatelab
maplegend:Legends for Maps
Create legends for maps and other graphics. Thematic maps need to be accompanied by legible legends to be fully comprehensible. This package offers a wide range of legends useful for cartography, some of which may also be useful for other types of graphics.
Maintained by Timothée Giraud. Last updated 5 months ago.
0.5 match 13 stars 6.33 score 5 scripts 14 dependentsterrytangyuan
scaffolder:Scaffolding Interfaces to Packages in Other Programming Languages
Comprehensive set of tools for scaffolding R interfaces to modules, classes, functions, and documentations written in other programming languages, such as 'Python'.
Maintained by Yuan Tang. Last updated 2 years ago.
code-generationpythonreticulatescaffolding
0.5 match 27 stars 6.13 score 9 scriptsmonty-se
PINstimation:Estimation of the Probability of Informed Trading
A comprehensive bundle of utilities for the estimation of probability of informed trading models: original PIN in Easley and O'Hara (1992) and Easley et al. (1996); Multilayer PIN (MPIN) in Ersan (2016); Adjusted PIN (AdjPIN) in Duarte and Young (2009); and volume-synchronized PIN (VPIN) in Easley et al. (2011, 2012). Implementations of various estimation methods suggested in the literature are included. Additional compelling features comprise posterior probabilities, an implementation of an expectation-maximization (EM) algorithm, and PIN decomposition into layers, and into bad/good components. Versatile data simulation tools, and trade classification algorithms are among the supplementary utilities. The package provides fast, compact, and precise utilities to tackle the sophisticated, error-prone, and time-consuming estimation procedure of informed trading, and this solely using the raw trade-level data.
Maintained by Montasser Ghachem. Last updated 5 months ago.
clustering-analysisexpectation-maximisation-algorithmhierarchical-clusteringinformation-asymmetrymarket-microstructuremaximum-likelihood-estimationmixture-distributionspoisson-distribution
0.5 match 36 stars 6.48 score 14 scriptsadwolfer
santaR:Short Asynchronous Time-Series Analysis
A graphical and automated pipeline for the analysis of short time-series in R ('santaR'). This approach is designed to accommodate asynchronous time sampling (i.e. different time points for different individuals), inter-individual variability, noisy measurements and large numbers of variables. Based on a smoothing splines functional model, 'santaR' is able to detect variables highlighting significantly different temporal trajectories between study groups. Designed initially for metabolic phenotyping, 'santaR' is also suited for other Systems Biology disciplines. Command line and graphical analysis (via a 'shiny' application) enable fast and parallel automated analysis and reporting, intuitive visualisation and comprehensive plotting options for non-specialist users.
Maintained by Arnaud Wolfer. Last updated 1 years ago.
0.5 match 11 stars 6.44 score 63 scriptssmartdata-analysis-and-statistics
SimTOST:Sample Size Estimation for Bio-Equivalence Trials Through Simulation
Sample size estimation for bio-equivalence trials is supported through a simulation-based approach that extends the Two One-Sided Tests (TOST) procedure. The methodology provides flexibility in hypothesis testing, accommodates multiple treatment comparisons, and accounts for correlated endpoints. Users can model complex trial scenarios, including parallel and crossover designs, intra-subject variability, and different equivalence margins. Monte Carlo simulations enable accurate estimation of power and type I error rates, ensuring well-calibrated study designs. The statistical framework builds on established methods for equivalence testing and multiple hypothesis testing in bio-equivalence studies, as described in Schuirmann (1987) <doi:10.1007/BF01068419>, Mielke et al. (2018) <doi:10.1080/19466315.2017.1371071>, Shieh (2022) <doi:10.1371/journal.pone.0269128>, and Sozu et al. (2015) <doi:10.1007/978-3-319-22005-5>. Comprehensive documentation and vignettes guide users through implementation and interpretation of results.
Maintained by Thomas Debray. Last updated 26 days ago.
mcmcmulti-armmultiple-comparisonssample-size-calculationsample-size-estimationtrial-simulationopenblascpp
0.5 match 2 stars 6.47 score 7 scriptsstc04003
reReg:Recurrent Event Regression
A comprehensive collection of practical and easy-to-use tools for regression analysis of recurrent events, with or without the presence of a (possibly) informative terminal event described in Chiou et al. (2023) <doi:10.18637/jss.v105.i05>. The modeling framework is based on a joint frailty scale-change model, that includes models described in Wang et al. (2001) <doi:10.1198/016214501753209031>, Huang and Wang (2004) <doi:10.1198/016214504000001033>, Xu et al. (2017) <doi:10.1080/01621459.2016.1173557>, and Xu et al. (2019) <doi:10.5705/SS.202018.0224> as special cases. The implemented estimating procedure does not require any parametric assumption on the frailty distribution. The package also allows the users to specify different model forms for both the recurrent event process and the terminal event.
Maintained by Sy Han (Steven) Chiou. Last updated 2 months ago.
0.5 match 23 stars 6.35 score 36 scripts 1 dependentsbioc
rhinotypeR:Rhinovirus genotyping
"rhinotypeR" is designed to automate the comparison of sequence data against prototype strains, streamlining the genotype assignment process. By implementing predefined pairwise distance thresholds, this package makes genotype assignment accessible to researchers and public health professionals. This tool enhances our epidemiological toolkit by enabling more efficient surveillance and analysis of rhinoviruses (RVs) and other viral pathogens with complex genomic landscapes. Additionally, "rhinotypeR" supports comprehensive visualization and analysis of single nucleotide polymorphisms (SNPs) and amino acid substitutions, facilitating in-depth genetic and evolutionary studies.
Maintained by Martha Luka. Last updated 5 months ago.
sequencinggeneticsphylogenetics
0.5 match 4 stars 6.28 score 2 scriptspgomba
MDPIexploreR:Web Scraping and Bibliometric Analysis of MDPI Journals
Provides comprehensive tools to scrape and analyze data from the MDPI journals. It allows users to extract metrics such as submission-to-acceptance times, article types, and whether articles are part of special issues. The package can also visualize this information through plots. Additionally, 'MDPIexploreR' offers tools to explore patterns of self-citations within articles and provides insights into guest-edited special issues.
Maintained by Pablo Gómez Barreiro. Last updated 4 months ago.
analysisdata-analysisdata-visualizationmdpimetricsscientific-journalsvisualizationweb-scraping
0.5 match 20 stars 6.20 score 9 scriptsskstavroglou
patterncausality:Pattern Causality Algorithm
A comprehensive package for detecting and analyzing causal relationships in complex systems using pattern-based approaches. Key features include state space reconstruction, pattern identification, and causality strength evaluation.
Maintained by Hui Wang. Last updated 29 days ago.
0.5 match 1 stars 6.08 score 20 scripts