Showing 200 of total 608 results (show query)
rstudio
reticulate:Interface to 'Python'
Interface to 'Python' modules, classes, and functions. When calling into 'Python', R data types are automatically converted to their equivalent 'Python' types. When values are returned from 'Python' to R they are converted back to R types. Compatible with all versions of 'Python' >= 2.7.
Maintained by Tomasz Kalinowski. Last updated 6 days ago.
1.7k stars 21.02 score 18k scripts 434 dependentssatijalab
Seurat:Tools for Single Cell Genomics
A toolkit for quality control, analysis, and exploration of single cell RNA sequencing data. 'Seurat' aims to enable users to identify and interpret sources of heterogeneity from single cell transcriptomic measurements, and to integrate diverse types of single cell data. See Satija R, Farrell J, Gennert D, et al (2015) <doi:10.1038/nbt.3192>, Macosko E, Basu A, Satija R, et al (2015) <doi:10.1016/j.cell.2015.05.002>, Stuart T, Butler A, et al (2019) <doi:10.1016/j.cell.2019.05.031>, and Hao, Hao, et al (2020) <doi:10.1101/2020.10.12.335331> for more details.
Maintained by Paul Hoffman. Last updated 1 years ago.
human-cell-atlassingle-cell-genomicssingle-cell-rna-seqcpp
2.4k stars 16.86 score 50k scripts 73 dependentsrstudio
tensorflow:R Interface to 'TensorFlow'
Interface to 'TensorFlow' <https://www.tensorflow.org/>, an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more 'CPUs' or 'GPUs' in a desktop, server, or mobile device with a single 'API'. 'TensorFlow' was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.
Maintained by Tomasz Kalinowski. Last updated 5 days ago.
1.3k stars 15.47 score 3.2k scripts 75 dependentsthinkr-open
golem:A Framework for Robust Shiny Applications
An opinionated framework for building a production-ready 'Shiny' application. This package contains a series of tools for building a robust 'Shiny' application from start to finish.
Maintained by Colin Fay. Last updated 7 months ago.
golemversehacktoberfestshinyshiny-appsshiny-rshinyapps
921 stars 14.21 score 167 scripts 63 dependentsr-spatial
rgee:R Bindings for Calling the 'Earth Engine' API
Earth Engine <https://earthengine.google.com/> client library for R. All of the 'Earth Engine' API classes, modules, and functions are made available. Additional functions implemented include importing (exporting) of Earth Engine spatial objects, extraction of time series, interactive map display, assets management interface, and metadata display. See <https://r-spatial.github.io/rgee/> for further details.
Maintained by Cesar Aybar. Last updated 4 days ago.
earth-engineearthenginegoogle-earth-enginegoogleearthenginespatial-analysisspatial-data
717 stars 13.77 score 1.9k scripts 3 dependentsrstudio
keras3:R Interface to 'Keras'
Interface to 'Keras' <https://keras.io>, a high-level neural networks API. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both CPU and GPU devices.
Maintained by Tomasz Kalinowski. Last updated 11 days ago.
845 stars 13.63 score 264 scripts 2 dependentsoscarkjell
text:Analyses of Text using Transformers Models from HuggingFace, Natural Language Processing and Machine Learning
Link R with Transformers from Hugging Face to transform text variables to word embeddings; where the word embeddings are used to statistically test the mean difference between set of texts, compute semantic similarity scores between texts, predict numerical variables, and visual statistically significant words according to various dimensions etc. For more information see <https://www.r-text.org>.
Maintained by Oscar Kjell. Last updated 9 days ago.
deep-learningmachine-learningnlptransformersopenjdk
145 stars 13.21 score 436 scripts 1 dependentstkonopka
umap:Uniform Manifold Approximation and Projection
Uniform manifold approximation and projection is a technique for dimension reduction. The algorithm was described by McInnes and Healy (2018) in <arXiv:1802.03426>. This package provides an interface for two implementations. One is written from scratch, including components for nearest-neighbor search and for embedding. The second implementation is a wrapper for 'python' package 'umap-learn' (requires separate installation, see vignette for more details).
Maintained by Tomasz Konopka. Last updated 11 months ago.
dimensionality-reductionumapcpp
132 stars 12.82 score 3.6k scripts 45 dependentsgreta-dev
greta:Simple and Scalable Statistical Modelling in R
Write statistical models in R and fit them by MCMC and optimisation on CPUs and GPUs, using Google 'TensorFlow'. greta lets you write your own model like in BUGS, JAGS and Stan, except that you write models right in R, it scales well to massive datasets, and it’s easy to extend and build on. See the website for more information, including tutorials, examples, package documentation, and the greta forum.
Maintained by Nicholas Tierney. Last updated 20 days ago.
566 stars 12.53 score 396 scripts 6 dependentsrstudio
tfruns:Training Run Tools for 'TensorFlow'
Create and manage unique directories for each 'TensorFlow' training run. Provides a unique, time stamped directory for each run along with functions to retrieve the directory of the latest run or latest several runs.
Maintained by Tomasz Kalinowski. Last updated 12 months ago.
34 stars 11.80 score 325 scripts 77 dependentsbioc
zellkonverter:Conversion Between scRNA-seq Objects
Provides methods to convert between Python AnnData objects and SingleCellExperiment objects. These are primarily intended for use by downstream Bioconductor packages that wrap Python methods for single-cell data analysis. It also includes functions to read and write H5AD files used for saving AnnData objects to disk.
Maintained by Luke Zappia. Last updated 22 days ago.
singlecelldataimportdatarepresentationbioconductorconversionscrna-seq
159 stars 11.25 score 660 scripts 4 dependentsropengov
eurostat:Tools for Eurostat Open Data
Tools to download data from the Eurostat database <https://ec.europa.eu/eurostat> together with search and manipulation utilities.
Maintained by Leo Lahti. Last updated 1 months ago.
242 stars 11.07 score 892 scripts 4 dependentst-kalinowski
keras:R Interface to 'Keras'
Interface to 'Keras' <https://keras.io>, a high-level neural networks 'API'. 'Keras' was developed with a focus on enabling fast experimentation, supports both convolution based networks and recurrent networks (as well as combinations of the two), and runs seamlessly on both 'CPU' and 'GPU' devices.
Maintained by Tomasz Kalinowski. Last updated 11 months ago.
10.93 score 10k scripts 55 dependentsbioc
infercnv:Infer Copy Number Variation from Single-Cell RNA-Seq Data
Using single-cell RNA-Seq expression to visualize CNV in cells.
Maintained by Christophe Georgescu. Last updated 5 months ago.
softwarecopynumbervariationvariantdetectionstructuralvariationgenomicvariationgeneticstranscriptomicsstatisticalmethodbayesianhiddenmarkovmodelsinglecelljagscpp
601 stars 10.92 score 674 scriptsfriendly
vcdExtra:'vcd' Extensions and Additions
Provides additional data sets, methods and documentation to complement the 'vcd' package for Visualizing Categorical Data and the 'gnm' package for Generalized Nonlinear Models. In particular, 'vcdExtra' extends mosaic, assoc and sieve plots from 'vcd' to handle 'glm()' and 'gnm()' models and adds a 3D version in 'mosaic3d'. Additionally, methods are provided for comparing and visualizing lists of 'glm' and 'loglm' objects. This package is now a support package for the book, "Discrete Data Analysis with R" by Michael Friendly and David Meyer.
Maintained by Michael Friendly. Last updated 7 days ago.
categorical-data-visualizationgeneralized-linear-modelsmosaic-plots
24 stars 10.85 score 472 scripts 3 dependentsmoodymudskipper
flow:View and Browse Code Using Flow Diagrams
Visualize as flow diagrams the logic of functions, expressions or scripts in a static way or when running a call, visualize the dependencies between functions or between modules in a shiny app, and more.
Maintained by Antoine Fabri. Last updated 4 months ago.
405 stars 10.84 score 61 scriptsrstudio
pointblank:Data Validation and Organization of Metadata for Local and Remote Tables
Validate data in data frames, 'tibble' objects, 'Spark' 'DataFrames', and database tables. Validation pipelines can be made using easily-readable, consecutive validation steps. Upon execution of the validation plan, several reporting options are available. User-defined thresholds for failure rates allow for the determination of appropriate reporting actions. Many other workflows are available including an information management workflow, where the aim is to record, collect, and generate useful information on data tables.
Maintained by Richard Iannone. Last updated 5 days ago.
data-assertionsdata-checkerdata-dictionariesdata-framesdata-inferencedata-managementdata-profilerdata-qualitydata-validationdata-verificationdatabase-tableseasy-to-understandreporting-toolschema-validationtesting-toolsyaml-configuration
942 stars 10.73 score 284 scriptsquanteda
spacyr:Wrapper to the 'spaCy' 'NLP' Library
An R wrapper to the 'Python' 'spaCy' 'NLP' library, from <https://spacy.io>.
Maintained by Kenneth Benoit. Last updated 2 months ago.
extract-entitiesnlpspacyspeech-tagging
253 stars 10.68 score 408 scripts 6 dependentsrstudio
blastula:Easily Send HTML Email Messages
Compose and send out responsive HTML email messages that render perfectly across a range of email clients and device sizes. Helper functions let the user insert embedded images, web link buttons, and 'ggplot2' plot objects into the message body. Messages can be sent through an 'SMTP' server, through the 'Posit Connect' service, or through the 'Mailgun' API service <https://www.mailgun.com/>.
Maintained by Richard Iannone. Last updated 9 months ago.
easy-to-useemailhtmlmarkdownresponsive-emailsmtp
552 stars 10.27 score 348 scripts 5 dependentsfacebookexperimental
Robyn:Semi-Automated Marketing Mix Modeling (MMM) from Meta Marketing Science
Semi-Automated Marketing Mix Modeling (MMM) aiming to reduce human bias by means of ridge regression and evolutionary algorithms, enables actionable decision making providing a budget allocation and diminishing returns curves and allows ground-truth calibration to account for causation.
Maintained by Gufeng Zhou. Last updated 13 days ago.
adstockingbudget-allocationcost-response-curveeconometricsevolutionary-algorithmgradient-based-optimisationhyperparameter-optimizationmarketing-mix-modelingmarketing-mix-modellingmarketing-sciencemmmridge-regression
1.3k stars 10.27 score 95 scriptsbioc
singleCellTK:Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data
The Single Cell Toolkit (SCTK) in the singleCellTK package provides an interface to popular tools for importing, quality control, analysis, and visualization of single cell RNA-seq data. SCTK allows users to seamlessly integrate tools from various packages at different stages of the analysis workflow. A general "a la carte" workflow gives users the ability access to multiple methods for data importing, calculation of general QC metrics, doublet detection, ambient RNA estimation and removal, filtering, normalization, batch correction or integration, dimensionality reduction, 2-D embedding, clustering, marker detection, differential expression, cell type labeling, pathway analysis, and data exporting. Curated workflows can be used to run Seurat and Celda. Streamlined quality control can be performed on the command line using the SCTK-QC pipeline. Users can analyze their data using commands in the R console or by using an interactive Shiny Graphical User Interface (GUI). Specific analyses or entire workflows can be summarized and shared with comprehensive HTML reports generated by Rmarkdown. Additional documentation and vignettes can be found at camplab.net/sctk.
Maintained by Joshua David Campbell. Last updated 1 months ago.
singlecellgeneexpressiondifferentialexpressionalignmentclusteringimmunooncologybatcheffectnormalizationqualitycontroldataimportgui
182 stars 10.17 score 252 scriptsconstantamateur
SoupX:Single Cell mRNA Soup eXterminator
Quantify, profile and remove ambient mRNA contamination (the "soup") from droplet based single cell RNA-seq experiments. Implements the method described in Young et al. (2018) <doi:10.1101/303727>.
Maintained by Matthew Daniel Young. Last updated 2 years ago.
266 stars 10.08 score 594 scripts 1 dependentsbioc
MOFA2:Multi-Omics Factor Analysis v2
The MOFA2 package contains a collection of tools for training and analysing multi-omic factor analysis (MOFA). MOFA is a probabilistic factor model that aims to identify principal axes of variation from data sets that can comprise multiple omic layers and/or groups of samples. Additional time or space information on the samples can be incorporated using the MEFISTO framework, which is part of MOFA2. Downstream analysis functions to inspect molecular features underlying each factor, vizualisation, imputation etc are available.
Maintained by Ricard Argelaguet. Last updated 5 months ago.
dimensionreductionbayesianvisualizationfactor-analysismofamulti-omics
326 stars 10.03 score 502 scriptspecanproject
PEcAn.assim.batch:PEcAn Functions Used for Ecological Forecasts and Reanalysis
The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.
Maintained by Istem Fer. Last updated 9 hours ago.
bayesiancyberinfrastructuredata-assimilationdata-scienceecosystem-modelecosystem-scienceforecastingmeta-analysisnational-science-foundationpecanplantsjagscpp
216 stars 9.97 score 20 scripts 2 dependentslorenzwalthert
precommit:Pre-Commit Hooks
Useful git hooks for R building on top of the multi-language framework 'pre-commit' for hook management. This package provides git hooks for common tasks like formatting files with 'styler' or spell checking as well as wrapper functions to access the 'pre-commit' executable.
Maintained by Lorenz Walthert. Last updated 2 days ago.
255 stars 9.73 score 10 scriptspecanproject
PEcAnRTM:PEcAn Functions Used for Radiative Transfer Modeling
Functions for performing forward runs and inversions of radiative transfer models (RTMs). Inversions can be performed using maximum likelihood, or more complex hierarchical Bayesian methods. Underlying numerical analyses are optimized for speed using Fortran code.
Maintained by Alexey Shiklomanov. Last updated 9 hours ago.
bayesiancyberinfrastructuredata-assimilationdata-scienceecosystem-modelecosystem-scienceforecastingmeta-analysisnational-science-foundationpecanplantsfortranjagscpp
216 stars 9.72 score 132 scriptsjamovi
jmv:The 'jamovi' Analyses
A suite of common statistical methods such as descriptives, t-tests, ANOVAs, regression, correlation matrices, proportion tests, contingency tables, and factor analysis. This package is also useable from the 'jamovi' statistical spreadsheet (see <https://www.jamovi.org> for more information).
Maintained by Jonathon Love. Last updated 29 days ago.
59 stars 9.58 score 440 scriptsstemangiola
tidyseurat:Brings Seurat to the Tidyverse
It creates an invisible layer that allow to see the 'Seurat' object as tibble and interact seamlessly with the tidyverse.
Maintained by Stefano Mangiola. Last updated 8 months ago.
assaydomaininfrastructurernaseqdifferentialexpressiongeneexpressionnormalizationclusteringqualitycontrolsequencingtranscriptiontranscriptomicsdplyrggplot2pcapurrrsctseuratsingle-cellsingle-cell-rna-seqtibbletidyrtidyversetranscriptstsneumap
159 stars 9.48 score 398 scripts 1 dependentsdynverse
anndata:'anndata' for R
A 'reticulate' wrapper for the Python package 'anndata'. Provides a scalable way of keeping track of data and learned annotations. Used to read from and write to the h5ad file format.
Maintained by Robrecht Cannoodt. Last updated 26 days ago.
44 stars 9.47 score 772 scripts 3 dependentsthinkr-open
fusen:Build a Package from Rmarkdown Files
Use Rmarkdown First method to build your package. Start your package with documentation, functions, examples and tests in the same unique file. Everything can be set from the Rmarkdown template file provided in your project, then inflated as a package. Inflating the template copies the relevant chunks and sections in the appropriate files required for package development.
Maintained by Vincent Guyader. Last updated 2 months ago.
163 stars 9.45 score 35 scriptseagerai
fastai:Interface to 'fastai'
The 'fastai' <https://docs.fast.ai/index.html> library simplifies training fast and accurate neural networks using modern best practices. It is based on research in to deep learning best practices undertaken at 'fast.ai', including 'out of the box' support for vision, text, tabular, audio, time series, and collaborative filtering models.
Maintained by Turgut Abdullayev. Last updated 12 months ago.
audiocollaborative-filteringdarknetdarknet-image-classificationfastaimedicalobject-detectiontabulartextvision
118 stars 9.40 score 76 scriptsrstudio
tfdatasets:Interface to 'TensorFlow' Datasets
Interface to 'TensorFlow' Datasets, a high-level library for building complex input pipelines from simple, re-usable pieces. See <https://www.tensorflow.org/guide> for additional details.
Maintained by Tomasz Kalinowski. Last updated 19 days ago.
34 stars 9.32 score 656 scripts 3 dependentsbodkan
slendr:A Simulation Framework for Spatiotemporal Population Genetics
A framework for simulating spatially explicit genomic data which leverages real cartographic information for programmatic and visual encoding of spatiotemporal population dynamics on real geographic landscapes. Population genetic models are then automatically executed by the 'SLiM' software by Haller et al. (2019) <doi:10.1093/molbev/msy228> behind the scenes, using a custom built-in simulation 'SLiM' script. Additionally, fully abstract spatial models not tied to a specific geographic location are supported, and users can also simulate data from standard, non-spatial, random-mating models. These can be simulated either with the 'SLiM' built-in back-end script, or using an efficient coalescent population genetics simulator 'msprime' by Baumdicker et al. (2022) <doi:10.1093/genetics/iyab229> with a custom-built 'Python' script bundled with the R package. Simulated genomic data is saved in a tree-sequence format and can be loaded, manipulated, and summarised using tree-sequence functionality via an R interface to the 'Python' module 'tskit' by Kelleher et al. (2019) <doi:10.1038/s41588-019-0483-y>. Complete model configuration, simulation and analysis pipelines can be therefore constructed without a need to leave the R environment, eliminating friction between disparate tools for population genetic simulations and data analysis.
Maintained by Martin Petr. Last updated 3 days ago.
popgenpopulation-geneticssimulationsspatial-statistics
56 stars 9.13 score 88 scriptsbioc
basilisk:Freezing Python Dependencies Inside Bioconductor Packages
Installs a self-contained conda instance that is managed by the R/Bioconductor installation machinery. This aims to provide a consistent Python environment that can be used reliably by Bioconductor packages. Functions are also provided to enable smooth interoperability of multiple Python environments in a single R session.
Maintained by Aaron Lun. Last updated 11 days ago.
9.12 score 75 scripts 39 dependentsstscl
gdverse:Analysis of Spatial Stratified Heterogeneity
Detecting spatial associations based on the concept of spatial stratified heterogeneity while also considering spatial dependencies, spatial interpretability, complex spatial interactions, and robust spatial stratification. In addition, it supports the spatial stratified heterogeneity family described in Lv et al. (2025)<doi:10.1111/tgis.70032>.
Maintained by Wenbo Lv. Last updated 2 days ago.
geographical-detectorgeoinformaticsgeospatial-analysisspatial-statisticsspatial-stratified-heterogeneitycpp
33 stars 9.10 score 41 scripts 2 dependentsdidiermurillof
FielDHub:A Shiny App for Design of Experiments in Life Sciences
A shiny design of experiments (DOE) app that aids in the creation of traditional, un-replicated, augmented and partially-replicated designs applied to agriculture, plant breeding, forestry, animal and biological sciences.
Maintained by Didier Murillo. Last updated 8 months ago.
agriculturalbreedingdesigndoeexperimentalplantbreedingshiny
47 stars 9.09 score 70 scripts 1 dependentsbioc
BatchQC:Batch Effects Quality Control Software
Sequencing and microarray samples often are collected or processed in multiple batches or at different times. This often produces technical biases that can lead to incorrect results in the downstream analysis. BatchQC is a software tool that streamlines batch preprocessing and evaluation by providing interactive diagnostics, visualizations, and statistical analyses to explore the extent to which batch variation impacts the data. BatchQC diagnostics help determine whether batch adjustment needs to be done, and how correction should be applied before proceeding with a downstream analysis. Moreover, BatchQC interactively applies multiple common batch effect approaches to the data and the user can quickly see the benefits of each method. BatchQC is developed as a Shiny App. The output is organized into multiple tabs and each tab features an important part of the batch effect analysis and visualization of the data. The BatchQC interface has the following analysis groups: Summary, Differential Expression, Median Correlations, Heatmaps, Circular Dendrogram, PCA Analysis, Shape, ComBat and SVA.
Maintained by Jessica Anderson. Last updated 13 days ago.
batcheffectgraphandnetworkmicroarraynormalizationprincipalcomponentsequencingsoftwarevisualizationqualitycontrolrnaseqpreprocessingdifferentialexpressionimmunooncology
7 stars 9.06 score 54 scriptspecanproject
PEcAn.all:PEcAn Functions Used for Ecological Forecasts and Reanalysis
The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PEcAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.
Maintained by David LeBauer. Last updated 9 hours ago.
bayesiancyberinfrastructuredata-assimilationdata-scienceecosystem-modelecosystem-scienceforecastingmeta-analysisnational-science-foundationpecanplantsjagscpp
216 stars 9.02 score 266 scriptsbioc
scPipe:Pipeline for single cell multi-omic data pre-processing
A preprocessing pipeline for single cell RNA-seq/ATAC-seq data that starts from the fastq files and produces a feature count matrix with associated quality control information. It can process fastq data generated by CEL-seq, MARS-seq, Drop-seq, Chromium 10x and SMART-seq protocols.
Maintained by Shian Su. Last updated 3 months ago.
immunooncologysoftwaresequencingrnaseqgeneexpressionsinglecellvisualizationsequencematchingpreprocessingqualitycontrolgenomeannotationdataimportcurlbzip2xz-utilszlibcpp
68 stars 9.02 score 84 scriptsazure
azuremlsdk:Interface to the 'Azure Machine Learning' 'SDK'
Interface to the 'Azure Machine Learning' Software Development Kit ('SDK'). Data scientists can use the 'SDK' to train, deploy, automate, and manage machine learning models on the 'Azure Machine Learning' service. To learn more about 'Azure Machine Learning' visit the website: <https://docs.microsoft.com/en-us/azure/machine-learning/service/overview-what-is-azure-ml>.
Maintained by Diondra Peck. Last updated 3 years ago.
amlcomputeazureazure-machine-learningazuremldsimachine-learningrstudiosdk-r
105 stars 8.91 score 221 scriptstomkellygenetics
leiden:R Implementation of Leiden Clustering Algorithm
Implements the 'Python leidenalg' module to be called in R. Enables clustering using the leiden algorithm for partition a graph into communities. See the 'Python' repository for more details: <https://github.com/vtraag/leidenalg> Traag et al (2018) From Louvain to Leiden: guaranteeing well-connected communities. <arXiv:1810.08473>.
Maintained by S. Thomas Kelly. Last updated 10 months ago.
38 stars 8.89 score 180 scripts 3 dependentspecanproject
PEcAn.workflow:PEcAn Functions Used for Ecological Forecasts and Reanalysis
The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PEcAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation. This package provides workhorse functions that can be used to run the major steps of a PEcAn analysis.
Maintained by David LeBauer. Last updated 9 hours ago.
bayesiancyberinfrastructuredata-assimilationdata-scienceecosystem-modelecosystem-scienceforecastingmeta-analysisnational-science-foundationpecanplantsjagscpp
216 stars 8.85 score 15 scripts 4 dependentsrjournal
rjtools:Preparing, Checking, and Submitting Articles to the 'R Journal'
Create an 'R Journal' 'Rmarkdown' template article, that will generate html and pdf versions of your paper. Check that the paper folder has all the required components needed for submission. Examples of 'R Journal' publications can be found at <https://journal.r-project.org>.
Maintained by Di Cook. Last updated 2 months ago.
33 stars 8.81 score 37 scripts 1 dependentsropengov
regions:Processing Regional Statistics
Validating sub-national statistical typologies, re-coding across standard typologies of sub-national statistics, and making valid aggregate level imputation, re-aggregation, re-weighting and projection down to lower hierarchical levels to create meaningful data panels and time series.
Maintained by Daniel Antal. Last updated 3 years ago.
observatoryregionsropengovstatistics
12 stars 8.81 score 67 scripts 5 dependentspecanproject
PEcAn.data.remote:PEcAn Functions Used for Extracting Remote Sensing Data
PEcAn module for processing remote data. Python module requirements: requests, json, re, ast, panads, sys. If any of these modules are missing, install using pip install <module name>.
Maintained by Bailey Morrison. Last updated 9 hours ago.
bayesiancyberinfrastructuredata-assimilationdata-scienceecosystem-modelecosystem-scienceforecastingmeta-analysisnational-science-foundationpecanplants
216 stars 8.77 score 6 scripts 5 dependentsrstudio
tfprobability:Interface to 'TensorFlow Probability'
Interface to 'TensorFlow Probability', a 'Python' library built on 'TensorFlow' that makes it easy to combine probabilistic models and deep learning on modern hardware ('TPU', 'GPU'). 'TensorFlow Probability' includes a wide selection of probability distributions and bijectors, probabilistic layers, variational inference, Markov chain Monte Carlo, and optimizers such as Nelder-Mead, BFGS, and SGLD.
Maintained by Tomasz Kalinowski. Last updated 3 years ago.
54 stars 8.63 score 221 scripts 3 dependentst-kalinowski
tfautograph:Autograph R for 'Tensorflow'
Translate R control flow expressions into 'Tensorflow' graphs.
Maintained by Tomasz Kalinowski. Last updated 2 years ago.
18 stars 8.62 score 145 scripts 75 dependentsbioc
BgeeDB:Annotation and gene expression data retrieval from Bgee database. TopAnat, an anatomical entities Enrichment Analysis tool for UBERON ontology
A package for the annotation and gene expression data download from Bgee database, and TopAnat analysis: GO-like enrichment of anatomical terms, mapped to genes by expression patterns.
Maintained by Julien Wollbrett. Last updated 5 months ago.
softwaredataimportsequencinggeneexpressionmicroarraygogenesetenrichmentbioinformaticsenrichment-analysisrna-seqscrna-seqsingle-cell
15 stars 8.46 score 19 scripts 1 dependentssamuel-marsh
scCustomize:Custom Visualizations & Functions for Streamlined Analyses of Single Cell Sequencing
Collection of functions created and/or curated to aid in the visualization and analysis of single-cell data using 'R'. 'scCustomize' aims to provide 1) Customized visualizations for aid in ease of use and to create more aesthetic and functional visuals. 2) Improve speed/reproducibility of common tasks/pieces of code in scRNA-seq analysis with a single or group of functions. For citation please use: Marsh SE (2021) "Custom Visualizations & Functions for Streamlined Analyses of Single Cell Sequencing" <doi:10.5281/zenodo.5706430> RRID:SCR_024675.
Maintained by Samuel Marsh. Last updated 3 months ago.
customizationggplot2scrna-seqseuratsingle-cellsingle-cell-genomicssingle-cell-rna-seqvisualization
246 stars 8.45 score 1.1k scriptsrstudio
tfestimators:Interface to 'TensorFlow' Estimators
Interface to 'TensorFlow' Estimators <https://www.tensorflow.org/guide/estimator>, a high-level API that provides implementations of many different model types including linear models and deep neural networks.
Maintained by Tomasz Kalinowski. Last updated 3 years ago.
57 stars 8.42 score 170 scriptsbioc
projectR:Functions for the projection of weights from PCA, CoGAPS, NMF, correlation, and clustering
Functions for the projection of data into the spaces defined by PCA, CoGAPS, NMF, correlation, and clustering.
Maintained by Genevieve Stein-OBrien. Last updated 13 days ago.
functionalpredictiongeneregulationbiologicalquestionsoftware
62 stars 8.42 score 70 scriptscarmonalab
scGate:Marker-Based Cell Type Purification for Single-Cell Sequencing Data
A common bioinformatics task in single-cell data analysis is to purify a cell type or cell population of interest from heterogeneous datasets. 'scGate' automatizes marker-based purification of specific cell populations, without requiring training data or reference gene expression profiles. Briefly, 'scGate' takes as input: i) a gene expression matrix stored in a 'Seurat' object and ii) a “gating model” (GM), consisting of a set of marker genes that define the cell population of interest. The GM can be as simple as a single marker gene, or a combination of positive and negative markers. More complex GMs can be constructed in a hierarchical fashion, akin to gating strategies employed in flow cytometry. 'scGate' evaluates the strength of signature marker expression in each cell using the rank-based method 'UCell', and then performs k-nearest neighbor (kNN) smoothing by calculating the mean 'UCell' score across neighboring cells. kNN-smoothing aims at compensating for the large degree of sparsity in scRNA-seq data. Finally, a universal threshold over kNN-smoothed signature scores is applied in binary decision trees generated from the user-provided gating model, to annotate cells as either “pure” or “impure”, with respect to the cell population of interest. See the related publication Andreatta et al. (2022) <doi:10.1093/bioinformatics/btac141>.
Maintained by Massimo Andreatta. Last updated 2 months ago.
filteringmarker-genesscgatesignaturessingle-cell
106 stars 8.38 score 163 scriptscefet-rj-dal
harbinger:A Unified Time Series Event Detection Framework
By analyzing time series, it is possible to observe significant changes in the behavior of observations that frequently characterize events. Events present themselves as anomalies, change points, or motifs. In the literature, there are several methods for detecting events. However, searching for a suitable time series method is a complex task, especially considering that the nature of events is often unknown. This work presents Harbinger, a framework for integrating and analyzing event detection methods. Harbinger contains several state-of-the-art methods described in Salles et al. (2020) <doi:10.5753/sbbd.2020.13626>.
Maintained by Eduardo Ogasawara. Last updated 4 months ago.
18 stars 8.32 score 216 scriptstaylor-arnold
cleanNLP:A Tidy Data Model for Natural Language Processing
Provides a set of fast tools for converting a textual corpus into a set of normalized tables. Users may make use of the 'udpipe' back end with no external dependencies, or a Python back ends with 'spaCy' <https://spacy.io>. Exposed annotation tasks include tokenization, part of speech tagging, named entity recognition, and dependency parsing.
Maintained by Taylor B. Arnold. Last updated 11 months ago.
corenlpnatural-language-processingspacy
215 stars 8.29 score 229 scriptsbioc
crisprDesign:Comprehensive design of CRISPR gRNAs for nucleases and base editors
Provides a comprehensive suite of functions to design and annotate CRISPR guide RNA (gRNAs) sequences. This includes on- and off-target search, on-target efficiency scoring, off-target scoring, full gene and TSS contextual annotations, and SNP annotation (human only). It currently support five types of CRISPR modalities (modes of perturbations): CRISPR knockout, CRISPR activation, CRISPR inhibition, CRISPR base editing, and CRISPR knockdown. All types of CRISPR nucleases are supported, including DNA- and RNA-target nucleases such as Cas9, Cas12a, and Cas13d. All types of base editors are also supported. gRNA design can be performed on reference genomes, transcriptomes, and custom DNA and RNA sequences. Both unpaired and paired gRNA designs are enabled.
Maintained by Jean-Philippe Fortin. Last updated 26 days ago.
crisprfunctionalgenomicsgenetargetbioconductorbioconductor-packagecrispr-cas9crispr-designcrispr-targetgenomics-analysisgrnagrna-sequencegrna-sequencessgrnasgrna-design
22 stars 8.28 score 80 scripts 3 dependentspecanproject
PEcAnAssimSequential:PEcAn Functions Used for Ecological Forecasts and Reanalysis
The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The goal of PECAn is to streamline the interaction between data and models, and to improve the efficacy of scientific investigation.
Maintained by Mike Dietze. Last updated 9 hours ago.
bayesiancyberinfrastructuredata-assimilationdata-scienceecosystem-modelecosystem-scienceforecastingmeta-analysisnational-science-foundationpecanplantsjagscpp
216 stars 8.14 score 35 scriptsgfellerlab
SuperCell:Simplification of scRNA-seq data by merging together similar cells
Aggregates large single-cell data into metacell dataset by merging together gene expression of very similar cells.
Maintained by The package maintainer. Last updated 9 months ago.
softwarecoarse-grainingscrna-seq-analysisscrna-seq-data
72 stars 8.08 score 93 scriptsbioc
velociraptor:Toolkit for Single-Cell Velocity
This package provides Bioconductor-friendly wrappers for RNA velocity calculations in single-cell RNA-seq data. We use the basilisk package to manage Conda environments, and the zellkonverter package to convert data structures between SingleCellExperiment (R) and AnnData (Python). The information produced by the velocity methods is stored in the various components of the SingleCellExperiment class.
Maintained by Kevin Rue-Albrecht. Last updated 5 months ago.
singlecellgeneexpressionsequencingcoveragerna-velocity
55 stars 8.06 score 52 scriptsbioc
FLAMES:FLAMES: Full Length Analysis of Mutations and Splicing in long read RNA-seq data
Semi-supervised isoform detection and annotation from both bulk and single-cell long read RNA-seq data. Flames provides automated pipelines for analysing isoforms, as well as intermediate functions for manual execution.
Maintained by Changqing Wang. Last updated 9 hours ago.
rnaseqsinglecelltranscriptomicsdataimportdifferentialsplicingalternativesplicinggeneexpressionlongreadzlibcurlbzip2xz-utilscpp
33 stars 8.04 score 12 scriptsbioc
netZooR:Unified methods for the inference and analysis of gene regulatory networks
netZooR unifies the implementations of several Network Zoo methods (netzoo, netzoo.github.io) into a single package by creating interfaces between network inference and network analysis methods. Currently, the package has 3 methods for network inference including PANDA and its optimized implementation OTTER (network reconstruction using mutliple lines of biological evidence), LIONESS (single-sample network inference), and EGRET (genotype-specific networks). Network analysis methods include CONDOR (community detection), ALPACA (differential community detection), CRANE (significance estimation of differential modules), MONSTER (estimation of network transition states). In addition, YARN allows to process gene expresssion data for tissue-specific analyses and SAMBAR infers missing mutation data based on pathway information.
Maintained by Tara Eicher. Last updated 13 days ago.
networkinferencenetworkgeneregulationgeneexpressiontranscriptionmicroarraygraphandnetworkgene-regulatory-networktranscription-factors
105 stars 7.98 scorescasanova
f1dataR:Access Formula 1 Data
Obtain Formula 1 data via the 'Jolpica API' <https://jolpi.ca> and the unofficial API <https://www.formula1.com/en/timing/f1-live> via the 'fastf1' 'Python' library <https://docs.fastf1.dev/>.
Maintained by Santiago Casanova. Last updated 4 days ago.
60 stars 7.97 score 26 scriptsropenspain
spanishoddata:Get Spanish Origin-Destination Data
Gain seamless access to origin-destination (OD) data from the Spanish Ministry of Transport, hosted at <https://www.transportes.gob.es/ministerio/proyectos-singulares/estudios-de-movilidad-con-big-data/opendata-movilidad>. This package simplifies the management of these large datasets by providing tools to download zone boundaries, handle associated origin-destination data, and process it efficiently with the 'duckdb' database interface. Local caching minimizes repeated downloads, streamlining workflows for researchers and analysts. Extensive documentation is available at <https://ropenspain.github.io/spanishoddata/index.html>, offering guides on creating static and dynamic mobility flow visualizations and transforming large datasets into analysis-ready formats.
Maintained by Egor Kotov. Last updated 10 days ago.
cdrdatadata-packagemobile-telephone-datamobilityorigin-destination
35 stars 7.92 score 14 scriptsbioc
COTAN:COexpression Tables ANalysis
Statistical and computational method to analyze the co-expression of gene pairs at single cell level. It provides the foundation for single-cell gene interactome analysis. The basic idea is studying the zero UMI counts' distribution instead of focusing on positive counts; this is done with a generalized contingency tables framework. COTAN can effectively assess the correlated or anti-correlated expression of gene pairs. It provides a numerical index related to the correlation and an approximate p-value for the associated independence test. COTAN can also evaluate whether single genes are differentially expressed, scoring them with a newly defined global differentiation index. Moreover, this approach provides ways to plot and cluster genes according to their co-expression pattern with other genes, effectively helping the study of gene interactions and becoming a new tool to identify cell-identity marker genes.
Maintained by Galfrè Silvia Giulia. Last updated 1 months ago.
systemsbiologytranscriptomicsgeneexpressionsinglecell
16 stars 7.85 score 96 scriptsbioc
PhyloProfile:PhyloProfile
PhyloProfile is a tool for exploring complex phylogenetic profiles. Phylogenetic profiles, presence/absence patterns of genes over a set of species, are commonly used to trace the functional and evolutionary history of genes across species and time. With PhyloProfile we can enrich regular phylogenetic profiles with further data like sequence/structure similarity, to make phylogenetic profiling more meaningful. Besides the interactive visualisation powered by R-Shiny, the package offers a set of further analysis features to gain insights like the gene age estimation or core gene identification.
Maintained by Vinh Tran. Last updated 10 days ago.
softwarevisualizationdatarepresentationmultiplecomparisonfunctionalpredictiondimensionreductionbioinformaticsheatmapinteractive-visualizationsorthologsphylogenetic-profileshiny
33 stars 7.79 score 10 scriptsropensci
rdataretriever:R Interface to the Data Retriever
Provides an R interface to the Data Retriever <https://retriever.readthedocs.io/en/latest/> via the Data Retriever's command line interface. The Data Retriever automates the tasks of finding, downloading, and cleaning public datasets, and then stores them in a local database.
Maintained by Henry Senyondo. Last updated 8 months ago.
datadata-sciencedatabasedatasetsscience
46 stars 7.70 score 36 scriptstheoreticalecology
sjSDM:Scalable Joint Species Distribution Modeling
A scalable and fast method for estimating joint Species Distribution Models (jSDMs) for big community data, including eDNA data. The package estimates a full (i.e. non-latent) jSDM with different response distributions (including the traditional multivariate probit model). The package allows to perform variation partitioning (VP) / ANOVA on the fitted models to separate the contribution of environmental, spatial, and biotic associations. In addition, the total R-squared can be further partitioned per species and site to reveal the internal metacommunity structure, see Leibold et al., <doi:10.1111/oik.08618>. The internal structure can then be regressed against environmental and spatial distinctiveness, richness, and traits to analyze metacommunity assembly processes. The package includes support for accounting for spatial autocorrelation and the option to fit responses using deep neural networks instead of a standard linear predictor. As described in Pichler & Hartig (2021) <doi:10.1111/2041-210X.13687>, scalability is achieved by using a Monte Carlo approximation of the joint likelihood implemented via 'PyTorch' and 'reticulate', which can be run on CPUs or GPUs.
Maintained by Maximilian Pichler. Last updated 1 months ago.
deep-learninggpu-accelerationmachine-learningspecies-distribution-modellingspecies-interactions
69 stars 7.64 score 70 scriptsropengov
retroharmonize:Ex Post Survey Data Harmonization
Assist in reproducible retrospective (ex-post) harmonization of data, particularly individual level survey data, by providing tools for organizing metadata, standardizing the coding of variables, and variable names and value labels, including missing values, and documenting the data transformations, with the help of comprehensive s3 classes.
Maintained by Daniel Antal. Last updated 2 months ago.
10 stars 7.62 score 59 scriptsnandp1
nser:Bhavcopy and Live Market Data from National Stock Exchange (NSE) & Bombay Stock Exchange (BSE) India
Download Current & Historical Bhavcopy. Get Live Market data from NSE India of Equities and Derivatives (F&O) segment. Data source <https://www.nseindia.com/>.
Maintained by Nandan Patil. Last updated 5 months ago.
bhavbhavcopybhavcopy-downloaderfinancial-datamarket-datanational-stock-exchangensense-stock-dataoption-pricingoptionchainrseleniumstock-prices
8 stars 7.61 score 76 scriptsbioc
rrvgo:Reduce + Visualize GO
Reduce and visualize lists of Gene Ontology terms by identifying redudance based on semantic similarity.
Maintained by Sergi Sayols. Last updated 5 months ago.
annotationclusteringgonetworkpathwayssoftware
26 stars 7.60 score 190 scriptsneurogenomics
rworkflows:Test, Document, Containerise, and Deploy R Packages
Reproducibility is essential to the progress of research, yet achieving it remains elusive even in computational fields. Continuous Integration (CI) platforms offer a powerful way to launch automated workflows to check and document code, but often require considerable time, effort, and technical expertise to setup. We therefore developed the rworkflows suite to make robust CI workflows easy and freely accessible to all R package developers. rworkflows consists of 1) a CRAN/Bioconductor-compatible R package template, 2) an R package to quickly implement a standardised workflow, and 3) a centrally maintained GitHub Action.
Maintained by Brian Schilder. Last updated 2 months ago.
softwareworkflowmanagementbioconductorcontainerscontinuous-integrationdockerdockerhubgithub-actionsreproducibilityworkflows
79 stars 7.60 score 6 scriptsbioc
scDesign3:A unified framework of realistic in silico data generation and statistical model inference for single-cell and spatial omics
We present a statistical simulator, scDesign3, to generate realistic single-cell and spatial omics data, including various cell states, experimental designs, and feature modalities, by learning interpretable parameters from real data. Using a unified probabilistic model for single-cell and spatial omics data, scDesign3 infers biologically meaningful parameters; assesses the goodness-of-fit of inferred cell clusters, trajectories, and spatial locations; and generates in silico negative and positive controls for benchmarking computational tools.
Maintained by Dongyuan Song. Last updated 29 days ago.
softwaresinglecellsequencinggeneexpressionspatial
89 stars 7.59 score 25 scriptsbioc
ggsc:Visualizing Single Cell and Spatial Transcriptomics
Useful functions to visualize single cell and spatial data. It supports visualizing 'Seurat', 'SingleCellExperiment' and 'SpatialExperiment' objects through grammar of graphics syntax implemented in 'ggplot2'.
Maintained by Guangchuang Yu. Last updated 5 months ago.
dimensionreductiongeneexpressionsinglecellsoftwarespatialtranscriptomicsvisualizationopenblascppopenmp
47 stars 7.59 score 18 scriptsohdsi
CohortSymmetry:Sequence Symmetry Analysis Using the Observational Medical Outcomes Partnership Common Data Model
Calculating crude sequence ratio, adjusted sequence ratio and confidence intervals using data mapped to the Observational Medical Outcomes Partnership Common Data Model.
Maintained by Xihang Chen. Last updated 7 days ago.
1 stars 7.52 score 73 scriptsbioc
SimBu:Simulate Bulk RNA-seq Datasets from Single-Cell Datasets
SimBu can be used to simulate bulk RNA-seq datasets with known cell type fractions. You can either use your own single-cell study for the simulation or the sfaira database. Different pre-defined simulation scenarios exist, as are options to run custom simulations. Additionally, expression values can be adapted by adding an mRNA bias, which produces more biologically relevant simulations.
Maintained by Alexander Dietrich. Last updated 3 days ago.
15 stars 7.50 score 29 scripts 1 dependentsmyeomans
politeness:Detecting Politeness Features in Text
Detecting markers of politeness in English natural language. This package allows researchers to easily visualize and quantify politeness between groups of documents. This package combines prior research on the linguistic markers of politeness. We thank the Spencer Foundation, the Hewlett Foundation, and Harvard's Institute for Quantitative Social Science for support.
Maintained by Mike Yeomans. Last updated 2 months ago.
25 stars 7.49 score 41 scripts 1 dependentshendersontrent
theft:Tools for Handling Extraction of Features from Time Series
Consolidates and calculates different sets of time-series features from multiple 'R' and 'Python' packages including 'Rcatch22' Henderson, T. (2021) <doi:10.5281/zenodo.5546815>, 'feasts' O'Hara-Wild, M., Hyndman, R., and Wang, E. (2021) <https://CRAN.R-project.org/package=feasts>, 'tsfeatures' Hyndman, R., Kang, Y., Montero-Manso, P., Talagala, T., Wang, E., Yang, Y., and O'Hara-Wild, M. (2020) <https://CRAN.R-project.org/package=tsfeatures>, 'tsfresh' Christ, M., Braun, N., Neuffer, J., and Kempa-Liehr A.W. (2018) <doi:10.1016/j.neucom.2018.03.067>, 'TSFEL' Barandas, M., et al. (2020) <doi:10.1016/j.softx.2020.100456>, and 'Kats' Facebook Infrastructure Data Science (2021) <https://facebookresearch.github.io/Kats/>.
Maintained by Trent Henderson. Last updated 2 months ago.
data-visualisationdata-visualizationdimensionality-reductionmachine-learningtime-series
40 stars 7.48 score 50 scripts 1 dependentsrstudio
tfhub:Interface to 'TensorFlow' Hub
'TensorFlow' Hub is a library for the publication, discovery, and consumption of reusable parts of machine learning models. A module is a self-contained piece of a 'TensorFlow' graph, along with its weights and assets, that can be reused across different tasks in a process known as transfer learning. Transfer learning train a model with a smaller dataset, improve generalization, and speed up training.
Maintained by Tomasz Kalinowski. Last updated 3 years ago.
29 stars 7.46 score 73 scripts 1 dependentsbioc
crisprScore:On-Target and Off-Target Scoring Algorithms for CRISPR gRNAs
Provides R wrappers of several on-target and off-target scoring methods for CRISPR guide RNAs (gRNAs). The following nucleases are supported: SpCas9, AsCas12a, enAsCas12a, and RfxCas13d (CasRx). The available on-target cutting efficiency scoring methods are RuleSet1, Azimuth, DeepHF, DeepCpf1, enPAM+GB, and CRISPRscan. Both the CFD and MIT scoring methods are available for off-target specificity prediction. The package also provides a Lindel-derived score to predict the probability of a gRNA to produce indels inducing a frameshift for the Cas9 nuclease. Note that DeepHF, DeepCpf1 and enPAM+GB are not available on Windows machines.
Maintained by Jean-Philippe Fortin. Last updated 5 months ago.
crisprfunctionalgenomicsfunctionalpredictionbioconductorbioconductor-packagecrispr-cas9crispr-designcrispr-targetgenomicsgrnagrna-sequencegrna-sequencesscoring-algorithmsgrnasgrna-design
16 stars 7.44 score 19 scripts 4 dependentsbioc
MOSim:Multi-Omics Simulation (MOSim)
MOSim package simulates multi-omic experiments that mimic regulatory mechanisms within the cell, allowing flexible experimental design including time course and multiple groups.
Maintained by Sonia Tarazona. Last updated 5 months ago.
softwaretimecourseexperimentaldesignrnaseqcpp
9 stars 7.42 score 11 scriptsmodeloriented
shapper:Wrapper of Python Library 'shap'
Provides SHAP explanations of machine learning models. In applied machine learning, there is a strong belief that we need to strike a balance between interpretability and accuracy. However, in field of the Interpretable Machine Learning, there are more and more new ideas for explaining black-box models. One of the best known method for local explanations is SHapley Additive exPlanations (SHAP) introduced by Lundberg, S., et al., (2016) <arXiv:1705.07874> The SHAP method is used to calculate influences of variables on the particular observation. This method is based on Shapley values, a technique used in game theory. The R package 'shapper' is a port of the Python library 'shap'.
Maintained by Szymon Maksymiuk. Last updated 2 years ago.
58 stars 7.31 score 59 scriptsbioc
CRISPRseek:Design of guide RNAs in CRISPR genome-editing systems
The package encompasses functions to find potential guide RNAs for the CRISPR-based genome-editing systems including the Base Editors and the Prime Editors when supplied with target sequences as input. Users have the flexibility to filter resulting guide RNAs based on parameters such as the absence of restriction enzyme cut sites or the lack of paired guide RNAs. The package also facilitates genome-wide exploration for off-targets, offering features to score and rank off-targets, retrieve flanking sequences, and indicate whether the hits are located within exon regions. All detected guide RNAs are annotated with the cumulative scores of the top5 and topN off-targets together with the detailed information such as mismatch sites and restrictuion enzyme cut sites. The package also outputs INDELs and their frequencies for Cas9 targeted sites.
Maintained by Lihua Julie Zhu. Last updated 21 days ago.
immunooncologygeneregulationsequencematchingcrispr
7.18 score 51 scripts 2 dependentsropensci
gitignore:Create Useful .gitignore Files for your Project
Simple interface to query gitignore.io to fetch gitignore templates that can be included in the .gitignore file. More than 450 templates are currently available.
Maintained by Philippe Massicotte. Last updated 5 months ago.
32 stars 7.14 score 31 scriptsropensci
pkgcheck:rOpenSci Package Checks
Check whether a package is ready for submission to rOpenSci's peer review system.
Maintained by Mark Padgham. Last updated 20 days ago.
compliance-automationsoftware-analysissoftware-checking
19 stars 7.13 score 29 scripts 1 dependentsbioc
CuratedAtlasQueryR:Queries the Human Cell Atlas
Provides access to a copy of the Human Cell Atlas, but with harmonised metadata. This allows for uniform querying across numerous datasets within the Atlas using common fields such as cell type, tissue type, and patient ethnicity. Usage involves first querying the metadata table for cells of interest, and then downloading the corresponding cells into a SingleCellExperiment object.
Maintained by Stefano Mangiola. Last updated 5 months ago.
assaydomaininfrastructurernaseqdifferentialexpressiongeneexpressionnormalizationclusteringqualitycontrolsequencingtranscriptiontranscriptomicsdatabaseduckdbhdf5human-cell-atlassingle-cellsinglecellexperimenttidyverse
90 stars 7.04 score 41 scriptsbioc
pipeComp:pipeComp pipeline benchmarking framework
A simple framework to facilitate the comparison of pipelines involving various steps and parameters. The `pipelineDefinition` class represents pipelines as, minimally, a set of functions consecutively executed on the output of the previous one, and optionally accompanied by step-wise evaluation and aggregation functions. Given such an object, a set of alternative parameters/methods, and benchmark datasets, the `runPipeline` function then proceeds through all combinations arguments, avoiding recomputing the same step twice and compiling evaluations on the fly to avoid storing potentially large intermediate data.
Maintained by Pierre-Luc Germain. Last updated 5 months ago.
geneexpressiontranscriptomicsclusteringdatarepresentationbenchmarkbioconductorpipeline-benchmarkingpipelinessingle-cell-rna-seq
41 stars 7.02 score 43 scriptsalesantuz
musclesyneRgies:Extract Muscle Synergies from Electromyography
Provides a framework to factorise electromyography (EMG) data. Tools are provided for raw data pre-processing, non negative matrix factorisation, classification of factorised data and plotting of obtained outcomes. In particular, reading from ASCII files is supported, along with wide-used filtering approaches to process EMG data. All steps include one or more sensible defaults that aim at simplifying the workflow. Yet, all functions are largely tunable at need. Example data sets are included.
Maintained by Alessandro Santuz. Last updated 3 months ago.
emgmuscle-synergiesnmfphysiolrstudio
40 stars 7.01 score 16 scriptstkcaccia
KODAMA:Knowledge Discovery by Accuracy Maximization
An unsupervised and semi-supervised learning algorithm that performs feature extraction from noisy and high-dimensional data. It facilitates identification of patterns representing underlying groups on all samples in a data set. Based on Cacciatore S, Tenori L, Luchinat C, Bennett PR, MacIntyre DA. (2017) Bioinformatics <doi:10.1093/bioinformatics/btw705> and Cacciatore S, Luchinat C, Tenori L. (2014) Proc Natl Acad Sci USA <doi:10.1073/pnas.1220873111>.
Maintained by Stefano Cacciatore. Last updated 4 hours ago.
1 stars 7.00 score 63 scripts 1 dependentsdyfanjones
RAthena:Connect to 'AWS Athena' using 'Boto3' ('DBI' Interface)
Designed to be compatible with the R package 'DBI' (Database Interface) when connecting to Amazon Web Service ('AWS') Athena <https://aws.amazon.com/athena/>. To do this 'Python' 'Boto3' Software Development Kit ('SDK') <https://boto3.amazonaws.com/v1/documentation/api/latest/index.html> is used as a driver.
Maintained by Dyfan Jones. Last updated 1 years ago.
37 stars 6.99 score 38 scriptsflyaflya
causact:Fast, Easy, and Visual Bayesian Inference
Accelerate Bayesian analytics workflows in 'R' through interactive modelling, visualization, and inference. Define probabilistic graphical models using directed acyclic graphs (DAGs) as a unifying language for business stakeholders, statisticians, and programmers. This package relies on interfacing with the 'numpyro' python package.
Maintained by Adam Fleischhacker. Last updated 2 months ago.
bayesian-inferencedagsposterior-probabilityprobabilistic-graphical-modelsprobabilistic-programming
45 stars 6.97 score 52 scriptsbioc
animalcules:Interactive microbiome analysis toolkit
animalcules is an R package for utilizing up-to-date data analytics, visualization methods, and machine learning models to provide users an easy-to-use interactive microbiome analysis framework. It can be used as a standalone software package or users can explore their data with the accompanying interactive R Shiny application. Traditional microbiome analysis such as alpha/beta diversity and differential abundance analysis are enhanced, while new methods like biomarker identification are introduced by animalcules. Powerful interactive and dynamic figures generated by animalcules enable users to understand their data better and discover new insights.
Maintained by Jessica McClintock. Last updated 5 months ago.
microbiomemetagenomicscoveragevisualization
55 stars 6.95 score 23 scriptsvegawidget
altair:Interface to 'Altair'
Interface to 'Altair' <https://altair-viz.github.io>, which itself is a 'Python' interface to 'Vega-Lite' <https://vega.github.io/vega-lite/>. This package uses the 'Reticulate' framework <https://rstudio.github.io/reticulate/> to manage the interface between R and 'Python'.
Maintained by Ian Lyttle. Last updated 1 years ago.
altairinteractivereticulatevega-litevisualization
91 stars 6.94 score 23 scripts 1 dependentsraymondbalise
rUM:R Templates from the University of Miami
This holds some r markdown and quarto templates and a template to create a research project in "R Studio".
Maintained by Raymond Balise. Last updated 10 days ago.
9 stars 6.84 score 16 scriptsbioc
MicrobiomeProfiler:An R/shiny package for microbiome functional enrichment analysis
This is an R/shiny package to perform functional enrichment analysis for microbiome data. This package was based on clusterProfiler. Moreover, MicrobiomeProfiler support KEGG enrichment analysis, COG enrichment analysis, Microbe-Disease association enrichment analysis, Metabo-Pathway analysis.
Maintained by Guangchuang Yu. Last updated 5 months ago.
microbiomesoftwarevisualizationkegg
38 stars 6.80 score 22 scriptsr-cas
caracas:Computer Algebra
Computer algebra via the 'SymPy' library (<https://www.sympy.org/>). This makes it possible to solve equations symbolically, find symbolic integrals, symbolic sums and other important quantities.
Maintained by Mikkel Meyer Andersen. Last updated 29 days ago.
24 stars 6.80 score 87 scripts 1 dependentskylegrealis
froggeR:Enhance 'Quarto' Project Workflows and Standards
Streamlines 'Quarto' workflows by providing tools for consistent project setup and documentation. Enables portability through reusable metadata, automated project structure creation, and standardized templates. Features include enhanced project initialization, pre-formatted 'Quarto' documents, comprehensive data protection settings, custom styling, and structured documentation generation. Designed to improve efficiency and collaboration in R data science projects by reducing repetitive setup tasks while maintaining consistent formatting across multiple documents. There are many valuable resources providing in-depth explanations of customizing 'Quarto' templates and theme styling by the Posit team: <https://quarto.org/docs/output-formats/html-themes.html#customizing-themes> & <https://quarto.org/docs/output-formats/html-themes-more.html>, and at the Bootstrap community's GitHub at <https://github.com/twbs/bootstrap/blob/main/scss/_variables.scss>.
Maintained by Kyle Grealis. Last updated 15 days ago.
data-scienceproject-managementquarto
26 stars 6.67 score 6 scriptsinsightsengineering
sasr:'SAS' Interface
Provides a 'SAS' interface, through 'SASPy'(<https://sassoftware.github.io/saspy/>) and 'reticulate'(<https://rstudio.github.io/reticulate/>). This package helps you create 'SAS' sessions, execute 'SAS' code in remote 'SAS' servers, retrieve execution results and log, and exchange datasets between 'SAS' and 'R'. It also helps you to install 'SASPy' and create a configuration file for the connection. Please review the 'SASPy' license file as instructed so that you comply with its separate and independent license.
Maintained by Liming Li. Last updated 1 days ago.
16 stars 6.62 score 21 scriptseagerai
kerastuneR:Interface to 'Keras Tuner'
'Keras Tuner' <https://keras-team.github.io/keras-tuner/> is a hypertuning framework made for humans. It aims at making the life of AI practitioners, hypertuner algorithm creators and model designers as simple as possible by providing them with a clean and easy to use API for hypertuning. 'Keras Tuner' makes moving from a base model to a hypertuned one quick and easy by only requiring you to change a few lines of code.
Maintained by Turgut Abdullayev. Last updated 12 months ago.
hyperparameter-tuninghypertuningkeraskeras-tunertensorflowtrial
34 stars 6.61 score 48 scriptsbioc
scAnnotatR:Pretrained learning models for cell type prediction on single cell RNA-sequencing data
The package comprises a set of pretrained machine learning models to predict basic immune cell types. This enables all users to quickly get a first annotation of the cell types present in their dataset without requiring prior knowledge. scAnnotatR also allows users to train their own models to predict new cell types based on specific research needs.
Maintained by Johannes Griss. Last updated 5 months ago.
singlecelltranscriptomicsgeneexpressionsupportvectormachineclassificationsoftware
15 stars 6.61 score 20 scriptsdaroczig
botor:'AWS Python SDK' ('boto3') for R
Fork-safe, raw access to the 'Amazon Web Services' ('AWS') 'SDK' via the 'boto3' 'Python' module, and convenient helper functions to query the 'Simple Storage Service' ('S3') and 'Key Management Service' ('KMS'), partial support for 'IAM', the 'Systems Manager Parameter Store' and 'Secrets Manager'.
Maintained by Gergely Daróczi. Last updated 2 months ago.
amazon-web-servicesawsboto3python
30 stars 6.60 score 32 scriptsbioc
M3C:Monte Carlo Reference-based Consensus Clustering
M3C is a consensus clustering algorithm that uses a Monte Carlo simulation to eliminate overestimation of K and can reject the null hypothesis K=1.
Maintained by Christopher John. Last updated 5 months ago.
clusteringgeneexpressiontranscriptionrnaseqsequencingimmunooncology
6.59 score 174 scripts 1 dependentscarmonalab
GeneNMF:Non-Negative Matrix Factorization for Single-Cell Omics
A collection of methods to extract gene programs from single-cell gene expression data using non-negative matrix factorization (NMF). 'GeneNMF' contains functions to directly interact with the 'Seurat' toolkit and derive interpretable gene program signatures.
Maintained by Massimo Andreatta. Last updated 13 days ago.
105 stars 6.58 score 12 scriptstaxonomicallyinformedannotation
tima:Taxonomically Informed Metabolite Annotation
This package provides the infrastructure to perform Taxonomically Informed Metabolite Annotation.
Maintained by Adriano Rutz. Last updated 2 hours ago.
metabolite annotationchemotaxonomyscoring systemnatural productscomputational metabolomicstaxonomic distancespecialized metabolome
9 stars 6.56 score 32 scripts 2 dependentslaminlabs
laminr:Client for 'LaminDB'
Interact with 'LaminDB'. 'LaminDB' is an open-source data framework for biology. This package allows you to query and download data from 'LaminDB' instances.
Maintained by Robrecht Cannoodt. Last updated 3 days ago.
8 stars 6.54 score 13 scriptsmidasverse
rMIDAS:Multiple Imputation with Denoising Autoencoders
A tool for multiply imputing missing data using 'MIDAS', a deep learning method based on denoising autoencoder neural networks. This algorithm offers significant accuracy and efficiency advantages over other multiple imputation strategies, particularly when applied to large datasets with complex features. Alongside interfacing with 'Python' to run the core algorithm, this package contains functions for processing data before and after model training, running imputation model diagnostics, generating multiple completed datasets, and estimating regression models on these datasets.
Maintained by Thomas Robinson. Last updated 1 years ago.
deep-learningimputation-methodsneural-networkreticulatetensorflow
34 stars 6.53 score 33 scriptsbioc
SpotClean:SpotClean adjusts for spot swapping in spatial transcriptomics data
SpotClean is a computational method to adjust for spot swapping in spatial transcriptomics data. Recent spatial transcriptomics experiments utilize slides containing thousands of spots with spot-specific barcodes that bind mRNA. Ideally, unique molecular identifiers at a spot measure spot-specific expression, but this is often not the case due to bleed from nearby spots, an artifact we refer to as spot swapping. SpotClean is able to estimate the contamination rate in observed data and decontaminate the spot swapping effect, thus increase the sensitivity and precision of downstream analyses.
Maintained by Zijian Ni. Last updated 5 months ago.
dataimportrnaseqsequencinggeneexpressionspatialsinglecelltranscriptomicspreprocessingrna-seqspatial-transcriptomics
31 stars 6.52 score 36 scriptsschochastics
edgebundle:Algorithms for Bundling Edges in Networks and Visualizing Flow and Metro Maps
Implements several algorithms for bundling edges in networks and flow and metro map layouts. This includes force directed edge bundling <doi:10.1111/j.1467-8659.2009.01450.x>, a flow algorithm based on Steiner trees<doi:10.1080/15230406.2018.1437359> and a multicriteria optimization method for metro map layouts <doi:10.1109/TVCG.2010.24>.
Maintained by David Schoch. Last updated 6 months ago.
graph-algorithmsnetwork-analysisvisualizationcpp
129 stars 6.52 score 512 scriptsbioc
CatsCradle:This package provides methods for analysing spatial transcriptomics data and for discovering gene clusters
This package addresses two broad areas. It allows for in-depth analysis of spatial transcriptomic data by identifying tissue neighbourhoods. These are contiguous regions of tissue surrounding individual cells. 'CatsCradle' allows for the categorisation of neighbourhoods by the cell types contained in them and the genes expressed in them. In particular, it produces Seurat objects whose individual elements are neighbourhoods rather than cells. In addition, it enables the categorisation and annotation of genes by producing Seurat objects whose elements are genes.
Maintained by Michael Shapiro. Last updated 14 days ago.
biologicalquestionstatisticalmethodgeneexpressionsinglecelltranscriptomicsspatial
3 stars 6.52 scoredgrun
FateID:Quantification of Fate Bias in Multipotent Progenitors
Application of 'FateID' allows computation and visualization of cell fate bias for multi-lineage single cell transcriptome data. Herman, J.S., Sagar, Grün D. (2018) <DOI:10.1038/nmeth.4662>.
Maintained by Dominic Grün. Last updated 3 years ago.
22 stars 6.50 score 48 scripts 1 dependentsmolina-valero
FORTLS:Automatic Processing of Terrestrial-Based Technologies Point Cloud Data for Forestry Purposes
Process automation of point cloud data derived from terrestrial-based technologies such as Terrestrial Laser Scanner (TLS) or Mobile Laser Scanner. 'FORTLS' enables (i) detection of trees and estimation of tree-level attributes (e.g. diameters and heights), (ii) estimation of stand-level variables (e.g. density, basal area, mean and dominant height), (iii) computation of metrics related to important forest attributes estimated in Forest Inventories at stand-level, and (iv) optimization of plot design for combining TLS data and field measured data. Documentation about 'FORTLS' is described in Molina-Valero et al. (2022, <doi:10.1016/j.envsoft.2022.105337>).
Maintained by Juan Alberto Molina-Valero. Last updated 12 days ago.
forest-inventoryforest-monitoringlidar-point-cloudcpp
23 stars 6.48 score 11 scriptsmathewchamberlain
SignacX:Cell Type Identification and Discovery from Single Cell Gene Expression Data
An implementation of neural networks trained with flow-sorted gene expression data to classify cellular phenotypes in single cell RNA-sequencing data. See Chamberlain M et al. (2021) <doi:10.1101/2021.02.01.429207> for more details.
Maintained by Mathew Chamberlain. Last updated 2 years ago.
cellular-phenotypesseuratsingle-cell-rna-seq
25 stars 6.47 score 34 scriptsqile0317
APackOfTheClones:Visualization of Clonal Expansion for Single Cell Immune Profiles
Visualize clonal expansion via circle-packing. 'APackOfTheClones' extends 'scRepertoire' to produce a publication-ready visualization of clonal expansion at a single cell resolution, by representing expanded clones as differently sized circles. The method was originally implemented by Murray Christian and Ben Murrell in the following immunology study: Ma et al. (2021) <doi:10.1126/sciimmunol.abg6356>.
Maintained by Qile Yang. Last updated 4 months ago.
clonal-analysisimmune-repertoireimmune-systemscrna-seqscrnaseqseuratsingle-cellsingle-cell-genomicscpp
15 stars 6.45 score 15 scriptsethanbass
chromConverter:Chromatographic File Converter
Reads chromatograms from binary formats into R objects. Currently supports conversion of 'Agilent ChemStation', 'Agilent MassHunter', 'Shimadzu LabSolutions', 'ThermoRaw', and 'Varian Workstation' files as well as various text-based formats. In addition to its internal parsers, chromConverter contains bindings to parsers in external libraries, such as 'Aston' <https://github.com/bovee/aston>, 'Entab' <https://github.com/bovee/entab>, 'rainbow' <https://rainbow-api.readthedocs.io/>, and 'ThermoRawFileParser' <https://github.com/compomics/ThermoRawFileParser>.
Maintained by Ethan Bass. Last updated 17 hours ago.
cheminformaticschromatographyfair-datagc-fidhplchplc-dadhplc-uvmetabolomicsmetabolomics-dataopen-dataopen-science
34 stars 6.44 score 16 scripts 2 dependentslhe17
nebula:Negative Binomial Mixed Models Using Large-Sample Approximation for Differential Expression Analysis of ScRNA-Seq Data
A fast negative binomial mixed model for conducting association analysis of multi-subject single-cell data. It can be used for identifying marker genes, differential expression and co-expression analyses. The model includes subject-level random effects to account for the hierarchical structure in multi-subject single-cell data. See He et al. (2021) <doi:10.1038/s42003-021-02146-6>.
Maintained by Liang He. Last updated 5 days ago.
37 stars 6.43 score 145 scriptskjhealy
gssr:US General Social Survey (GSS) Data for R
The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the GSS Cumulative Data and GSS Panel Data files packaged for R. Its companion package, gssrdoc, provides the codebook integrated into R's help system For more information on the GSS see \url{http://gss.norc.org}.
Maintained by Kieran Healy. Last updated 5 months ago.
45 stars 6.42 score 147 scriptsbioc
sRACIPE:Systems biology tool to simulate gene regulatory circuits
sRACIPE implements a randomization-based method for gene circuit modeling. It allows us to study the effect of both the gene expression noise and the parametric variation on any gene regulatory circuit (GRC) using only its topology, and simulates an ensemble of models with random kinetic parameters at multiple noise levels. Statistical analysis of the generated gene expressions reveals the basin of attraction and stability of various phenotypic states and their changes associated with intrinsic and extrinsic noises. sRACIPE provides a holistic picture to evaluate the effects of both the stochastic nature of cellular processes and the parametric variation.
Maintained by Mingyang Lu. Last updated 1 months ago.
researchfieldsystemsbiologymathematicalbiologygeneexpressiongeneregulationgenetargetcpp
4 stars 6.40 score 209 scriptsatomashevic
transforEmotion:Sentiment Analysis for Text, Image and Video using Transformer Models
Implements sentiment analysis using huggingface <https://huggingface.co> transformer zero-shot classification model pipelines for text and image data. The default text pipeline is Cross-Encoder's DistilRoBERTa <https://huggingface.co/cross-encoder/nli-distilroberta-base> and default image/video pipeline is Open AI's CLIP <https://huggingface.co/openai/clip-vit-base-patch32>. All other zero-shot classification model pipelines can be implemented using their model name from <https://huggingface.co/models?pipeline_tag=zero-shot-classification>.
Maintained by Aleksandar Tomašević. Last updated 3 months ago.
26 stars 6.40 score 12 scriptsbioc
ontoProc:processing of ontologies of anatomy, cell lines, and so on
Support harvesting of diverse bioinformatic ontologies, making particular use of the ontologyIndex package on CRAN. We provide snapshots of key ontologies for terms about cells, cell lines, chemical compounds, and anatomy, to help analyze genome-scale experiments, particularly cell x compound screens. Another purpose is to strengthen development of compelling use cases for richer interfaces to emerging ontologies.
Maintained by Vincent Carey. Last updated 18 days ago.
infrastructuregobioinformaticsgenomicsontology
3 stars 6.37 score 75 scripts 2 dependentsethanbass
chromatographR:Chromatographic Data Analysis Toolset
Tools for high-throughput analysis of HPLC-DAD/UV chromatograms (or similar data). Includes functions for preprocessing, alignment, peak-finding and fitting, peak-table construction, data-visualization, etc. Preprocessing and peak-table construction follow the rough formula laid out in 'alsace' (Wehrens, R., Bloemberg, T.G., and Eilers P.H.C., 2015. <doi:10.1093/bioinformatics/btv299>. Alignment of chromatograms is available using parametric time warping (as implemented in the 'ptw' package) (Wehrens, R., Bloemberg, T.G., and Eilers P.H.C. 2015. <doi:10.1093/bioinformatics/btv299>) or variable penalty dynamic time warping (as implemented in 'VPdtw') (Clifford, D., & Stone, G. 2012. <doi:10.18637/jss.v047.i08>). Peak-finding uses the algorithm by Tom O'Haver <https://terpconnect.umd.edu/~toh/spectrum/PeakFindingandMeasurement.htm>. Peaks are then fitted to a gaussian or exponential-gaussian hybrid peak shape using non-linear least squares (Lan, K. & Jorgenson, J. W. 2001. <doi:10.1016/S0021-9673(01)00594-5>). See the vignette for more details and suggested workflow.
Maintained by Ethan Bass. Last updated 15 days ago.
bioinformaticscheminformaticschromatographygc-fidhplchplc-dadhplc-pdahplv-uvmetabolomicsopen-dataopen-sciencereproducibilityreproducible-research
18 stars 6.36 score 8 scripts 1 dependentsarutools
ARUtools:Management and Processing of Autonomous Recording Unit (ARU) Data
Parse Autonomous Recording Unit (ARU) data and for sub-sampling recordings. Extract Metadata from your recordings, select a subset of recordings for interpretation, and prepare files for processing on the 'WildTrax' <https://wildtrax.ca/> platform. Read and process metadata from recordings collected using the SongMeter and BAR-LT types of ARUs.
Maintained by David Hope. Last updated 12 days ago.
6.34 score 26 scriptsthewileylab
ReviewR:A Light-Weight, Portable Tool for Reviewing Individual Patient Records
A portable Shiny tool to explore patient-level electronic health record data and perform chart review in a single integrated framework. This tool supports browsing clinical data in many different formats including multiple versions of the 'OMOP' common data model as well as the 'MIMIC-III' data model. In addition, chart review information is captured and stored securely via the Shiny interface in a 'REDCap' (Research Electronic Data Capture) project using the 'REDCap' API. See the 'ReviewR' website for additional information, documentation, and examples.
Maintained by David Mayer. Last updated 2 years ago.
24 stars 6.33 score 6 scriptsgreta-dev
greta.gp:Gaussian Process Modelling in 'greta'
Provides a syntax to create and combine Gaussian process kernels in 'greta'. You can then them to define either full rank or sparse Gaussian processes. This is an extension to the 'greta' software, Golding (2019) <doi:10.21105/joss.01601>.
Maintained by Nicholas Tierney. Last updated 3 months ago.
19 stars 6.33 score 28 scriptsabhi-1u
texor:Converting 'LaTeX' 'R Journal' Articles into 'RJ-web-articles'
Articles in the 'R Journal' were first authored in 'LaTeX', which performs admirably for 'PDF' files but is less than ideal for modern online interfaces. The 'texor' package does all the transitional chores and conversions necessary to move to the online versions.
Maintained by Abhishek Ulayil. Last updated 19 hours ago.
7 stars 6.32 score 8 scriptsgesistsa
rang:Reconstructing Reproducible R Computational Environments
Resolve the dependency graph of R packages at a specific time point based on the information from various 'R-hub' web services <https://blog.r-hub.io/>. The dependency graph can then be used to reconstruct the R computational environment with 'Rocker' <https://rocker-project.org>.
Maintained by Chung-hong Chan. Last updated 2 months ago.
reproducibilityreproducible-research
80 stars 6.32 score 13 scriptsbioc
scDataviz:scDataviz: single cell dataviz and downstream analyses
In the single cell World, which includes flow cytometry, mass cytometry, single-cell RNA-seq (scRNA-seq), and others, there is a need to improve data visualisation and to bring analysis capabilities to researchers even from non-technical backgrounds. scDataviz attempts to fit into this space, while also catering for advanced users. Additonally, due to the way that scDataviz is designed, which is based on SingleCellExperiment, it has a 'plug and play' feel, and immediately lends itself as flexibile and compatibile with studies that go beyond scDataviz. Finally, the graphics in scDataviz are generated via the ggplot engine, which means that users can 'add on' features to these with ease.
Maintained by Kevin Blighe. Last updated 5 months ago.
singlecellimmunooncologyrnaseqgeneexpressiontranscriptionflowcytometrymassspectrometrydataimport
63 stars 6.30 score 16 scriptsphuse-org
sendigR:Enable Cross-Study Analysis of 'CDISC' 'SEND' Datasets
A system enables cross study Analysis by extracting and filtering study data for control animals from 'CDISC' 'SEND' Study Repository. These data types are supported: Body Weights, Laboratory test results and Microscopic findings. These database types are supported: 'SQLite' and 'Oracle'.
Maintained by Wenxian Wang. Last updated 25 days ago.
12 stars 6.28 score 6 scriptsbioc
recountmethylation:Access and analyze public DNA methylation array data compilations
Resources for cross-study analyses of public DNAm array data from NCBI GEO repo, produced using Illumina's Infinium HumanMethylation450K (HM450K) and MethylationEPIC (EPIC) platforms. Provided functions enable download, summary, and filtering of large compilation files. Vignettes detail background about file formats, example analyses, and more. Note the disclaimer on package load and consult the main manuscripts for further info.
Maintained by Sean K Maden. Last updated 5 months ago.
dnamethylationepigeneticsmicroarraymethylationarrayexperimenthub
9 stars 6.28 score 9 scriptsjensharbers
agricolaeplotr:Visualization of Design of Experiments from the 'agricolae' Package
Visualization of Design of Experiments from the 'agricolae' package with 'ggplot2' framework The user provides an experiment design from the 'agricolae' package, calls the corresponding function and will receive a visualization with 'ggplot2' based functions that are specific for each design. As there are many different designs, each design is tested on its type. The output can be modified with standard 'ggplot2' commands or with other packages with 'ggplot2' function extensions.
Maintained by Jens Harbers. Last updated 2 months ago.
8 stars 6.27 score 78 scriptsropensci
deposits:A universal client for depositing and accessing research data anywhere
A universal client for depositing and accessing research data anywhere. Currently supported services are zenodo and figshare.
Maintained by Mark Padgham. Last updated 7 months ago.
39 stars 6.21 score 8 scripts 2 dependentsropensci
autotest:Automatic Package Testing
Automatic testing of R packages via a simple YAML schema.
Maintained by Mark Padgham. Last updated 5 months ago.
automated-testingfuzzingtesting
54 stars 6.21 score 25 scriptsmu-sigma
HVT:Constructing Hierarchical Voronoi Tessellations and Overlay Heatmaps for Data Analysis
Facilitates building topology preserving maps for data analysis.
Maintained by "Mu Sigma, Inc.". Last updated 4 days ago.
4 stars 6.20 score 1 scriptsr-spatial
RPyGeo:ArcGIS Geoprocessing via Python
Provides access to ArcGIS geoprocessing tools by building an interface between R and the ArcPy Python side-package via the reticulate package.
Maintained by Alexander Brenning. Last updated 5 years ago.
29 stars 6.19 score 54 scriptsashbaldry
designer:'Shiny' UI Prototype Builder
A 'shiny' application that enables the user to create a prototype UI, being able to drag and drop UI components before being able to save or download the equivalent R code.
Maintained by Ashley Baldry. Last updated 2 years ago.
149 stars 6.17 score 3 scriptsbioc
HIPPO:Heterogeneity-Induced Pre-Processing tOol
For scRNA-seq data, it selects features and clusters the cells simultaneously for single-cell UMI data. It has a novel feature selection method using the zero inflation instead of gene variance, and computationally faster than other existing methods since it only relies on PCA+Kmeans rather than graph-clustering or consensus clustering.
Maintained by Tae Kim. Last updated 5 months ago.
sequencingsinglecellgeneexpressiondifferentialexpressionclustering
18 stars 6.16 score 4 scriptsbioc
crisprViz:Visualization Functions for CRISPR gRNAs
Provides functionalities to visualize and contextualize CRISPR guide RNAs (gRNAs) on genomic tracks across nucleases and applications. Works in conjunction with the crisprBase and crisprDesign Bioconductor packages. Plots are produced using the Gviz framework.
Maintained by Jean-Philippe Fortin. Last updated 5 months ago.
crisprfunctionalgenomicsgenetargetbioconductorbioconductor-packagecrispr-analysiscrispr-designgrnagrna-sequencegrna-sequencessgrnasgrna-designvisualization
8 stars 6.16 score 6 scripts 2 dependentsr-a-dobson
dynamicSDM:Species Distribution and Abundance Modelling at High Spatio-Temporal Resolution
A collection of novel tools for generating species distribution and abundance models (SDM) that are dynamic through both space and time. These highly flexible functions incorporate spatial and temporal aspects across key SDM stages; including when cleaning and filtering species occurrence data, generating pseudo-absence records, assessing and correcting sampling biases and autocorrelation, extracting explanatory variables and projecting distribution patterns. Throughout, functions utilise Google Earth Engine and Google Drive to minimise the computing power and storage demands associated with species distribution modelling at high spatio-temporal resolution.
Maintained by Rachel Dobson. Last updated 1 months ago.
dynamicsdmgoogle-earth-enginegoogledrivesdmspatiotemporalspatiotemporal-data-analysisspatiotemporal-forecastingspecies-distribution-modellingspecies-distributions
6 stars 6.16 score 20 scriptsbrownag
rgeedim:Search, Composite, and Download 'Google Earth Engine' Imagery with the 'Python' Module 'geedim'
Search, composite, and download 'Google Earth Engine' imagery with 'reticulate' bindings for the 'Python' module 'geedim' by Dugal Harris. Read the 'geedim' documentation here: <https://geedim.readthedocs.io/>. Wrapper functions are provided to make it more convenient to use 'geedim' to download images larger than the 'Google Earth Engine' size limit <https://developers.google.com/earth-engine/apidocs/ee-image-getdownloadurl>. By default the "High Volume" API endpoint <https://developers.google.com/earth-engine/cloud/highvolume> is used to download data and this URL can be customized during initialization of the package.
Maintained by Andrew Brown. Last updated 9 days ago.
geedimgeotiffgisgoogle-earth-enginepythonrasterremote-sensingsatellite-imageryspatialterra
50 stars 6.13 score 27 scriptsterrytangyuan
scaffolder:Scaffolding Interfaces to Packages in Other Programming Languages
Comprehensive set of tools for scaffolding R interfaces to modules, classes, functions, and documentations written in other programming languages, such as 'Python'.
Maintained by Yuan Tang. Last updated 2 years ago.
code-generationpythonreticulatescaffolding
27 stars 6.13 score 9 scriptsbioc
tidyomics:Easily install and load the tidyomics ecosystem
The tidyomics ecosystem is a set of packages for ’omic data analysis that work together in harmony; they share common data representations and API design, consistent with the tidyverse ecosystem. The tidyomics package is designed to make it easy to install and load core packages from the tidyomics ecosystem with a single command.
Maintained by Stefano Mangiola. Last updated 5 months ago.
assaydomaininfrastructurernaseqdifferentialexpressiongeneexpressionnormalizationclusteringqualitycontrolsequencingtranscriptiontranscriptomicscytometrygenomicstidyverse
67 stars 6.13 score 5 scriptsandreanini
idiolect:Forensic Authorship Analysis
Carry out comparative authorship analysis of disputed and undisputed texts within the Likelihood Ratio Framework for expressing evidence in forensic science. This package contains implementations of well-known algorithms for comparative authorship analysis, such as Smith and Aldridge's (2011) Cosine Delta <doi:10.1080/09296174.2011.533591> or Koppel and Winter's (2014) Impostors Method <doi:10.1002/asi.22954>, as well as functions to measure their performance and to calibrate their outputs into Log-Likelihood Ratios.
Maintained by Andrea Nini. Last updated 24 days ago.
14 stars 6.12 score 3 scriptsfeiyoung
DR.SC:Joint Dimension Reduction and Spatial Clustering
Joint dimension reduction and spatial clustering is conducted for Single-cell RNA sequencing and spatial transcriptomics data, and more details can be referred to Wei Liu, Xu Liao, Yi Yang, Huazhen Lin, Joe Yeong, Xiang Zhou, Xingjie Shi and Jin Liu. (2022) <doi:10.1093/nar/gkac219>. It is not only computationally efficient and scalable to the sample size increment, but also is capable of choosing the smoothness parameter and the number of clusters as well.
Maintained by Wei Liu. Last updated 1 years ago.
dimension-reductionselfsupervisedspatial-clusteringspatial-transcriptomicsopenblascpp
5 stars 6.12 score 29 scripts 2 dependentsbioc
Herper:The Herper package is a simple toolset to install and manage conda packages and environments from R
Many tools for data analysis are not available in R, but are present in public repositories like conda. The Herper package provides a comprehensive set of functions to interact with the conda package managament system. With Herper users can install, manage and run conda packages from the comfort of their R session. Herper also provides an ad-hoc approach to handling external system requirements for R packages. For people developing packages with python conda dependencies we recommend using basilisk (https://bioconductor.org/packages/release/bioc/html/basilisk.html) to internally support these system requirments pre-hoc.
Maintained by Thomas Carroll. Last updated 5 months ago.
5 stars 6.11 score 52 scriptsgreta-dev
greta.gam:Generalised Additive Models in 'greta' using 'mgcv'
A 'greta' (Golding (2019) <doi:10.21105/joss.01601>) module that lets you use 'mgcv' smoother functions and formula syntax to define smooth terms for use in a 'greta' model. You can then define your own likelihood to complete the model, and fit it by Markov Chain Monte Carlo (MCMC).
Maintained by Nicholas Tierney. Last updated 3 months ago.
11 stars 6.04 score 6 scriptsmingdeyu
dgpsi:Interface to 'dgpsi' for Deep and Linked Gaussian Process Emulations
Interface to the 'python' package 'dgpsi' for Gaussian process, deep Gaussian process, and linked deep Gaussian process emulations of computer models and networks using stochastic imputation (SI). The implementations follow Ming & Guillas (2021) <doi:10.1137/20M1323771> and Ming, Williamson, & Guillas (2023) <doi:10.1080/00401706.2022.2124311> and Ming & Williamson (2023) <doi:10.48550/arXiv.2306.01212>. To get started with the package, see <https://mingdeyu.github.io/dgpsi-R/>.
Maintained by Deyu Ming. Last updated 6 days ago.
deep-gaussian-processesemulationgaussian-processessurrogate-models
6.03 score 76 scriptsbioc
Dino:Normalization of Single-Cell mRNA Sequencing Data
Dino normalizes single-cell, mRNA sequencing data to correct for technical variation, particularly sequencing depth, prior to downstream analysis. The approach produces a matrix of corrected expression for which the dependency between sequencing depth and the full distribution of normalized expression; many existing methods aim to remove only the dependency between sequencing depth and the mean of the normalized expression. This is particuarly useful in the context of highly sparse datasets such as those produced by 10X genomics and other uninque molecular identifier (UMI) based microfluidics protocols for which the depth-dependent proportion of zeros in the raw expression data can otherwise present a challenge.
Maintained by Jared Brown. Last updated 5 months ago.
softwarenormalizationrnaseqsinglecellsequencinggeneexpressiontranscriptomicsregressioncellbasedassays
9 stars 6.02 score 13 scriptsexaexa
EmbedSOM:Fast Embedding Guided by Self-Organizing Map
Provides a smooth mapping of multidimensional points into low-dimensional space defined by a self-organizing map. Designed to work with 'FlowSOM' and flow-cytometry use-cases. See Kratochvil et al. (2019) <doi:10.12688/f1000research.21642.1>.
Maintained by Mirek Kratochvil. Last updated 2 months ago.
26 stars 6.02 score 8 scriptsalgorithmiaio
algorithmia:Allows you to Easily Interact with the Algorithmia Platform
The company, Algorithmia, houses the largest marketplace of online algorithms. This package essentially holds a bunch of REST wrappers that make it very easy to call algorithms in the Algorithmia platform and access files and directories in the Algorithmia data API. To learn more about the services they offer and the algorithms in the platform visit <http://algorithmia.com>. More information for developers can be found at <https://algorithmia.com/developers>.
Maintained by Robert Fulton. Last updated 4 years ago.
14 stars 6.00 score 36 scriptskosukehamazaki
RAINBOWR:Genome-Wide Association Study with SNP-Set Methods
By using 'RAINBOWR' (Reliable Association INference By Optimizing Weights with R), users can test multiple SNPs (Single Nucleotide Polymorphisms) simultaneously by kernel-based (SNP-set) methods. This package can also be applied to haplotype-based GWAS (Genome-Wide Association Study). Users can test not only additive effects but also dominance and epistatic effects. In detail, please check our paper on PLOS Computational Biology: Kosuke Hamazaki and Hiroyoshi Iwata (2020) <doi:10.1371/journal.pcbi.1007663>.
Maintained by Kosuke Hamazaki. Last updated 4 months ago.
22 stars 5.99 score 22 scriptsmmedl94
lionfish:Interactive 'tourr' Using 'python'
Extends the functionality of the 'tourr' package by an interactive graphical user interface. The interactivity allows users to effortlessly refine their 'tourr' results by manual intervention, which allows for integration of expert knowledge and aids the interpretation of results. For more information on 'tourr' see Wickham et. al (2011) <doi:10.18637/jss.v040.i02> or <https://github.com/ggobi/tourr>.
Maintained by Matthias Medl. Last updated 7 days ago.
data-siencedata-visualizationdimensionality-reductionexploratory-data-analysisinteractiveinteractive-visualizationstourr
1 stars 5.98 scoref0nzie
rTorch:R Bindings to 'PyTorch'
'R' implementation and interface of the Machine Learning platform 'PyTorch' <https://pytorch.org/> developed in 'Python'. It requires a 'conda' environment with 'torch' and 'torchvision' Python packages to provide 'PyTorch' functions, methods and classes. The key object in 'PyTorch' is the tensor which is in essence a multidimensional array. These tensors are fairly flexible in performing calculations in CPUs as well as 'GPUs' to accelerate tensor operations.
Maintained by Alfonso R. Reyes. Last updated 3 years ago.
6 stars 5.97 score 157 scriptsmatsuurakentaro
RLoptimal:Optimal Adaptive Allocation Using Deep Reinforcement Learning
An implementation to compute an optimal adaptive allocation rule using deep reinforcement learning in a dose-response study (Matsuura et al. (2022) <doi:10.1002/sim.9247>). The adaptive allocation rule can directly optimize a performance metric, such as power, accuracy of the estimated target dose, or mean absolute error over the estimated dose-response curve.
Maintained by Kentaro Matsuura. Last updated 3 months ago.
4 stars 5.95 score 21 scriptsbioc
immApex:Tools for Adaptive Immune Receptor Sequence-Based Machine and Deep Learning
A set of tools to build tensorflow/keras3-based models in R from amino acid and nucleotide sequences focusing on adaptive immune receptors. The package includes pre-processing of sequences, unifying gene nomenclature usage, encoding sequences, and combining models. This package will serve as the basis of future immune receptor sequence functions/packages/models compatible with the scRepertoire ecosystem.
Maintained by Nick Borcherding. Last updated 6 days ago.
softwareimmunooncologysinglecellclassificationannotationsequencingmotifannotationcpp
8 stars 5.94 score 3 scriptsgesistsa
grafzahl:Supervised Machine Learning for Textual Data Using Transformers and 'Quanteda'
Duct tape the 'quanteda' ecosystem (Benoit et al., 2018) <doi:10.21105/joss.00774> to modern Transformer-based text classification models (Wolf et al., 2020) <doi:10.18653/v1/2020.emnlp-demos.6>, in order to facilitate supervised machine learning for textual data. This package mimics the behaviors of 'quanteda.textmodels' and provides a function to setup the 'Python' environment to use the pretrained models from 'Hugging Face' <https://huggingface.co/>. More information: <doi:10.5117/CCR2023.1.003.CHAN>.
Maintained by Chung-hong Chan. Last updated 1 months ago.
41 stars 5.91 score 3 scriptsonnx
onnx:R Interface to 'ONNX'
R Interface to 'ONNX' - Open Neural Network Exchange <https://onnx.ai/>. 'ONNX' provides an open source format for machine learning models. It defines an extensible computation graph model, as well as definitions of built-in operators and standard data types.
Maintained by Yuan Tang. Last updated 2 years ago.
deep-learningdeep-neural-networksonnx
44 stars 5.90 score 18 scriptstaikisan21
PAMpal:Load and Process Passive Acoustic Data
Tools for loading and processing passive acoustic data. Read in data that has been processed in 'Pamguard' (<https://www.pamguard.org/>), apply a suite processing functions, and export data for reports or external modeling tools. Parameter calculations implement methods by Oswald et al (2007) <doi:10.1121/1.2743157>, Griffiths et al (2020) <doi:10.1121/10.0001229> and Baumann-Pickering et al (2010) <doi:10.1121/1.3479549>.
Maintained by Taiki Sakai. Last updated 27 days ago.
9 stars 5.87 score 79 scriptsmlampros
fuzzywuzzyR:Fuzzy String Matching
Fuzzy string matching implementation of the 'fuzzywuzzy' <https://github.com/seatgeek/fuzzywuzzy> 'python' package. It uses the Levenshtein Distance <https://en.wikipedia.org/wiki/Levenshtein_distance> to calculate the differences between sequences.
Maintained by Lampros Mouselimis. Last updated 2 years ago.
fuzzywuzzymatchingpythonreticulatestring
37 stars 5.87 score 40 scriptsedhofman
ReSurv:Machine Learning Models For Predicting Claim Counts
Prediction of claim counts using the feature based development factors introduced in the manuscript <doi:10.48550/arXiv.2312.14549>. Implementation of Neural Networks, Extreme Gradient Boosting, and Cox model with splines to optimise the partial log-likelihood of proportional hazard models.
Maintained by Emil Hofman. Last updated 5 months ago.
2 stars 5.87 score 21 scriptscogbrainhealthlab
VertexWiseR:Simplified Vertex-Wise Analyses of Whole-Brain and Hippocampal Surface
Provides functions to run statistical analyses on surface-based neuroimaging data, computing measures including cortical thickness and surface area of the whole-brain and of the hippocampi. It can make use of 'FreeSurfer', 'fMRIprep', 'HCP' and 'CAT12' preprocessed datasets and 'HippUnfold' hippocampal segmentation outputs for a given sample by restructuring the data values into a single file. The single file can then be used by the package for analyses independently from its base dataset and without need for its access.
Maintained by Charly Billaud. Last updated 5 days ago.
1 stars 5.86 score 12 scriptsfeiyoung
ProFAST:Probabilistic Factor Analysis for Spatially-Aware Dimension Reduction
Probabilistic factor analysis for spatially-aware dimension reduction across multi-section spatial transcriptomics data with millions of spatial locations. More details can be referred to Wei Liu, et al. (2023) <doi:10.1101/2023.07.11.548486>.
Maintained by Wei Liu. Last updated 2 months ago.
2 stars 5.86 score 12 scripts 1 dependentsbayer-group
adepro:A 'shiny' Application for the (Audio-)Visualization of Adverse Event Profiles
Contains a 'shiny' application called AdEPro (Animation of Adverse Event Profiles) which (audio-)visualizes adverse events occurring in clinical trials. As this data is usually considered sensitive, this tool is provided as a stand-alone application that can be launched from any local machine on which the data is stored.
Maintained by Nicole Rethemeier. Last updated 7 days ago.
adverse-eventsbayer-not-classifiedbayer-reg-nonebeat-not-applicableclinical-trialsdata-insightsshiny-appsvisualization
7 stars 5.84 score 11 scriptsbioc
ChromSCape:Analysis of single-cell epigenomics datasets with a Shiny App
ChromSCape - Chromatin landscape profiling for Single Cells - is a ready-to-launch user-friendly Shiny Application for the analysis of single-cell epigenomics datasets (scChIP-seq, scATAC-seq, scCUT&Tag, ...) from aligned data to differential analysis & gene set enrichment analysis. It is highly interactive, enables users to save their analysis and covers a wide range of analytical steps: QC, preprocessing, filtering, batch correction, dimensionality reduction, vizualisation, clustering, differential analysis and gene set analysis.
Maintained by Pacome Prompsy. Last updated 5 months ago.
shinyappssoftwaresinglecellchipseqatacseqmethylseqclassificationclusteringepigeneticsprincipalcomponentannotationbatcheffectmultiplecomparisonnormalizationpathwayspreprocessingqualitycontrolreportwritingvisualizationgenesetenrichmentdifferentialpeakcallingepigenomicsshinysingle-cellcpp
14 stars 5.83 score 16 scriptsjimbrig
lossrx:Actuarial Loss Development and Reserving with R
Actuarial Loss Development and Reserving Helper Functions and ShinyApp.
Maintained by Jimmy Briggs. Last updated 3 months ago.
actuarial-scienceclaims-dataclaims-reservingdata-scienceinsurancemodellingproperty-casualtyreservingrshinyworkflow
14 stars 5.82 score 7 scriptsbioc
scBubbletree:Quantitative visual exploration of scRNA-seq data
scBubbletree is a quantitative method for the visual exploration of scRNA-seq data, preserving key biological properties such as local and global cell distances and cell density distributions across samples. It effectively resolves overplotting and enables the visualization of diverse cell attributes from multiomic single-cell experiments. Additionally, scBubbletree is user-friendly and integrates seamlessly with popular scRNA-seq analysis tools, facilitating comprehensive and intuitive data interpretation.
Maintained by Simo Kitanovski. Last updated 5 months ago.
visualizationclusteringsinglecelltranscriptomicsrnaseqbig-databigdatascrna-seqscrna-seq-analysisvisualvisual-exploration
6 stars 5.82 score 8 scriptsbioc
benchdamic:Benchmark of differential abundance methods on microbiome data
Starting from a microbiome dataset (16S or WMS with absolute count values) it is possible to perform several analysis to assess the performances of many differential abundance detection methods. A basic and standardized version of the main differential abundance analysis methods is supplied but the user can also add his method to the benchmark. The analyses focus on 4 main aspects: i) the goodness of fit of each method's distributional assumptions on the observed count data, ii) the ability to control the false discovery rate, iii) the within and between method concordances, iv) the truthfulness of the findings if any apriori knowledge is given. Several graphical functions are available for result visualization.
Maintained by Matteo Calgaro. Last updated 4 months ago.
metagenomicsmicrobiomedifferentialexpressionmultiplecomparisonnormalizationpreprocessingsoftwarebenchmarkdifferential-abundance-methods
8 stars 5.78 score 8 scriptsbioc
scRNAseqApp:A single-cell RNAseq Shiny app-package
The scRNAseqApp is a Shiny app package designed for interactive visualization of single-cell data. It is an enhanced version derived from the ShinyCell, repackaged to accommodate multiple datasets. The app enables users to visualize data containing various types of information simultaneously, facilitating comprehensive analysis. Additionally, it includes a user management system to regulate database accessibility for different users.
Maintained by Jianhong Ou. Last updated 18 days ago.
visualizationsinglecellrnaseqinteractive-visualizationsmultiple-usersshiny-appssingle-cell-rna-seq
4 stars 5.76 score 3 scriptsadafede
cascade:Contextualizing untargeted Annotation with Semi-quantitative Charged Aerosol Detection for pertinent characterization of natural Extracts
This package provides the infrastructure to perform Automated Composition Assessment of Natural Extracts.
Maintained by Adriano Rutz. Last updated 3 days ago.
metabolite annotationcharged aerosol detectorsemi-quantitativenatural productscomputational metabolomicsspecialized metabolome
2 stars 5.76 score 40 scripts 1 dependentseuropeanifcbgroup
iRfcb:Tools for Managing Imaging FlowCytobot (IFCB) Data
A comprehensive suite of tools for managing, processing, and analyzing data from the IFCB. I R FlowCytobot ('iRfcb') supports quality control, geospatial analysis, and preparation of IFCB data for publication in databases like <https://www.gbif.org>, <https://www.obis.org>, <https://emodnet.ec.europa.eu/en>, <https://shark.smhi.se/>, and <https://www.ecotaxa.org>. The package integrates with the MATLAB 'ifcb-analysis' tool, which is described in Sosik and Olson (2007) <doi:10.4319/lom.2007.5.204>, and provides features for working with raw, manually classified, and machine learning–classified image datasets. Key functionalities include image extraction, particle size distribution analysis, taxonomic data handling, and biomass concentration calculations, essential for plankton research.
Maintained by Anders Torstensson. Last updated 3 days ago.
data-exportimaging-flow-cytometryimaging-flowcytobotoceanographyplanktonquality-control
1 stars 5.76 scoregreta-dev
greta.dynamics:Modelling Structured Dynamical Systems in 'greta'
A 'greta' extension for analysing transition matrices and ordinary differential equations representing dynamical systems. Provides functions for analysing transition matrices by iteration, and solving ordinary differential equations. This is an extension to the 'greta' software, Golding (2019) <doi:10.21105/joss.01601>.
Maintained by Nicholas Tierney. Last updated 5 months ago.
6 stars 5.72 score 11 scriptspharmpy
pharmr:Interface to the 'Pharmpy' 'Pharmacometrics' Library
Interface to the 'Pharmpy' 'pharmacometrics' library. The 'Reticulate' package is used to interface Python from R.
Maintained by Rikard Nordgren. Last updated 2 months ago.
21 stars 5.72 score 5 scriptscore-bioinformatics
ClustAssess:Tools for Assessing Clustering
A set of tools for evaluating clustering robustness using proportion of ambiguously clustered pairs (Senbabaoglu et al. (2014) <doi:10.1038/srep06207>), as well as similarity across methods and method stability using element-centric clustering comparison (Gates et al. (2019) <doi:10.1038/s41598-019-44892-y>). Additionally, this package enables stability-based parameter assessment for graph-based clustering pipelines typical in single-cell data analysis.
Maintained by Andi Munteanu. Last updated 2 months ago.
softwaresinglecellrnaseqatacseqnormalizationpreprocessingdimensionreductionvisualizationqualitycontrolclusteringclassificationannotationgeneexpressiondifferentialexpressionbioinformaticsgenomicsmachine-learningparameter-optimizationrobustnesssingle-cellunsupervised-learningcpp
23 stars 5.70 score 18 scriptsbioc
scFeatures:scFeatures: Multi-view representations of single-cell and spatial data for disease outcome prediction
scFeatures constructs multi-view representations of single-cell and spatial data. scFeatures is a tool that generates multi-view representations of single-cell and spatial data through the construction of a total of 17 feature types. These features can then be used for a variety of analyses using other software in Biocondutor.
Maintained by Yue Cao. Last updated 5 months ago.
cellbasedassayssinglecellspatialsoftwaretranscriptomics
11 stars 5.69 score 15 scriptsdipterix
ieegio:File IO for Intracranial Electroencephalography
Integrated toolbox supporting common file formats used for intracranial Electroencephalography (iEEG) and deep-brain stimulation (DBS) study.
Maintained by Zhengjia Wang. Last updated 12 days ago.
bci2000brainbrainvisiondbsedfelectrophysiologyephysfreesurferieegneuroimagingneuroscienceniftinwb-format
5.68 score 10 scripts 1 dependentsfilipezabala
voice:Voice Analysis, Speaker Recognition and Mood Inference via music theory
Voice analysis, speaker recognition and mood inference via music theory.
Maintained by Zabala Filipe J.. Last updated 15 days ago.
20 stars 5.64 score 88 scriptscjerzak
fastrerandomize:Hardware-Accelerated Rerandomization for Improved Balance
Provides hardware-accelerated tools for performing rerandomization and randomization testing in experimental research. Using a 'JAX' backend, the package enables exact rerandomization inference even for large experiments with hundreds of billions of possible randomizations. Key functionalities include generating pools of acceptable rerandomizations based on covariate balance, conducting exact randomization tests, and performing pre-analysis evaluations to determine optimal rerandomization acceptance thresholds. The package supports various hardware acceleration frameworks including 'CPU', 'CUDA', and 'METAL', making it versatile across accelerated computing environments. This allows researchers to efficiently implement stringent rerandomization designs and conduct valid inference even with large sample sizes. The package is partly based on Jerzak and Goldstein (2023) <doi:10.48550/arXiv.2310.00861>.
Maintained by Connor Jerzak. Last updated 2 months ago.
balanceexperimental-designhardware-acceleration
8 stars 5.64 score 1 scriptsmlverse
pysparklyr:Provides a 'PySpark' Back-End for the 'sparklyr' Package
It enables 'sparklyr' to integrate with 'Spark Connect', and 'Databricks Connect' by providing a wrapper over the 'PySpark' 'python' library.
Maintained by Edgar Ruiz. Last updated 7 days ago.
databrickspysparksparkspark-connect
15 stars 5.58 score 13 scriptsepe-gov-br
epe4md:EPE's 4MD model to forecast the adoption of Distributed Generation and Behind-the-meter energy storage
EPE's 4MD model to forecast the adoption of Distributed Generation and Behind-the-meter energy storage
Maintained by Gabriel Konzen. Last updated 20 days ago.
19 stars 5.58 score 5 scriptsbioc
multicrispr:Multi-locus multi-purpose Crispr/Cas design
This package is for designing Crispr/Cas9 and Prime Editing experiments. It contains functions to (1) define and transform genomic targets, (2) find spacers (4) count offtarget (mis)matches, and (5) compute Doench2016/2014 targeting efficiency. Care has been taken for multicrispr to scale well towards large target sets, enabling the design of large Crispr/Cas9 libraries.
Maintained by Aditya Bhagwat. Last updated 4 months ago.
5.56 score 2 scriptsjbdorey
BeeBDC:Occurrence Data Cleaning
Flags and checks occurrence data that are in Darwin Core format. The package includes generic functions and data as well as some that are specific to bees. This package is meant to build upon and be complimentary to other excellent occurrence cleaning packages, including 'bdc' and 'CoordinateCleaner'. This package uses datasets from several sources and particularly from the Discover Life Website, created by Ascher and Pickering (2020). For further information, please see the original publication and package website. Publication - Dorey et al. (2023) <doi:10.1101/2023.06.30.547152> and package website - Dorey et al. (2023) <https://github.com/jbdorey/BeeBDC>.
Maintained by James B. Dorey. Last updated 4 months ago.
3 stars 5.56 score 7 scriptsbioc
scifer:Scifer: Single-Cell Immunoglobulin Filtering of Sanger Sequences
Have you ever index sorted cells in a 96 or 384-well plate and then sequenced using Sanger sequencing? If so, you probably had some struggles to either check the electropherogram of each cell sequenced manually, or when you tried to identify which cell was sorted where after sequencing the plate. Scifer was developed to solve this issue by performing basic quality control of Sanger sequences and merging flow cytometry data from probed single-cell sorted B cells with sequencing data. scifer can export summary tables, 'fasta' files, electropherograms for visual inspection, and generate reports.
Maintained by Rodrigo Arcoverde Cerveira. Last updated 4 months ago.
preprocessingqualitycontrolsangerseqsequencingsoftwareflowcytometrysinglecell
5 stars 5.54 score 9 scriptsrebeccasalles
TSPred:Functions for Benchmarking Time Series Prediction
Functions for defining and conducting a time series prediction process including pre(post)processing, decomposition, modelling, prediction and accuracy assessment. The generated models and its yielded prediction errors can be used for benchmarking other time series prediction methods and for creating a demand for the refinement of such methods. For this purpose, benchmark data from prediction competitions may be used.
Maintained by Rebecca Pontes Salles. Last updated 4 years ago.
benchmarkinglinear-modelsmachine-learningnonstationaritytime-series-forecasttime-series-prediction
24 stars 5.53 score 94 scripts 1 dependentsr-tidy-remote-sensing
tidyrgee:'tidyverse' Methods for 'Earth Engine'
Provides 'tidyverse' methods for wrangling and analyzing 'Earth Engine' <https://earthengine.google.com/> data. These methods help the user with filtering, joining and summarising 'Earth Engine' image collections.
Maintained by Zack Arno. Last updated 2 years ago.
48 stars 5.53 score 140 scriptsbioc
Rcwl:An R interface to the Common Workflow Language
The Common Workflow Language (CWL) is an open standard for development of data analysis workflows that is portable and scalable across different tools and working environments. Rcwl provides a simple way to wrap command line tools and build CWL data analysis pipelines programmatically within R. It increases the ease of usage, development, and maintenance of CWL pipelines.
Maintained by Qiang Hu. Last updated 5 months ago.
softwareworkflowstepimmunooncology
5.52 score 37 scripts 2 dependentsbioc
cbpManager:Generate, manage, and edit data and metadata files suitable for the import in cBioPortal for Cancer Genomics
This R package provides an R Shiny application that enables the user to generate, manage, and edit data and metadata files suitable for the import in cBioPortal for Cancer Genomics. Create cancer studies and edit its metadata. Upload mutation data of a patient that will be concatenated to the data_mutation_extended.txt file of the study. Create and edit clinical patient data, sample data, and timeline data. Create custom timeline tracks for patients.
Maintained by Arsenij Ustjanzew. Last updated 5 months ago.
immunooncologydataimportdatarepresentationguithirdpartyclientpreprocessingvisualizationcancer-genomicscbioportalclinical-datafilegeneratormutation-datapatient-data
8 stars 5.51 score 1 scriptsbioc
tripr:T-cell Receptor/Immunoglobulin Profiler (TRIP)
TRIP is a software framework that provides analytics services on antigen receptor (B cell receptor immunoglobulin, BcR IG | T cell receptor, TR) gene sequence data. It is a web application written in R Shiny. It takes as input the output files of the IMGT/HighV-Quest tool. Users can select to analyze the data from each of the input samples separately, or the combined data files from all samples and visualize the results accordingly.
Maintained by Nikolaos Pechlivanis. Last updated 5 months ago.
batcheffectmultiplecomparisongeneexpressionimmunooncologytargetedresequencingbioconductorclonotype
2 stars 5.48 score 4 scriptsagusnieto77
ACEP:Análisis Computacional de Eventos de Protesta
La librería 'ACEP' contiene funciones específicas para desarrollar análisis computacional de eventos de protesta. Asimismo, contiene bases de datos con colecciones de notas sobre protestas y diccionarios de palabras conflictivas. La colección de diccionarios reune diccionarios de diferentes orígenes. The 'ACEP' library contains specific functions to perform computational analysis of protest events. It also contains a database with collections of notes on protests and dictionaries of conflicting words. Collection of dictionaries that brings together dictionaries from different sources.
Maintained by Agustín Nieto. Last updated 1 years ago.
computer-aided-detectionconflict-analysisconflict-detectiondictionariesnlp-keywords-extractionprotest-eventstext-miningvisualization
10 stars 5.48 score 9 scriptsropensci
EndoMineR:Functions to mine endoscopic and associated pathology datasets
This script comprises the functions that are used to clean up endoscopic reports and pathology reports as well as many of the scripts used for analysis. The scripts assume the endoscopy and histopathology data set is merged already but it can also be used of course with the unmerged datasets.
Maintained by Sebastian Zeki. Last updated 7 months ago.
endoscopygastroenterologypeer-reviewedsemi-structured-datatext-mining
13 stars 5.47 score 30 scriptsbioc
scDotPlot:Cluster a Single-cell RNA-seq Dot Plot
Dot plots of single-cell RNA-seq data allow for an examination of the relationships between cell groupings (e.g. clusters) and marker gene expression. The scDotPlot package offers a unified approach to perform a hierarchical clustering analysis and add annotations to the columns and/or rows of a scRNA-seq dot plot. It works with SingleCellExperiment and Seurat objects as well as data frames.
Maintained by Benjamin I Laufer. Last updated 11 days ago.
softwarevisualizationdifferentialexpressiongeneexpressiontranscriptionrnaseqsinglecellsequencingclustering
7 stars 5.45 score 2 scriptsmolinlab
Holomics:An User-Friendly R 'shiny' Application for Multi-Omics Data Integration and Analysis
A 'shiny' application, which allows you to perform single- and multi-omics analyses using your own omics datasets. After the upload of the omics datasets and a metadata file, single-omics is performed for feature selection and dataset reduction. These datasets are used for pairwise- and multi-omics analyses, where automatic tuning is done to identify correlations between the datasets - the end goal of the recommended 'Holomics' workflow. Methods used in the package were implemented in the package 'mixomics' by Florian Rohart,Benoît Gautier,Amrit Singh,Kim-Anh Lê Cao (2017) <doi:10.1371/journal.pcbi.1005752> and are described there in further detail.
Maintained by Katharina Munk. Last updated 10 months ago.
7 stars 5.45 score 7 scriptsbioc
cfTools:Informatics Tools for Cell-Free DNA Study
The cfTools R package provides methods for cell-free DNA (cfDNA) methylation data analysis to facilitate cfDNA-based studies. Given the methylation sequencing data of a cfDNA sample, for each cancer marker or tissue marker, we deconvolve the tumor-derived or tissue-specific reads from all reads falling in the marker region. Our read-based deconvolution algorithm exploits the pervasiveness of DNA methylation for signal enhancement, therefore can sensitively identify a trace amount of tumor-specific or tissue-specific cfDNA in plasma. cfTools provides functions for (1) cancer detection: sensitively detect tumor-derived cfDNA and estimate the tumor-derived cfDNA fraction (tumor burden); (2) tissue deconvolution: infer the tissue type composition and the cfDNA fraction of multiple tissue types for a plasma cfDNA sample. These functions can serve as foundations for more advanced cfDNA-based studies, including cancer diagnosis and disease monitoring.
Maintained by Ran Hu. Last updated 1 days ago.
softwarebiomedicalinformaticsepigeneticssequencingmethylseqdnamethylationdifferentialmethylationcpp
7 stars 5.45 score 2 scriptsinesortega
neuralGAM:Interpretable Neural Network Based on Generalized Additive Models
Neural network framework based on Generalized Additive Models from Hastie & Tibshirani (1990, ISBN:9780412343902), which trains a different neural network to estimate the contribution of each feature to the response variable. The networks are trained independently leveraging the local scoring and backfitting algorithms to ensure that the Generalized Additive Model converges and it is additive. The resultant Neural Network is a highly accurate and interpretable deep learning model, which can be used for high-risk AI practices where decision-making should be based on accountable and interpretable algorithms.
Maintained by Ines Ortega-Fernandez. Last updated 7 months ago.
deep-neural-networksexplainable-aigamganngeneralized-additive-modelsgeneralized-additive-neural-networkself-explanatory-mlxai
2 stars 5.44 score 40 scriptsmarioangst
motifr:Motif Analysis in Multi-Level Networks
Tools for motif analysis in multi-level networks. Multi-level networks combine multiple networks in one, e.g. social-ecological networks. Motifs are small configurations of nodes and edges (subgraphs) occurring in networks. 'motifr' can visualize multi-level networks, count multi-level network motifs and compare motif occurrences to baseline models. It also identifies contributions of existing or potential edges to motifs to find critical or missing edges. The package is in many parts an R wrapper for the excellent 'SESMotifAnalyser' 'Python' package written by Tim Seppelt.
Maintained by Mario Angst. Last updated 4 years ago.
15 stars 5.43 score 18 scriptsbioc
sketchR:An R interface for python subsampling/sketching algorithms
Provides an R interface for various subsampling algorithms implemented in python packages. Currently, interfaces to the geosketch and scSampler python packages are implemented. In addition it also provides diagnostic plots to evaluate the subsampling.
Maintained by Charlotte Soneson. Last updated 3 months ago.
3 stars 5.43 score 3 scriptsthinkr-open
lozen:Management tools for missions
Management tools for missions (internal and external). Includes weekly, GL projects, etc.
Maintained by Sébastien Rochette. Last updated 12 months ago.
7 stars 5.42 score 14 scriptsbioc
speckle:Statistical methods for analysing single cell RNA-seq data
The speckle package contains functions for the analysis of single cell RNA-seq data. The speckle package currently contains functions to analyse differences in cell type proportions. There are also functions to estimate the parameters of the Beta distribution based on a given counts matrix, and a function to normalise a counts matrix to the median library size. There are plotting functions to visualise cell type proportions and the mean-variance relationship in cell type proportions and counts. As our research into specialised analyses of single cell data continues we anticipate that the package will be updated with new functions.
Maintained by Belinda Phipson. Last updated 5 months ago.
singlecellrnaseqregressiongeneexpression
5.41 score 258 scriptsvabar
vibass:Valencia International Bayesian Summer School
Materials for the introductory course on Bayesian inference. Practicals, data and interactive apps.
Maintained by Facundo Muñoz. Last updated 9 months ago.
7 stars 5.40 score 2 scriptsbioc
RITAN:Rapid Integration of Term Annotation and Network resources
Tools for comprehensive gene set enrichment and extraction of multi-resource high confidence subnetworks. RITAN facilitates bioinformatic tasks for enabling network biology research.
Maintained by Michael Zimmermann. Last updated 5 months ago.
qualitycontrolnetworknetworkenrichmentnetworkinferencegenesetenrichmentfunctionalgenomicsgraphandnetwork
5.40 score 9 scripts