Showing 200 of total 223 results (show query)
bioc
KBoost:Inference of gene regulatory networks from gene expression data
Reconstructing gene regulatory networks and transcription factor activity is crucial to understand biological processes and holds potential for developing personalized treatment. Yet, it is still an open problem as state-of-art algorithm are often not able to handle large amounts of data. Furthermore, many of the present methods predict numerous false positives and are unable to integrate other sources of information such as previously known interactions. Here we introduce KBoost, an algorithm that uses kernel PCA regression, boosting and Bayesian model averaging for fast and accurate reconstruction of gene regulatory networks. KBoost can also use a prior network built on previously known transcription factor targets. We have benchmarked KBoost using three different datasets against other high performing algorithms. The results show that our method compares favourably to other methods across datasets.
Maintained by Luis F. Iglesias-Martinez. Last updated 5 months ago.
networkgraphandnetworkbayesiannetworkinferencegeneregulationtranscriptomicssystemsbiologytranscriptiongeneexpressionregressionprincipalcomponent
40.1 match 4 stars 4.60 score 9 scriptsdoehm
survivoR:Data from all Seasons of Survivor (US) TV Series in Tidy Format
Datasets detailing the results, castaways, and events of each season of Survivor for the US, Australia, South Africa, New Zealand, and the UK. This includes details on the cast, voting history, immunity and reward challenges, jury votes, boot order, advantage details, and episode ratings. Use this for analysis of trends and statistics of the game.
Maintained by Daniel Oehm. Last updated 4 days ago.
14.0 match 73 stars 7.08 score 94 scriptsadrtod
rchallenge:A Simple Data Science Challenge System
A simple data science challenge system using R Markdown and 'Dropbox' <https://www.dropbox.com/>. It requires no network configuration, does not depend on external platforms like e.g. 'Kaggle' <https://www.kaggle.com/> and can be easily installed on a personal computer.
Maintained by Adrien Todeschini. Last updated 4 years ago.
20.7 match 7 stars 3.85 score 20 scriptsdicook
mulgar:Functions for Pre-Processing Data for Multivariate Data Visualisation using Tours
This is a companion to the book Cook, D. and Laa, U. (2023) <https://dicook.github.io/mulgar_book/> "Interactively exploring high-dimensional data and models in R". by Cook and Laa. It contains useful functions for processing data in preparation for visualising with a tour. There are also several sample data sets.
Maintained by Dianne Cook. Last updated 2 months ago.
15.0 match 4 stars 4.50 score 79 scriptscoolbutuseless
emphatic:Exploratory Analysis of Tabular Data using Colour Highlighting
Tools for exploratory analysis of tabular data using colour highlighting. Highlighting is displayed in any console supporting 'ANSI' colours, and can be converted to 'HTML', 'typst', 'latex' and 'SVG'. 'quarto' and 'rmarkdown' rendering are directly supported. It is also possible to add colour to regular expression matches and highlight differences between two arbitrary R objects.
Maintained by Mike Cheng. Last updated 3 months ago.
8.0 match 141 stars 7.55 score 12 scriptsapreshill
bakeoff:Data from "The Great British Bake Off"
Data about the bakers, challenges, and ratings for "The Great British Bake Off", from Wikipedia <https://en.wikipedia.org/wiki/The_Great_British_Bake_Off>.
Maintained by Alison Hill. Last updated 2 years ago.
10.5 match 67 stars 5.71 score 77 scriptsmiyamot0
fxl:'fxl' Single Case Design Charting Package
The 'fxl' Charting package is used to prepare and design single case design figures that are typically prepared in spreadsheet software. With 'fxl', there is no need to leave the R environment to prepare these works and many of the more unique conventions in single case experimental designs can be performed without the need for physically constructing features of plots (e.g., drawing annotations across plots). Support is provided for various different plotting arrangements (e.g., multiple baseline), annotations (e.g., brackets, arrows), and output formats (e.g., svg, rasters).
Maintained by Shawn Gilroy. Last updated 3 months ago.
behavior-analysissingle-case-designvisual-analysis
10.8 match 8 stars 5.46 score 24 scriptsbioc
CellNOptR:Training of boolean logic models of signalling networks using prior knowledge networks and perturbation data
This package does optimisation of boolean logic networks of signalling pathways based on a previous knowledge network and a set of data upon perturbation of the nodes in the network.
Maintained by Attila Gabor. Last updated 5 months ago.
cellbasedassayscellbiologyproteomicspathwaysnetworktimecourseimmunooncology
7.5 match 6.72 score 98 scripts 6 dependentsr-lib
testthat:Unit Testing for R
Software testing is important, but, in part because it is frustrating and boring, many of us avoid it. 'testthat' is a testing framework for R that is easy to learn and use, and integrates with your existing 'workflow'.
Maintained by Hadley Wickham. Last updated 18 days ago.
2.0 match 900 stars 20.97 score 74k scripts 465 dependentsfriendly
HistData:Data Sets from the History of Statistics and Data Visualization
The 'HistData' package provides a collection of small data sets that are interesting and important in the history of statistics and data visualization. The goal of the package is to make these available, both for instructional use and for historical research. Some of these present interesting challenges for graphics or analysis in R.
Maintained by Michael Friendly. Last updated 10 months ago.
4.5 match 63 stars 9.19 score 732 scripts 2 dependentsopenintrostat
openintro:Datasets and Supplemental Functions from 'OpenIntro' Textbooks and Labs
Supplemental functions and data for 'OpenIntro' resources, which includes open-source textbooks and resources for introductory statistics (<https://www.openintro.org/>). The package contains datasets used in our open-source textbooks along with custom plotting functions for reproducing book figures. Note that many functions and examples include color transparency; some plotting elements may not show up properly (or at all) when run in some versions of Windows operating system.
Maintained by Mine รetinkaya-Rundel. Last updated 3 months ago.
3.6 match 240 stars 11.39 score 6.0k scriptsrconsortium
S7:An Object Oriented System Meant to Become a Successor to S3 and S4
A new object oriented programming system designed to be a successor to S3 and S4. It includes formal class, generic, and method specification, and a limited form of multiple dispatch. It has been designed and implemented collaboratively by the R Consortium Object-Oriented Programming Working Group, which includes representatives from R-Core, 'Bioconductor', 'Posit'/'tidyverse', and the wider R community.
Maintained by Hadley Wickham. Last updated 4 months ago.
3.0 match 432 stars 13.15 score 86 scripts 22 dependentsbioc
IgGeneUsage:Differential gene usage in immune repertoires
Detection of biases in the usage of immunoglobulin (Ig) genes is an important task in immune repertoire profiling. IgGeneUsage detects aberrant Ig gene usage between biological conditions using a probabilistic model which is analyzed computationally by Bayes inference. With this IgGeneUsage also avoids some common problems related to the current practice of null-hypothesis significance testing.
Maintained by Simo Kitanovski. Last updated 5 months ago.
differentialexpressionregressiongeneticsbayesianbiomedicalinformaticsimmunooncologymathematicalbiologyb-cell-receptorbcr-repertoiredifferential-analysisdifferential-gene-expressionhigh-throughput-sequencingimmune-repertoireimmune-repertoire-analysisimmune-repertoiresimmunogenomicsimmunoglobulinimmunoinformaticsimmunological-bioinformaticsimmunologytcr-repertoirevdj-recombinationcpp
6.6 match 6 stars 5.92 score 1 scriptsjulianfaraway
faraway:Datasets and Functions for Books by Julian Faraway
Books are "Linear Models with R" published 1st Ed. August 2004, 2nd Ed. July 2014, 3rd Ed. February 2025 by CRC press, ISBN 9781439887332, and "Extending the Linear Model with R" published by CRC press in 1st Ed. December 2005 and 2nd Ed. March 2016, ISBN 9781584884248 and "Practical Regression and ANOVA in R" contributed documentation on CRAN (now very dated).
Maintained by Julian Faraway. Last updated 1 months ago.
3.6 match 29 stars 9.43 score 1.7k scripts 1 dependentskrzjoa
m5:'M5 Forecasting' Challenges Data
Contains functions, which facilitate downloading, loading and preparing data from 'M5 Forecasting' challenges (by 'University of Nicosia', hosted on 'Kaggle'). The data itself is set of time series of different product sales in 'Walmart'. The package also includes a ready-to-use built-in M5 subset named 'tiny_m5'. For detailed information about the challenges, see: Makridakis, Spyros & Spiliotis, Evangelos & Assimakopoulos, Vassilis. (2020). The M5 Accuracy competition: Results, findings and conclusions. <doi:10.1016/j.ijforecast.2021.10.009>
Maintained by Krzysztof Joachimiak. Last updated 3 years ago.
data-sciencekaggle-competitionkaggle-datasetm5-competitionm5-forecastingtime-series-forecastingwalmartwalmart-sales-forecasting
7.3 match 2 stars 4.45 score 28 scriptscelevitz
topChef:Top Chef Data
Several datasets which describe the chef contestants in Top Chef, the challenges that they compete in, and the results of those challenges. This data is useful for practicing data wrangling, graphing, and analyzing how each season of Top Chef played out.
Maintained by Levitz Carly E. Last updated 2 days ago.
5.3 match 3 stars 5.99 score 26 scriptswjbraun
DAAG:Data Analysis and Graphics Data and Functions
Functions and data sets used in examples and exercises in the text Maindonald, J.H. and Braun, W.J. (2003, 2007, 2010) "Data Analysis and Graphics Using R", and in an upcoming Maindonald, Braun, and Andrews text that builds on this earlier text.
Maintained by W. John Braun. Last updated 11 months ago.
3.8 match 8.25 score 1.2k scripts 1 dependentsreed-evic
cpsvote:A Toolbox for Using the CPSโs Voting and Registration Supplement
Provides automated methods for downloading, recoding, and merging selected years of the Current Population Survey's Voting and Registration Supplement, a large N national survey about registration, voting, and non-voting in United States federal elections. Provides documentation for appropriate use of sample weights to generate statistical estimates, drawing from Hur & Achen (2013) <doi:10.1093/poq/nft042> and McDonald (2018) <http://www.electproject.org/home/voter-turnout/voter-turnout-data>.
Maintained by Jay Lee. Last updated 2 years ago.
5.5 match 3 stars 5.58 score 21 scriptsalanarnholt
BSDA:Basic Statistics and Data Analysis
Data sets for book "Basic Statistics and Data Analysis" by Larry J. Kitchens.
Maintained by Alan T. Arnholt. Last updated 2 years ago.
3.4 match 7 stars 9.11 score 1.3k scripts 6 dependentsbioc
GARS:GARS: Genetic Algorithm for the identification of Robust Subsets of variables in high-dimensional and challenging datasets
Feature selection aims to identify and remove redundant, irrelevant and noisy variables from high-dimensional datasets. Selecting informative features affects the subsequent classification and regression analyses by improving their overall performances. Several methods have been proposed to perform feature selection: most of them relies on univariate statistics, correlation, entropy measurements or the usage of backward/forward regressions. Herein, we propose an efficient, robust and fast method that adopts stochastic optimization approaches for high-dimensional. GARS is an innovative implementation of a genetic algorithm that selects robust features in high-dimensional and challenging datasets.
Maintained by Mattia Chiesa. Last updated 5 months ago.
classificationfeatureextractionclusteringopenjdk
6.0 match 5.00 score 2 scriptslightbluetitan
usdatasets:A Comprehensive Collection of U.S. Datasets
Provides a diverse collection of U.S. datasets encompassing various fields such as crime, economics, education, finance, energy, healthcare, and more. It serves as a valuable resource for researchers and analysts seeking to perform in-depth analyses and derive insights from U.S.-specific data.
Maintained by Renzo Caceres Rossi. Last updated 5 months ago.
3.6 match 7 stars 5.99 score 141 scriptslearnitr
learnitdown:R Markdown, Bookdown and Learnr Additions for Learning Material
Extension to R Markdown, Bookdown and Learnr for building better learning and e-learning material: H5P integration, course-contextual divs, differed loading of Shiny and learnr applications, and much more ...
Maintained by Philippe Grosjean. Last updated 6 months ago.
bookdownlearning-resourcesr-markdownteaching-materials
4.8 match 13 stars 4.49 score 16 scriptsnjlyon0
dndR:Dungeons & Dragons Functions for Players and Dungeon Masters
The goal of 'dndR' is to provide a suite of Dungeons & Dragons related functions. This package is meant to be useful both to players and Dungeon Masters (DMs). All functions currently focus on Fifth Edition (a.k.a. "5e") but once the next edition is published functions will likely be expanded to include any rule changes.
Maintained by Nicholas Lyon. Last updated 11 months ago.
data-sciencedungeons-and-dragonsttrpg
3.0 match 17 stars 6.98 score 16 scriptssistm
cytometree:Automated Cytometry Gating and Annotation
Given the hypothesis of a bi-modal distribution of cells for each marker, the algorithm constructs a binary tree, the nodes of which are subpopulations of cells. At each node, observed cells and markers are modeled by both a family of normal distributions and a family of bi-modal normal mixture distributions. Splitting is done according to a normalized difference of AIC between the two families. Method is detailed in: Commenges, Alkhassim, Gottardo, Hejblum & Thiebaut (2018) <doi: 10.1002/cyto.a.23601>.
Maintained by Boris P Hejblum. Last updated 2 years ago.
3.3 match 9 stars 5.91 score 15 scripts 1 dependentsrudeboybert
resampledata:Data Sets for Mathematical Statistics with Resampling in R
Package of data sets from "Mathematical Statistics with Resampling in R" (1st Ed. 2011, 2nd Ed. 2018) by Laura Chihara and Tim Hesterberg.
Maintained by Albert Y. Kim. Last updated 4 months ago.
3.8 match 15 stars 5.15 score 187 scriptsangabrio
missingHE:Missing Outcome Data in Health Economic Evaluation
Contains a suite of functions for health economic evaluations with missing outcome data. The package can fit different types of statistical models under a fully Bayesian approach using the software 'JAGS' (which should be installed locally and which is loaded in 'missingHE' via the 'R' package 'R2jags'). Three classes of models can be fitted under a variety of missing data assumptions: selection models, pattern mixture models and hurdle models. In addition to model fitting, 'missingHE' provides a set of specialised functions to assess model convergence and fit, and to summarise the statistical and economic results using different types of measures and graphs. The methods implemented are described in Mason (2018) <doi:10.1002/hec.3793>, Molenberghs (2000) <doi:10.1007/978-1-4419-0300-6_18> and Gabrio (2019) <doi:10.1002/sim.8045>.
Maintained by Andrea Gabrio. Last updated 2 years ago.
cost-effectiveness-analysishealth-economic-evaluationindividual-level-datajagsmissing-dataparametric-modellingsensitivity-analysiscpp
3.4 match 5 stars 5.38 score 24 scriptsmodeloriented
DALEXtra:Extension for 'DALEX' Package
Provides wrapper of various machine learning models. In applied machine learning, there is a strong belief that we need to strike a balance between interpretability and accuracy. However, in field of the interpretable machine learning, there are more and more new ideas for explaining black-box models, that are implemented in 'R'. 'DALEXtra' creates 'DALEX' Biecek (2018) <arXiv:1806.08915> explainer for many type of models including those created using 'python' 'scikit-learn' and 'keras' libraries, and 'java' 'h2o' library. Important part of the package is Champion-Challenger analysis and innovative approach to model performance across subsets of test data presented in Funnel Plot.
Maintained by Szymon Maksymiuk. Last updated 2 years ago.
2.4 match 67 stars 7.71 score 400 scripts 1 dependentsantonio-pgarcia
evoper:Evolutionary Parameter Estimation for 'Repast Simphony' Models
The EvoPER, Evolutionary Parameter Estimation for Individual-based Models is an extensible package providing optimization driven parameter estimation methods using metaheuristics and evolutionary computation techniques (Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization for continuous domains, Tabu Search, Evolutionary Strategies, ...) which could be more efficient and require, in some cases, fewer model evaluations than alternatives relying on experimental design. Currently there are built in support for models developed with 'Repast Simphony' Agent-Based framework (<https://repast.github.io/>) and with NetLogo (<https://ccl.northwestern.edu/netlogo/>) which are the most used frameworks for Agent-based modeling.
Maintained by Antonio Prestes Garcia. Last updated 5 years ago.
4.5 match 6 stars 3.92 score 28 scriptsloelschlaeger
fHMM:Fitting Hidden Markov Models to Financial Data
Fitting (hierarchical) hidden Markov models to financial data via maximum likelihood estimation. See Oelschlรคger, L. and Adam, T. "Detecting Bearish and Bullish Markets in Financial Time Series Using Hierarchical Hidden Markov Models" (2021, Statistical Modelling) <doi:10.1177/1471082X211034048> for a reference on the method. A user guide is provided by the accompanying software paper "fHMM: Hidden Markov Models for Financial Time Series in R", Oelschlรคger, L., Adam, T., and Michels, R. (2024, Journal of Statistical Software) <doi:10.18637/jss.v109.i09>.
Maintained by Lennart Oelschlรคger. Last updated 6 months ago.
financehidden-markov-modelscppopenmp
2.5 match 16 stars 6.95 score 5 scriptsbiodiverse
unmarked:Models for Data from Unmarked Animals
Fits hierarchical models of animal abundance and occurrence to data collected using survey methods such as point counts, site occupancy sampling, distance sampling, removal sampling, and double observer sampling. Parameters governing the state and observation processes can be modeled as functions of covariates. References: Kellner et al. (2023) <doi:10.1111/2041-210X.14123>, Fiske and Chandler (2011) <doi:10.18637/jss.v043.i10>.
Maintained by Ken Kellner. Last updated 3 days ago.
1.3 match 4 stars 13.03 score 652 scripts 12 dependentssanfordweisberg
alr4:Data to Accompany Applied Linear Regression 4th Edition
Datasets to Accompany S. Weisberg (2014, ISBN: 978-1-118-38608-8), "Applied Linear Regression," 4th edition. Many data files in this package are included in the `alr3` package as well, so only one of them should be used.
Maintained by Sanford Weisberg. Last updated 7 years ago.
4.5 match 1 stars 3.45 score 306 scriptsshikokuchuo
mirai:Minimalist Async Evaluation Framework for R
Designed for simplicity, a 'mirai' evaluates an R expression asynchronously in a parallel process, locally or distributed over the network. The result is automatically available upon completion. Modern networking and concurrency, built on 'nanonext' and 'NNG' (Nanomsg Next Gen), ensures reliable and efficient scheduling over fast inter-process communications or TCP/IP secured by TLS. Distributed computing can launch remote resources via SSH or cluster managers. An inherently queued architecture handles many more tasks than available processes, and requires no storage on the file system. Innovative features include support for otherwise non-exportable reference objects, event-driven promises, and asynchronous parallel map.
Maintained by Charlie Gao. Last updated 3 hours ago.
asyncasynchronous-tasksconcurrencydistributed-computinghigh-performance-computingparallel-computing
1.3 match 217 stars 11.89 score 130 scripts 7 dependentshiggi13425
medicaldata:Data Package for Medical Datasets
Provides access to well-documented medical datasets for teaching. Featuring several from the Teaching of Statistics in the Health Sciences website <https://www.causeweb.org/tshs/category/dataset/>, a few reconstructed datasets of historical significance in medical research, some reformatted and extended from existing R packages, and some data donations.
Maintained by Peter Higgins. Last updated 2 years ago.
2.0 match 48 stars 7.43 score 317 scriptskwstat
agridat:Agricultural Datasets
Datasets from books, papers, and websites related to agriculture. Example graphics and analyses are included. Data come from small-plot trials, multi-environment trials, uniformity trials, yield monitors, and more.
Maintained by Kevin Wright. Last updated 30 days ago.
1.3 match 126 stars 10.78 score 1.7k scripts 1 dependentscarpentries
sandpaper:Create and Curate Carpentries Lessons
We provide tools to build a Carpentries-themed lesson repository into an accessible standalone static website. These include local tools and those designed to be used in a continuous integration context so that all the lesson author needs to focus on is writing the content of the actual lesson.
Maintained by Robert Davey. Last updated 2 months ago.
carpentriescarpentries-infrastructurecarpentries-workbenchlesson-templatelessonsmarkdownstatic-site-generator
1.8 match 44 stars 7.68 score 8 scriptsjacob-long
panelr:Regression Models and Utilities for Repeated Measures and Panel Data
Provides an object type and associated tools for storing and wrangling panel data. Implements several methods for creating regression models that take advantage of the unique aspects of panel data. Among other capabilities, automates the "within-between" (also known as "between-within" and "hybrid") panel regression specification that combines the desirable aspects of both fixed effects and random effects econometric models and fits them as multilevel models (Allison, 2009 <doi:10.4135/9781412993869.d33>; Bell & Jones, 2015 <doi:10.1017/psrm.2014.7>). These models can also be estimated via generalized estimating equations (GEE; McNeish, 2019 <doi:10.1080/00273171.2019.1602504>) and Bayesian estimation is (optionally) supported via 'Stan'. Supports estimation of asymmetric effects models via first differences (Allison, 2019 <doi:10.1177/2378023119826441>) as well as a generalized linear model extension thereof using GEE.
Maintained by Jacob A. Long. Last updated 1 years ago.
1.5 match 101 stars 8.76 score 181 scripts 1 dependentsitsleeds
pct:Propensity to Cycle Tool
Functions and example data to teach and increase the reproducibility of the methods and code underlying the Propensity to Cycle Tool (PCT), a research project and web application hosted at <https://www.pct.bike/>. For an academic paper on the methods, see Lovelace et al (2017) <doi:10.5198/jtlu.2016.862>.
Maintained by Robin Lovelace. Last updated 14 days ago.
2.0 match 20 stars 6.54 scoremrc-ide
malariasimulation:An individual based model for malaria
Specifies the latest and greatest malaria model.
Maintained by Giovanni Charles. Last updated 29 days ago.
1.5 match 16 stars 8.17 score 146 scriptslukejharmon
geiger:Analysis of Evolutionary Diversification
Methods for fitting macroevolutionary models to phylogenetic trees Pennell (2014) <doi:10.1093/bioinformatics/btu181>.
Maintained by Luke Harmon. Last updated 2 years ago.
1.6 match 1 stars 7.84 score 2.3k scripts 28 dependentsmikejohnson51
climateR:climateR
Find, subset, and retrive geospatial data by AOI.
Maintained by Mike Johnson. Last updated 3 months ago.
aoiclimatedatasetgeospatialgridded-climate-dataweather
1.3 match 187 stars 8.74 score 156 scripts 1 dependentschrisbrownlie
bushtucker:'I'm a Celebrity Get Me Out of Here' Data
Data on the first 24 seasons of the UK TV show 'I'm a Celebrity, Get Me Out of Here', broadcast from 2002-2024.
Maintained by Chris Brownlie. Last updated 1 months ago.
3.8 match 3.00 score 3 scriptsbioc
maPredictDSC:Phenotype prediction using microarray data: approach of the best overall team in the IMPROVER Diagnostic Signature Challenge
This package implements the classification pipeline of the best overall team (Team221) in the IMPROVER Diagnostic Signature Challenge. Additional functionality is added to compare 27 combinations of data preprocessing, feature selection and classifier types.
Maintained by Adi Laurentiu Tarca. Last updated 5 months ago.
4.8 match 2.30 score 2 scriptsf0nzie
rODE:Ordinary Differential Equation (ODE) Solvers Written in R Using S4 Classes
Show physics, math and engineering students how an ODE solver is made and how effective R classes can be for the construction of the equations that describe natural phenomena. Inspiration for this work comes from the book on "Computer Simulations in Physics" by Harvey Gould, Jan Tobochnik, and Wolfgang Christian. Book link: <http://www.compadre.org/osp/items/detail.cfm?ID=7375>.
Maintained by Alfonso R. Reyes. Last updated 7 years ago.
2.0 match 5.50 score 71 scriptscanmod
macpan2:Fast and Flexible Compartmental Modelling
Fast and flexible compartmental modelling with Template Model Builder.
Maintained by Steve Walker. Last updated 1 days ago.
compartmental-modelsepidemiologyforecastingmixed-effectsmodel-fittingoptimizationsimulationsimulation-modelingcpp
1.2 match 4 stars 8.90 score 246 scripts 1 dependentsbioc
cellxgenedp:Discover and Access Single Cell Data Sets in the CELLxGENE Data Portal
The cellxgene data portal (https://cellxgene.cziscience.com/) provides a graphical user interface to collections of single-cell sequence data processed in standard ways to 'count matrix' summaries. The cellxgenedp package provides an alternative, R-based inteface, allowind data discovery, viewing, and downloading.
Maintained by Martin Morgan. Last updated 5 months ago.
singlecelldataimportthirdpartyclient
1.5 match 8 stars 6.64 score 27 scriptsusepa
ctxR:Utilities for Interacting with the 'CTX' APIs
Access chemical, hazard, bioactivity, and exposure data from the Computational Toxicology and Exposure ('CTX') APIs <https://www.epa.gov/comptox-tools/computational-toxicology-and-exposure-apis>. 'ctxR' was developed to streamline the process of accessing the information available through the 'CTX' APIs without requiring prior knowledge of how to use APIs. Most data is also available on the CompTox Chemical Dashboard ('CCD') <https://comptox.epa.gov/dashboard/> and other resources found at the EPA Computational Toxicology and Exposure Online Resources <https://www.epa.gov/comptox-tools>.
Maintained by Paul Kruse. Last updated 2 months ago.
1.2 match 10 stars 8.02 score 13 scripts 1 dependentsgithubwilly
SymbolicDeterminants:Symbolic Representation of Matrix Determinant
Creates a numeric guide for writing the formula for the determinant of a square matrix (a detguide) as a function of the elements of the matrix and writes out that formula, the symbolic representation.
Maintained by William Fairweather. Last updated 4 years ago.
3.5 match 2.70 scorebergsmat
nonmemica:Create and Evaluate NONMEM Models in a Project Context
Systematically creates and modifies NONMEM(R) control streams. Harvests NONMEM output, builds run logs, creates derivative data, generates diagnostics. NONMEM (ICON Development Solutions <https://www.iconplc.com/>) is software for nonlinear mixed effects modeling. See 'package?nonmemica'.
Maintained by Tim Bergsma. Last updated 2 months ago.
2.0 match 4 stars 4.58 score 45 scriptstimhesterberg
resampledata3:Data Sets for "Mathematical Statistics with Resampling and R" (3rd Ed)
Data sets for Chihara and Hesterberg (2022, ISBN: 978-1-119-87404-1) "Mathematical Statistics with Resampling in R" (3rd Ed).
Maintained by Tim Hesterberg. Last updated 3 years ago.
6.0 match 1.52 score 33 scriptstuncerkerem
experimentr:Datasets Used in Social Science Experiments: A Hands-on Introduction
Contains all the datasets that were used in Social Science Experiments: A Hands-On Introduction and in its R Companion. Relevant materials can be found at <https://osf.io/b78je>.
Maintained by Kerem Tuncer. Last updated 3 years ago.
3.3 match 2.70 score 8 scriptsgreat-northern-diver
loon.ggplot:A Grammar of Interactive Graphics
Provides a bridge between the 'loon' and 'ggplot2' packages. Extends the grammar of ggplot to add clauses to create interactive 'loon' plots. Existing ggplot(s) can be turned into interactive 'loon' plots and 'loon' plots into static ggplot(s); the function 'loon.ggplot()' is the bridge from one plot structure to the other.
Maintained by Zehao Xu. Last updated 10 months ago.
data-analysisggplotggplot-featuresgraphicsinteractive-plotsloonvisualizations
1.3 match 24 stars 7.11 score 9 scripts 3 dependentsbryanhanson
readJDX:Import Data in the JCAMP-DX Format
Import data written in the JCAMP-DX format. This is an instrument-independent format used in the field of spectroscopy. Examples include IR, NMR, and Raman spectroscopy. See the vignette for background and supported formats. The official JCAMP-DX site is <http://www.jcamp-dx.org/>.
Maintained by Bryan A. Hanson. Last updated 1 years ago.
1.3 match 8 stars 6.48 score 7 scripts 5 dependentsrnuske
komaletter:Simply Beautiful PDF Letters from Markdown
Write beautiful yet customizable letters in R Markdown and directly obtain the finished PDF. Smooth generation of PDFs is realized by 'rmarkdown', the 'pandoc-letter' template and the 'KOMA-Script' letter class. 'KOMA-Script' provides enhanced replacements for the standard 'LaTeX' classes with emphasis on typography and versatility. 'KOMA-Script' is particularly useful for international writers as it handles various paper formats well, provides layouts for many common window envelope types (e.g. German, US, French, Japanese) and lets you define your own layouts. The package comes with a default letter layout based on 'DIN 5008B'.
Maintained by Robert Nuske. Last updated 9 months ago.
koma-scriptlatexlettermarkdownpandocpandoc-letterpdf
1.3 match 87 stars 6.78 score 3 scriptsdpmcsuss
iGraphMatch:Tools for Graph Matching
Versatile tools and data for graph matching analysis with various forms of prior information that supports working with 'igraph' objects, matrix objects, or lists of either.
Maintained by Daniel Sussman. Last updated 10 months ago.
graph-algorithmsgraph-matchingcpp
1.5 match 9 stars 5.65 score 9 scriptscarpentries
pegboard:Explore and Manipulate Markdown Curricula from the Carpentries
The Carpentries (<https://carpentries.org>) curricula is made of of lessons that are hosted as websites. Each lesson represents between a half day to two days of instruction and contains several episodes, which are written as 'kramdown'-flavored 'markdown' documents and converted to HTML using the 'Jekyll' static website generator. This package builds on top of the 'tinkr' package; reads in these markdown documents to 'XML' and stores them in R6 classes for convenient exploration and manipulation of sections within episodes.
Maintained by Robert Davey. Last updated 24 days ago.
1.8 match 6 stars 4.58 score 4 scripts 1 dependentsoobianom
quickcode:Quick and Essential 'R' Tricks for Better Scripts
The NOT functions, 'R' tricks and a compilation of some simple quick plus often used 'R' codes to improve your scripts. Improve the quality and reproducibility of 'R' scripts.
Maintained by Obinna Obianom. Last updated 16 days ago.
1.0 match 5 stars 7.76 score 7 scripts 6 dependentsusepa
ccdR:Utilities for Interacting with the 'CTX' APIs
Access chemical, hazard, bioactivity, and exposure data from the Computational Toxicology and Exposure ('CTX') APIs <https://api-ccte.epa.gov/docs/>. 'ccdR' was developed to streamline the process of accessing the information available through the 'CTX' APIs without requiring prior knowledge of how to use APIs. Most data is also available on the CompTox Chemical Dashboard ('CCD') <https://comptox.epa.gov/dashboard/> and other resources found at the EPA Computational Toxicology and Exposure Online Resources <https://www.epa.gov/comptox-tools>.
Maintained by Paul Kruse. Last updated 8 months ago.
1.2 match 2 stars 6.38 score 7 scriptstidymodels
workflows:Modeling Workflows
Managing both a 'parsnip' model and a preprocessor, such as a model formula or recipe from 'recipes', can often be challenging. The goal of 'workflows' is to streamline this process by bundling the model alongside the preprocessor, all within the same object.
Maintained by Simon Couch. Last updated 27 days ago.
0.5 match 207 stars 13.80 score 876 scripts 43 dependentspmair78
MPsychoR:Modern Psychometrics with R
Supplementary materials and datasets for the book "Modern Psychometrics With R" (Mair, 2018, Springer useR! series).
Maintained by Patrick Mair. Last updated 5 years ago.
4.0 match 1.73 score 54 scriptsle-huynh
lehuynh:Le-Huynh Truc-Ly's R Code and Templates
Miscellaneous R functions (for graphics, data import, data transformation, and general utilities) and templates (for exploratory analysis, Bayesian modeling, and crafting scientific manuscripts).
Maintained by Truc-Ly Le-Huynh. Last updated 9 months ago.
1.8 match 3 stars 3.88 score 4 scriptsconnormayer
maxent.ot:Perform Phonological Analyses using Maximum Entropy Optimality Theory
Fit Maximum Entropy Optimality Theory models to data sets, generate the predictions made by such models for novel data, and compare the fit of different models using a variety of metrics. The package is described in Mayer, C., Tan, A., Zuraw, K. (in press) <https://sites.socsci.uci.edu/~cjmayer/papers/cmayer_et_al_maxent_ot_accepted.pdf>.
Maintained by Connor Mayer. Last updated 4 months ago.
1.2 match 8 stars 5.51 score 6 scriptsbioc
SNPRelate:Parallel Computing Toolset for Relatedness and Principal Component Analysis of SNP Data
Genome-wide association studies (GWAS) are widely used to investigate the genetic basis of diseases and traits, but they pose many computational challenges. We developed an R package SNPRelate to provide a binary format for single-nucleotide polymorphism (SNP) data in GWAS utilizing CoreArray Genomic Data Structure (GDS) data files. The GDS format offers the efficient operations specifically designed for integers with two bits, since a SNP could occupy only two bits. SNPRelate is also designed to accelerate two key computations on SNP data using parallel computing for multi-core symmetric multiprocessing computer architectures: Principal Component Analysis (PCA) and relatedness analysis using Identity-By-Descent measures. The SNP GDS format is also used by the GWASTools package with the support of S4 classes and generic functions. The extended GDS format is implemented in the SeqArray package to support the storage of single nucleotide variations (SNVs), insertion/deletion polymorphism (indel) and structural variation calls in whole-genome and whole-exome variant data.
Maintained by Xiuwen Zheng. Last updated 5 months ago.
infrastructuregeneticsstatisticalmethodprincipalcomponentbioinformaticsgds-formatpcasimdsnpopenblascpp
0.5 match 104 stars 12.69 score 1.6k scripts 18 dependentsa-dudek-ue
mdsOpt:Searching for Optimal MDS Procedure for Metric and Interval-Valued Data
Selecting the optimal multidimensional scaling (MDS) procedure for metric data via metric MDS (ratio, interval, mspline) and nonmetric MDS (ordinal). Selecting the optimal multidimensional scaling (MDS) procedure for interval-valued data via metric MDS (ratio, interval, mspline).Selecting the optimal multidimensional scaling procedure for interval-valued data by varying all combinations of normalization and optimization methods.Selecting the optimal MDS procedure for statistical data referring to the evaluation of tourist attractiveness of Lower Silesian counties. (Borg, I., Groenen, P.J.F., Mair, P. (2013) <doi:10.1007/978-3-642-31848-1>, Walesiak, M. (2016) <doi:10.15611/ekt.2016.2.01>, Walesiak, M. (2017) <doi:10.15611/ekt.2017.3.01>).
Maintained by Andrzej Dudek. Last updated 1 years ago.
2.6 match 2.28 score 19 scriptserichson
rsvd:Randomized Singular Value Decomposition
Low-rank matrix decompositions are fundamental tools and widely used for data analysis, dimension reduction, and data compression. Classically, highly accurate deterministic matrix algorithms are used for this task. However, the emergence of large-scale data has severely challenged our computational ability to analyze big data. The concept of randomness has been demonstrated as an effective strategy to quickly produce approximate answers to familiar problems such as the singular value decomposition (SVD). The rsvd package provides several randomized matrix algorithms such as the randomized singular value decomposition (rsvd), randomized principal component analysis (rpca), randomized robust principal component analysis (rrpca), randomized interpolative decomposition (rid), and the randomized CUR decomposition (rcur). In addition several plot functions are provided.
Maintained by N. Benjamin Erichson. Last updated 4 years ago.
dimension-reductionmatrix-approximationpcaprincipal-component-analysisprobabilistic-algorithmsrandomized-algorithmsingular-value-decompositionsvd
0.5 match 98 stars 10.80 score 408 scripts 119 dependentsbioc
BASiCS:Bayesian Analysis of Single-Cell Sequencing data
Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model to perform statistical analyses of single-cell RNA sequencing datasets in the context of supervised experiments (where the groups of cells of interest are known a priori, e.g. experimental conditions or cell types). BASiCS performs built-in data normalisation (global scaling) and technical noise quantification (based on spike-in genes). BASiCS provides an intuitive detection criterion for highly (or lowly) variable genes within a single group of cells. Additionally, BASiCS can compare gene expression patterns between two or more pre-specified groups of cells. Unlike traditional differential expression tools, BASiCS quantifies changes in expression that lie beyond comparisons of means, also allowing the study of changes in cell-to-cell heterogeneity. The latter can be quantified via a biological over-dispersion parameter that measures the excess of variability that is observed with respect to Poisson sampling noise, after normalisation and technical noise removal. Due to the strong mean/over-dispersion confounding that is typically observed for scRNA-seq datasets, BASiCS also tests for changes in residual over-dispersion, defined by residual values with respect to a global mean/over-dispersion trend.
Maintained by Catalina Vallejos. Last updated 5 months ago.
immunooncologynormalizationsequencingrnaseqsoftwaregeneexpressiontranscriptomicssinglecelldifferentialexpressionbayesiancellbiologybioconductor-packagegene-expressionrcpprcpparmadilloscrna-seqsingle-cellopenblascppopenmp
0.5 match 83 stars 10.26 score 368 scripts 1 dependentsjustinmshea
wooldridge:115 Data Sets from "Introductory Econometrics: A Modern Approach, 7e" by Jeffrey M. Wooldridge
Students learning both econometrics and R may find the introduction to both challenging. The wooldridge data package aims to lighten the task by efficiently loading any data set found in the text with a single command. Data sets have been compressed to a fraction of their original size. Documentation files contain page numbers, the original source, time of publication, and notes from the author suggesting avenues for further analysis and research. If one needs an introduction to R model syntax, a vignette contains solutions to examples from chapters of the text. Data sets are from the 7th edition (Wooldridge 2020, ISBN-13 978-1-337-55886-0), and are backwards compatible with all previous versions of the text.
Maintained by Justin M. Shea. Last updated 4 months ago.
0.5 match 203 stars 9.38 score 1.4k scriptsbioc
tenXplore:ontological exploration of scRNA-seq of 1.3 million mouse neurons from 10x genomics
Perform ontological exploration of scRNA-seq of 1.3 million mouse neurons from 10x genomics.
Maintained by VJ Carey. Last updated 5 months ago.
immunooncologydimensionreductionprincipalcomponenttranscriptomicssinglecell
1.1 match 4.18 score 7 scriptsbioc
scmap:A tool for unsupervised projection of single cell RNA-seq data
Single-cell RNA-seq (scRNA-seq) is widely used to investigate the composition of complex tissues since the technology allows researchers to define cell-types using unsupervised clustering of the transcriptome. However, due to differences in experimental methods and computational analyses, it is often challenging to directly compare the cells identified in two different experiments. scmap is a method for projecting cells from a scRNA-seq experiment on to the cell-types or individual cells identified in a different experiment.
Maintained by Vladimir Kiselev. Last updated 5 months ago.
immunooncologysinglecellsoftwareclassificationsupportvectormachinernaseqvisualizationtranscriptomicsdatarepresentationtranscriptionsequencingpreprocessinggeneexpressiondataimportbioconductor-packagehuman-cell-atlasprojection-mappingsingle-cell-rna-seqopenblascpp
0.5 match 95 stars 8.82 score 172 scriptsgogonzo
sport:Sequential Pairwise Online Rating Techniques
Calculates ratings for two-player or multi-player challenges. Methods included in package such as are able to estimate ratings (players strengths) and their evolution in time, also able to predict output of challenge. Algorithms are based on Bayesian Approximation Method, and they don't involve any matrix inversions nor likelihood estimation. Parameters are updated sequentially, and computation doesn't require any additional RAM to make estimation feasible. Additionally, base of the package is written in C++ what makes sport computation even faster. Methods used in the package refers to Mark E. Glickman (1999) <http://www.glicko.net/research/glicko.pdf>; Mark E. Glickman (2001) <doi:10.1080/02664760120059219>; Ruby C. Weng, Chih-Jen Lin (2011) <http://jmlr.csail.mit.edu/papers/volume12/weng11a/weng11a.pdf>; W. Penny, Stephen J. Roberts (1999) <doi:10.1109/IJCNN.1999.832603>.
Maintained by Dawid Kaลฤdkowski. Last updated 5 years ago.
0.8 match 25 stars 5.78 score 24 scriptscran
gains:Lift (Gains) Tables and Charts
Constructs gains tables and lift charts for prediction algorithms. Gains tables and lift charts are commonly used in direct marketing applications. The method is described in Drozdenko and Drake (2002), "Optimal Database Marketing", Chapter 11.
Maintained by Craig A. Rolling. Last updated 8 years ago.
3.4 match 1.26 scorebioc
SIMLR:Single-cell Interpretation via Multi-kernel LeaRning (SIMLR)
Single-cell RNA-seq technologies enable high throughput gene expression measurement of individual cells, and allow the discovery of heterogeneity within cell populations. Measurement of cell-to-cell gene expression similarity is critical for the identification, visualization and analysis of cell populations. However, single-cell data introduce challenges to conventional measures of gene expression similarity because of the high level of noise, outliers and dropouts. We develop a novel similarity-learning framework, SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), which learns an appropriate distance metric from the data for dimension reduction, clustering and visualization.
Maintained by Luca De Sano. Last updated 5 months ago.
immunooncologyclusteringgeneexpressionsequencingsinglecellopenblascpp
0.5 match 111 stars 8.49 score 69 scriptsrazrahman
IntegratedMRF:Integrated Prediction using Uni-Variate and Multivariate Random Forests
An implementation of a framework for drug sensitivity prediction from various genetic characterizations using ensemble approaches. Random Forests or Multivariate Random Forest predictive models can be generated from each genetic characterization that are then combined using a Least Square Regression approach. It also provides options for the use of different error estimation approaches of Leave-one-out, Bootstrap, N-fold cross validation and 0.632+Bootstrap along with generation of prediction confidence interval using Jackknife-after-Bootstrap approach.
Maintained by Raziur Rahman. Last updated 7 years ago.
3.4 match 1.26 score 18 scriptsnumbersman77
eventTrack:Event Prediction for Time-to-Event Endpoints
Implements the hybrid framework for event prediction described in Fang & Zheng (2011, <doi:10.1016/j.cct.2011.05.013>). To estimate the survival function the event prediction is based on, a piecewise exponential hazard function is fit to the time-to-event data to infer the potential change points. Prior to the last identified change point, the survival function is estimated using Kaplan-Meier, and the tail after the change point is fit using piecewise exponential.
Maintained by Kaspar Rufibach. Last updated 24 days ago.
2.0 match 1 stars 2.00 score 2 scriptsbioc
SRAdb:A compilation of metadata from NCBI SRA and tools
The Sequence Read Archive (SRA) is the largest public repository of sequencing data from the next generation of sequencing platforms including Roche 454 GS System, Illumina Genome Analyzer, Applied Biosystems SOLiD System, Helicos Heliscope, and others. However, finding data of interest can be challenging using current tools. SRAdb is an attempt to make access to the metadata associated with submission, study, sample, experiment and run much more feasible. This is accomplished by parsing all the NCBI SRA metadata into a SQLite database that can be stored and queried locally. Fulltext search in the package make querying metadata very flexible and powerful. fastq and sra files can be downloaded for doing alignment locally. Beside ftp protocol, the SRAdb has funcitons supporting fastp protocol (ascp from Aspera Connect) for faster downloading large data files over long distance. The SQLite database is updated regularly as new data is added to SRA and can be downloaded at will for the most up-to-date metadata.
Maintained by Jack Zhu. Last updated 3 months ago.
infrastructuresequencingdataimport
0.5 match 2 stars 7.81 score 200 scriptsreichlab
zoltr:Interface to the 'Zoltar' Forecast Repository API
'Zoltar' <https://www.zoltardata.com/> is a website that provides a repository of model forecast results in a standardized format and a central location. It supports storing, retrieving, comparing, and analyzing time series forecasts for prediction challenges of interest to the modeling community. This package provides functions for working with the 'Zoltar' API, including connecting and authenticating, getting meta information (projects, models, and forecasts, and truth), and uploading, downloading, and deleting forecast and truth data.
Maintained by Matthew Cornell. Last updated 12 days ago.
0.5 match 2 stars 7.58 score 175 scripts 3 dependentschristopherkenny
royale:Clash Royale API
R interface to the official API for Clash Royale <https://developer.clashroyale.com/#/>.
Maintained by Christopher T. Kenny. Last updated 1 years ago.
2.3 match 1.70 score 4 scriptsyuelyu21
SCIntRuler:Guiding the Integration of Multiple Single-Cell RNA-Seq Datasets
The accumulation of single-cell RNA-seq (scRNA-seq) studies highlights the potential benefits of integrating multiple datasets. By augmenting sample sizes and enhancing analytical robustness, integration can lead to more insightful biological conclusions. However, challenges arise due to the inherent diversity and batch discrepancies within and across studies. SCIntRuler, a novel R package, addresses these challenges by guiding the integration of multiple scRNA-seq datasets.
Maintained by Yue Lyu. Last updated 5 months ago.
sequencinggeneticvariabilitysinglecellcpp
0.8 match 2 stars 4.85 score 3 scriptsextremestats
DATAstudio:The Research Data Warehouse of Miguel de Carvalho
Pulls together a collection of datasets from Miguel de Carvalho research articles. Including, for example: - de Carvalho (2012) <doi:10.1016/j.jspi.2011.08.016>; - de Carvalho et al (2012) <doi:10.1080/03610926.2012.709905>; - de Carvalho et al (2012) <doi:10.1016/j.econlet.2011.09.007>); - de Carvalho and Davison (2014) <doi:10.1080/01621459.2013.872651>; - de Carvalho and Rua (2017) <doi:10.1016/j.ijforecast.2015.09.004>; - de Carvalho et al (2023) <doi:10.1002/sta4.560>; - de Carvalho et al (2022) <doi:10.1007/s13253-021-00469-9>; - Palacios et al (2024) <doi:10.1214/24-BA1420>.
Maintained by Miguel de Carvalho. Last updated 1 days ago.
3.8 match 1.00 score 2 scriptscran
SeqDetect:Sequence and Latent Process Detector
Sequence detector in this package contains a specific automaton model that can be used to learn and detect data and process sequences. Automaton model in this package is capable of learning and tracing sequences. Automaton model can be found in Krleลพa, Vrdoljak, Brฤiฤ (2019) <doi:10.1109/ACCESS.2019.2955245>. This research has been partly supported under Competitiveness and Cohesion Operational Programme from the European Regional and Development Fund, as part of the Integrated Anti-Fraud System project no. KK.01.2.1.01.0041. This research has also been partly supported by the European Regional Development Fund under the grant KK.01.1.1.01.0009.
Maintained by Dalibor Krleลพa. Last updated 5 years ago.
1.9 match 2.00 score 2 scriptsbioc
AMARETTO:Regulatory Network Inference and Driver Gene Evaluation using Integrative Multi-Omics Analysis and Penalized Regression
Integrating an increasing number of available multi-omics cancer data remains one of the main challenges to improve our understanding of cancer. One of the main challenges is using multi-omics data for identifying novel cancer driver genes. We have developed an algorithm, called AMARETTO, that integrates copy number, DNA methylation and gene expression data to identify a set of driver genes by analyzing cancer samples and connects them to clusters of co-expressed genes, which we define as modules. We applied AMARETTO in a pancancer setting to identify cancer driver genes and their modules on multiple cancer sites. AMARETTO captures modules enriched in angiogenesis, cell cycle and EMT, and modules that accurately predict survival and molecular subtypes. This allows AMARETTO to identify novel cancer driver genes directing canonical cancer pathways.
Maintained by Olivier Gevaert. Last updated 5 months ago.
statisticalmethoddifferentialmethylationgeneregulationgeneexpressionmethylationarraytranscriptionpreprocessingbatcheffectdataimportmrnamicroarraymicrornaarrayregressionclusteringrnaseqcopynumbervariationsequencingmicroarraynormalizationnetworkbayesianexonarrayonechanneltwochannelproprietaryplatformsalternativesplicingdifferentialexpressiondifferentialsplicinggenesetenrichmentmultiplecomparisonqualitycontroltimecourse
0.8 match 4.88 score 15 scriptseltebioinformatics
mulea:Enrichment Analysis Using Multiple Ontologies and False Discovery Rate
Background - Traditional gene set enrichment analyses are typically limited to a few ontologies and do not account for the interdependence of gene sets or terms, resulting in overcorrected p-values. To address these challenges, we introduce mulea, an R package offering comprehensive overrepresentation and functional enrichment analysis. Results - mulea employs a progressive empirical false discovery rate (eFDR) method, specifically designed for interconnected biological data, to accurately identify significant terms within diverse ontologies. mulea expands beyond traditional tools by incorporating a wide range of ontologies, encompassing Gene Ontology, pathways, regulatory elements, genomic locations, and protein domains. This flexibility enables researchers to tailor enrichment analysis to their specific questions, such as identifying enriched transcriptional regulators in gene expression data or overrepresented protein domains in protein sets. To facilitate seamless analysis, mulea provides gene sets (in standardised GMT format) for 27 model organisms, covering 22 ontology types from 16 databases and various identifiers resulting in almost 900 files. Additionally, the muleaData ExperimentData Bioconductor package simplifies access to these pre-defined ontologies. Finally, mulea's architecture allows for easy integration of user-defined ontologies, or GMT files from external sources (e.g., MSigDB or Enrichr), expanding its applicability across diverse research areas. Conclusions - mulea is distributed as a CRAN R package. It offers researchers a powerful and flexible toolkit for functional enrichment analysis, addressing limitations of traditional tools with its progressive eFDR and by supporting a variety of ontologies. Overall, mulea fosters the exploration of diverse biological questions across various model organisms.
Maintained by Tamas Stirling. Last updated 3 months ago.
annotationdifferentialexpressiongeneexpressiongenesetenrichmentgographandnetworkmultiplecomparisonpathwaysreactomesoftwaretranscriptionvisualizationenrichmentenrichment-analysisfunctional-enrichment-analysisgene-set-enrichmentontologiestranscriptomicscpp
0.5 match 28 stars 7.36 score 34 scriptsfawda123
SWMPr:Retrieving, Organizing, and Analyzing Estuary Monitoring Data
Tools for retrieving, organizing, and analyzing environmental data from the System Wide Monitoring Program of the National Estuarine Research Reserve System <https://cdmo.baruch.sc.edu/>. These tools address common challenges associated with continuous time series data for environmental decision making.
Maintained by Marcus W. Beck. Last updated 1 months ago.
0.5 match 13 stars 7.05 score 143 scripts 1 dependentsstatdivlab
rigr:Regression, Inference, and General Data Analysis Tools in R
A set of tools to streamline data analysis. Learning both R and introductory statistics at the same time can be challenging, and so we created 'rigr' to facilitate common data analysis tasks and enable learners to focus on statistical concepts. We provide easy-to-use interfaces for descriptive statistics, one- and two-sample inference, and regression analyses. 'rigr' output includes key information while omitting unnecessary details that can be confusing to beginners. Heteroscedasticity-robust ("sandwich") standard errors are returned by default, and multiple partial F-tests and tests for contrasts are easy to specify. A single regression function can fit both linear and generalized linear models, allowing students to more easily make connections between different classes of models.
Maintained by Amy D Willis. Last updated 9 months ago.
0.5 match 10 stars 7.09 score 39 scriptscmerow
rangeModelMetadata:Provides Templates for Metadata Files Associated with Species Range Models
Range Modeling Metadata Standards (RMMS) address three challenges: they (i) are designed for convenience to encourage use, (ii) accommodate a wide variety of applications, and (iii) are extensible to allow the community of range modelers to steer it as needed. RMMS are based on a data dictionary that specifies a hierarchical structure to catalog different aspects of the range modeling process. The dictionary balances a constrained, minimalist vocabulary to improve standardization with flexibility for users to provide their own values. Merow et al. (2019) <DOI:10.1111/geb.12993> describe the standards in more detail. Note that users who prefer to use the R package 'ecospat' can obtain it from <https://github.com/ecospat/ecospat>.
Maintained by Cory Merow. Last updated 8 months ago.
ecological-metadata-languageecological-modellingecological-modelsecologyspecies-distribution-modellingspecies-distributions
0.5 match 6 stars 6.96 score 16 scripts 3 dependentsropensci
waywiser:Ergonomic Methods for Assessing Spatial Models
Assessing predictive models of spatial data can be challenging, both because these models are typically built for extrapolating outside the original region represented by training data and due to potential spatially structured errors, with "hot spots" of higher than expected error clustered geographically due to spatial structure in the underlying data. Methods are provided for assessing models fit to spatial data, including approaches for measuring the spatial structure of model errors, assessing model predictions at multiple spatial scales, and evaluating where predictions can be made safely. Methods are particularly useful for models fit using the 'tidymodels' framework. Methods include Moran's I ('Moran' (1950) <doi:10.2307/2332142>), Geary's C ('Geary' (1954) <doi:10.2307/2986645>), Getis-Ord's G ('Ord' and 'Getis' (1995) <doi:10.1111/j.1538-4632.1995.tb00912.x>), agreement coefficients from 'Ji' and Gallo (2006) (<doi: 10.14358/PERS.72.7.823>), agreement metrics from 'Willmott' (1981) (<doi: 10.1080/02723646.1981.10642213>) and 'Willmott' 'et' 'al'. (2012) (<doi: 10.1002/joc.2419>), an implementation of the area of applicability methodology from 'Meyer' and 'Pebesma' (2021) (<doi:10.1111/2041-210X.13650>), and an implementation of multi-scale assessment as described in 'Riemann' 'et' 'al'. (2010) (<doi:10.1016/j.rse.2010.05.010>).
Maintained by Michael Mahoney. Last updated 23 hours ago.
spatialspatial-analysistidymodelstidyverse
0.5 match 37 stars 6.93 score 19 scriptsbioc
rawDiag:Brings Orbitrap Mass Spectrometry Data to Life; Fast and Colorful
Optimizing methods for liquid chromatography coupled to mass spectrometry (LC-MS) poses a nontrivial challenge. The rawDiag package facilitates rational method optimization by generating MS operator-tailored diagnostic plots of scan-level metadata. The package is designed for use on the R shell or as a Shiny application on the Orbitrap instrument PC.
Maintained by Christian Panse. Last updated 4 months ago.
massspectrometryproteomicsmetabolomicsinfrastructuresoftwareshinyappsfastmass-spectrometrymultiplatformorbitrapvisualization
0.5 match 36 stars 6.71 score 18 scriptsbioc
escheR:Unified multi-dimensional visualizations with Gestalt principles
The creation of effective visualizations is a fundamental component of data analysis. In biomedical research, new challenges are emerging to visualize multi-dimensional data in a 2D space, but current data visualization tools have limited capabilities. To address this problem, we leverage Gestalt principles to improve the design and interpretability of multi-dimensional data in 2D data visualizations, layering aesthetics to display multiple variables. The proposed visualization can be applied to spatially-resolved transcriptomics data, but also broadly to data visualized in 2D space, such as embedding visualizations. We provide this open source R package escheR, which is built off of the state-of-the-art ggplot2 visualization framework and can be seamlessly integrated into genomics toolboxes and workflows.
Maintained by Boyi Guo. Last updated 5 months ago.
spatialsinglecelltranscriptomicsvisualizationsoftwaremultidimensionalsingle-cellspatial-omics
0.5 match 6 stars 6.74 score 153 scripts 1 dependentsbioc
Modstrings:Working with modified nucleotide sequences
Representing nucleotide modifications in a nucleotide sequence is usually done via special characters from a number of sources. This represents a challenge to work with in R and the Biostrings package. The Modstrings package implements this functionallity for RNA and DNA sequences containing modified nucleotides by translating the character internally in order to work with the infrastructure of the Biostrings package. For this the ModRNAString and ModDNAString classes and derivates and functions to construct and modify these objects despite the encoding issues are implemenented. In addition the conversion from sequences to list like location information (and the reverse operation) is implemented as well.
Maintained by Felix G.M. Ernst. Last updated 5 months ago.
dataimportdatarepresentationinfrastructuresequencingsoftwarebioconductorbiostringsdnadna-modificationsmodified-nucleotidesnucleotidesrnarna-modification-alphabetrna-modificationssequences
0.5 match 1 stars 6.64 score 5 scripts 8 dependentscefet-rj-dal
daltoolbox:Leveraging Experiment Lines to Data Analytics
The natural increase in the complexity of current research experiments and data demands better tools to enhance productivity in Data Analytics. The package is a framework designed to address the modern challenges in data analytics workflows. The package is inspired by Experiment Line concepts. It aims to provide seamless support for users in developing their data mining workflows by offering a uniform data model and method API. It enables the integration of various data mining activities, including data preprocessing, classification, regression, clustering, and time series prediction. It also offers options for hyper-parameter tuning and supports integration with existing libraries and languages. Overall, the package provides researchers with a comprehensive set of functionalities for data science, promoting ease of use, extensibility, and integration with various tools and libraries. Information on Experiment Line is based on Ogasawara et al. (2009) <doi:10.1007/978-3-642-02279-1_20>.
Maintained by Eduardo Ogasawara. Last updated 1 months ago.
0.5 match 1 stars 6.65 score 536 scripts 4 dependentssrkobakian
sugarbag:Create Tessellated Hexagon Maps
Create a hexagon tile map display from spatial polygons. Each polygon is represented by a hexagon tile, placed as close to it's original centroid as possible, with a focus on maintaining spatial relationship to a focal point. Developed to aid visualisation and analysis of spatial distributions across Australia, which can be challenging due to the concentration of the population on the coast and wide open interior.
Maintained by Dianne Cook. Last updated 2 years ago.
0.5 match 42 stars 6.52 score 53 scriptsepinowcast
epidist:Estimate Epidemiological Delay Distributions With brms
Understanding and accurately estimating epidemiological delay distributions is important for public health policy. These estimates influence epidemic situational awareness, control strategies, and resource allocation. This package provides methods to address the key challenges in estimating these distributions, including truncation, interval censoring, and dynamical biases. These issues are frequently overlooked, resulting in biased conclusions. Built on top of 'brms', it allows for flexible modelling including time-varying spatial components and partially pooled estimates of demographic characteristics.
Maintained by Sam Abbott. Last updated 11 days ago.
0.5 match 14 stars 6.52 score 7 scriptsvkrakovna
sbfc:Selective Bayesian Forest Classifier
An MCMC algorithm for simultaneous feature selection and classification, and visualization of the selected features and feature interactions. An implementation of SBFC by Krakovna, Du and Liu (2015), <arXiv:1506.02371>.
Maintained by Viktoriya Krakovna. Last updated 3 years ago.
3.3 match 1.00 score 4 scriptsradicalcommecol
cxr:A Toolbox for Modelling Species Coexistence in R
Recent developments in modern coexistence theory have advanced our understanding on how species are able to persist and co-occur with other species at varying abundances. However, applying this mathematical framework to empirical data is still challenging, precluding a larger adoption of the theoretical tools developed by empiricists. This package provides a complete toolbox for modelling interaction effects between species, and calculate fitness and niche differences. The functions are flexible, may accept covariates, and different fitting algorithms can be used. A full description of the underlying methods is available in Garcรญa-Callejas, D., Godoy, O., and Bartomeus, I. (2020) <doi:10.1111/2041-210X.13443>. Furthermore, the package provides a series of functions to calculate dynamics for stage-structured populations across sites.
Maintained by David Garcia-Callejas. Last updated 1 months ago.
0.5 match 10 stars 6.51 score 27 scriptsbioc
SpliceWiz:interactive analysis and visualization of alternative splicing in R
The analysis and visualization of alternative splicing (AS) events from RNA sequencing data remains challenging. SpliceWiz is a user-friendly and performance-optimized R package for AS analysis, by processing alignment BAM files to quantify read counts across splice junctions, IRFinder-based intron retention quantitation, and supports novel splicing event identification. We introduce a novel visualization for AS using normalized coverage, thereby allowing visualization of differential AS across conditions. SpliceWiz features a shiny-based GUI facilitating interactive data exploration of results including gene ontology enrichment. It is performance optimized with multi-threaded processing of BAM files and a new COV file format for fast recall of sequencing coverage. Overall, SpliceWiz streamlines AS analysis, enabling reliable identification of functionally relevant AS events for further characterization.
Maintained by Alex Chit Hei Wong. Last updated 6 days ago.
softwaretranscriptomicsrnaseqalternativesplicingcoveragedifferentialsplicingdifferentialexpressionguisequencingcppopenmp
0.5 match 16 stars 6.41 score 8 scriptsropensci
epubr:Read EPUB File Metadata and Text
Provides functions supporting the reading and parsing of internal e-book content from EPUB files. The 'epubr' package provides functions supporting the reading and parsing of internal e-book content from EPUB files. E-book metadata and text content are parsed separately and joined together in a tidy, nested tibble data frame. E-book formatting is not completely standardized across all literature. It can be challenging to curate parsed e-book content across an arbitrary collection of e-books perfectly and in completely general form, to yield a singular, consistently formatted output. Many EPUB files do not even contain all the same pieces of information in their respective metadata. EPUB file parsing functionality in this package is intended for relatively general application to arbitrary EPUB e-books. However, poorly formatted e-books or e-books with highly uncommon formatting may not work with this package. There may even be cases where an EPUB file has DRM or some other property that makes it impossible to read with 'epubr'. Text is read 'as is' for the most part. The only nominal changes are minor substitutions, for example curly quotes changed to straight quotes. Substantive changes are expected to be performed subsequently by the user as part of their text analysis. Additional text cleaning can be performed at the user's discretion, such as with functions from packages like 'tm' or 'qdap'.
Maintained by Matthew Leonawicz. Last updated 6 months ago.
epubepub-filesepub-formatpeer-reviewed
0.5 match 24 stars 6.37 score 49 scriptssentometricsresearch
sentometrics:An Integrated Framework for Textual Sentiment Time Series Aggregation and Prediction
Optimized prediction based on textual sentiment, accounting for the intrinsic challenge that sentiment can be computed and pooled across texts and time in various ways. See Ardia et al. (2021) <doi:10.18637/jss.v099.i02>.
Maintained by Samuel Borms. Last updated 4 years ago.
nlppredictionsentiment-analysistext-miningtime-seriesopenblascppopenmp
0.5 match 83 stars 6.09 score 49 scriptscran
dsample:Discretization-Based Direct Random Sample Generation
Discretization-based random sampling algorithm that is useful for a complex model in high dimension is implemented. The normalizing constant of a target distribution is not needed. Posterior summaries are compared with those by 'OpenBUGS'. The method is described: Wang and Lee (2014) <doi:10.1016/j.csda.2013.06.011> and exercised in Lee (2009) <http://hdl.handle.net/1993/21352>.
Maintained by Chel Hee Lee. Last updated 2 years ago.
1.2 match 2.70 scorebioc
knowYourCG:Functional analysis of DNA methylome datasets
KnowYourCG (KYCG) is a supervised learning framework designed for the functional analysis of DNA methylation data. Unlike existing tools that focus on genes or genomic intervals, KnowYourCG directly targets CpG dinucleotides, featuring automated supervised screenings of diverse biological and technical influences, including sequence motifs, transcription factor binding, histone modifications, replication timing, cell-type-specific methylation, and trait-epigenome associations. KnowYourCG addresses the challenges of data sparsity in various methylation datasets, including low-pass Nanopore sequencing, single-cell DNA methylomes, 5-hydroxymethylation profiles, spatial DNA methylation maps, and array-based datasets for epigenome-wide association studies and epigenetic clocks.
Maintained by Goldberg David. Last updated 2 months ago.
epigeneticsdnamethylationsequencingsinglecellspatialmethylationarrayzlib
0.5 match 2 stars 6.10 score 4 scriptsmodeloriented
arenar:Arena for the Exploration and Comparison of any ML Models
Generates data for challenging machine learning models in 'Arena' <https://arena.drwhy.ai> - an interactive web application. You can start the server with XAI (Explainable Artificial Intelligence) plots to be generated on-demand or precalculate and auto-upload data file beside shareable 'Arena' URL.
Maintained by Piotr Piฤ tyszek. Last updated 4 years ago.
axplainable-artificial-intelligenceemaexplainabilityexplanatory-model-analysisimlinteractive-xaiinterpretabilityxai
0.5 match 31 stars 5.94 score 14 scriptsbioc
Dino:Normalization of Single-Cell mRNA Sequencing Data
Dino normalizes single-cell, mRNA sequencing data to correct for technical variation, particularly sequencing depth, prior to downstream analysis. The approach produces a matrix of corrected expression for which the dependency between sequencing depth and the full distribution of normalized expression; many existing methods aim to remove only the dependency between sequencing depth and the mean of the normalized expression. This is particuarly useful in the context of highly sparse datasets such as those produced by 10X genomics and other uninque molecular identifier (UMI) based microfluidics protocols for which the depth-dependent proportion of zeros in the raw expression data can otherwise present a challenge.
Maintained by Jared Brown. Last updated 5 months ago.
softwarenormalizationrnaseqsinglecellsequencinggeneexpressiontranscriptomicsregressioncellbasedassays
0.5 match 9 stars 6.02 score 13 scriptsbioc
dar:Differential Abundance Analysis by Consensus
Differential abundance testing in microbiome data challenges both parametric and non-parametric statistical methods, due to its sparsity, high variability and compositional nature. Microbiome-specific statistical methods often assume classical distribution models or take into account compositional specifics. These produce results that range within the specificity vs sensitivity space in such a way that type I and type II error that are difficult to ascertain in real microbiome data when a single method is used. Recently, a consensus approach based on multiple differential abundance (DA) methods was recently suggested in order to increase robustness. With dar, you can use dplyr-like pipeable sequences of DA methods and then apply different consensus strategies. In this way we can obtain more reliable results in a fast, consistent and reproducible way.
Maintained by Francesc Catala-Moll. Last updated 3 days ago.
softwaresequencingmicrobiomemetagenomicsmultiplecomparisonnormalizationbioconductorbiomarker-discoverydifferential-abundance-analysisfeature-selectionmicrobiologyphyloseq
0.5 match 2 stars 5.98 score 8 scriptsbioc
transcriptR:An Integrative Tool for ChIP- And RNA-Seq Based Primary Transcripts Detection and Quantification
The differences in the RNA types being sequenced have an impact on the resulting sequencing profiles. mRNA-seq data is enriched with reads derived from exons, while GRO-, nucRNA- and chrRNA-seq demonstrate a substantial broader coverage of both exonic and intronic regions. The presence of intronic reads in GRO-seq type of data makes it possible to use it to computationally identify and quantify all de novo continuous regions of transcription distributed across the genome. This type of data, however, is more challenging to interpret and less common practice compared to mRNA-seq. One of the challenges for primary transcript detection concerns the simultaneous transcription of closely spaced genes, which needs to be properly divided into individually transcribed units. The R package transcriptR combines RNA-seq data with ChIP-seq data of histone modifications that mark active Transcription Start Sites (TSSs), such as, H3K4me3 or H3K9/14Ac to overcome this challenge. The advantage of this approach over the use of, for example, gene annotations is that this approach is data driven and therefore able to deal also with novel and case specific events. Furthermore, the integration of ChIP- and RNA-seq data allows the identification all known and novel active transcription start sites within a given sample.
Maintained by Armen R. Karapetyan. Last updated 5 months ago.
immunooncologytranscriptionsoftwaresequencingrnaseqcoverage
0.9 match 3.30 score 2 scriptsimbs-hl
fuseMLR:Fusing Machine Learning in R
Recent technological advances have enable the simultaneous collection of multi-omics data i.e., different types or modalities of molecular data, presenting challenges for integrative prediction modeling due to the heterogeneous, high-dimensional nature and possible missing modalities of some individuals. We introduce this package for late integrative prediction modeling, enabling modality-specific variable selection and prediction modeling, followed by the aggregation of the modality-specific predictions to train a final meta-model. This package facilitates conducting late integration predictive modeling in a systematic, structured, and reproducible way.
Maintained by Cesaire J. K. Fouodo. Last updated 7 days ago.
0.5 match 6 stars 5.80 score 3 scriptsicra
ediblecity:Modeling Urban Agriculture at City Scale
The purpose of this package is to estimate the potential of urban agriculture to contribute to addressing several urban challenges at the city-scale. Within this aim, we selected 8 indicators directly related to one or several urban challenges. Also, a function is provided to compute new scenarios of urban agriculture. Methods are described by Pueyo-Ros, Comas & Corominas (2023) <doi:10.12688/openreseurope.16054.1>.
Maintained by Josep Pueyo-Ros. Last updated 1 years ago.
0.8 match 3.70 score 10 scriptsbioc
DMCHMM:Differentially Methylated CpG using Hidden Markov Model
A pipeline for identifying differentially methylated CpG sites using Hidden Markov Model in bisulfite sequencing data. DNA methylation studies have enabled researchers to understand methylation patterns and their regulatory roles in biological processes and disease. However, only a limited number of statistical approaches have been developed to provide formal quantitative analysis. Specifically, a few available methods do identify differentially methylated CpG (DMC) sites or regions (DMR), but they suffer from limitations that arise mostly due to challenges inherent in bisulfite sequencing data. These challenges include: (1) that read-depths vary considerably among genomic positions and are often low; (2) both methylation and autocorrelation patterns change as regions change; and (3) CpG sites are distributed unevenly. Furthermore, there are several methodological limitations: almost none of these tools is capable of comparing multiple groups and/or working with missing values, and only a few allow continuous or multiple covariates. The last of these is of great interest among researchers, as the goal is often to find which regions of the genome are associated with several exposures and traits. To tackle these issues, we have developed an efficient DMC identification method based on Hidden Markov Models (HMMs) called โDMCHMMโ which is a three-step approach (model selection, prediction, testing) aiming to address the aforementioned drawbacks.
Maintained by Farhad Shokoohi. Last updated 5 months ago.
differentialmethylationsequencinghiddenmarkovmodelcoverage
0.8 match 3.78 score 3 scriptscadam00
prior3D:3D Prioritization Algorithm
Three-dimensional systematic conservation planning, conducting nested prioritization analyses across multiple depth levels and ensuring efficient resource allocation throughout the water column. It provides a structured workflow designed to address biodiversity conservation and management challenges in the 3 dimensions, while facilitating usersโ choices and parameterization (Doxa et al. 2025 <doi:10.1016/j.ecolmodel.2024.110919>).
Maintained by Christos Adam. Last updated 2 months ago.
biodiversityconservationconservation-planningdepthmarine-spatial-planningmultidimensional-environmentsprioritization
0.5 match 6 stars 5.62 score 3 scriptsjohnbaums
hues:Distinct Colour Palettes Based on 'iwanthue'
Creating effective colour palettes for figures is challenging. This package generates and plot palettes of optimally distinct colours in perceptually uniform colour space, based on 'iwanthue' <http://tools.medialab.sciences-po.fr/iwanthue/>. This is done through k-means clustering of CIE Lab colour space, according to user-selected constraints on hue, chroma, and lightness.
Maintained by John Baumgartner. Last updated 5 years ago.
0.5 match 34 stars 5.46 score 170 scriptsjuanv66x
qvirus:Quantum Computing for Analyzing CD4 Lymphocytes and Antiretroviral Therapy
Resources, tutorials, and code snippets dedicated to exploring the intersection of quantum computing and artificial intelligence (AI) in the context of analyzing Cluster of Differentiation 4 (CD4) lymphocytes and optimizing antiretroviral therapy (ART) for human immunodeficiency virus (HIV). With the emergence of quantum artificial intelligence and the development of small-scale quantum computers, there's an unprecedented opportunity to revolutionize the understanding of HIV dynamics and treatment strategies. This project leverages the R package 'qsimulatR' (Ostmeyer and Urbach, 2023, <https://CRAN.R-project.org/package=qsimulatR>), a quantum computer simulator, to explore these applications in quantum computing techniques, addressing the challenges in studying CD4 lymphocytes and enhancing ART efficacy.
Maintained by Juan Pablo Acuรฑa Gonzรกlez. Last updated 13 days ago.
0.5 match 5.43 score 15 scriptsjapilo
colorednoise:Simulate Temporally Autocorrelated Populations
Temporally autocorrelated populations are correlated in their vital rates (growth, death, etc.) from year to year. It is very common for populations, whether they be bacteria, plants, or humans, to be temporally autocorrelated. This poses a challenge for stochastic population modeling, because a temporally correlated population will behave differently from an uncorrelated one. This package provides tools for simulating populations with white noise (no temporal autocorrelation), red noise (positive temporal autocorrelation), and blue noise (negative temporal autocorrelation). The algebraic formulation for autocorrelated noise comes from Ruokolainen et al. (2009) <doi:10.1016/j.tree.2009.04.009>. Models for unstructured populations and for structured populations (matrix models) are available.
Maintained by July Pilowsky. Last updated 11 months ago.
0.5 match 10 stars 5.43 score 18 scriptsigordot
clustermole:Unbiased Single-Cell Transcriptomic Data Cell Type Identification
Assignment of cell type labels to single-cell RNA sequencing (scRNA-seq) clusters is often a time-consuming process that involves manual inspection of the cluster marker genes complemented with a detailed literature search. This is especially challenging when unexpected or poorly described populations are present. The clustermole R package provides methods to query thousands of human and mouse cell identity markers sourced from a variety of databases.
Maintained by Igor Dolgalev. Last updated 1 years ago.
cell-typecell-type-annotationcell-type-classificationcell-type-identificationcell-type-matchinggene-expression-signaturesscrna-seqsingle-cell
0.5 match 13 stars 5.37 score 36 scriptstimothy-barry
ondisc:Algorithms and data structures for large single-cell expression matrices
Single-cell datasets are growing in size, posing challenges as well as opportunities for genomics researchers. `ondisc` is an R package that facilitates analysis of large-scale single-cell data out-of-core on a laptop or distributed across tens to hundreds processors on a cluster or cloud. In both of these settings, `ondisc` requires only a few gigabytes of memory, even if the input data are tens of gigabytes in size. `ondisc` mainly is oriented toward single-cell CRISPR screen analysis, but ondisc also can be used for single-cell differential expression and single-cell co-expression analyses. ondisc is powered by several new, efficient algorithms for manipulating and querying large, sparse expression matrices.
Maintained by Timothy Barry. Last updated 11 months ago.
dataimportsinglecelldifferentialexpressioncrisprzlibcpp
0.5 match 11 stars 5.13 score 62 scriptsropensci
qcoder:Lightweight Qualitative Coding
A free, lightweight, open source option for analyzing text-based qualitative data. Enables analysis of interview transcripts, observation notes, memos, and other sources. Supports the work of social scientists, historians, humanists, and other researchers who use qualitative methods. Addresses the unique challenges faced in analyzing qualitative data analysis. Provides opportunities for researchers who otherwise might not develop software to build software development skills.
Maintained by Elin Waring. Last updated 3 years ago.
0.5 match 134 stars 5.05 score 13 scriptsbioc
spiky:Spike-in calibration for cell-free MeDIP
spiky implements methods and model generation for cfMeDIP (cell-free methylated DNA immunoprecipitation) with spike-in controls. CfMeDIP is an enrichment protocol which avoids destructive conversion of scarce template, making it ideal as a "liquid biopsy," but creating certain challenges in comparing results across specimens, subjects, and experiments. The use of synthetic spike-in standard oligos allows diagnostics performed with cfMeDIP to quantitatively compare samples across subjects, experiments, and time points in both relative and absolute terms.
Maintained by Tim Triche. Last updated 5 months ago.
differentialmethylationdnamethylationnormalizationpreprocessingqualitycontrolsequencing
0.5 match 2 stars 4.90 score 3 scriptsbioc
immunotation:Tools for working with diverse immune genes
MHC (major histocompatibility complex) molecules are cell surface complexes that present antigens to T cells. The repertoire of antigens presented in a given genetic background largely depends on the sequence of the encoded MHC molecules, and thus, in humans, on the highly variable HLA (human leukocyte antigen) genes of the hyperpolymorphic HLA locus. More than 28,000 different HLA alleles have been reported, with significant differences in allele frequencies between human populations worldwide. Reproducible and consistent annotation of HLA alleles in large-scale bioinformatics workflows remains challenging, because the available reference databases and software tools often use different HLA naming schemes. The package immunotation provides tools for consistent annotation of HLA genes in typical immunoinformatics workflows such as for example the prediction of MHC-presented peptides in different human donors. Converter functions that provide mappings between different HLA naming schemes are based on the MHC restriction ontology (MRO). The package also provides automated access to HLA alleles frequencies in worldwide human reference populations stored in the Allele Frequency Net Database.
Maintained by Katharina Imkeller. Last updated 5 months ago.
softwareimmunooncologybiomedicalinformaticsgeneticsannotation
0.5 match 8 stars 4.90 score 3 scriptsyannabraham
hilbertSimilarity:Hilbert Similarity Index for High Dimensional Data
Quantifying similarity between high-dimensional single cell samples is challenging, and usually requires some simplifying hypothesis to be made. By transforming the high dimensional space into a high dimensional grid, the number of cells in each sub-space of the grid is characteristic of a given sample. Using a Hilbert curve each sample can be visualized as a simple density plot, and the distance between samples can be calculated from the distribution of cells using the Jensen-Shannon distance. Bins that correspond to significant differences between samples can identified using a simple bootstrap procedure.
Maintained by Yann Abraham. Last updated 5 years ago.
0.5 match 5 stars 4.74 score 11 scriptsjosesamos
clc:CORINE Land Cover Data and Styles
Streamline the management, analysis, and visualization of CORINE Land Cover data. Addresses challenges associated with its classification system and related styles, such as color mappings and descriptive labels.
Maintained by Jose Samos. Last updated 3 months ago.
0.5 match 4.52 score 11 scripts 1 dependentsegeulgen
PANACEA:Personalized Network-Based Anti-Cancer Therapy Evaluation
Identification of the most appropriate pharmacotherapy for each patient based on genomic alterations is a major challenge in personalized oncology. 'PANACEA' is a collection of personalized anti-cancer drug prioritization approaches utilizing network methods. The methods utilize personalized "driverness" scores from 'driveR' to rank drugs, mapping these onto a protein-protein interaction network. The "distance-based" method scores each drug based on these scores and distances between drugs and genes to rank given drugs. The "RWR" method propagates these scores via a random-walk with restart framework to rank the drugs. The methods are described in detail in Ulgen E, Ozisik O, Sezerman OU. 2023. PANACEA: network-based methods for pharmacotherapy prioritization in personalized oncology. Bioinformatics <doi:10.1093/bioinformatics/btad022>.
Maintained by Ege Ulgen. Last updated 2 years ago.
drugnetwork-analysisoncologypersonalized-medicine
0.5 match 10 stars 4.70 score 3 scriptsbioc
zitools:Analysis of zero-inflated count data
zitools allows for zero inflated count data analysis by either using down-weighting of excess zeros or by replacing an appropriate proportion of excess zeros with NA. Through overloading frequently used statistical functions (such as mean, median, standard deviation), plotting functions (such as boxplots or heatmap) or differential abundance tests, it allows a wide range of downstream analyses for zero-inflated data in a less biased manner. This becomes applicable in the context of microbiome analyses, where the data is often overdispersed and zero-inflated, therefore making data analysis extremly challenging.
Maintained by Carlotta Meyring. Last updated 5 months ago.
softwarestatisticalmethodmicrobiome
0.5 match 4.60 score 6 scriptsbioc
NoRCE:NoRCE: Noncoding RNA Sets Cis Annotation and Enrichment
While some non-coding RNAs (ncRNAs) are assigned critical regulatory roles, most remain functionally uncharacterized. This presents a challenge whenever an interesting set of ncRNAs needs to be analyzed in a functional context. Transcripts located close-by on the genome are often regulated together. This genomic proximity on the sequence can hint to a functional association. We present a tool, NoRCE, that performs cis enrichment analysis for a given set of ncRNAs. Enrichment is carried out using the functional annotations of the coding genes located proximal to the input ncRNAs. Other biologically relevant information such as topologically associating domain (TAD) boundaries, co-expression patterns, and miRNA target prediction information can be incorporated to conduct a richer enrichment analysis. To this end, NoRCE includes several relevant datasets as part of its data repository, including cell-line specific TAD boundaries, functional gene sets, and expression data for coding & ncRNAs specific to cancer. Additionally, the users can utilize custom data files in their investigation. Enrichment results can be retrieved in a tabular format or visualized in several different ways. NoRCE is currently available for the following species: human, mouse, rat, zebrafish, fruit fly, worm, and yeast.
Maintained by Gulden Olgun. Last updated 5 months ago.
biologicalquestiondifferentialexpressiongenomeannotationgenesetenrichmentgenetargetgenomeassemblygo
0.5 match 1 stars 4.60 score 6 scriptskoenniem
mpathsenser:Process and Analyse Data from m-Path Sense
Overcomes one of the major challenges in mobile (passive) sensing, namely being able to pre-process the raw data that comes from a mobile sensing app, specifically 'm-Path Sense' <https://m-path.io>. The main task of 'mpathsenser' is therefore to read 'm-Path Sense' JSON files into a database and provide several convenience functions to aid in data processing.
Maintained by Koen Niemeijer. Last updated 22 days ago.
0.5 match 1 stars 4.48 score 6 scriptsbioc
HiCBricks:Framework for Storing and Accessing Hi-C Data Through HDF Files
HiCBricks is a library designed for handling large high-resolution Hi-C datasets. Over the years, the Hi-C field has experienced a rapid increase in the size and complexity of datasets. HiCBricks is meant to overcome the challenges related to the analysis of such large datasets within the R environment. HiCBricks offers user-friendly and efficient solutions for handling large high-resolution Hi-C datasets. The package provides an R/Bioconductor framework with the bricks to build more complex data analysis pipelines and algorithms. HiCBricks already incorporates example algorithms for calling domain boundaries and functions for high quality data visualization.
Maintained by Koustav Pal. Last updated 5 months ago.
dataimportinfrastructuresoftwaretechnologysequencinghic
0.5 match 4.48 score 9 scripts 1 dependentspaul-haimerl
PAGFL:Joint Estimation of Latent Groups and Group-Specific Coefficients in Panel Data Models
Latent group structures are a common challenge in panel data analysis. Disregarding group-level heterogeneity can introduce bias. Conversely, estimating individual coefficients for each cross-sectional unit is inefficient and may lead to high uncertainty. This package addresses the issue of unobservable group structures by implementing the pairwise adaptive group fused Lasso (PAGFL) by Mehrabani (2023) <doi:10.1016/j.jeconom.2022.12.002>. PAGFL identifies latent group structures and group-specific coefficients in a single step. On top of that, we extend the PAGFL to time-varying coefficient functions.
Maintained by Paul Haimerl. Last updated 21 days ago.
classificationpanel-data-modeltime-varying-coefficientsopenblascppopenmp
0.5 match 2 stars 4.43 score 3 scriptsbioc
PRONE:The PROteomics Normalization Evaluator
High-throughput omics data are often affected by systematic biases introduced throughout all the steps of a clinical study, from sample collection to quantification. Normalization methods aim to adjust for these biases to make the actual biological signal more prominent. However, selecting an appropriate normalization method is challenging due to the wide range of available approaches. Therefore, a comparative evaluation of unnormalized and normalized data is essential in identifying an appropriate normalization strategy for a specific data set. This R package provides different functions for preprocessing, normalizing, and evaluating different normalization approaches. Furthermore, normalization methods can be evaluated on downstream steps, such as differential expression analysis and statistical enrichment analysis. Spike-in data sets with known ground truth and real-world data sets of biological experiments acquired by either tandem mass tag (TMT) or label-free quantification (LFQ) can be analyzed.
Maintained by Lis Arend. Last updated 19 days ago.
proteomicspreprocessingnormalizationdifferentialexpressionvisualizationdata-analysisevaluation
0.5 match 2 stars 4.38 score 9 scriptsbioc
Uniquorn:Identification of cancer cell lines based on their weighted mutational/ variational fingerprint
'Uniquorn' enables users to identify cancer cell lines. Cancer cell line misidentification and cross-contamination reprents a significant challenge for cancer researchers. The identification is vital and in the frame of this package based on the locations/ loci of somatic and germline mutations/ variations. The input format is vcf/ vcf.gz and the files have to contain a single cancer cell line sample (i.e. a single member/genotype/gt column in the vcf file).
Maintained by Raik Otto. Last updated 5 months ago.
immunooncologystatisticalmethodwholegenomeexomeseq
0.5 match 4.30 scorebioc
gmoviz:Seamless visualization of complex genomic variations in GMOs and edited cell lines
Genetically modified organisms (GMOs) and cell lines are widely used models in all kinds of biological research. As part of characterising these models, DNA sequencing technology and bioinformatics analyses are used systematically to study their genomes. Therefore, large volumes of data are generated and various algorithms are applied to analyse this data, which introduces a challenge on representing all findings in an informative and concise manner. `gmoviz` provides users with an easy way to visualise and facilitate the explanation of complex genomic editing events on a larger, biologically-relevant scale.
Maintained by Kathleen Zeglinski. Last updated 5 months ago.
visualizationsequencinggeneticvariabilitygenomicvariationcoverage
0.5 match 4.30 score 9 scriptsdrkowal
SeBR:Semiparametric Bayesian Regression Analysis
Monte Carlo sampling algorithms for semiparametric Bayesian regression analysis. These models feature a nonparametric (unknown) transformation of the data paired with widely-used regression models including linear regression, spline regression, quantile regression, and Gaussian processes. The transformation enables broader applicability of these key models, including for real-valued, positive, and compactly-supported data with challenging distributional features. The samplers prioritize computational scalability and, for most cases, Monte Carlo (not MCMC) sampling for greater efficiency. Details of the methods and algorithms are provided in Kowal and Wu (2024) <doi:10.1080/01621459.2024.2395586>.
Maintained by Dan Kowal. Last updated 8 days ago.
0.5 match 1 stars 4.30 score 3 scriptsbioc
OSAT:OSAT: Optimal Sample Assignment Tool
A sizable genomics study such as microarray often involves the use of multiple batches (groups) of experiment due to practical complication. To minimize batch effects, a careful experiment design should ensure the even distribution of biological groups and confounding factors across batches. OSAT (Optimal Sample Assignment Tool) is developed to facilitate the allocation of collected samples to different batches. With minimum steps, it produces setup that optimizes the even distribution of samples in groups of biological interest into different batches, reducing the confounding or correlation between batches and the biological variables of interest. It can also optimize the even distribution of confounding factors across batches. Our tool can handle challenging instances where incomplete and unbalanced sample collections are involved as well as ideal balanced RCBD. OSAT provides a number of predefined layout for some of the most commonly used genomics platform. Related paper can be find at http://www.biomedcentral.com/1471-2164/13/689 .
Maintained by Li Yan. Last updated 5 months ago.
datarepresentationvisualizationexperimentaldesignqualitycontrol
0.5 match 4.30 score 3 scriptsbioc
GIGSEA:Genotype Imputed Gene Set Enrichment Analysis
We presented the Genotype-imputed Gene Set Enrichment Analysis (GIGSEA), a novel method that uses GWAS-and-eQTL-imputed trait-associated differential gene expression to interrogate gene set enrichment for the trait-associated SNPs. By incorporating eQTL from large gene expression studies, e.g. GTEx, GIGSEA appropriately addresses such challenges for SNP enrichment as gene size, gene boundary, SNP distal regulation, and multiple-marker regulation. The weighted linear regression model, taking as weights both imputation accuracy and model completeness, was used to perform the enrichment test, properly adjusting the bias due to redundancy in different gene sets. The permutation test, furthermore, is used to evaluate the significance of enrichment, whose efficiency can be largely elevated by expressing the computational intensive part in terms of large matrix operation. We have shown the appropriate type I error rates for GIGSEA (<5%), and the preliminary results also demonstrate its good performance to uncover the real signal.
Maintained by Shijia Zhu. Last updated 5 months ago.
genesetenrichmentsnpvariantannotationgeneexpressiongeneregulationregressiondifferentialexpression
0.5 match 4.30 score 2 scriptssubroy13
rsvddpd:Robust Singular Value Decomposition using Density Power Divergence
Computing singular value decomposition with robustness is a challenging task. This package provides an implementation of computing robust SVD using density power divergence (<arXiv:2109.10680>). It combines the idea of robustness and efficiency in estimation based on a tuning parameter. It also provides utility functions to simulate various scenarios to compare performances of different algorithms.
Maintained by Subhrajyoty Roy. Last updated 2 years ago.
0.5 match 3 stars 4.18 score 6 scriptsatbounds
ATbounds:Bounding Treatment Effects by Limited Information Pooling
Estimation and inference methods for bounding average treatment effects (on the treated) that are valid under an unconfoundedness assumption. The bounds are designed to be robust in challenging situations, for example, when the conditioning variables take on a large number of different values in the observed sample, or when the overlap condition is violated. This robustness is achieved by only using limited "pooling" of information across observations. For more details, see the paper by Lee and Weidner (2021), "Bounding Treatment Effects by Pooling Limited Information across Observations," <arXiv:2111.05243>.
Maintained by Sokbae Lee. Last updated 3 years ago.
causal-inferencelack-of-overlaplimited-overlappartial-identificationtreatment-effectsunconfoundedness-assumption
0.5 match 3 stars 4.18 score 6 scriptsthomaswiemann
civ:Categorical Instrumental Variables
Implementation of the categorical instrumental variable (CIV) estimator proposed by Wiemann (2023) <arXiv:2311.17021>. CIV allows for optimal instrumental variable estimation in settings with relatively few observations per category. To obtain valid inference in these challenging settings, CIV leverages a regularization assumption that implies existence of a latent categorical variable with fixed finite support achieving the same first stage fit as the observed instrument.
Maintained by Thomas Wiemann. Last updated 1 years ago.
0.5 match 2 stars 4.00 score 5 scriptsmroman-ibs
FuzzySimRes:Simulation and Resampling Methods for Epistemic Fuzzy Data
Random simulations of fuzzy numbers are still a challenging problem. The aim of this package is to provide the respective procedures to simulate fuzzy random variables, especially in the case of the piecewise linear fuzzy numbers (PLFNs, see Coroianua et al. (2013) <doi:10.1016/j.fss.2013.02.005> for the further details). Additionally, the special resampling algorithms known as the epistemic bootstrap are provided (see Grzegorzewski and Romaniuk (2022) <doi:10.34768/amcs-2022-0021>, Grzegorzewski and Romaniuk (2022) <doi:10.1007/978-3-031-08974-9_39>) together with the functions to apply statistical tests and estimate various characteristics based on the epistemic bootstrap. The package also includes a real-life data set of epistemic fuzzy triangular numbers. The fuzzy numbers used in this package are consistent with the 'FuzzyNumbers' package.
Maintained by Maciej Romaniuk. Last updated 7 months ago.
0.5 match 4.02 score 35 scripts 1 dependentsbioc
MethPed:A DNA methylation classifier tool for the identification of pediatric brain tumor subtypes
Classification of pediatric tumors into biologically defined subtypes is challenging and multifaceted approaches are needed. For this aim, we developed a diagnostic classifier based on DNA methylation profiles. We offer MethPed as an easy-to-use toolbox that allows researchers and clinical diagnosticians to test single samples as well as large cohorts for subclass prediction of pediatric brain tumors. The current version of MethPed can classify the following tumor diagnoses/subgroups: Diffuse Intrinsic Pontine Glioma (DIPG), Ependymoma, Embryonal tumors with multilayered rosettes (ETMR), Glioblastoma (GBM), Medulloblastoma (MB) - Group 3 (MB_Gr3), Group 4 (MB_Gr3), Group WNT (MB_WNT), Group SHH (MB_SHH) and Pilocytic Astrocytoma (PiloAstro).
Maintained by Helena Carรฉn. Last updated 5 months ago.
immunooncologydnamethylationclassificationepigenetics
0.5 match 4.00 score 1 scriptsbioc
POWSC:Simulation, power evaluation, and sample size recommendation for single cell RNA-seq
Determining the sample size for adequate power to detect statistical significance is a crucial step at the design stage for high-throughput experiments. Even though a number of methods and tools are available for sample size calculation for microarray and RNA-seq in the context of differential expression (DE), this topic in the field of single-cell RNA sequencing is understudied. Moreover, the unique data characteristics present in scRNA-seq such as sparsity and heterogeneity increase the challenge. We propose POWSC, a simulation-based method, to provide power evaluation and sample size recommendation for single-cell RNA sequencing DE analysis. POWSC consists of a data simulator that creates realistic expression data, and a power assessor that provides a comprehensive evaluation and visualization of the power and sample size relationship.
Maintained by Kenong Su. Last updated 5 months ago.
differentialexpressionimmunooncologysinglecellsoftware
0.5 match 4.00 score 7 scriptsbioc
GSEAmining:Make Biological Sense of Gene Set Enrichment Analysis Outputs
Gene Set Enrichment Analysis is a very powerful and interesting computational method that allows an easy correlation between differential expressed genes and biological processes. Unfortunately, although it was designed to help researchers to interpret gene expression data it can generate huge amounts of results whose biological meaning can be difficult to interpret. Many available tools rely on the hierarchically structured Gene Ontology (GO) classification to reduce reundandcy in the results. However, due to the popularity of GSEA many more gene set collections, such as those in the Molecular Signatures Database are emerging. Since these collections are not organized as those in GO, their usage for GSEA do not always give a straightforward answer or, in other words, getting all the meaninful information can be challenging with the currently available tools. For these reasons, GSEAmining was born to be an easy tool to create reproducible reports to help researchers make biological sense of GSEA outputs. Given the results of GSEA, GSEAmining clusters the different gene sets collections based on the presence of the same genes in the leadind edge (core) subset. Leading edge subsets are those genes that contribute most to the enrichment score of each collection of genes or gene sets. For this reason, gene sets that participate in similar biological processes should share genes in common and in turn cluster together. After that, GSEAmining is able to identify and represent for each cluster: - The most enriched terms in the names of gene sets (as wordclouds) - The most enriched genes in the leading edge subsets (as bar plots). In each case, positive and negative enrichments are shown in different colors so it is easy to distinguish biological processes or genes that may be of interest in that particular study.
Maintained by Oriol Arquรฉs. Last updated 5 months ago.
genesetenrichmentclusteringvisualization
0.5 match 4.00 score 7 scriptsdgkf
parttime:Partial Datetime Handling
Datetimes and timestamps are invariably an imprecise notation, with any partial representation implying some amount of uncertainty. To handle this, 'parttime' provides classes for embedding partial missingness as a central part of its datetime classes. This central feature allows for more ergonomic use of datetimes for challenging datetime computation, including calculations of overlapping date ranges, imputations, and more thoughtful handling of ambiguity that arises from uncertain time zones. This package was developed first and foremost with pharmaceutical applications in mind, but aims to be agnostic to application to accommodate general use cases just as conveniently.
Maintained by Doug Kelkhoff. Last updated 1 years ago.
0.5 match 17 stars 3.93 score 3 scriptscelevitz
touRnamentofchampions:Tournament of Champions Data
Several datasets which describe the challenges and results of competitions in Tournament of Champions. This data is useful for practicing data wrangling, graphing, and analyzing how each season of Tournament of Champions played out.
Maintained by Levitz Carly. Last updated 9 days ago.
0.5 match 3.70 scoreimalagaris
RCTRecruit:Non-Parametric Recruitment Prediction for Randomized Clinical Trials
Accurate prediction of subject recruitment for Randomized Clinical Trials (RCT) remains an ongoing challenge. Many previous prediction models rely on parametric assumptions. We present functions for non-parametric RCT recruitment prediction under several scenarios.
Maintained by Ioannis Malagaris. Last updated 2 months ago.
0.5 match 1 stars 3.65 score 3 scriptsmncube
idmact:Interpreting Differences Between Mean ACT Scores
Interpreting the differences between mean scale scores across various forms of an assessment can be challenging. This difficulty arises from different mappings between raw scores and scale scores, complex mathematical relationships, adjustments based on judgmental procedures, and diverse equating functions applied to different assessment forms. An alternative method involves running simulations to explore the effect of incrementing raw scores on mean scale scores. The 'idmact' package provides an implementation of this approach based on the algorithm detailed in Schiel (1998) <https://www.act.org/content/dam/act/unsecured/documents/ACT_RR98-01.pdf> which was developed to help interpret differences between mean scale scores on the American College Testing (ACT) assessment. The function idmact_subj() within the package offers a framework for running simulations on subject-level scores. In contrast, the idmact_comp() function provides a framework for conducting simulations on composite scores.
Maintained by Mackson Ncube. Last updated 2 years ago.
assessmentmeasurementpsychometricsscale
0.5 match 3.70 score 4 scriptsbioc
SCAN.UPC:Single-channel array normalization (SCAN) and Universal exPression Codes (UPC)
SCAN is a microarray normalization method to facilitate personalized-medicine workflows. Rather than processing microarray samples as groups, which can introduce biases and present logistical challenges, SCAN normalizes each sample individually by modeling and removing probe- and array-specific background noise using only data from within each array. SCAN can be applied to one-channel (e.g., Affymetrix) or two-channel (e.g., Agilent) microarrays. The Universal exPression Codes (UPC) method is an extension of SCAN that estimates whether a given gene/transcript is active above background levels in a given sample. The UPC method can be applied to one-channel or two-channel microarrays as well as to RNA-Seq read counts. Because UPC values are represented on the same scale and have an identical interpretation for each platform, they can be used for cross-platform data integration.
Maintained by Stephen R. Piccolo. Last updated 5 months ago.
immunooncologysoftwaremicroarraypreprocessingrnaseqtwochannelonechannel
0.5 match 3.48 score 15 scriptsgshs-ornl
revengc:Reverse Engineering Summarized Data
Decoupled (e.g. separate averages) and censored (e.g. > 100 species) variables are continually reported by many well-established organizations (e.g. World Health Organization (WHO), Centers for Disease Control and Prevention (CDC), World Bank, and various national censuses). The challenge therefore is to infer what the original data could have been given summarized information. We present an R package that reverse engineers decoupled and/or censored count data with two main functions. The cnbinom.pars() function estimates the average and dispersion parameter of a censored univariate frequency table. The rec() function reverse engineers summarized data into an uncensored bivariate table of probabilities.
Maintained by Samantha Duchscherer. Last updated 6 years ago.
0.5 match 5 stars 3.44 score 11 scriptsegpivo
QuantRegGLasso:Adaptively Weighted Group Lasso for Semiparametric Quantile Regression Models
Implements an adaptively weighted group Lasso procedure for simultaneous variable selection and structure identification in varying coefficient quantile regression models and additive quantile regression models with ultra-high dimensional covariates. The methodology, grounded in a strong sparsity condition, establishes selection consistency under certain weight conditions. To address the challenge of tuning parameter selection in practice, a BIC-type criterion named high-dimensional information criterion (HDIC) is proposed. The Lasso procedure, guided by HDIC-determined tuning parameters, maintains selection consistency. Theoretical findings are strongly supported by simulation studies. (Toshio Honda, Ching-Kang Ing, Wei-Ying Wu, 2019, <DOI:10.3150/18-BEJ1091>).
Maintained by Wen-Ting Wang. Last updated 5 months ago.
admmgroup-lassohigh-dimensionalquantile-regressionrcpprcpparmadilloopenblascpp
0.5 match 2 stars 3.30 score 2 scriptspridiltal
clap:Detecting Class Overlapping Regions in Multidimensional Data
The issue of overlapping regions in multidimensional data arises when different classes or clusters share similar feature representations, making it challenging to delineate distinct boundaries between them accurately. This package provides methods for detecting and visualizing these overlapping regions using partitional clustering techniques based on nearest neighbor distances.
Maintained by Priyanga Dilini Talagala. Last updated 9 months ago.
0.5 match 1 stars 3.18 score 2 scriptsjrhub
spinBayes:Semi-Parametric Gene-Environment Interaction via Bayesian Variable Selection
Many complex diseases are known to be affected by the interactions between genetic variants and environmental exposures beyond the main genetic and environmental effects. Existing Bayesian methods for gene-environment (GรE) interaction studies are challenged by the high-dimensional nature of the study and the complexity of environmental influences. We have developed a novel and powerful semi-parametric Bayesian variable selection method that can accommodate linear and nonlinear GรE interactions simultaneously (Ren et al. (2020) <doi:10.1002/sim.8434>). Furthermore, the proposed method can conduct structural identification by distinguishing nonlinear interactions from main effects only case within Bayesian framework. Spike-and-slab priors are incorporated on both individual and group level to shrink coefficients corresponding to irrelevant main and interaction effects to zero exactly. The Markov chain Monte Carlo algorithms of the proposed and alternative methods are efficiently implemented in C++.
Maintained by Jie Ren. Last updated 1 months ago.
bayesian-variable-selectiongene-environment-interactionshigh-dimensional-datasemi-parametric-modelingopenblascppopenmp
0.5 match 1 stars 3.18 score 3 scriptspwkraft
discursive:Measuring Discursive Sophistication in Open-Ended Survey Responses
A simple approach to measure political sophistication based on open-ended survey responses. Discursive sophistication captures the complexity of individual attitude expression by quantifying its relative size, range, and constraint. For more information on the measurement approach see: Kraft, Patrick W. 2023. "Women Also Know Stuff: Challenging the Gender Gap in Political Sophistication." American Political Science Review (forthcoming).
Maintained by Patrick Kraft. Last updated 2 years ago.
0.5 match 2 stars 3.00 score 5 scriptsjienagu
dataMojo:Reshape Data Table
A grammar of data manipulation with 'data.table', providing a consistent a series of utility functions that help you solve the most common data manipulation challenges.
Maintained by Jiena McLellan. Last updated 2 years ago.
0.5 match 2.88 score 15 scriptslcbc-uio
nettskjema.tsd:Decrypt and Organise Nettskjema Data Within TSD
Working with Nettskjema (<https://nettskjema.no/>) data inside TSD can be challenging. This package aims to aid users in managing their incoming Nettskjema data by decrypting the data and storing the nettskjema data and meta-data in convenient and standardised ways. This package functionality currently only works on the Linux VMs of TSD, and for version 1 of the nettskjema data delivery to TSD.
Maintained by Athanasia Mo Mowinckel. Last updated 3 years ago.
0.5 match 2.70 score 3 scriptssigbertklinke
exams.forge:Support for Compiling Examination Tasks using the 'exams' Package
The main aim is to further facilitate the creation of exercises based on the package 'exams' by Grรผn, B., and Zeileis, A. (2009) <doi:10.18637/jss.v029.i10>. Creating effective student exercises involves challenges such as creating appropriate data sets and ensuring access to intermediate values for accurate explanation of solutions. The functionality includes the generation of univariate and bivariate data including simple time series, functions for theoretical distributions and their approximation, statistical and mathematical calculations for tasks in basic statistics courses as well as general tasks such as string manipulation, LaTeX/HTML formatting and the editing of XML task files for 'Moodle'.
Maintained by Sigbert Klinke. Last updated 8 months ago.
0.5 match 2.70 score 1 scriptsdavezes
mactivate:Multiplicative Activation
Provides methods and classes for adding m-activation ("multiplicative activation") layers to MLR or multivariate logistic regression models. M-activation layers created in this library detect and add input interaction (polynomial) effects into a predictive model. M-activation can detect high-order interactions -- a traditionally non-trivial challenge. Details concerning application, methodology, and relevant survey literature can be found in this library's vignette, "About."
Maintained by Dave Zes. Last updated 4 years ago.
0.5 match 2.68 score 12 scriptsrajkumpismb
PCAPAM50:Enhanced 'PAM50' Subtyping of Breast Cancer
Accurate classification of breast cancer tumors based on gene expression data is not a trivial task, and it lacks standard practices.The 'PAM50' classifier, which uses 50 gene centroid correlation distances to classify tumors, faces challenges with balancing estrogen receptor (ER) status and gene centering. The 'PCAPAM50' package leverages principal component analysis and iterative 'PAM50' calls to create a gene expression-based ER-balanced subset for gene centering, avoiding the use of protein expression-based ER data resulting into an enhanced Breast Cancer subtyping.
Maintained by Praveen-Kumar Raj-Kumar. Last updated 2 months ago.
0.5 match 2.48 score 3 scriptsimarkonis
csa:A Cross-Scale Analysis Tool for Model-Observation Visualization and Integration
Integration of Earth system data from various sources is a challenging task. Except for their qualitative heterogeneity, different data records exist for describing similar Earth system process at different spatio-temporal scales. Data inter-comparison and validation are usually performed at a single spatial or temporal scale, which could hamper the identification of potential discrepancies in other scales. 'csa' package offers a simple, yet efficient, graphical method for synthesizing and comparing observed and modelled data across a range of spatio-temporal scales. Instead of focusing at specific scales, such as annual means or original grid resolution, we examine how their statistical properties change across spatio-temporal continuum.
Maintained by Yannis Markonis. Last updated 5 years ago.
0.5 match 1 stars 2.48 score 3 scriptscran
Numero:Statistical Framework to Define Subgroups in Complex Datasets
High-dimensional datasets that do not exhibit a clear intrinsic clustered structure pose a challenge to conventional clustering algorithms. For this reason, we developed an unsupervised framework that helps scientists to better subgroup their datasets based on visual cues, please see Gao S, Mutter S, Casey A, Makinen V-P (2019) Numero: a statistical framework to define multivariable subgroups in complex population-based datasets, Int J Epidemiology, 48:369-37, <doi:10.1093/ije/dyy113>. The framework includes the necessary functions to construct a self-organizing map of the data, to evaluate the statistical significance of the observed data patterns, and to visualize the results.
Maintained by Ville-Petteri Makinen. Last updated 6 months ago.
0.5 match 2.30 scoreandriyprotsak5
UAHDataScienceUC:Learn Clustering Techniques Through Examples and Code
A comprehensive educational package combining clustering algorithms with detailed step-by-step explanations. Provides implementations of both traditional (hierarchical, k-means) and modern (Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Gaussian Mixture Models (GMM), genetic k-means) clustering methods as described in Ezugwu et. al., (2022) <doi:10.1016/j.engappai.2022.104743>. Includes educational datasets highlighting different clustering challenges, based on 'scikit-learn' examples (Pedregosa et al., 2011) <https://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html>. Features detailed algorithm explanations, visualizations, and weighted distance calculations for enhanced learning.
Maintained by Andriy Protsak Protsak. Last updated 29 days ago.
0.5 match 2.30 scorekatilingban
ennet:Utilities to Extract and Analyse Text Data from the Emergency Nutrition Network Forum
The Emergency Nutrition Network or en-net forum is the go to online forum for field practitioners requiring prompt technical advice for operational challenges for which answers are not readily accessible in current guidelines. The questions and the corresponding answers raised within en-net can provide insight into what the key topics of discussion are within the nutrition sector. This package provides utility functions for the extraction, processing and analysis of text data from the online forum.
Maintained by Ernest Guevarra. Last updated 2 years ago.
0.5 match 2 stars 2.08 score 12 scriptscran
SOFIA:Making Sophisticated and Aesthetical Figures in R
Software that leverages the capabilities of Circos by manipulating data, preparing configuration files, and running the Perl-native Circos directly from the R environment with minimal user intervention. Circos is a novel software that addresses the challenges in visualizing genetic data by creating circular ideograms composed of tracks of heatmaps, scatter plots, line plots, histograms, links between common markers, glyphs, text, and etc. Please see <http://www.circos.ca>.
Maintained by Luis Diaz-Garcia. Last updated 8 years ago.
0.5 match 2.00 scoresunnypig1988
BCSub:A Bayesian Semiparametric Factor Analysis Model for Subtype Identification (Clustering)
Gene expression profiles are commonly utilized to infer disease subtypes and many clustering methods can be adopted for this task. However, existing clustering methods may not perform well when genes are highly correlated and many uninformative genes are included for clustering. To deal with these challenges, we develop a novel clustering method in the Bayesian setting. This method, called BCSub, adopts an innovative semiparametric Bayesian factor analysis model to reduce the dimension of the data to a few factor scores for clustering. Specifically, the factor scores are assumed to follow the Dirichlet process mixture model in order to induce clustering.
Maintained by Jiehuan Sun. Last updated 8 years ago.
0.5 match 2.00 score 2 scriptscran
beanz:Bayesian Analysis of Heterogeneous Treatment Effect
It is vital to assess the heterogeneity of treatment effects (HTE) when making health care decisions for an individual patient or a group of patients. Nevertheless, it remains challenging to evaluate HTE based on information collected from clinical studies that are often designed and conducted to evaluate the efficacy of a treatment for the overall population. The Bayesian framework offers a principled and flexible approach to estimate and compare treatment effects across subgroups of patients defined by their characteristics. This package allows users to explore a wide range of Bayesian HTE analysis models, and produce posterior inferences about HTE. See Wang et al. (2018) <DOI:10.18637/jss.v085.i07> for further details.
Maintained by Chenguang Wang. Last updated 2 years ago.
0.5 match 1 stars 2.00 score 7 scriptscran
UAHDataScienceUC:Learn Clustering Techniques Through Examples and Code
A comprehensive educational package combining clustering algorithms with detailed step-by-step explanations. Provides implementations of both traditional (hierarchical, k-means) and modern (Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Gaussian Mixture Models (GMM), genetic k-means) clustering methods as described in Ezugwu et. al., (2022) <doi:10.1016/j.engappai.2022.104743>. Includes educational datasets highlighting different clustering challenges, based on 'scikit-learn' examples (Pedregosa et al., 2011) <https://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html>. Features detailed algorithm explanations, visualizations, and weighted distance calculations for enhanced learning.
Maintained by Andriy Protsak Protsak. Last updated 29 days ago.
0.5 match 2.00 score