R-universe search: less

kjhealy

gssrdoc:Document General Social Survey Variable

The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the a tibble with information on the survey variables, together with every variable documented as an R help page. For more information on the GSS see \url{http://gss.norc.org}.

Maintained by Kieran Healy. Last updated 12 months ago.

75.5 match 2.28 score 38 scripts

eagerai

fastai:Interface to 'fastai'

The 'fastai' <https://docs.fast.ai/index.html> library simplifies training fast and accurate neural networks using modern best practices. It is based on research in to deep learning best practices undertaken at 'fast.ai', including 'out of the box' support for vision, text, tabular, audio, time series, and collaborative filtering models.

Maintained by Turgut Abdullayev. Last updated 12 months ago.

audio collaborative-filtering darknet darknet-image-classification fastai medical object-detection tabular text vision

11.1 match 118 stars 9.40 score 76 scripts

selkamand

assertions:Simple Assertions for Beautiful and Customisable Error Messages

Provides simple assertions with sensible defaults and customisable error messages. It offers convenient assertion call wrappers and a general assert function that can handle any condition. Default error messages are user friendly and easily customized with inline code evaluation and styling powered by the 'cli' package.

Maintained by Sam El-Kamand. Last updated 4 months ago.

14.3 match 3 stars 6.84 score 172 scripts 3 dependents

bozercavdar

less:Learning with Subset Stacking

"Learning with Subset Stacking" is a supervised learning algorithm that is based on training many local estimators on subsets of a given dataset, and then passing their predictions to a global estimator. You can find the details about LESS in our manuscript at <arXiv:2112.06251>.

Maintained by Burhan Ozer Cavdar. Last updated 3 years ago.

55.5 match 1.70 score 5 scripts

henrikbengtsson

R.utils:Various Programming Utilities

Utility functions useful when programming and developing R packages.

Maintained by Henrik Bengtsson. Last updated 1 years ago.

5.6 match 63 stars 13.74 score 5.7k scripts 814 dependents

ciirc-kso

rless:Leaner Style Sheets

Converts LESS to CSS. It uses V8 engine, where LESS parser is run. Functions for LESS text, file or folder conversion are provided. This work was supported by a junior grant research project by Czech Science Foundation 'GACR' no. 'GJ18-04150Y'.

Maintained by Jonas Vaclavek. Last updated 6 years ago.

css less shiny

18.7 match 1 stars 4.00 score 8 scripts

andrisignorell

DescTools:Tools for Descriptive Statistics

A collection of miscellaneous basic statistic functions and convenience wrappers for efficiently describing data. The author's intention was to create a toolbox, which facilitates the (notoriously time consuming) first descriptive tasks in data analysis, consisting of calculating descriptive statistics, drawing graphical summaries and reporting the results. The package contains furthermore functions to produce documents using MS Word (or PowerPoint) and functions to import data from Excel. Many of the included functions can be found scattered in other packages and other sources written partly by Titans of R. The reason for collecting them here, was primarily to have them consolidated in ONE instead of dozens of packages (which themselves might depend on other packages which are not needed at all), and to provide a common and consistent interface as far as function and arguments naming, NA handling, recycling rules etc. are concerned. Google style guides were used as naming rules (in absence of convincing alternatives). The 'BigCamelCase' style was consequently applied to functions borrowed from contributed R packages as well.

Maintained by Andri Signorell. Last updated 4 days ago.

fortran cpp

3.6 match 86 stars 16.73 score 7.7k scripts 101 dependents

mlverse

torch:Tensors and Neural Networks with 'GPU' Acceleration

Provides functionality to define and train neural networks similar to 'PyTorch' by Paszke et al (2019) <doi:10.48550/arXiv.1912.01703> but written entirely in R using the 'libtorch' library. Also supports low-level tensor operations and 'GPU' acceleration.

Maintained by Daniel Falbel. Last updated 6 days ago.

autograd deep-learning torch cpp

3.3 match 521 stars 16.50 score 1.4k scripts 39 dependents

opengeos

whitebox:'WhiteboxTools' R Frontend

An R frontend for the 'WhiteboxTools' library, which is an advanced geospatial data analysis platform developed by Prof. John Lindsay at the University of Guelph's Geomorphometry and Hydrogeomatics Research Group. 'WhiteboxTools' can be used to perform common geographical information systems (GIS) analysis operations, such as cost-distance analysis, distance buffering, and raster reclassification. Remote sensing and image processing tasks include image enhancement (e.g. panchromatic sharpening, contrast adjustments), image mosaicing, numerous filtering operations, simple classification (k-means), and common image transformations. 'WhiteboxTools' also contains advanced tooling for spatial hydrological analysis (e.g. flow-accumulation, watershed delineation, stream network analysis, sink removal), terrain analysis (e.g. common terrain indices such as slope, curvatures, wetness index, hillshading; hypsometric analysis; multi-scale topographic position analysis), and LiDAR data processing. Suggested citation: Lindsay (2016) <doi:10.1016/j.cageo.2016.07.003>.

Maintained by Andrew Brown. Last updated 6 months ago.

geomorphometry geoprocessing geospatial gis hydrology remote-sensing rstudio

5.3 match 173 stars 9.65 score 203 scripts 2 dependents

poissonconsulting

chk:Check User-Supplied Function Arguments

For developers to check user-supplied function arguments. It is designed to be simple, fast and customizable. Error messages follow the tidyverse style guide.

Maintained by Joe Thorley. Last updated 2 months ago.

chk

4.3 match 48 stars 11.91 score 22 scripts 98 dependents

ixpantia

lacrmr:Connect to the 'Less Annoying CRM' API

Connect to the 'Less Annoying CRM' API with ease to get your crm data in a clean and tidy format. 'Less Annoying CRM' is a simple CRM built for small businesses, more information is available on their website <https://www.lessannoyingcrm.com/>.

Maintained by Frans van Dunné. Last updated 1 years ago.

crm hacktoberfest less-annoying-crm

10.5 match 2 stars 4.48 score 4 scripts

cran

critpath:Setting the Critical Path in Project Management

Solving the problem of project management using CPM (Critical Path Method), PERT (Program Evaluation and Review Technique) and LESS (Least Cost Estimating and Scheduling) methods. The package sets the critical path, schedule and Gantt chart. In addition, it allows to draw a graph even with marked critical activities. For more information about project management see: Taha H. A. "Operations Research. An Introduction" (2017, ISBN:978-1-292-16554-7), Rama Murthy P. "Operations Research" (2007, ISBN:978-81-224-2944-2), Yuval Cohen & Arik Sadeh (2006) "A New Approach for Constructing and Generating AOA Networks", Journal of Engineering, Computing and Architecture 1. 1-13, Konarzewska I., Jewczak M., Kucharski A. (2020, ISBN:978-83-8220-112-3), Miszczyńska D., Miszczyński M. "Wybrane metody badań operacyjnych" (2000, ISBN:83-907712-0-9).

Maintained by Adam Kucharski. Last updated 1 years ago.

15.1 match 2.48 score

tengmcing

bandicoot:Light-Weight 'python'-Like Object-Oriented System

A light-weight object-oriented system with 'python'-like syntax which supports multiple inheritances and incorporates a 'python'-like method resolution order.

Maintained by Weihao Li. Last updated 1 years ago.

7.7 match 4 stars 4.78 score 10 scripts 1 dependents

gdemin

expss:Tables, Labels and Some Useful Functions from Spreadsheets and 'SPSS' Statistics

Package computes and displays tables with support for 'SPSS'-style labels, multiple and nested banners, weights, multiple-response variables and significance testing. There are facilities for nice output of tables in 'knitr', 'Shiny', '*.xlsx' files, R and 'Jupyter' notebooks. Methods for labelled variables add value labels support to base R functions and to some functions from other packages. Additionally, the package brings popular data transformation functions from 'SPSS' Statistics and 'Excel': 'RECODE', 'COUNT', 'COUNTIF', 'VLOOKUP' and etc. These functions are very useful for data processing in marketing research surveys. Package intended to help people to move data processing from 'Excel' and 'SPSS' to R.

Maintained by Gregory Demin. Last updated 12 months ago.

excel labels labels-support msexcel pivot-tables recode spss spss-statistics tables variable-labels vlookup

3.3 match 84 stars 11.00 score 1.8k scripts 4 dependents

r-lib

testthat:Unit Testing for R

Software testing is important, but, in part because it is frustrating and boring, many of us avoid it. 'testthat' is a testing framework for R that is easy to learn and use, and integrates with your existing 'workflow'.

Maintained by Hadley Wickham. Last updated 1 months ago.

unit-testing cpp

1.7 match 900 stars 20.99 score 74k scripts 471 dependents

rstudio

pointblank:Data Validation and Organization of Metadata for Local and Remote Tables

Validate data in data frames, 'tibble' objects, 'Spark' 'DataFrames', and database tables. Validation pipelines can be made using easily-readable, consecutive validation steps. Upon execution of the validation plan, several reporting options are available. User-defined thresholds for failure rates allow for the determination of appropriate reporting actions. Many other workflows are available including an information management workflow, where the aim is to record, collect, and generate useful information on data tables.

Maintained by Richard Iannone. Last updated 6 days ago.

data-assertions data-checker data-dictionaries data-frames data-inference data-management data-profiler data-quality data-validation data-verification database-tables easy-to-understand reporting-tool schema-validation testing-tools yaml-configuration

3.4 match 942 stars 10.73 score 284 scripts

hrbrmstr

epidata:Tools to Retrieve Economic Policy Institute Data Library Extracts

The Economic Policy Institute (<http://www.epi.org/>) provides researchers, media, and the public with easily accessible, up-to-date, and comprehensive historical data on the American labor force. It is compiled from Economic Policy Institute analysis of government data sources. Use it to research wages, inequality, and other economic indicators over time and among demographic groups. Data is usually updated monthly.

Maintained by Bob Rudis. Last updated 5 years ago.

6.5 match 19 stars 5.42 score 28 scripts

thothorn

ipred:Improved Predictors

Improved predictive models by indirect classification and bagging for classification, regression and survival problems as well as resampling based estimators of prediction error.

Maintained by Torsten Hothorn. Last updated 9 months ago.

3.1 match 10.76 score 3.3k scripts 411 dependents

rstudio

shinyvalidate:Input Validation for Shiny Apps

Improves the user experience of Shiny apps by helping to provide feedback when required inputs are missing, or input values are not valid.

Maintained by Carson Sievert. Last updated 1 years ago.

shiny ui validation

3.5 match 112 stars 9.10 score 316 scripts 13 dependents

r-quantities

units:Measurement Units for R Vectors

Support for measurement units in R vectors, matrices and arrays: automatic propagation, conversion, derivation and simplification of units; raising errors in case of unit incompatibility. Compatible with the POSIXct, Date and difftime classes. Uses the UNIDATA udunits library and unit database for unit compatibility checking and conversion. Documentation about 'units' is provided in the paper by Pebesma, Mailund & Hiebert (2016, <doi:10.32614/RJ-2016-061>), included in this package as a vignette; see 'citation("units")' for details.

Maintained by Edzer Pebesma. Last updated 20 days ago.

udunits cpp

1.8 match 181 stars 17.28 score 3.3k scripts 1.2k dependents

r-spatial

spdep:Spatial Dependence: Weighting Schemes, Statistics

A collection of functions to create spatial weights matrix objects from polygon 'contiguities', from point patterns by distance and tessellations, for summarizing these objects, and for permitting their use in spatial data analysis, including regional aggregation by minimum spanning tree; a collection of tests for spatial 'autocorrelation', including global 'Morans I' and 'Gearys C' proposed by 'Cliff' and 'Ord' (1973, ISBN: 0850860369) and (1981, ISBN: 0850860814), 'Hubert/Mantel' general cross product statistic, Empirical Bayes estimates and 'Assunção/Reis' (1999) <doi:10.1002/(SICI)1097-0258(19990830)18:16%3C2147::AID-SIM179%3E3.0.CO;2-I> Index, 'Getis/Ord' G ('Getis' and 'Ord' 1992) <doi:10.1111/j.1538-4632.1992.tb00261.x> and multicoloured join count statistics, 'APLE' ('Li 'et al.' ) <doi:10.1111/j.1538-4632.2007.00708.x>, local 'Moran's I', 'Gearys C' ('Anselin' 1995) <doi:10.1111/j.1538-4632.1995.tb00338.x> and 'Getis/Ord' G ('Ord' and 'Getis' 1995) <doi:10.1111/j.1538-4632.1995.tb00912.x>, 'saddlepoint' approximations ('Tiefelsdorf' 2002) <doi:10.1111/j.1538-4632.2002.tb01084.x> and exact tests for global and local 'Moran's I' ('Bivand et al.' 2009) <doi:10.1016/j.csda.2008.07.021> and 'LOSH' local indicators of spatial heteroscedasticity ('Ord' and 'Getis') <doi:10.1007/s00168-011-0492-y>. The implementation of most of these measures is described in 'Bivand' and 'Wong' (2018) <doi:10.1007/s11749-018-0599-x>, with further extensions in 'Bivand' (2022) <doi:10.1111/gean.12319>. 'Lagrange' multiplier tests for spatial dependence in linear models are provided ('Anselin et al'. 1996) <doi:10.1016/0166-0462(95)02111-6>, as are 'Rao' score tests for hypothesised spatial 'Durbin' models based on linear models ('Koley' and 'Bera' 2023) <doi:10.1080/17421772.2023.2256810>. A local indicators for categorical data (LICD) implementation based on 'Carrer et al.' (2021) <doi:10.1016/j.jas.2020.105306> and 'Bivand et al.' (2017) <doi:10.1016/j.spasta.2017.03.003> was added in 1.3-7. From 'spdep' and 'spatialreg' versions >= 1.2-1, the model fitting functions previously present in this package are defunct in 'spdep' and may be found in 'spatialreg'.

Maintained by Roger Bivand. Last updated 1 months ago.

spatial-autocorrelation spatial-dependence spatial-weights

1.8 match 131 stars 16.59 score 6.0k scripts 106 dependents

stan-dev

loo:Efficient Leave-One-Out Cross-Validation and WAIC for Bayesian Models

Efficient approximate leave-one-out cross-validation (LOO) for Bayesian models fit using Markov chain Monte Carlo, as described in Vehtari, Gelman, and Gabry (2017) <doi:10.1007/s11222-016-9696-4>. The approximation uses Pareto smoothed importance sampling (PSIS), a new procedure for regularizing importance weights. As a byproduct of the calculations, we also obtain approximate standard errors for estimated predictive errors and for the comparison of predictive errors between models. The package also provides methods for using stacking and other model weighting techniques to average Bayesian predictive distributions.

Maintained by Jonah Gabry. Last updated 19 days ago.

bayes bayesian bayesian-data-analysis bayesian-inference bayesian-methods bayesian-statistics cross-validation information-criterion model-comparison stan

1.6 match 152 stars 17.30 score 2.6k scripts 297 dependents

statnet

ergm:Fit, Simulate and Diagnose Exponential-Family Models for Networks

An integrated set of tools to analyze and simulate networks based on exponential-family random graph models (ERGMs). 'ergm' is a part of the Statnet suite of packages for network analysis. See Hunter, Handcock, Butts, Goodreau, and Morris (2008) <doi:10.18637/jss.v024.i03> and Krivitsky, Hunter, Morris, and Klumb (2023) <doi:10.18637/jss.v105.i06>.

Maintained by Pavel N. Krivitsky. Last updated 23 days ago.

1.8 match 100 stars 15.36 score 1.4k scripts 36 dependents

r-lib

clock:Date-Time Types and Tools

Provides a comprehensive library for date-time manipulations using a new family of orthogonal date-time classes (durations, time points, zoned-times, and calendars) that partition responsibilities so that the complexities of time zones are only considered when they are really needed. Capabilities include: date-time parsing, formatting, arithmetic, extraction and updating of components, and rounding.

Maintained by Davis Vaughan. Last updated 15 days ago.

cpp

1.8 match 106 stars 14.53 score 296 scripts 407 dependents

cwickham

munsell:Utilities for Using Munsell Colours

Provides easy access to, and manipulation of, the Munsell colours. Provides a mapping between Munsell's original notation (e.g. "5R 5/10") and hexadecimal strings suitable for use directly in R graphics. Also provides utilities to explore slices through the Munsell colour tree, to transform Munsell colours and display colour palettes.

Maintained by Charlotte Wickham. Last updated 12 months ago.

munsell-colour

1.8 match 111 stars 13.99 score 179 scripts 8.0k dependents

dgerbing

lessR:Less Code, More Results

Each function replaces multiple standard R functions. For example, two function calls, Read() and CountAll(), generate summary statistics for all variables in the data frame, plus histograms and bar charts as appropriate. Other functions provide for summary statistics via pivot tables, a comprehensive regression analysis, ANOVA and t-test, visualizations including the Violin/Box/Scatter plot for a numerical variable, bar chart, histogram, box plot, density curves, calibrated power curve, reading multiple data formats with the same function call, variable labels, time series with aggregation and forecasting, color themes, and Trellis (facet) graphics. Also includes a confirmatory factor analysis of multiple indicator measurement models, pedagogical routines for data simulation such as for the Central Limit Theorem, generation and rendering of regression instructions for interpretative output, and interactive visualizations.

Maintained by David W. Gerbing. Last updated 16 days ago.

3.3 match 6 stars 7.42 score 394 scripts 3 dependents

green-striped-gecko

dartR:Importing and Analysing 'SNP' and 'Silicodart' Data Generated by Genome-Wide Restriction Fragment Analysis

Functions are provided that facilitate the import and analysis of 'SNP' (single nucleotide polymorphism) and 'silicodart' (presence/absence) data. The main focus is on data generated by 'DarT' (Diversity Arrays Technology), however, data from other sequencing platforms can be used once 'SNP' or related fragment presence/absence data from any source is imported. Genetic datasets are stored in a derived 'genlight' format (package 'adegenet'), that allows for a very compact storage of data and metadata. Functions are available for importing and exporting of 'SNP' and 'silicodart' data, for reporting on and filtering on various criteria (e.g. 'CallRate', heterozygosity, reproducibility, maximum allele frequency). Additional functions are available for visualization (e.g. Principle Coordinate Analysis) and creating a spatial representation using maps. 'dartR' supports also the analysis of 3rd party software package such as 'newhybrid', 'structure', 'NeEstimator' and 'blast'. Since version 2.0.3 we also implemented simulation functions, that allow to forward simulate 'SNP' dynamics under different population and evolutionary dynamics. Comprehensive tutorials and support can be found at our 'github' repository: github.com/green-striped-gecko/dartR/. If you want to cite 'dartR', you find the information by typing citation('dartR') in the console.

Maintained by Bernd Gruber. Last updated 5 days ago.

3.3 match 34 stars 7.41 score

thothorn

maxstat:Maximally Selected Rank Statistics

Maximally selected rank statistics with several p-value approximations.

Maintained by Torsten Hothorn. Last updated 8 years ago.

3.1 match 1 stars 7.69 score 107 scripts 59 dependents

brodieg

diffobj:Diffs for R Objects

Generate a colorized diff of two R objects for an intuitive visualization of their differences.

Maintained by Brodie Gaslam. Last updated 3 years ago.

diff

1.8 match 231 stars 13.17 score 107 scripts 494 dependents

f0nzie

rTorch:R Bindings to 'PyTorch'

'R' implementation and interface of the Machine Learning platform 'PyTorch' <https://pytorch.org/> developed in 'Python'. It requires a 'conda' environment with 'torch' and 'torchvision' Python packages to provide 'PyTorch' functions, methods and classes. The key object in 'PyTorch' is the tensor which is in essence a multidimensional array. These tensors are fairly flexible in performing calculations in CPUs as well as 'GPUs' to accelerate tensor operations.

Maintained by Alfonso R. Reyes. Last updated 3 years ago.

3.7 match 6 stars 5.97 score 157 scripts

pecanproject

PEcAn.data.atmosphere:PEcAn Functions Used for Managing Climate Driver Data

The Predictive Ecosystem Carbon Analyzer (PEcAn) is a scientific workflow management tool that is designed to simplify the management of model parameterization, execution, and analysis. The PECAn.data.atmosphere package converts climate driver data into a standard format for models integrated into PEcAn. As a standalone package, it provides an interface to access diverse climate data sets.

Maintained by David LeBauer. Last updated 8 hours ago.

bayesian cyberinfrastructure data-assimilation data-science ecosystem-model ecosystem-science forecasting meta-analysis national-science-foundation pecan plants

1.9 match 216 stars 11.62 score 64 scripts 14 dependents

nhs-r-community

NHSRplotthedots:Draw XmR Charts for NHSE/I 'Making Data Count' Programme

Provides tools for drawing Statistical Process Control (SPC) charts. This package supports the NHSE/I programme 'Making Data Count', and allows users to draw XmR charts, use change points and apply rules with summary indicators for when rules are breached.

Maintained by Christopher Reading. Last updated 2 months ago.

ggplot2 nhs plotthedots spc xmr

2.5 match 49 stars 8.35 score 58 scripts

glsnow

TeachingDemos:Demonstrations for Teaching and Learning

Demonstration functions that can be used in a classroom to demonstrate statistical concepts, or on your own to better understand the concepts or the programming.

Maintained by Greg Snow. Last updated 1 years ago.

2.9 match 7.18 score 760 scripts 13 dependents

sgibb

MALDIquant:Quantitative Analysis of Mass Spectrometry Data

A complete analysis pipeline for matrix-assisted laser desorption/ionization-time-of-flight (MALDI-TOF) and other two-dimensional mass spectrometry data. In addition to commonly used plotting and processing methods it includes distinctive features, namely baseline subtraction methods such as morphological filters (TopHat) or the statistics-sensitive non-linear iterative peak-clipping algorithm (SNIP), peak alignment using warping functions, handling of replicated measurements as well as allowing spectra with different resolutions.

Maintained by Sebastian Gibb. Last updated 7 months ago.

maldi maldi-ims maldi-tof-ms mass-spectrometry

1.9 match 62 stars 11.06 score 180 scripts 44 dependents

mmaechler

sfsmisc:Utilities from 'Seminar fuer Statistik' ETH Zurich

Useful utilities ['goodies'] from Seminar fuer Statistik ETH Zurich, some of which were ported from S-plus in the 1990s. For graphics, have pretty (Log-scale) axes eaxis(), an enhanced Tukey-Anscombe plot, combining histogram and boxplot, 2d-residual plots, a 'tachoPlot()', pretty arrows, etc. For robustness, have a robust F test and robust range(). For system support, notably on Linux, provides 'Sys.*()' functions with more access to system and CPU information. Finally, miscellaneous utilities such as simple efficient prime numbers, integer codes, Duplicated(), toLatex.numeric() and is.whole().

Maintained by Martin Maechler. Last updated 5 months ago.

1.9 match 11 stars 10.85 score 566 scripts 119 dependents

jthomasmock

gtExtras:Extending 'gt' for Beautiful HTML Tables

Provides additional functions for creating beautiful tables with 'gt'. The functions are generally wrappers around boilerplate or adding opinionated niche capabilities and helpers functions.

Maintained by Thomas Mock. Last updated 12 months ago.

data-science data-visualization datascience ggplot2 gt plots sparkline sparkline-graphs sparklines tables

1.7 match 201 stars 11.66 score 2.4k scripts 5 dependents

fvafrcu

fritools:Utilities for the Forest Research Institute of the State Baden-Wuerttemberg

Miscellaneous utilities, tools and helper functions for finding and searching files on disk, searching for and removing R objects from the workspace. Does not import or depend on any third party package, but on core R only (i.e. it may depend on packages with priority 'base').

Maintained by Andreas Dominik Cullmann. Last updated 1 months ago.

3.4 match 5.82 score 4 scripts 6 dependents

plangfelder

WGCNA:Weighted Correlation Network Analysis

Functions necessary to perform Weighted Correlation Network Analysis on high-dimensional data as originally described in Horvath and Zhang (2005) <doi:10.2202/1544-6115.1128> and Langfelder and Horvath (2008) <doi:10.1186/1471-2105-9-559>. Includes functions for rudimentary data cleaning, construction of correlation networks, module identification, summarization, and relating of variables and modules to sample traits. Also includes a number of utility functions for data manipulation and visualization.

Maintained by Peter Langfelder. Last updated 7 months ago.

cpp

2.0 match 54 stars 9.65 score 5.3k scripts 32 dependents

fishr-core-team

FSA:Simple Fisheries Stock Assessment Methods

A variety of simple fish stock assessment methods.

Maintained by Derek H. Ogle. Last updated 2 months ago.

fish fisheries fisheries-management fisheries-stock-assessment population-dynamics stock-assessment

1.7 match 69 stars 11.16 score 1.7k scripts 6 dependents

jverzani

UsingR:Data Sets, Etc. for the Text "Using R for Introductory Statistics", Second Edition

A collection of data sets to accompany the textbook "Using R for Introductory Statistics," second edition.

Maintained by John Verzani. Last updated 3 years ago.

3.8 match 1 stars 5.00 score 1.4k scripts

ropensci

aorsf:Accelerated Oblique Random Forests

Fit, interpret, and compute predictions with oblique random forests. Includes support for partial dependence, variable importance, passing customized functions for variable importance and identification of linear combinations of features. Methods for the oblique random survival forest are described in Jaeger et al., (2023) <DOI:10.1080/10618600.2023.2231048>.

Maintained by Byron Jaeger. Last updated 11 days ago.

data-science oblique random-forest survival openblas cpp openmp

2.0 match 58 stars 9.29 score 60 scripts 1 dependents

gnguy

assertable:Verbose Assertions for Tabular Data (Data.frames and Data.tables)

Simple, flexible, assertions on data.frame or data.table objects with verbose output for vetting. While other assertion packages apply towards more general use-cases, assertable is tailored towards tabular data. It includes functions to check variable names and values, whether the dataset contains all combinations of a given set of unique identifiers, and whether it is a certain length. In addition, assertable includes utility functions to check the existence of target files and to efficiently import multiple tabular data files into one data.table.

Maintained by Grant Nguyen. Last updated 4 years ago.

2.9 match 6.29 score 219 scripts 2 dependents

natverse

nat:NeuroAnatomy Toolbox for Analysis of 3D Image Data

NeuroAnatomy Toolbox (nat) enables analysis and visualisation of 3D biological image data, especially traced neurons. Reads and writes 3D images in NRRD and 'Amira' AmiraMesh formats and reads surfaces in 'Amira' hxsurf format. Traced neurons can be imported from and written to SWC and 'Amira' LineSet and SkeletonGraph formats. These data can then be visualised in 3D via 'rgl', manipulated including applying calculated registrations, e.g. using the 'CMTK' registration suite, and analysed. There is also a simple representation for neurons that have been subjected to 3D skeletonisation but not formally traced; this allows morphological comparison between neurons including searches and clustering (via the 'nat.nblast' extension package).

Maintained by Gregory Jefferis. Last updated 6 months ago.

3d connectomics image-analysis neuroanatomy neuroanatomy-toolbox neuron neuron-morphology neuroscience visualisation

1.7 match 67 stars 9.94 score 436 scripts 2 dependents

gadenbuie

cleanrmd:Clean Class-Less 'R Markdown' HTML Documents

A collection of clean 'R Markdown' HTML document templates using classy-looking classless CSS styles. These documents use a minimal set of dependencies but still look great, making them suitable for use a package vignettes or for sharing results via email.

Maintained by Garrick Aden-Buie. Last updated 2 years ago.

classless classless-theme clean css html rmarkdown style theme

2.9 match 151 stars 5.95 score 10 scripts 1 dependents

bioc

imcRtools:Methods for imaging mass cytometry data analysis

This R package supports the handling and analysis of imaging mass cytometry and other highly multiplexed imaging data. The main functionality includes reading in single-cell data after image segmentation and measurement, data formatting to perform channel spillover correction and a number of spatial analysis approaches. First, cell-cell interactions are detected via spatial graph construction; these graphs can be visualized with cells representing nodes and interactions representing edges. Furthermore, per cell, its direct neighbours are summarized to allow spatial clustering. Per image/grouping level, interactions between types of cells are counted, averaged and compared against random permutations. In that way, types of cells that interact more (attraction) or less (avoidance) frequently than expected by chance are detected.

Maintained by Daniel Schulz. Last updated 5 months ago.

immunooncology singlecell spatial dataimport clustering imc single-cell

2.2 match 24 stars 7.58 score 126 scripts

billdenney

PKNCA:Perform Pharmacokinetic Non-Compartmental Analysis

Compute standard Non-Compartmental Analysis (NCA) parameters for typical pharmacokinetic analyses and summarize them.

Maintained by Bill Denney. Last updated 1 months ago.

nca noncompartmental-analysis pharmacokinetics

1.3 match 73 stars 12.53 score 214 scripts 4 dependents

stemangiola

tidyHeatmap:A Tidy Implementation of Heatmap

This is a tidy implementation for heatmap. At the moment it is based on the (great) package 'ComplexHeatmap'. The goal of this package is to interface a tidy data frame with this powerful tool. Some of the advantages are: Row and/or columns colour annotations are easy to integrate just specifying one parameter (column names). Custom grouping of rows is easy to specify providing a grouped tbl. For example: df %>% group_by(...). Labels size adjusted by row and column total number. Default use of Brewer and Viridis palettes.

Maintained by Stefano Mangiola. Last updated 2 months ago.

assaydomain infrastructure brewer complexheatmap custom-palette dplyr graphviz heatmap mtcars plotting rstudio scale tibble tidy tidy-data-frame tidybulk tidyverse viridis

1.6 match 335 stars 10.23 score 197 scripts 1 dependents

sujit-sahu

bmstdr:Bayesian Modeling of Spatio-Temporal Data with R

Fits, validates and compares a number of Bayesian models for spatial and space time point referenced and areal unit data. Model fitting is done using several packages: 'rstan', 'INLA', 'spBayes', 'spTimer', 'spTDyn', 'CARBayes' and 'CARBayesST'. Model comparison is performed using the DIC and WAIC, and K-fold cross-validation where the user is free to select their own subset of data rows for validation. Sahu (2022) <doi:10.1201/9780429318443> describes the methods in detail.

Maintained by Sujit K. Sahu. Last updated 1 days ago.

bayesian modelling spatio-temporal-data cpp

3.1 match 16 stars 5.28 score 12 scripts

bioc

mina:Microbial community dIversity and Network Analysis

An increasing number of microbiome datasets have been generated and analyzed with the help of rapidly developing sequencing technologies. At present, analysis of taxonomic profiling data is mainly conducted using composition-based methods, which ignores interactions between community members. Besides this, a lack of efficient ways to compare microbial interaction networks limited the study of community dynamics. To better understand how community diversity is affected by complex interactions between its members, we developed a framework (Microbial community dIversity and Network Analysis, mina), a comprehensive framework for microbial community diversity analysis and network comparison. By defining and integrating network-derived community features, we greatly reduce noise-to-signal ratio for diversity analyses. A bootstrap and permutation-based method was implemented to assess community network dissimilarities and extract discriminative features in a statistically principled way.

Maintained by Rui Guan. Last updated 5 months ago.

software workflowstep cpp

3.2 match 5 stars 4.85 score 14 scripts

traversc

stringfish:Alt String Implementation

Provides an extendable, performant and multithreaded 'alt-string' implementation backed by 'C++' vectors and strings.

Maintained by Travers Ching. Last updated 5 months ago.

pcre2 cpp

1.5 match 67 stars 10.14 score 14 scripts 57 dependents

bioc

BatchQC:Batch Effects Quality Control Software

Sequencing and microarray samples often are collected or processed in multiple batches or at different times. This often produces technical biases that can lead to incorrect results in the downstream analysis. BatchQC is a software tool that streamlines batch preprocessing and evaluation by providing interactive diagnostics, visualizations, and statistical analyses to explore the extent to which batch variation impacts the data. BatchQC diagnostics help determine whether batch adjustment needs to be done, and how correction should be applied before proceeding with a downstream analysis. Moreover, BatchQC interactively applies multiple common batch effect approaches to the data and the user can quickly see the benefits of each method. BatchQC is developed as a Shiny App. The output is organized into multiple tabs and each tab features an important part of the batch effect analysis and visualization of the data. The BatchQC interface has the following analysis groups: Summary, Differential Expression, Median Correlations, Heatmaps, Circular Dendrogram, PCA Analysis, Shape, ComBat and SVA.

Maintained by Jessica Anderson. Last updated 14 days ago.

batcheffect graphandnetwork microarray normalization principalcomponent sequencing software visualization qualitycontrol rnaseq preprocessing differentialexpression immunooncology

1.6 match 7 stars 9.06 score 54 scripts

nlmixr2

nlmixr2:Nonlinear Mixed Effects Models in Population PK/PD

Fit and compare nonlinear mixed-effects models in differential equations with flexible dosing information commonly seen in pharmacokinetics and pharmacodynamics (Almquist, Leander, and Jirstrand 2015 <doi:10.1007/s10928-015-9409-1>). Differential equation solving is by compiled C code provided in the 'rxode2' package (Wang, Hallow, and James 2015 <doi:10.1002/psp4.12052>).

Maintained by Matthew Fidler. Last updated 1 days ago.

1.7 match 52 stars 8.32 score 120 scripts 3 dependents

mrc-ide

gonovax:Deterministic Compartmental Model of Gonorrhoea with Vaccination

Model for gonorrhoea vaccination, using odin.

Maintained by Lilith Whittles. Last updated 18 days ago.

3.0 match 3 stars 4.56 score

darwin-eu

DrugUtilisation:Summarise Patient-Level Drug Utilisation in Data Mapped to the OMOP Common Data Model

Summarise patient-level drug utilisation cohorts using data mapped to the Observational Medical Outcomes Partnership (OMOP) common data model. New users and prevalent users cohorts can be generated and their characteristics, indication and drug use summarised.

Maintained by Martí Català. Last updated 3 months ago.

1.7 match 8.20 score 156 scripts 2 dependents

cran

epiR:Tools for the Analysis of Epidemiological Data

Tools for the analysis of epidemiological and surveillance data. Contains functions for directly and indirectly adjusting measures of disease frequency, quantifying measures of association on the basis of single or multiple strata of count data presented in a contingency table, computation of confidence intervals around incidence risk and incidence rate estimates and sample size calculations for cross-sectional, case-control and cohort studies. Surveillance tools include functions to calculate an appropriate sample size for 1- and 2-stage representative freedom surveys, functions to estimate surveillance system sensitivity and functions to support scenario tree modelling analyses.

Maintained by Mark Stevenson. Last updated 2 months ago.

1.7 match 10 stars 8.06 score 10 dependents

bioc

netZooR:Unified methods for the inference and analysis of gene regulatory networks

netZooR unifies the implementations of several Network Zoo methods (netzoo, netzoo.github.io) into a single package by creating interfaces between network inference and network analysis methods. Currently, the package has 3 methods for network inference including PANDA and its optimized implementation OTTER (network reconstruction using mutliple lines of biological evidence), LIONESS (single-sample network inference), and EGRET (genotype-specific networks). Network analysis methods include CONDOR (community detection), ALPACA (differential community detection), CRANE (significance estimation of differential modules), MONSTER (estimation of network transition states). In addition, YARN allows to process gene expresssion data for tissue-specific analyses and SAMBAR infers missing mutation data based on pathway information.

Maintained by Tara Eicher. Last updated 14 days ago.

networkinference network generegulation geneexpression transcription microarray graphandnetwork gene-regulatory-network transcription-factors

1.7 match 105 stars 7.98 score

moderndive

moderndive:Tidyverse-Friendly Introductory Linear Regression

Datasets and wrapper functions for tidyverse-friendly introductory linear regression, used in "Statistical Inference via Data Science: A ModernDive into R and the Tidyverse" available at <https://moderndive.com/>.

Maintained by Albert Y. Kim. Last updated 3 months ago.

1.1 match 88 stars 11.32 score 1.8k scripts

cardiomoon

ztable:Zebra-Striped Tables in LaTeX and HTML Formats

Makes zebra-striped tables (tables with alternating row colors) in LaTeX and HTML formats easily from a data.frame, matrix, lm, aov, anova, glm, coxph, nls, fitdistr, mytable and cbind.mytable objects.

Maintained by Keon-Woong Moon. Last updated 2 years ago.

1.6 match 21 stars 7.90 score 212 scripts 2 dependents

kharchenkolab

pagoda2:Single Cell Analysis and Differential Expression

Analyzing and interactively exploring large-scale single-cell RNA-seq datasets. 'pagoda2' primarily performs normalization and differential gene expression analysis, with an interactive application for exploring single-cell RNA-seq datasets. It performs basic tasks such as cell size normalization, gene variance normalization, and can be used to identify subpopulations and run differential expression within individual samples. 'pagoda2' was written to rapidly process modern large-scale scRNAseq datasets of approximately 1e6 cells. The companion web application allows users to explore which gene expression patterns form the different subpopulations within your data. The package also serves as the primary method for preprocessing data for conos, <https://github.com/kharchenkolab/conos>. This package interacts with data available through the 'p2data' package, which is available in a 'drat' repository. To access this data package, see the instructions at <https://github.com/kharchenkolab/pagoda2>. The size of the 'p2data' package is approximately 6 MB.

Maintained by Evan Biederstedt. Last updated 1 years ago.

scrna-seq single-cell single-cell-rna-seq transcriptomics openblas cpp openmp

1.6 match 223 stars 8.00 score 282 scripts

tomasfryda

h2o:R Interface for the 'H2O' Scalable Machine Learning Platform

R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).

Maintained by Tomas Fryda. Last updated 1 years ago.

1.5 match 3 stars 8.20 score 7.8k scripts 11 dependents

david-hervas

clickR:Semi-Automatic Preprocessing of Messy Data with Change Tracking for Dataset Cleaning

Tools for assessing data quality, performing exploratory analysis, and semi-automatic preprocessing of messy data with change tracking for integral dataset cleaning.

Maintained by David Hervas Marin. Last updated 1 months ago.

2.3 match 2 stars 5.53 score 25 scripts 1 dependents

bioc

MutationalPatterns:Comprehensive genome-wide analysis of mutational processes

Mutational processes leave characteristic footprints in genomic DNA. This package provides a comprehensive set of flexible functions that allows researchers to easily evaluate and visualize a multitude of mutational patterns in base substitution catalogues of e.g. healthy samples, tumour samples, or DNA-repair deficient cells. The package covers a wide range of patterns including: mutational signatures, transcriptional and replicative strand bias, lesion segregation, genomic distribution and association with genomic features, which are collectively meaningful for studying the activity of mutational processes. The package works with single nucleotide variants (SNVs), insertions and deletions (Indels), double base substitutions (DBSs) and larger multi base substitutions (MBSs). The package provides functionalities for both extracting mutational signatures de novo and determining the contribution of previously identified mutational signatures on a single sample level. MutationalPatterns integrates with common R genomic analysis workflows and allows easy association with (publicly available) annotation data.

Maintained by Mark van Roosmalen. Last updated 5 months ago.

genetics somaticmutation

1.7 match 7.18 score 251 scripts 1 dependents

bioc

psichomics:Graphical Interface for Alternative Splicing Quantification, Analysis and Visualisation

Interactive R package with an intuitive Shiny-based graphical interface for alternative splicing quantification and integrative analyses of alternative splicing and gene expression based on The Cancer Genome Atlas (TCGA), the Genotype-Tissue Expression project (GTEx), Sequence Read Archive (SRA) and user-provided data. The tool interactively performs survival, dimensionality reduction and median- and variance-based differential splicing and gene expression analyses that benefit from the incorporation of clinical and molecular sample-associated features (such as tumour stage or survival). Interactive visual access to genomic mapping and functional annotation of selected alternative splicing events is also included.

Maintained by Nuno Saraiva-Agostinho. Last updated 5 months ago.

sequencing rnaseq alternativesplicing differentialsplicing transcription gui principalcomponent survival biomedicalinformatics transcriptomics immunooncology visualization multiplecomparison geneexpression differentialexpression alternative-splicing bioconductor data-analyses differential-gene-expression differential-splicing-analysis gene-expression gtex recount2 rna-seq-data splicing-quantification sra tcga vast-tools cpp

1.8 match 36 stars 6.95 score 31 scripts

bioc

ncdfFlow:ncdfFlow: A package that provides HDF5 based storage for flow cytometry data.

Provides HDF5 storage based methods and functions for manipulation of flow cytometry data.

Maintained by Mike Jiang. Last updated 3 months ago.

immunooncology flowcytometry zlib cpp

1.6 match 7.56 score 96 scripts 11 dependents

joycekang

symphony:Efficient and Precise Single-Cell Reference Atlas Mapping

Implements the Symphony single-cell reference building and query mapping algorithms and additional functions described in Kang et al <https://www.nature.com/articles/s41467-021-25957-x>.

Maintained by Joyce Kang. Last updated 2 years ago.

openblas cpp

3.1 match 3.83 score 134 scripts

kharchenkolab

conos:Clustering on Network of Samples

Wires together large collections of single-cell RNA-seq datasets, which allows for both the identification of recurrent cell clusters and the propagation of information between datasets in multi-sample or atlas-scale collections. 'Conos' focuses on the uniform mapping of homologous cell types across heterogeneous sample collections. For instance, users could investigate a collection of dozens of peripheral blood samples from cancer patients combined with dozens of controls, which perhaps includes samples of a related tissue such as lymph nodes. This package interacts with data available through the 'conosPanel' package, which is available in a 'drat' repository. To access this data package, see the instructions at <https://github.com/kharchenkolab/conos>. The size of the 'conosPanel' package is approximately 12 MB.

Maintained by Evan Biederstedt. Last updated 1 years ago.

batch-correction scrna-seq single-cell-rna-seq openblas cpp openmp

1.6 match 205 stars 7.33 score 258 scripts

bioc

categoryCompare:Meta-analysis of high-throughput experiments using feature annotations

Calculates significant annotations (categories) in each of two (or more) feature (i.e. gene) lists, determines the overlap between the annotations, and returns graphical and tabular data about the significant annotations and which combinations of feature lists the annotations were found to be significant. Interactive exploration is facilitated through the use of RCytoscape (heavily suggested).

Maintained by Robert M. Flight. Last updated 5 months ago.

annotation go multiplecomparison pathways geneexpression bioconductor

1.7 match 6 stars 6.68 score

hayamizu-lab

treefit:The First Software for Quantitative Trajectory Inference

Perform two types of analysis: 1) checking the goodness-of-fit of tree models to your single-cell gene expression data; and 2) deciding which tree best fits your data.

Maintained by Kouhei Sutou. Last updated 2 months ago.

2.3 match 3 stars 4.95 score 1 scripts

markbravington

mvbutils:General utilities, workspace organization, code and docu editing, live package maintenance, etc

Hierarchical workspace tree, code editing and backup, easy package prep, editing of packages while loaded, per-object lazy-loading, easy documentation, macro functions, and miscellaneous utilities. Needed by debug package.

Maintained by Mark V. Bravington. Last updated 8 days ago.

1.7 match 6.57 score 138 scripts 18 dependents

jhudsl

ottrpal:Companion Tools for Open-Source Tools for Training Resources (OTTR)

Tools for converting Open-Source Tools for Training Resources (OTTR) courses into Leanpub or Coursera courses. 'ottrpal' is for use with the OTTR Template repository to create courses.

Maintained by Candace Savonen. Last updated 6 hours ago.

edtech-software

1.7 match 3 stars 6.53 score 10 scripts 1 dependents

cran

BayesGP:Efficient Implementation of Gaussian Process in Bayesian Hierarchical Models

Implements Bayesian hierarchical models with flexible Gaussian process priors, focusing on Extended Latent Gaussian Models and incorporating various Gaussian process priors for Bayesian smoothing. Computations leverage finite element approximations and adaptive quadrature for efficient inference. Methods are detailed in Zhang, Stringer, Brown, and Stafford (2023) <doi:10.1177/09622802221134172>; Zhang, Stringer, Brown, and Stafford (2024) <doi:10.1080/10618600.2023.2289532>; Zhang, Brown, and Stafford (2023) <doi:10.48550/arXiv.2305.09914>; and Stringer, Brown, and Stafford (2021) <doi:10.1111/biom.13329>.

Maintained by Ziang Zhang. Last updated 5 months ago.

cpp

3.4 match 3.18 score

marcelschweiker

comf:Models and Equations for Human Comfort Research

Calculation of various common and less common comfort indices such as predicted mean vote or the two node model. Converts physical variables such as relative to absolute humidity and evaluates the performance of comfort indices.

Maintained by Marcel Schweiker. Last updated 3 months ago.

2.2 match 3 stars 4.78 score 40 scripts

r-lib

marquee:Markdown Parser and Renderer for R Graphics

Provides the mean to parse and render markdown text with grid along with facilities to define the styling of the text.

Maintained by Thomas Lin Pedersen. Last updated 4 days ago.

cpp

1.2 match 86 stars 8.59 score 28 scripts 1 dependents

quexiang

OpenMindat:Quickly Retrieve Datasets from the 'Mindat' API

The goal of OpenMindat R package is to provide functions for users or machines to quickly and easily retrieve datasets from the mindat.org API (<https://api.mindat.org/schema/redoc/>).

Maintained by Xiang Que. Last updated 2 months ago.

1.8 match 34 stars 5.83 score 3 scripts

felixfan

FinCal:Time Value of Money, Time Series Analysis and Computational Finance

Package for time value of money calculation, time series analysis and computational finance.

Maintained by Felix Yanhui Fan. Last updated 8 years ago.

1.7 match 23 stars 6.14 score 203 scripts 1 dependents

hanettools

Perc:Using Percolation and Conductance to Find Information Flow Certainty in a Direct Network

To find the certainty of dominance interactions with indirect interactions being considered.

Maintained by Jessica Vandeleest. Last updated 4 years ago.

1.7 match 5.88 score 38 scripts

bioc

rBiopaxParser:Parses BioPax files and represents them in R

Parses BioPAX files and represents them in R, at the moment BioPAX level 2 and level 3 are supported.

Maintained by Frank Kramer. Last updated 5 months ago.

datarepresentation

1.7 match 10 stars 5.85 score 7 scripts

nlmixr2

nlmixr2extra:Nonlinear Mixed Effects Models in Population PK/PD, Extra Support Functions

Fit and compare nonlinear mixed-effects models in differential equations with flexible dosing information commonly seen in pharmacokinetics and pharmacodynamics (Almquist, Leander, and Jirstrand 2015 <doi:10.1007/s10928-015-9409-1>). Differential equation solving is by compiled C code provided in the 'rxode2' package (Wang, Hallow, and James 2015 <doi:10.1002/psp4.12052>). This package is for support functions like preconditioned fits <doi:10.1208/s12248-016-9866-5>, boostrap and stepwise covariate selection.

Maintained by Matthew Fidler. Last updated 1 months ago.

openblas cpp

1.7 match 3 stars 5.83 score 11 scripts 5 dependents

altabering

altadata:API Wrapper for Altadata.io

Functions for interacting directly with the 'ALTADATA' API. With this R package, developers can build applications around the 'ALTADATA' API without having to deal with accessing and managing requests and responses. 'ALTADATA' is a curated data marketplace for more information go to <https://www.altadata.io>.

Maintained by Emre Durukan. Last updated 4 years ago.

altadata api-client

3.5 match 1 stars 2.70 score 1 scripts

cran

PLORN:Prediction with Less Overfitting and Robust to Noise

A method for the quantitative prediction with much predictors. This package provides functions to construct the quantitative prediction model with less overfitting and robust to noise.

Maintained by Takahiko Koizumi. Last updated 3 years ago.

3.5 match 2.70 score

bioc

spiky:Spike-in calibration for cell-free MeDIP

spiky implements methods and model generation for cfMeDIP (cell-free methylated DNA immunoprecipitation) with spike-in controls. CfMeDIP is an enrichment protocol which avoids destructive conversion of scarce template, making it ideal as a "liquid biopsy," but creating certain challenges in comparing results across specimens, subjects, and experiments. The use of synthetic spike-in standard oligos allows diagnostics performed with cfMeDIP to quantitatively compare samples across subjects, experiments, and time points in both relative and absolute terms.

Maintained by Tim Triche. Last updated 5 months ago.

differentialmethylation dnamethylation normalization preprocessing qualitycontrol sequencing

1.9 match 2 stars 4.90 score 3 scripts

ibecav

CGPfunctions:Powell Miscellaneous Functions for Teaching and Learning Statistics

Miscellaneous functions useful for teaching statistics as well as actually practicing the art. They typically are not new methods but rather wrappers around either base R or other packages.

Maintained by Chuck Powell. Last updated 4 years ago.

1.3 match 27 stars 7.28 score 122 scripts

sfcheung

modelbpp:Model BIC Posterior Probability

Fits the neighboring models of a fitted structural equation model and assesses the model uncertainty of the fitted model based on BIC posterior probabilities, using the method presented in Wu, Cheung, and Leung (2020) <doi:10.1080/00273171.2019.1574546>.

Maintained by Shu Fai Cheung. Last updated 7 months ago.

lavaan model-comparison model-comparison-and-selection model-selection structural-equation-modeling

2.0 match 4.54 score 2 scripts

bioc

lemur:Latent Embedding Multivariate Regression

Fit a latent embedding multivariate regression (LEMUR) model to multi-condition single-cell data. The model provides a parametric description of single-cell data measured with treatment vs. control or more complex experimental designs. The parametric model is used to (1) align conditions, (2) predict log fold changes between conditions for all cells, and (3) identify cell neighborhoods with consistent log fold changes. For those neighborhoods, a pseudobulked differential expression test is conducted to assess which genes are significantly changed.

Maintained by Constantin Ahlmann-Eltze. Last updated 5 months ago.

transcriptomics differentialexpression singlecell dimensionreduction regression openblas cpp

1.1 match 87 stars 7.69 score 81 scripts

kevinhzq

healthdb:Working with Healthcare Databases

A system for identifying diseases or events from healthcare databases and preparing data for epidemiological studies. It includes capabilities not supported by 'SQL', such as matching strings by 'stringr' style regular expressions, and can compute comorbidity scores (Quan et al. (2005) <doi:10.1097/01.mlr.0000182534.19832.83>) directly on a database server. The implementation is based on 'dbplyr' with full 'tidyverse' compatibility.

Maintained by Kevin Hu. Last updated 1 months ago.

1.8 match 2 stars 4.95 score

pboutros

ISOpureR:Deconvolution of Tumour Profiles

Deconvolution of mixed tumour profiles into normal and cancer for each patient, using the ISOpure algorithm in Quon et al. Genome Medicine, 2013 5:29. Deconvolution requires mixed tumour profiles and a set of unmatched "basis" normal profiles.

Maintained by Paul C Boutros. Last updated 6 years ago.

cpp

2.3 match 3 stars 3.61 score 34 scripts

wangyuncw

Trendtwosub:Two Sample Order Free Trend Nonparametric Inference

The package contains functions for non-parametric trend comparison of two independent samples with sequential subsamples.

Maintained by Yishi Wang Developer. Last updated 5 years ago.

4.0 match 2.00 score

opisthokonta

implied:Convert Between Bookmaker Odds and Probabilities

Convert between bookmaker odds and probabilities. Eight different algorithms are available, including basic normalization, Shin's method (Hyun Song Shin, (1992) <doi:10.2307/2234526>), and others.

Maintained by Jonas Christoffer Lindstrøm. Last updated 2 years ago.

1.2 match 9 stars 6.85 score 9 scripts

daniel-jg

ontologyPlot:Visualising Sets of Ontological Terms

Create R plots visualising ontological terms and the relationships between them with various graphical options - Greene et al. 2017 <doi:10.1093/bioinformatics/btw763>.

Maintained by Daniel Greene. Last updated 1 years ago.

1.8 match 4.48 score 50 scripts 5 dependents

hubverse-org

hubValidations:Testing framework for hubverse hub validations

This package aims at providing a simple interface to run validations on data and metadata submitted to a hubverse modeling hub. Validation tests can be run at different levels (single file, single folder, whole repository) and locally as well as part of a continuous integration workflow.

Maintained by Anna Krystalli. Last updated 20 days ago.

hubverse

1.7 match 1 stars 4.67 score 27 scripts 1 dependents

tidymodels

hardhat:Construct Modeling Packages

Building modeling packages is hard. A large amount of effort generally goes into providing an implementation for a new method that is efficient, fast, and correct, but often less emphasis is put on the user interface. A good interface requires specialized knowledge about S3 methods and formulas, which the average package developer might not have. The goal of 'hardhat' is to reduce the burden around building new modeling packages by providing functionality for preprocessing, predicting, and validating input.

Maintained by Hannah Frick. Last updated 2 months ago.

0.5 match 104 stars 14.86 score 175 scripts 437 dependents

iancero

checkthat:Intuitive Unit Testing Tools for Data Manipulation

Provides a lightweight data validation and testing toolkit for R. Its guiding philosophy is that adding code-based data checks to users' existing workflow should be both quick and intuitive. The suite of functions included therefore mirror the common data checks many users already perform by hand or by eye. Additionally, the 'checkthat' package is optimized to work within 'tidyverse' data manipulation pipelines.

Maintained by Ian Cero. Last updated 2 years ago.

data-validation unittesting

1.8 match 3 stars 4.18 score 4 scripts

bioc

yarn:YARN: Robust Multi-Condition RNA-Seq Preprocessing and Normalization

Expedite large RNA-Seq analyses using a combination of previously developed tools. YARN is meant to make it easier for the user in performing basic mis-annotation quality control, filtering, and condition-aware normalization. YARN leverages many Bioconductor tools and statistical techniques to account for the large heterogeneity and sparsity found in very large RNA-seq experiments.

Maintained by Joseph N Paulson. Last updated 5 months ago.

software qualitycontrol geneexpression sequencing preprocessing normalization annotation visualization clustering

1.7 match 4.49 score 31 scripts

glgrabow

spreval:Evaluation of Sprinkler Irrigation Uniformity and Efficiency

Processing and analysis of field collected or simulated sprinkler system catch data (depths) to characterize irrigation uniformity and efficiency using standard and other measures. Standard measures include the Christiansen coefficient of uniformity (CU) as found in Christiansen, J.E.(1942, ISBN:0138779295, "Irrigation by Sprinkling"); and distribution uniformity (DU), potential efficiency of the low quarter (PELQ), and application efficiency of the low quarter (AELQ) that are implementations of measures of the same notation in Keller, J. and Merriam, J.L. (1978) "Farm Irrigation System Evaluation: A Guide for Management" <https://pdf.usaid.gov/pdf_docs/PNAAG745.pdf>. spreval::DU.lh is similar to spreval::DU but is the distribution uniformity of the low half instead of low quarter as in DU. spreval::PELQT is a version of spreval::PELQ adapted for traveling systems instead of lateral move or solid-set sprinkler systems. The function spreval::eff is analogous to the method used to compute application efficiency for furrow irrigation presented in Walker, W. and Skogerboe, G.V. (1987,ISBN:0138779295, "Surface Irrigation: Theory and Practice"),that uses piecewise integration of infiltrated depth compared against soil-moisture deficit (SMD), when the argument "target" is set equal to SMD. The other functions contained in the package provide graphical representation of sprinkler system uniformity, and other standard univariate parametric and non-parametric statistical measures as applied to sprinkler system catch depths. A sample data set of field test data spreval::catchcan (catch depths) is provided and is used in examples and vignettes. Agricultural systems emphasized, but this package can be used for landscape irrigation evaluation, and a landscape (turf) vignette is included as an example application.

Maintained by Garry Grabow. Last updated 3 years ago.

1.7 match 4.30 score 9 scripts

r-forge

numDeriv:Accurate Numerical Derivatives

Methods for calculating (usually) accurate numerical first and second order derivatives. Accurate calculations are done using 'Richardson''s' extrapolation or, when applicable, a complex step derivative is available. A simple difference method is also provided. Simple difference is (usually) less accurate but is much quicker than 'Richardson''s' extrapolation and provides a useful cross-check. Methods are provided for real scalar and vector valued functions.

Maintained by Paul Gilbert. Last updated 3 months ago.

0.5 match 1 stars 14.10 score 1.2k scripts 3.1k dependents

hugogogo

natural:Estimating the Error Variance in a High-Dimensional Linear Model

Implementation of the two error variance estimation methods in high-dimensional linear models of Yu, Bien (2017) <arXiv:1712.02412>.

Maintained by Guo Yu. Last updated 7 years ago.

1.6 match 1 stars 4.48 score 9 scripts

bioc

qckitfastq:FASTQ Quality Control

Assessment of FASTQ file format with multiple metrics including quality score, sequence content, overrepresented sequence and Kmers.

Maintained by August Guang. Last updated 5 months ago.

software qualitycontrol sequencing zlib cpp

1.6 match 4.38 score 24 scripts

rstudio

pool:Object Pooling

Enables the creation of object pools, which make it less computationally expensive to fetch a new object. Currently the only supported pooled objects are 'DBI' connections.

Maintained by Hadley Wickham. Last updated 6 months ago.

0.5 match 255 stars 12.85 score 684 scripts 27 dependents

bioc

dada2:Accurate, high-resolution sample inference from amplicon sequencing data

The dada2 package infers exact amplicon sequence variants (ASVs) from high-throughput amplicon sequencing data, replacing the coarser and less accurate OTU clustering approach. The dada2 pipeline takes as input demultiplexed fastq files, and outputs the sequence variants and their sample-wise abundances after removing substitution and chimera errors. Taxonomic classification is available via a native implementation of the RDP naive Bayesian classifier, and species-level assignment to 16S rRNA gene fragments by exact matching.

Maintained by Benjamin Callahan. Last updated 5 months ago.

immunooncology microbiome sequencing classification metagenomics amplicon bioconductor bioinformatics metabarcoding taxonomy cpp

0.5 match 487 stars 13.17 score 3.0k scripts 4 dependents

ngthomas

microhaplot:Microhaplotype Constructor and Visualizer

A downstream bioinformatics tool to construct and assist curation of microhaplotypes from short read sequences.

Maintained by Thomas Ng. Last updated 4 years ago.

amplicon-sequencing microhaplot-shiny shiny vcf

1.2 match 18 stars 5.73 score 10 scripts

poissonconsulting

subfoldr2:Save and Load R Objects

Facilitates saving and loading R objects, data frames, tables, plots, text blocks and numbers to subfolders.

Maintained by Joe Thorley. Last updated 30 days ago.

1.8 match 2 stars 3.70 score 5 scripts

bioc

plasmut:Stratifying mutations observed in cell-free DNA and white blood cells as germline, hematopoietic, or somatic

A Bayesian method for quantifying the liklihood that a given plasma mutation arises from clonal hematopoesis or the underlying tumor. It requires sequencing data of the mutation in plasma and white blood cells with the number of distinct and mutant reads in both tissues. We implement a Monte Carlo importance sampling method to assess the likelihood that a mutation arises from the tumor relative to non-tumor origin.

Maintained by Adith Arun. Last updated 5 months ago.

bayesian somaticmutation germlinemutation sequencing

1.6 match 4.00 score 2 scripts

derek-corcoran-barrios

NetworkExtinction:Extinction Simulation in Ecological Networks

Simulates the extinction of species in ecological networks and it analyzes its cascading effects, described in Dunne et al. (2002) <doi:10.1073/pnas.192407699>.

Maintained by Derek Corcoran. Last updated 5 months ago.

1.2 match 5 stars 5.15 score 19 scripts

green-striped-gecko

dartR.base:Analysing 'SNP' and 'Silicodart' Data - Basic Functions

Facilitates the import and analysis of 'SNP' (single nucleotide 'polymorphism') and 'silicodart' (presence/absence) data. The main focus is on data generated by 'DarT' (Diversity Arrays Technology), however, data from other sequencing platforms can be used once 'SNP' or related fragment presence/absence data from any source is imported. Genetic datasets are stored in a derived 'genlight' format (package 'adegenet'), that allows for a very compact storage of data and metadata. Functions are available for importing and exporting of 'SNP' and 'silicodart' data, for reporting on and filtering on various criteria (e.g. 'callrate', 'heterozygosity', 'reproducibility', maximum allele frequency). Additional functions are available for visualization (e.g. Principle Coordinate Analysis) and creating a spatial representation using maps. 'dartR.base' is the 'base' package of the 'dartRverse' suits of packages. To install the other packages, we recommend to install the 'dartRverse' package, that supports the installation of all packages in the 'dartRverse'. If you want to cite 'dartR', you find the information by typing citation('dartR.base') in the console.

Maintained by Bernd Gruber. Last updated 29 days ago.

1.6 match 3.84 score 17 scripts 5 dependents

strevisani

SurfRough:Calculate Surface/Image Texture Indexes

Methods for the computation of surface/image texture indices using a geostatistical based approach (Trevisani et al. (2023) <doi:10.1016/j.geomorph.2023.108838>). It provides various functions for the computation of surface texture indices (e.g., omnidirectional roughness and roughness anisotropy), including the ones based on the robust MAD estimator. The kernels included in the software permit also to calculate the surface/image texture indices directly from the input surface (i.e., without de-trending) using increments of order 2. It also provides the new radial roughness index (RRI), representing the improvement of the popular topographic roughness index (TRI). The framework can be easily extended with ad-hoc surface/image texture indices.

Maintained by Sebastiano Trevisani. Last updated 18 days ago.

cpp

1.7 match 1 stars 3.65 score

bioc

GeneGA:Design gene based on both mRNA secondary structure and codon usage bias using Genetic algorithm

R based Genetic algorithm for gene expression optimization by considering both mRNA secondary structure and codon usage bias, GeneGA includes the information of highly expressed genes of almost 200 genomes. Meanwhile, Vienna RNA Package is needed to ensure GeneGA to function properly.

Maintained by Zhenpeng Li. Last updated 5 months ago.

geneexpression

2.5 match 2.30 score 6 scripts

bioc

gdsfmt:R Interface to CoreArray Genomic Data Structure (GDS) Files

Provides a high-level R interface to CoreArray Genomic Data Structure (GDS) data files. GDS is portable across platforms with hierarchical structure to store multiple scalable array-oriented data sets with metadata information. It is suited for large-scale datasets, especially for data which are much larger than the available random-access memory. The gdsfmt package offers the efficient operations specifically designed for integers of less than 8 bits, since a diploid genotype, like single-nucleotide polymorphism (SNP), usually occupies fewer bits than a byte. Data compression and decompression are available with relatively efficient random access. It is also allowed to read a GDS file in parallel with multiple R processes supported by the package parallel.

Maintained by Xiuwen Zheng. Last updated 18 days ago.

infrastructure dataimport bioinformatics gds-format genomics cpp

0.5 match 18 stars 11.34 score 920 scripts 29 dependents

andymckenzie

bayesbio:Miscellaneous Functions for Bioinformatics and Bayesian Statistics

A hodgepodge of hopefully helpful functions. Two of these perform shrinkage estimation: one using a simple weighted method where the user can specify the degree of shrinkage required, and one using James-Stein shrinkage estimation for the case of unequal variances.

Maintained by Andrew McKenzie. Last updated 6 years ago.

1.8 match 1 stars 3.18 score 30 scripts

wrathematics

float:32-Bit Floats

R comes with a suite of utilities for linear algebra with "numeric" (double precision) vectors/matrices. However, sometimes single precision (or less!) is more than enough for a particular task. This package extends R's linear algebra facilities to include 32-bit float (single precision) data. Float vectors/matrices have half the precision of their "numeric"-type counterparts but are generally faster to numerically operate on, for a performance vs accuracy trade-off. The internal representation is an S4 class, which allows us to keep the syntax identical to that of base R's. Interaction between floats and base types for binary operators is generally possible; in these cases, type promotion always defaults to the higher precision. The package ships with copies of the single precision 'BLAS' and 'LAPACK', which are automatically built in the event they are not available on the system.

Maintained by Drew Schmidt. Last updated 23 days ago.

float-matrix hpc linear-algebra matrix fortran openblas openmp

0.5 match 46 stars 10.53 score 228 scripts 42 dependents

jimbrig

jimstools:Tools for R

What the package does (one paragraph).

Maintained by Jimmy Briggs. Last updated 3 years ago.

functions personal utility

1.8 match 2 stars 3.00 score 2 scripts

cran

PracTools:Designing and Weighting Survey Samples

Functions and datasets to support Valliant, Dever, and Kreuter (2018), <doi:10.1007/978-3-319-93632-1>, "Practical Tools for Designing and Weighting Survey Samples". Contains functions for sample size calculation for survey samples using stratified or clustered one-, two-, and three-stage sample designs, and single-stage audit sample designs. Functions are included that will group geographic units accounting for distances apart and measures of size. Other functions compute variance components for multistage designs and sample sizes in two-phase designs. A number of example data sets are included.

Maintained by Richard Valliant. Last updated 9 months ago.

1.7 match 1 stars 3.18 score 1 dependents

bioc

UCell:Rank-based signature enrichment analysis for single-cell data

UCell is a package for evaluating gene signatures in single-cell datasets. UCell signature scores, based on the Mann-Whitney U statistic, are robust to dataset size and heterogeneity, and their calculation demands less computing time and memory than other available methods, enabling the processing of large datasets in a few minutes even on machines with limited computing power. UCell can be applied to any single-cell data matrix, and includes functions to directly interact with SingleCellExperiment and Seurat objects.

Maintained by Massimo Andreatta. Last updated 5 months ago.

singlecell genesetenrichment transcriptomics geneexpression cellbasedassays

0.5 match 145 stars 10.21 score 454 scripts 2 dependents

msberends

plot2:A Plotting Assistant for Fast 'ggplot2' Visualisations

A streamlined extension of 'ggplot2' designed to simplify and accelerate the creation of data visualisations. 'plot2' automates common tasks such as axis handling, plot type selection, and data transformation, allowing users to create complex, publication-ready plots with minimal code. It integrates seamlessly with the tidyverse and retains full compatibility with 'ggplot2', while offering additional conveniences like enhanced sorting, faceting, and custom theming.

Maintained by Matthijs S. Berends. Last updated 24 days ago.

ggplot2 helper plotting tidyverse

1.2 match 1 stars 4.32 score 8 scripts 1 dependents

ropensci

git2rdata:Store and Retrieve Data.frames in a Git Repository

The git2rdata package is an R package for writing and reading dataframes as plain text files. A metadata file stores important information. 1) Storing metadata allows to maintain the classes of variables. By default, git2rdata optimizes the data for file storage. The optimization is most effective on data containing factors. The optimization makes the data less human readable. The user can turn this off when they prefer a human readable format over smaller files. Details on the implementation are available in vignette("plain_text", package = "git2rdata"). 2) Storing metadata also allows smaller row based diffs between two consecutive commits. This is a useful feature when storing data as plain text files under version control. Details on this part of the implementation are available in vignette("version_control", package = "git2rdata"). Although we envisioned git2rdata with a git workflow in mind, you can use it in combination with other version control systems like subversion or mercurial. 3) git2rdata is a useful tool in a reproducible and traceable workflow. vignette("workflow", package = "git2rdata") gives a toy example. 4) vignette("efficiency", package = "git2rdata") provides some insight into the efficiency of file storage, git repository size and speed for writing and reading.

Maintained by Thierry Onkelinx. Last updated 2 months ago.

reproducible-research version-control

0.5 match 99 stars 10.03 score 216 scripts 4 dependents

dangerzig

tipitaka:Data and Tools for Analyzing the Pali Canon

Provides access to the complete Pali Canon, or Tipitaka, the canonical scripture for Theravadin Buddhists worldwide. Based on the Chattha Sangayana Tipitaka version 4 (Vipassana Research Institute, 1990).

Maintained by Dan Zigmond. Last updated 3 years ago.

cpp

1.8 match 1 stars 2.70 score 1 scripts

bioc

DepInfeR:Inferring tumor-specific cancer dependencies through integrating ex-vivo drug response assays and drug-protein profiling

DepInfeR integrates two experimentally accessible input data matrices: the drug sensitivity profiles of cancer cell lines or primary tumors ex-vivo (X), and the drug affinities of a set of proteins (Y), to infer a matrix of molecular protein dependencies of the cancers (ß). DepInfeR deconvolutes the protein inhibition effect on the viability phenotype by using regularized multivariate linear regression. It assigns a “dependence coefficient” to each protein and each sample, and therefore could be used to gain a causal and accurate understanding of functional consequences of genomic aberrations in a heterogeneous disease, as well as to guide the choice of pharmacological intervention for a specific cancer type, sub-type, or an individual patient. For more information, please read out preprint on bioRxiv: https://doi.org/10.1101/2022.01.11.475864.

Maintained by Junyan Lu. Last updated 5 months ago.

software regression pharmacogenetics pharmacogenomics functionalgenomics

1.1 match 1 stars 4.36 score 23 scripts

cdruee

readmet:Read some less Popular Formats Used in Meteorology

Contains tools for reading and writing data from or to files in the formats: akterm, dmna, Scintec Format-1, and Campbell Scientific TOA5.

Maintained by Clemens Druee. Last updated 1 years ago.

4.7 match 1.00 score

ledell

cvAUC:Cross-Validated Area Under the ROC Curve Confidence Intervals

Tools for working with and evaluating cross-validated area under the ROC curve (AUC) estimators. The primary functions of the package are ci.cvAUC and ci.pooled.cvAUC, which report cross-validated AUC and compute confidence intervals for cross-validated AUC estimates based on influence curves for i.i.d. and pooled repeated measures data, respectively. One benefit to using influence curve based confidence intervals is that they require much less computation time than bootstrapping methods. The utility functions, AUC and cvAUC, are simple wrappers for functions from the ROCR package.

Maintained by Erin LeDell. Last updated 3 years ago.

auc confidence-intervals cross-validation machine-learning statistics variance

0.5 match 23 stars 9.17 score 317 scripts 40 dependents

bernice0321

UnivRNG:Univariate Pseudo-Random Number Generation

Pseudo-random number generation of 17 univariate distributions proposed by Demirtas. (2005) <DOI:10.22237/jmasm/1114907220>.

Maintained by Ran Gao. Last updated 4 years ago.

3.6 match 1.26 score 18 scripts

cran

drcarlate:Improving Estimation Efficiency in CAR with Imperfect Compliance

We provide a list of functions for replicating the results of the Monte Carlo simulations and empirical application of Jiang et al. (2022). In particular, we provide corresponding functions for generating the three types of random data described in this paper, as well as all the estimation strategies. Detailed information about the data generation process and estimation strategy can be found in Jiang et al. (2022) <doi:10.48550/arXiv.2201.13004>.

Maintained by Mingxin Zhang. Last updated 2 years ago.

1.6 match 2.70 score

chris31415926535

pseudohouseholds:Generate Pseudohouseholds on Road Networks in Regions

Given an arbitrary set of spatial regions and road networks, generate a set of representative points, or pseudohouseholds, that can be used for travel burden analysis. Parallel processing is supported.

Maintained by Christopher Belanger. Last updated 1 years ago.

geospatial

1.2 match 3.70 score 9 scripts

jmoonen

compositeReliabilityInNestedDesigns:Optimizing the Composite Reliability in Multivariate Nested Designs

The reliability of assessment tools is a crucial aspect of monitoring student performance in various educational settings. It ensures that the assessment outcomes accurately reflect a student's true level of performance. However, when assessments are combined, determining composite reliability can be challenging, especially for naturalistic and unbalanced datasets in nested design as is often the case for Workplace-Based Assessments. This package is designed to estimate composite reliability in nested designs using multivariate generalizability theory and enhance the analysis of assessment data. The package allows for the inclusion of weight per assessment type and produces extensive G- and D-study results with graphical interpretations, and options to find the set of weights that maximizes the composite reliability or minimizes the standard error of measurement (SEM).

Maintained by Joyce Moonen - van Loon. Last updated 6 months ago.

1.6 match 2.70 score

bristol-vaccine-centre

avoncap:AvonCap Study Analysis

A WIP set of functions allowing data load, wrangling of the AvonCap data set.

Maintained by Rob Challen. Last updated 4 months ago.

1.8 match 2.34 score 11 scripts

bioc

mzID:An mzIdentML parser for R

A parser for mzIdentML files implemented using the XML package. The parser tries to be general and able to handle all types of mzIdentML files with the drawback of having less 'pretty' output than a vendor specific parser. Please contact the maintainer with any problems and supply an mzIdentML file so the problems can be fixed quickly.

Maintained by Laurent Gatto. Last updated 5 months ago.

immunooncology dataimport massspectrometry proteomics

0.5 match 7.83 score 32 scripts 38 dependents

jasdumas

shinyLP:Bootstrap Landing Home Pages for Shiny Applications

Provides functions that wrap HTML Bootstrap components code to enable the design and layout of informative landing home pages for Shiny applications. This can lead to a better user experience for the users and writing less HTML for the developer.

Maintained by Jasmine Daly. Last updated 29 days ago.

bootstrap r-shiny shiny ui-design

0.5 match 115 stars 7.29 score 85 scripts 2 dependents

hongyuanjia

eplusr:A Toolkit for Using Whole Building Simulation Program 'EnergyPlus'

A rich toolkit of using the whole building simulation program 'EnergyPlus'(<https://energyplus.net>), which enables programmatic navigation, modification of 'EnergyPlus' models and makes it less painful to do parametric simulations and analysis.

Maintained by Hongyuan Jia. Last updated 8 months ago.

energy-simulation energyplus energyplus-models eplus epw idd idf parametric-simulation r6 simulation

0.5 match 71 stars 7.19 score 91 scripts 4 dependents

bioc

ATACseqQC:ATAC-seq Quality Control

ATAC-seq, an assay for Transposase-Accessible Chromatin using sequencing, is a rapid and sensitive method for chromatin accessibility analysis. It was developed as an alternative method to MNase-seq, FAIRE-seq and DNAse-seq. Comparing to the other methods, ATAC-seq requires less amount of the biological samples and time to process. In the process of analyzing several ATAC-seq dataset produced in our labs, we learned some of the unique aspects of the quality assessment for ATAC-seq data.To help users to quickly assess whether their ATAC-seq experiment is successful, we developed ATACseqQC package partially following the guideline published in Nature Method 2013 (Greenleaf et al.), including diagnostic plot of fragment size distribution, proportion of mitochondria reads, nucleosome positioning pattern, and CTCF or other Transcript Factor footprints.

Maintained by Jianhong Ou. Last updated 3 months ago.

sequencing dnaseq atacseq generegulation qualitycontrol coverage nucleosomepositioning immunooncology

0.5 match 7.12 score 146 scripts 1 dependents

thinkr-open

thinkr:Tools for Cleaning Up Messy Files

Some tools for cleaning up messy 'Excel' files to be suitable for R. People who have been working with 'Excel' for years built more or less complicated sheets with names, characters, formats that are not homogeneous. To be able to use them in R nowadays, we built a set of functions that will avoid the majority of importation problems and keep all the data at best.

Maintained by Vincent Guyader. Last updated 3 years ago.

hacktoberfest thinkr-not-maintained

0.5 match 29 stars 6.96 score 45 scripts

jwiley

extraoperators:Extra Binary Relational and Logical Operators

Speed up common tasks, particularly logical or relational comparisons and routine follow up tasks such as finding the indices and subsetting. Inspired by mathematics, where something like: 3 < x < 6 is a standard, elegant and clear way to assert that x is both greater than 3 and less than 6 (see for example <https://en.wikipedia.org/wiki/Relational_operator>), a chaining operator is implemented. The chaining operator, %c%, allows multiple relational operations to be used in quotes on the right hand side for the same object, on the left hand side. The %e% operator allows something like set-builder notation (see for example <https://en.wikipedia.org/wiki/Set-builder_notation>) to be used on the right hand side. All operators have built in prefixes defined for all, subset, and which to reduce the amount of code needed for common tasks, such as return those values that are true.

Maintained by Joshua F. Wiley. Last updated 1 years ago.

0.5 match 3 stars 7.06 score 239 scripts 7 dependents

metinbulus

pwrss:Statistical Power and Sample Size Calculation Tools

Statistical power and minimum required sample size calculations for (1) testing a proportion (one-sample) against a constant, (2) testing a mean (one-sample) against a constant, (3) testing difference between two proportions (independent samples), (4) testing difference between two means or groups (parametric and non-parametric tests for independent and paired samples), (5) testing a correlation (one-sample) against a constant, (6) testing difference between two correlations (independent samples), (7) testing a single coefficient in multiple linear regression, logistic regression, and Poisson regression (with standardized or unstandardized coefficients, with no covariates or covariate adjusted), (8) testing an indirect effect (with standardized or unstandardized coefficients, with no covariates or covariate adjusted) in the mediation analysis (Sobel, Joint, and Monte Carlo tests), (9) testing an R-squared against zero in linear regression, (10) testing an R-squared difference against zero in hierarchical regression, (11) testing an eta-squared or f-squared (for main and interaction effects) against zero in analysis of variance (could be one-way, two-way, and three-way), (12) testing an eta-squared or f-squared (for main and interaction effects) against zero in analysis of covariance (could be one-way, two-way, and three-way), (13) testing an eta-squared or f-squared (for between, within, and interaction effects) against zero in one-way repeated measures analysis of variance (with non-sphericity correction and repeated measures correlation), and (14) testing goodness-of-fit or independence for contingency tables. Alternative hypothesis can be formulated as "not equal", "less", "greater", "non-inferior", "superior", or "equivalent" in (1), (2), (3), and (4); as "not equal", "less", or "greater" in (5), (6), (7) and (8); but always as "greater" in (9), (10), (11), (12), (13), and (14). Reference: Bulus and Polat (2023) <https://osf.io/ua5fc>.

Maintained by Metin Bulus. Last updated 4 months ago.

0.8 match 1 stars 4.67 score 57 scripts

tom-wolff

ideanet:Integrating Data Exchange and Analysis for Networks ('ideanet')

A suite of convenient tools for social network analysis geared toward students, entry-level users, and non-expert practitioners. ‘ideanet’ features unique functions for the processing and measurement of sociocentric and egocentric network data. These functions automatically generate node- and system-level measures commonly used in the analysis of these types of networks. Outputs from these functions maximize the ability of novice users to employ network measurements in further analyses while making all users less prone to common data analytic errors. Additionally, ‘ideanet’ features an R Shiny graphic user interface that allows novices to explore network data with minimal need for coding.

Maintained by Tom Wolff. Last updated 19 days ago.

0.5 match 6 stars 6.80 score 10 scripts

kozodoi

fairness:Algorithmic Fairness Metrics

Offers calculation, visualization and comparison of algorithmic fairness metrics. Fair machine learning is an emerging topic with the overarching aim to critically assess whether ML algorithms reinforce existing social biases. Unfair algorithms can propagate such biases and produce predictions with a disparate impact on various sensitive groups of individuals (defined by sex, gender, ethnicity, religion, income, socioeconomic status, physical or mental disabilities). Fair algorithms possess the underlying foundation that these groups should be treated similarly or have similar prediction outcomes. The fairness R package offers the calculation and comparisons of commonly and less commonly used fairness metrics in population subgroups. These methods are described by Calders and Verwer (2010) <doi:10.1007/s10618-010-0190-x>, Chouldechova (2017) <doi:10.1089/big.2016.0047>, Feldman et al. (2015) <doi:10.1145/2783258.2783311> , Friedler et al. (2018) <doi:10.1145/3287560.3287589> and Zafar et al. (2017) <doi:10.1145/3038912.3052660>. The package also offers convenient visualizations to help understand fairness metrics.

Maintained by Nikita Kozodoi. Last updated 2 years ago.

algorithmic-discrimination algorithmic-fairness discrimination disparate-impact fairness fairness-ai fairness-ml machine-learning

0.5 match 32 stars 6.82 score 69 scripts 1 dependents

m-jahn

WeightedTreemaps:Generate and Plot Voronoi or Sunburst Treemaps from Hierarchical Data

Treemaps are a visually appealing graphical representation of numerical data using a space-filling approach. A plane or 'map' is subdivided into smaller areas called cells. The cells in the map are scaled according to an underlying metric which allows to grasp the hierarchical organization and relative importance of many objects at once. This package contains two different implementations of treemaps, Voronoi treemaps and Sunburst treemaps. The Voronoi treemap function subdivides the plot area in polygonal cells according to the highest hierarchical level, then continues to subdivide those parental cells on the next lower hierarchical level, and so on. The Sunburst treemap is a computationally less demanding treemap that does not require iterative refinement, but simply generates circle sectors that are sized according to predefined weights. The Voronoi tesselation is based on functions from Paul Murrell (2012) <https://www.stat.auckland.ac.nz/~paul/Reports/VoronoiTreemap/voronoiTreeMap.html>.

Maintained by Michael Jahn. Last updated 4 months ago.

r-programming rcpp sunburst-treemap voronoi-diagram voronoi-treemap cpp

0.5 match 50 stars 6.73 score 18 scripts

clarkevansteenderen

BinMat:Processes Binary Data Obtained from Fragment Analysis (Such as AFLPs, ISSRs, and RFLPs)

A molecular genetics tool that processes binary data from fragment analysis. It consolidates replicate sample pairs, outputs summary statistics, and produces hierarchical clustering trees and nMDS plots. This package was developed from the publication available here: <https://www.sciencedirect.com/science/article/pii/S1049964420306538>. The GUI version of this package is available on the R Shiny online server at: <https://clarkevansteenderen.shinyapps.io/BINMAT/> or it is accessible via GitHub by typing: shiny::runGitHub("BinMat", "clarkevansteenderen") into the console in R. Two real-world datasets accompany the package: an AFLP dataset of Bunias orientalis samples from Tewes et. al. (2017) <https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/1365-2745.12869>, and an ISSR dataset of Nymphaea specimens from Reid et. al. (2021) <https://www.sciencedirect.com/science/article/pii/S0304377021000218> . The authors of these publications are thanked for allowing the use of their data.

Maintained by Clarke van Steenderen. Last updated 3 years ago.

1.7 match 2.00 score 3 scripts

ropensci

landscapetools:Landscape Utility Toolbox

Provides utility functions for some of the less-glamorous tasks involved in landscape analysis. It includes functions to coerce raster data to the common tibble format and vice versa, it helps with flexible reclassification tasks of raster data and it provides a function to merge multiple raster. Furthermore, 'landscapetools' helps landscape scientists to visualize their data by providing optional themes and utility functions to plot single landscapes, rasterstacks, -bricks and lists of raster.

Maintained by Marco Sciaini. Last updated 2 years ago.

landscape landscape-ecology raster visualization workflow

0.5 match 47 stars 6.61 score 191 scripts

abhi-1u

texor:Converting 'LaTeX' 'R Journal' Articles into 'RJ-web-articles'

Articles in the 'R Journal' were first authored in 'LaTeX', which performs admirably for 'PDF' files but is less than ideal for modern online interfaces. The 'texor' package does all the transitional chores and conversions necessary to move to the online versions.

Maintained by Abhishek Ulayil. Last updated 6 hours ago.

0.5 match 7 stars 6.36 score 8 scripts

przechoj

gips:Gaussian Model Invariant by Permutation Symmetry

Find the permutation symmetry group such that the covariance matrix of the given data is approximately invariant under it. Discovering such a permutation decreases the number of observations needed to fit a Gaussian model, which is of great use when it is smaller than the number of variables. Even if that is not the case, the covariance matrix found with 'gips' approximates the actual covariance with less statistical error. The methods implemented in this package are described in Graczyk et al. (2022) <doi:10.1214/22-AOS2174>. Documentation about 'gips' is provided via its website at <https://przechoj.github.io/gips/> and the paper by Chojecki, Morgen, Kołodziejek (2025, <doi:10.18637/jss.v112.i07>).

Maintained by Adam Przemysław Chojecki. Last updated 15 days ago.

covariance-estimation machine-learning normal-distribution

0.5 match 6 stars 6.52 score 31 scripts

tconwell

textTools:Functions for Text Cleansing and Text Analysis

A framework for text cleansing and analysis. Conveniently prepare and process large amounts of text for analysis. Includes various metrics for word counts/frequencies that scale efficiently. Quickly analyze large amounts of text data using a text.table (a data.table created with one word (or unit of text analysis) per row, similar to the tidytext format). Offers flexibility to efficiently work with text data stored in vectors as well as text data formatted as a text.table.

Maintained by Timothy Conwell. Last updated 4 years ago.

3.3 match 1.00 score 4 scripts

green-striped-gecko

dartR.popgen:Analysing 'SNP' and 'Silicodart' Data Generated by Genome-Wide Restriction Fragment Analysis

Facilitates the analysis of SNP (single nucleotide polymorphism) and silicodart (presence/absence) data. 'dartR.popgen' provides a suit of functions to analyse such data in a population genetics context. It provides several functions to calculate population genetic metrics and to study population structure. Quite a few functions need additional software to be able to run (gl.run.structure(), gl.blast(), gl.LDNe()). You find detailed description in the help pages how to download and link the packages so the function can run the software. 'dartR.popgen' is part of the the 'dartRverse' suit of packages. Gruber et al. (2018) <doi:10.1111/1755-0998.12745>. Mijangos et al. (2022) <doi:10.1111/2041-210X.13918>.

Maintained by Bernd Gruber. Last updated 9 months ago.

1.6 match 2.00 score 9 scripts

rainers48

tsapp:Time Series, Analysis and Application

Accompanies the book Rainer Schlittgen and Cristina Sattarhoff (2020) <https://www.degruyter.com/view/title/575978> "Angewandte Zeitreihenanalyse mit R, 4. Auflage" . The package contains the time series and functions used therein. It was developed over many years teaching courses about time series analysis.

Maintained by Rainer Schlittgen. Last updated 3 years ago.

3.2 match 1.00 score 1 scripts

myles-lewis

glmmSeq:General Linear Mixed Models for Gene-Level Differential Expression

Using mixed effects models to analyse longitudinal gene expression can highlight differences between sample groups over time. The most widely used differential gene expression tools are unable to fit linear mixed effect models, and are less optimal for analysing longitudinal data. This package provides negative binomial and Gaussian mixed effects models to fit gene expression and other biological data across repeated samples. This is particularly useful for investigating changes in RNA-Sequencing gene expression between groups of individuals over time, as described in: Rivellese, F., Surace, A. E., Goldmann, K., Sciacca, E., Cubuk, C., Giorli, G., ... Lewis, M. J., & Pitzalis, C. (2022) Nature medicine <doi:10.1038/s41591-022-01789-0>.

Maintained by Myles Lewis. Last updated 2 months ago.

bioinformatics differential-gene-expression gene-expression glmm mixed-models transcriptomics

0.5 match 20 stars 6.13 score 45 scripts

cran

fipp:Induced Priors in Bayesian Mixture Models

Computes implicitly induced quantities from prior/hyperparameter specifications of three Mixtures of Finite Mixtures models: Dirichlet Process Mixtures (DPMs; Escobar and West (1995) <doi:10.1080/01621459.1995.10476550>), Static Mixtures of Finite Mixtures (Static MFMs; Miller and Harrison (2018) <doi:10.1080/01621459.2016.1255636>), and Dynamic Mixtures of Finite Mixtures (Dynamic MFMs; Frühwirth-Schnatter, Malsiner-Walli and Grün (2020) <arXiv:2005.09918>). For methodological details, please refer to Greve, Grün, Malsiner-Walli and Frühwirth-Schnatter (2020) <arXiv:2012.12337>) as well as the package vignette.

Maintained by Jan Greve. Last updated 4 years ago.

cpp

1.1 match 2.70 score

mrc-ide

odin.dust:Compile Odin to Dust

Less painful than it sounds, this package compiles an odin model to use dust, our new stochastic model system. Supports only a subset of odin models (discrete time stochastic models with no interpolation and no delays).

Maintained by Rich FitzJohn. Last updated 6 months ago.

0.5 match 3 stars 5.71 score 122 scripts

cran

psyphy:Functions for Analyzing Psychophysical Data in R

An assortment of functions that could be useful in analyzing data from psychophysical experiments. It includes functions for calculating d' from several different experimental designs, links for m-alternative forced-choice (mafc) data to be used with the binomial family in glm (and possibly other contexts) and self-Start functions for estimating gamma values for CRT screen calibrations.

Maintained by Ken Knoblauch. Last updated 2 years ago.

1.7 match 1.78 score

etiennebacher

flir:Find and Fix Lints in R Code

Lints are code patterns that are not optimal because they are inefficient, forget corner cases, or less readable. 'flir' provides a small set of functions to detect those lints and automatically fix them. It builds on 'astgrepr', which itself uses the Rust crate 'ast-grep' to parse and navigate R code.

Maintained by Etienne Bacher. Last updated 1 months ago.

0.5 match 51 stars 5.73 score 1 scripts

cran

dotsViolin:Dot Plots Mimicking Violin Plots

Modifies dot plots to have different sizes of dots mimicking violin plots and identifies modes or peaks for them based on frequency and kernel density estimates (Rosenblatt, 1956) <doi:10.1214/aoms/1177728190> (Parzen, 1962) <doi:10.1214/aoms/1177704472>.

Maintained by Fernando Roa. Last updated 1 years ago.

1.7 match 1.70 score

professorbeautiful

NNTbiomarker:Calculate Design Parameters for Biomarker Validation Studies

Helps a clinical trial team discuss the clinical goals of a well-defined biomarker with a diagnostic, staging, prognostic, or predictive purpose. From this discussion will come a statistical plan for a (non-randomized) validation trial. Both prospective and retrospective trials are supported. In a specific focused discussion, investigators should determine the range of "discomfort" for the NNT, number needed to treat. The meaning of the discomfort range, [NNTlower, NNTupper], is that within this range most physicians would feel discomfort either in treating or withholding treatment. A pair of NNT values bracketing that range, NNTpos and NNTneg, become the targets of the study's design. If the trial can demonstrate that a positive biomarker test yields an NNT less than NNTlower, and that a negative biomarker test yields an NNT less than NNTlower, then the biomarker may be useful for patients. A highlight of the package is visualization of a "contra-Bayes" theorem, which produces criteria for retrospective case-controls studies.

Maintained by Roger Day. Last updated 6 years ago.

0.8 match 3.65 score 15 scripts

fredhasselman

invctr:Infix Functions For Vector Operations

Vector operations between grapes: An infix-only package! The 'invctr' functions perform common and less common operations on vectors, data frames matrices and list objects: - Extracting a value (range), or, finding the indices of a value (range). - Trimming, or padding a vector with a value of your choice. - Simple polynomial regression. - Set and membership operations. - General check & replace function for NAs, Inf and other values.

Maintained by Fred Hasselman. Last updated 1 months ago.

0.5 match 5 stars 5.30 score 40 scripts

erikseulean

nonparametric.bayes:Project Code - Nonparametric Bayes

Basic implementation of a Gibbs sampler for a Chinese Restaurant Process along with some visual aids to help understand how the sampling works. This is developed as part of a postgraduate school project for an Advanced Bayesian Nonparametric course. It is inspired by Tamara Broderick's presentation on Nonparametric Bayesian statistics given at the Simons institute.

Maintained by Erik-Cristian Seulean. Last updated 3 years ago.

1.6 match 1.70 score

baderlab

FLASHMM:Fast and Scalable Single Cell Differential Expression Analysis using Mixed-Effects Models

A fast and scalable linear mixed-effects model (LMM) estimation algorithm for analysis of single-cell differential expression. The algorithm uses summary-level statistics and requires less computer memory to fit the LMM.

Maintained by Changjiang Xu. Last updated 1 days ago.

0.5 match 2 stars 5.00 score 3 scripts

mbojan

netseg:Measures of Network Segregation and Homophily

Segregation is a network-level property such that edges between predefined groups of vertices are relatively less likely. Network homophily is a individual-level tendency to form relations with people who are similar on some attribute (e.g. gender, music taste, social status, etc.). In general homophily leads to segregation, but segregation might arise without homophily. This package implements descriptive indices measuring homophily/segregation. It is a computational companion to Bojanowski & Corten (2014) <doi:10.1016/j.socnet.2014.04.001>.

Maintained by Michal Bojanowski. Last updated 2 years ago.

homophily segregation social-networks

0.5 match 17 stars 5.08 score 14 scripts

aadler

Delaporte:Statistical Functions for the Delaporte Distribution

Provides probability mass, distribution, quantile, random-variate generation, and method-of-moments parameter-estimation functions for the Delaporte distribution with parameterization based on Vose (2008) <isbn:9780470512845>. The Delaporte is a discrete probability distribution which can be considered the convolution of a negative binomial distribution with a Poisson distribution. Alternatively, it can be considered a counting distribution with both Poisson and negative binomial components. It has been studied in actuarial science as a frequency distribution which has more variability than the Poisson, but less than the negative binomial.

Maintained by Avraham Adler. Last updated 10 months ago.

fortran openmp

0.5 match 4 stars 5.00 score 14 scripts 2 dependents

pachadotdev

kendallknight:Efficient Implementation of Kendall's Correlation Coefficient Computation

The computational complexity of the implemented algorithm for Kendall's correlation is O(n log(n)), which is faster than the base R implementation with a computational complexity of O(n^2). For small vectors (i.e., less than 100 observations), the time difference is negligible. However, for larger vectors, the speed difference can be substantial and the numerical difference is minimal. The references are Knight (1966) <doi:10.2307/2282833>, Abrevaya (1999) <doi:10.1016/S0165-1765(98)00255-9>, Christensen (2005) <doi:10.1007/BF02736122> and Emara (2024) <https://learningcpp.org/>. This implementation is described in Vargas Sepulveda (2024) <doi:10.48550/arXiv.2408.09618>.

Maintained by Mauricio Vargas Sepulveda. Last updated 1 months ago.

cpp openmp

0.5 match 3 stars 5.02 score 3 scripts

mdelacre

Routliers:Robust Outliers Detection

Detecting outliers using robust methods, i.e. the Median Absolute Deviation (MAD) for univariate outliers; Leys, Ley, Klein, Bernard, & Licata (2013) <doi:10.1016/j.jesp.2013.03.013> and the Mahalanobis-Minimum Covariance Determinant (MMCD) for multivariate outliers; Leys, C., Klein, O., Dominicy, Y. & Ley, C. (2018) <doi:10.1016/j.jesp.2017.09.011>. There is also the more known but less robust Mahalanobis distance method, only for comparison purposes.

Maintained by Marie Delacre. Last updated 4 years ago.

0.5 match 11 stars 4.86 score 66 scripts

olgaviedma

LadderFuelsR:Automated Tool for Vertical Fuel Continuity Analysis using Airborne Laser Scanning Data

Set of tools for analyzing vertical fuel continuity at the tree level using Airborne Laser Scanning data. The workflow consisted of: 1) calculating the vertical height profiles of each segmented tree; 2) identifying gaps and fuel layers; 3) estimating the distance between fuel layers; and 4) retrieving the fuel layers base height and depth. Additionally, other functions recalculate previous metrics after considering distances greater than certain threshold. Moreover, the package calculates: i) the percentage of Leaf Area Density comprised in each fuel layer, ii) remove fuel layers with Leaf Area Density (LAD) percentage less than 10, and iii) recalculate the distances among the reminder ones. On the other hand, it identifies the crown base height (CBH) based on different criteria: the fuel layer with the highest LAD percentage and the fuel layers located at the largest- and at the last-distance. When there is only one fuel layer, it also identifies the CBH performing a segmented linear regression (breaking points) on the cumulative sum of LAD as a function of height. Finally, a collection of plotting functions is developed to represent: i) the initial gaps and fuel layers; ii) the fuels base height, depths and gaps with distances greater than certain threshold and, iii) the CBH based on different criteria. The methods implemented in this package are original and have not been published elsewhere.

Maintained by Olga Viedma. Last updated 5 months ago.

ladderfuelsr

0.5 match 7 stars 4.80 score 4 scripts

christiangoueguel

ConfidenceEllipse:Computation of 2D and 3D Elliptical Joint Confidence Regions

Computing elliptical joint confidence regions at a specified confidence level. It provides the flexibility to estimate either classical or robust confidence regions, which can be visualized in 2D or 3D plots. The classical approach assumes normality and uses the mean and covariance matrix to define the confidence regions. Alternatively, the robustified version employs estimators like minimum covariance determinant (MCD) and M-estimator, making them less sensitive to outliers and departures from normality. Furthermore, the functions allow users to group the dataset based on categorical variables and estimate separate confidence regions for each group. This capability is particularly useful for exploring potential differences or similarities across subgroups within a dataset. Varmuza and Filzmoser (2009, ISBN:978-1-4200-5947-2). Johnson and Wichern (2007, ISBN:0-13-187715-1). Raymaekers and Rousseeuw (2019) <DOI:10.1080/00401706.2019.1677270>.

Maintained by Christian L. Goueguel. Last updated 11 months ago.

confidence-ellipse confidence-ellipsoid confidence-region multivariate-distribution outliers-detection robust-statistics

0.5 match 1 stars 4.70 score

mpt-network

MPTmultiverse:Multiverse Analysis of Multinomial Processing Tree Models

Statistical or cognitive modeling usually requires a number of more or less arbitrary choices creating one specific path through a 'garden of forking paths'. The multiverse approach (Steegen, Tuerlinckx, Gelman, & Vanpaemel, 2016, <doi:10.1177/1745691616658637>) offers a principled alternative in which results for all possible combinations of reasonable modeling choices are reported. MPTmultiverse performs a multiverse analysis for multinomial processing tree (MPT, Riefer & Batchelder, 1988, <doi:10.1037/0033-295X.95.3.318>) models combining maximum-likelihood/frequentist and Bayesian estimation approaches with different levels of pooling (i.e., data aggregation). For the frequentist approaches, no pooling (with and without parametric or nonparametric bootstrap) and complete pooling are implemented using MPTinR <https://cran.r-project.org/package=MPTinR>. For the Bayesian approaches, no pooling, complete pooling, and three different variants of partial pooling are implemented using TreeBUGS <https://cran.r-project.org/package=TreeBUGS>. The main function is fit_mpt() who performs the multiverse analysis in one call.

Maintained by Henrik Singmann. Last updated 2 years ago.

jags cpp

0.5 match 3 stars 4.65 score 3 scripts

bioc

LPE:Methods for analyzing microarray data using Local Pooled Error (LPE) method

This LPE library is used to do significance analysis of microarray data with small number of replicates. It uses resampling based FDR adjustment, and gives less conservative results than traditional 'BH' or 'BY' procedures. Data accepted is raw data in txt format from MAS4, MAS5 or dChip. Data can also be supplied after normalization. LPE library is primarily used for analyzing data between two conditions. To use it for paired data, see LPEP library. For using LPE in multiple conditions, use HEM library.

Maintained by Nitin Jain. Last updated 5 months ago.

microarray differentialexpression

0.5 match 4.58 score 21 scripts 1 dependents

ironholds

primes:Fast Functions for Prime Numbers

Fast functions for dealing with prime numbers, such as testing whether a number is prime and generating a sequence prime numbers. Additional functions include finding prime factors and Ruth-Aaron pairs, finding next and previous prime numbers in the series, finding or estimating the nth prime, estimating the number of primes less than or equal to an arbitrary number, computing primorials, prime k-tuples (e.g., twin primes), finding the greatest common divisor and smallest (least) common multiple, testing whether two numbers are coprime, and computing Euler's totient function. Most functions are vectorized for speed and convenience.

Maintained by Os Keyes. Last updated 1 years ago.

cpp

0.5 match 11 stars 4.50 score 47 scripts

haghish

mlim:Single and Multiple Imputation with Automated Machine Learning

Machine learning algorithms have been used for performing single missing data imputation and most recently, multiple imputations. However, this is the first attempt for using automated machine learning algorithms for performing both single and multiple imputation. Automated machine learning is a procedure for fine-tuning the model automatic, performing a random search for a model that results in less error, without overfitting the data. The main idea is to allow the model to set its own parameters for imputing each variable separately instead of setting fixed predefined parameters to impute all variables of the dataset. Using automated machine learning, the package fine-tunes an Elastic Net (default) or Gradient Boosting, Random Forest, Deep Learning, Extreme Gradient Boosting, or Stacked Ensemble machine learning model (from one or a combination of other supported algorithms) for imputing the missing observations. This procedure has been implemented for the first time by this package and is expected to outperform other packages for imputing missing data that do not fine-tune their models. The multiple imputation is implemented via bootstrapping without letting the duplicated observations to harm the cross-validation procedure, which is the way imputed variables are evaluated. Most notably, the package implements automated procedure for handling imputing imbalanced data (class rarity problem), which happens when a factor variable has a level that is far more prevalent than the other(s). This is known to result in biased predictions, hence, biased imputation of missing data. However, the autobalancing procedure ensures that instead of focusing on maximizing accuracy (classification error) in imputing factor variables, a fairer procedure and imputation method is practiced.

Maintained by E. F. Haghish. Last updated 8 months ago.

automatic-machine-learning automl classimbalance data-science elastic-net extreme-gradient-boosting gbm glm gradient-boosting gradient-boosting-machine imputation imputation-algorithm imputation-methods machine-learning missing-data multipleimputation stack-ensemble

0.5 match 31 stars 4.49 score 7 scripts

boxiang-wang

ARTtransfer:Adaptive and Robust Pipeline for Transfer Learning

Adaptive and Robust Transfer Learning (ART) is a flexible framework for transfer learning that integrates information from auxiliary data sources to improve model performance on primary tasks. It is designed to be robust against negative transfer by including the non-transfer model in the candidate pool, ensuring stable performance even when auxiliary datasets are less informative. See the paper, Wang, Wu, and Ye (2023) <doi:10.1002/sta4.582>.

Maintained by Boxiang Wang. Last updated 2 months ago.

0.5 match 4.40 score 1 scripts

bioc

zitools:Analysis of zero-inflated count data

zitools allows for zero inflated count data analysis by either using down-weighting of excess zeros or by replacing an appropriate proportion of excess zeros with NA. Through overloading frequently used statistical functions (such as mean, median, standard deviation), plotting functions (such as boxplots or heatmap) or differential abundance tests, it allows a wide range of downstream analyses for zero-inflated data in a less biased manner. This becomes applicable in the context of microbiome analyses, where the data is often overdispersed and zero-inflated, therefore making data analysis extremly challenging.

Maintained by Carlotta Meyring. Last updated 5 months ago.

software statisticalmethod microbiome

0.5 match 4.40 score 6 scripts

jimclarkatduke

mastif:Mast Inference and Forecasting

Analyzes production and dispersal of seeds dispersed from trees and recovered in seed traps. Motivated by long-term inventory plots where seed collections are used to infer seed production by each individual plant.

Maintained by James S. Clark. Last updated 1 years ago.

openblas cpp

1.1 match 2.00 score

fvafrcu

cleanr:Helps You to Code Cleaner

Check your R code for some of the most common layout flaws. Many tried to teach us how to write code less dreadful, be it implicitly as B. W. Kernighan and D. M. Ritchie (1988) <ISBN:0-13-110362-8> in 'The C Programming Language' did, be it explicitly as R.C. Martin (2008) <ISBN:0-13-235088-2> in 'Clean Code: A Handbook of Agile Software Craftsmanship' did. So we should check our code for files too long or wide, functions with too many lines, too wide lines, too many arguments or too many levels of nesting. Note: This is not a static code analyzer like pylint or the like. Checkout <https://cran.r-project.org/package=lintr> instead.

Maintained by Andreas Dominik Cullmann. Last updated 2 years ago.

0.5 match 4.22 score 33 scripts

bioc

odseq:Outlier detection in multiple sequence alignments

Performs outlier detection of sequences in a multiple sequence alignment using bootstrap of predefined distance metrics. Outlier sequences can make downstream analyses unreliable or make the alignments less accurate while they are being constructed. This package implements the OD-seq algorithm proposed by Jehl et al (doi 10.1186/s12859-015-0702-1) for aligned sequences and a variant using string kernels for unaligned sequences.

Maintained by José Jiménez. Last updated 5 months ago.

alignment multiplesequencealignment

0.5 match 4.18 score 25 scripts 1 dependents

medewitt

staninside:Facilitating the Use of Stan Within Packages

This package provides helper functions that can be used for integrating Stan code driven by the CmdStanR package. Using CmdStanR and pre-written Stan code can make package installation easy and less prone to fail because it removes the need for Rcpp and RStan packages(and their dependencies). Using CmdStanR will also afford users the opportunity to use the latest developments within CmdStan. However, building these packages requires some work and this package provides tools to assist with that,

Maintained by Michael DeWitt. Last updated 3 years ago.

stan

0.5 match 7 stars 4.02 score 2 scripts 1 dependents

chenlaboratory

hexDensity:Fast Kernel Density Estimation with Hexagonal Grid

Kernel density estimation with hexagonal grid for bivariate data. Hexagonal grid has many beneficial properties like equidistant neighbours and less edge bias, making it better for spatial analyses than the more commonly used rectangular grid. Carr, D. B. et al. (1987) <doi:10.2307/2289444>. Diggle, P. J. (2010) <doi:10.1201/9781420072884>. Hill, B. (2017) <https://blog.bruce-hill.com/meandering-triangles>. Jones, M. C. (1993) <doi:10.1007/BF00147776>.

Maintained by Quoc Hoang Nguyen. Last updated 2 months ago.

fortran cpp

0.5 match 3.93 score

jtlandis

dr4pl:Dose Response Data Analysis using the 4 Parameter Logistic (4pl) Model

Models the relationship between dose levels and responses in a pharmacological experiment using the 4 Parameter Logistic model. Traditional packages on dose-response modelling such as 'drc' and 'nplr' often draw errors due to convergence failure especially when data have outliers or non-logistic shapes. This package provides robust estimation methods that are less affected by outliers and other initialization methods that work well for data lacking logistic shapes. We provide the bounds on the parameters of the 4PL model that prevent parameter estimates from diverging or converging to zero and base their justification in a statistical principle. These methods are used as remedies to convergence failure problems. Gadagkar, S. R. and Call, G. B. (2015) <doi:10.1016/j.vascn.2014.08.006> Ritz, C. and Baty, F. and Streibig, J. C. and Gerhard, D. (2015) <doi:10.1371/journal.pone.0146021>.

Maintained by Justin T. Landis. Last updated 3 years ago.

0.5 match 3.92 score 83 scripts

drjohanlk

kollaR:Filtering, Visualization and Analysis of Eye Tracking Data

Functions for analysing eye tracking data, including event detection (I-VT, I-DT and two means clustering), visualizations and area of interest (AOI) based analyses. See separate documentation for each function. The principles underlying I-VT and I-DT filters are described in Salvucci & Goldberg (2000,\doi{10.1145/355017.355028}). Two-means clustering is described in Hessels et al. (2017, \doi{10.3758/s13428-016-0822-1}).

Maintained by Johan Lundin Kleberg. Last updated 1 months ago.

1.5 match 1.30 score

bioc

RareVariantVis:A suite for analysis of rare genomic variants in whole genome sequencing data

Second version of RareVariantVis package aims to provide comprehensive information about rare variants for your genome data. It annotates, filters and presents genomic variants (especially rare ones) in a global, per chromosome way. For discovered rare variants CRISPR guide RNAs are designed, so the user can plan further functional studies. Large structural variants, including copy number variants are also supported. Package accepts variants directly from variant caller - for example GATK or Speedseq. Output of package are lists of variants together with adequate visualization. Visualization of variants is performed in two ways - standard that outputs png figures and interactive that uses JavaScript d3 package. Interactive visualization allows to analyze trio/family data, for example in search for causative variants in rare Mendelian diseases, in point-and-click interface. The package includes homozygous region caller and allows to analyse whole human genomes in less than 30 minutes on a desktop computer. RareVariantVis disclosed novel causes of several rare monogenic disorders, including one with non-coding causative variant - keratolythic winter erythema.

Maintained by Tomasz Stokowy. Last updated 5 months ago.

genomicvariation sequencing wholegenome

0.5 match 3.90 score 1 scripts

izmirlig

pwrFDR:FDR Power

Computing Average and TPX Power under various BHFDR type sequential procedures. All of these procedures involve control of some summary of the distribution of the FDP, e.g. the proportion of discoveries which are false in a given experiment. The most widely known of these, the BH-FDR procedure, controls the FDR which is the mean of the FDP. A lesser known procedure, due to Lehmann and Romano, controls the FDX, or probability that the FDP exceeds a user provided threshold. This is less conservative than FWE control procedures but much more conservative than the BH-FDR proceudre. This package and the references supporting it introduce a new procedure for controlling the FDX which we call the BH-FDX procedure. This procedure iteratively identifies, given alpha and lower threshold delta, an alpha* less than alpha at which BH-FDR guarantees FDX control. This uses asymptotic approximation and is only slightly more conservative than the BH-FDR procedure. Likewise, we can think of the power in multiple testing experiments in terms of a summary of the distribution of the True Positive Proportion (TPP), the portion of tests truly non-null distributed that are called significant. The package will compute power, sample size or any other missing parameter required for power defined as (i) the mean of the TPP (average power) or (ii) the probability that the TPP exceeds a given value, lambda, (TPX power) via asymptotic approximation. All supplied theoretical results are also obtainable via simulation. The suggested approach is to narrow in on a design via the theoretical approaches and then make final adjustments/verify the results by simulation. The theoretical results are described in Izmirlian, G (2020) Statistics and Probability letters, "<doi:10.1016/j.spl.2020.108713>", and an applied paper describing the methodology with a simulation study is in preparation. See citation("pwrFDR").

Maintained by Grant Izmirlian. Last updated 3 months ago.

0.8 match 2.58 score 19 scripts

mitra-ep

rSEA:Simultaneous Enrichment Analysis

SEA performs simultaneous feature-set testing for (gen)omics data. It tests the unified null hypothesis and controls the family-wise error rate for all possible pathways. The unified null hypothesis is defined as: "The proportion of true features in the set is less than or equal to a threshold." Family-wise error rate control is provided through use of closed testing with Simes test. There are some practical functions to play around with the pathways of interest.

Maintained by Mitra Ebrahimpoor. Last updated 10 months ago.

0.5 match 3.70 score 10 scripts

sachsmc

testassay:A Hypothesis Testing Framework for Validating an Assay for Precision

A common way of validating a biological assay for is through a procedure, where m levels of an analyte are measured with n replicates at each level, and if all m estimates of the coefficient of variation (CV) are less than some prespecified level, then the assay is declared validated for precision within the range of the m analyte levels. Two limitations of this procedure are: there is no clear statistical statement of precision upon passing, and it is unclear how to modify the procedure for assays with constant standard deviation. We provide tools to convert such a procedure into a set of m hypothesis tests. This reframing motivates the m:n:q procedure, which upon completion delivers a 100q% upper confidence limit on the CV. Additionally, for a post-validation assay output of y, the method gives an ``effective standard deviation interval'' of log(y) plus or minus r, which is a 68% confidence interval on log(mu), where mu is the expected value of the assay output for that sample. Further, the m:n:q procedure can be straightforwardly applied to constant standard deviation assays. We illustrate these tools by applying them to a growth inhibition assay. This is an implementation of the methods described in Fay, Sachs, and Miura (2018) <doi:10.1002/sim.7528>.

Maintained by Michael C Sachs. Last updated 5 years ago.

0.5 match 3.70 score 6 scripts

meenakshi-kushwaha

mmaqshiny:Explore Air Quality Mobile-Monitoring Data

Mobile-monitoring or sensors on a mobile platform, is an increasingly popular approach to measure high-resolution pollution data at the street level. Coupled with location data, spatial visualization of air-quality parameters helps detect localized areas of high air pollution, also called hotspots. In this approach, portable sensors are mounted on a vehicle and driven on predetermined routes to collect high frequency data (1 Hz). 'mmaqshiny' is for analysing, visualizing and spatial mapping of high-resolution air-quality data collected by specific devices installed on a moving platform. 1 Hz data of PM2.5 (mass concentrations of particulate matter with size less than 2.5 microns), Black carbon mass concentrations (BC), ultra-fine particle number concentrations, carbon dioxide along with GPS coordinates and relative humidity (RH) data collected by popular portable instruments (TSI DustTrak-8530, Aethlabs microAeth-AE51, TSI CPC3007, LICOR Li-830, Garmin GPSMAP 64s, Omega USB RH probe respectively). It incorporates device specific cleaning and correction algorithms. RH correction is applied to DustTrak PM2.5 following the Chakrabarti et al., (2004) <doi:10.1016/j.atmosenv.2004.03.007>. Provision is given to add linear regression coefficients for correcting the PM2.5 data (if required). BC data will be cleaned for the vibration generated noise, by adopting the statistical procedure as explained in Apte et al., (2011) <doi:10.1016/j.atmosenv.2011.05.028>, followed by a loading correction as suggested by Ban-Weiss et al., (2009) <doi:10.1021/es8021039>. For the number concentration data, provision is given for dilution correction factor (if a diluter is used with CPC3007; default value is 1). The package joins the raw, cleaned and corrected data from the above said instruments and outputs as a downloadable csv file.

Maintained by Adithi R. Upadhya. Last updated 3 years ago.

0.5 match 5 stars 3.70 score 4 scripts

eleanorcaves

AcuityView:A Package for Displaying Visual Scenes as They May Appear to an Animal with Lower Acuity

This code provides a simple method for representing a visual scene as it may be seen by an animal with less acute vision. When using (or for more information), please cite the original publication.

Maintained by Eleanor Caves. Last updated 8 years ago.

0.5 match 3 stars 3.48 score 1 scripts

bioc

omada:Machine learning tools for automated transcriptome clustering analysis

Symptomatic heterogeneity in complex diseases reveals differences in molecular states that need to be investigated. However, selecting the numerous parameters of an exploratory clustering analysis in RNA profiling studies requires deep understanding of machine learning and extensive computational experimentation. Tools that assist with such decisions without prior field knowledge are nonexistent and further gene association analyses need to be performed independently. We have developed a suite of tools to automate these processes and make robust unsupervised clustering of transcriptomic data more accessible through automated machine learning based functions. The efficiency of each tool was tested with four datasets characterised by different expression signal strengths. Our toolkit’s decisions reflected the real number of stable partitions in datasets where the subgroups are discernible. Even in datasets with less clear biological distinctions, stable subgroups with different expression profiles and clinical associations were found.

Maintained by Sokratis Kariotis. Last updated 5 months ago.

software clustering rnaseq geneexpression

0.5 match 3.60 score 5 scripts

cran

meerva:Analysis of Data with Measurement Error Using a Validation Subsample

Sometimes data for analysis are obtained using more convenient or less expensive means yielding "surrogate" variables for what could be obtained more accurately, albeit with less convenience; or less conveniently or at more expense yielding "reference" variables, thought of as being measured without error. Analysis of the surrogate variables measured with error generally yields biased estimates when the objective is to make inference about the reference variables. Often it is thought that ignoring the measurement error in surrogate variables only biases effects toward the null hypothesis, but this need not be the case. Measurement errors may bias parameter estimates either toward or away from the null hypothesis. If one has a data set with surrogate variable data from the full sample, and also reference variable data from a randomly selected subsample, then one can assess the bias introduced by measurement error in parameter estimation, and use this information to derive improved estimates based upon all available data. Formulaically these estimates based upon the reference variables from the validation subsample combined with the surrogate variables from the whole sample can be interpreted as starting with the estimate from reference variables in the validation subsample, and "augmenting" this with additional information from the surrogate variables. This suggests the term "augmented" estimate. The meerva package calculates these augmented estimates in the regression setting when there is a randomly selected subsample with both surrogate and reference variables. Measurement errors may be differential or non-differential, in any or all predictors (simultaneously) as well as outcome. The augmented estimates derive, in part, from the multivariate correlation between regression model parameter estimates from the reference variables and the surrogate variables, both from the validation subset. Because the validation subsample is chosen at random any biases imposed by measurement error, whether non-differential or differential, are reflected in this correlation and these correlations can be used to derive estimates for the reference variables using data from the whole sample. The main functions in the package are meerva.fit which calculates estimates for a dataset, and meerva.sim.block which simulates multiple datasets as described by the user, and analyzes these datasets, storing the regression coefficient estimates for inspection. The augmented estimates, as well as how measurement error may arise in practice, is described in more detail by Kremers WK (2021) <arXiv:2106.14063> and is an extension of the works by Chen Y-H, Chen H. (2000) <doi:10.1111/1467-9868.00243>, Chen Y-H. (2002) <doi:10.1111/1467-9868.00324>, Wang X, Wang Q (2015) <doi:10.1016/j.jmva.2015.05.017> and Tong J, Huang J, Chubak J, et al. (2020) <doi:10.1093/jamia/ocz180>.

Maintained by Walter K Kremers. Last updated 3 years ago.

0.9 match 2.00 score

han-siyu

TIGERr:Technical Variation Elimination with Ensemble Learning Architecture

The R implementation of TIGER. TIGER integrates random forest algorithm into an innovative ensemble learning architecture. Benefiting from this advanced architecture, TIGER is resilient to outliers, free from model tuning and less likely to be affected by specific hyperparameters. TIGER supports targeted and untargeted metabolomics data and is competent to perform both intra- and inter-batch technical variation removal. TIGER can also be used for cross-kit adjustment to ensure data obtained from different analytical assays can be effectively combined and compared. Reference: Han S. et al. (2022) <doi:10.1093/bib/bbab535>.

Maintained by Siyu Han. Last updated 6 months ago.

0.5 match 6 stars 3.48 score 1 scripts

cran

vitality:Fitting Routines for the Vitality Family of Mortality Models

Provides fitting routines for four versions of the Vitality family of mortality models.

Maintained by David J. Sharrow. Last updated 7 years ago.

1.8 match 1.00 score

bioc

SICtools:Find SNV/Indel differences between two bam files with near relationship

This package is to find SNV/Indel differences between two bam files with near relationship in a way of pairwise comparison thourgh each base position across the genome region of interest. The difference is inferred by fisher test and euclidean distance, the input of which is the base count (A,T,G,C) in a given position and read counts for indels that span no less than 2bp on both sides of indel region.

Maintained by Xiaobin Xing. Last updated 5 months ago.

alignment sequencing coverage sequencematching qualitycontrol dataimport software snp variantdetection

0.5 match 3.30 score 1 scripts

jabiru

csvread:Fast Specialized CSV File Loader

Functions for loading large (10M+ lines) CSV and other delimited files, similar to read.csv, but typically faster and using less memory than the standard R loader. While not entirely general, it covers many common use cases when the types of columns in the CSV file are known in advance. In addition, the package provides a class 'int64', which represents 64-bit integers exactly when reading from a file. The latter is useful when working with 64-bit integer identifiers exported from databases. The CSV file loader supports common column types including 'integer', 'double', 'string', and 'int64', leaving further type transformations to the user.

Maintained by Sergei Izrailev. Last updated 5 months ago.

cpp

0.5 match 3.32 score 29 scripts

yuanmingzhang

mrMLM:Multi-Locus Random-SNP-Effect Mixed Linear Model Tools for GWAS

Conduct multi-locus genome-wide association study under the framework of multi-locus random-SNP-effect mixed linear model (mrMLM). First, each marker on the genome is scanned. Bonferroni correction is replaced by a less stringent selection criterion for significant test. Then, all the markers that are potentially associated with the trait are included in a multi-locus genetic model, their effects are estimated by empirical Bayes, and all the nonzero effects were further identified by likelihood ratio test for significant QTL. The program may run on a desktop or laptop computers. If marker genotypes in association mapping population are almost homozygous, these methods in this software are very effective. If there are many heterozygous marker genotypes, the IIIVmrMLM software is recommended. Wen YJ, Zhang H, Ni YL, Huang B, Zhang J, Feng JY, Wang SB, Dunwell JM, Zhang YM, Wu R (2018, <doi:10.1093/bib/bbw145>), and Li M, Zhang YW, Zhang ZC, Xiang Y, Liu MH, Zhou YH, Zuo JF, Zhang HQ, Chen Y, Zhang YM (2022, <doi:10.1016/j.molp.2022.02.012>).

Maintained by Yuan-Ming Zhang. Last updated 3 years ago.

cpp

0.5 match 5 stars 3.32 score 14 scripts 1 dependents

bioc

geneRxCluster:gRx Differential Clustering

Detect Differential Clustering of Genomic Sites such as gene therapy integrations. The package provides some functions for exploring genomic insertion sites originating from two different sources. Possibly, the two sources are two different gene therapy vectors. Vectors are preferred that target sensitive regions less frequently, motivating the search for localized clusters of insertions and comparison of the clusters formed by integration of different vectors. Scan statistics allow the discovery of spatial differences in clustering and calculation of False Discovery Rates (FDRs) providing statistical methods for comparing retroviral vectors. A scan statistic for comparing two vectors using multiple window widths to detect clustering differentials and compute FDRs is implemented here.

Maintained by Charles Berry. Last updated 5 months ago.

sequencing clustering genetics

0.5 match 3.30 score 1 scripts

bioc

transcriptR:An Integrative Tool for ChIP- And RNA-Seq Based Primary Transcripts Detection and Quantification

The differences in the RNA types being sequenced have an impact on the resulting sequencing profiles. mRNA-seq data is enriched with reads derived from exons, while GRO-, nucRNA- and chrRNA-seq demonstrate a substantial broader coverage of both exonic and intronic regions. The presence of intronic reads in GRO-seq type of data makes it possible to use it to computationally identify and quantify all de novo continuous regions of transcription distributed across the genome. This type of data, however, is more challenging to interpret and less common practice compared to mRNA-seq. One of the challenges for primary transcript detection concerns the simultaneous transcription of closely spaced genes, which needs to be properly divided into individually transcribed units. The R package transcriptR combines RNA-seq data with ChIP-seq data of histone modifications that mark active Transcription Start Sites (TSSs), such as, H3K4me3 or H3K9/14Ac to overcome this challenge. The advantage of this approach over the use of, for example, gene annotations is that this approach is data driven and therefore able to deal also with novel and case specific events. Furthermore, the integration of ChIP- and RNA-seq data allows the identification all known and novel active transcription start sites within a given sample.

Maintained by Armen R. Karapetyan. Last updated 5 months ago.

immunooncology transcription software sequencing rnaseq coverage

0.5 match 3.30 score 2 scripts

cran

survIDINRI:IDI and NRI for Comparing Competing Risk Prediction Models with Censored Survival Data

Performs inference for a class of measures to compare competing risk prediction models with censored survival data. The class includes the integrated discrimination improvement index (IDI) and category-less net reclassification index (NRI).

Maintained by Hajime Uno. Last updated 3 years ago.

fortran

0.5 match 1 stars 3.18 score 1 dependents

cmclean5

rSpectral:Spectral Modularity Clustering

Implements the network clustering algorithm described in Newman (2006) <doi:10.1103/PhysRevE.74.036104>. The complete iterative algorithm comprises of two steps. In the first step, the network is expressed in terms of its leading eigenvalue and eigenvector and recursively partition into two communities. Partitioning occurs if the maximum positive eigenvalue is greater than the tolerance (10e-5) for the current partition, and if it results in a positive contribution to the Modularity. Given an initial separation using the leading eigen step, 'rSpectral' then continues to maximise for the change in Modularity using a fine-tuning step - or variate thereof. The first stage here is to find the node which, when moved from one community to another, gives the maximum change in Modularity. This node’s community is then fixed and we repeat the process until all nodes have been moved. The whole process is repeated from this new state until the change in the Modularity, between the new and old state, is less than the predefined tolerance. A slight variant of the fine-tuning step, which can improve speed of the calculation, is also provided. Instead of moving each node into each community in turn, we only consider moves of neighbouring nodes, found in different communities, to the community of the current node of interest. The two steps process is repeatedly applied to each new community found, subdivided each community into two new communities, until we are unable to find any division that results in a positive change in Modularity.

Maintained by Anatoly Sorokin. Last updated 2 years ago.

openblas cpp

0.5 match 1 stars 3.18 score 9 scripts 1 dependents

ptfonseca

inspector:Validation of Arguments and Objects in User-Defined Functions

Utility functions that implement and automate common sets of validation tasks. These functions are particularly useful to validate inputs, intermediate objects and output values in user-defined functions, resulting in tidier and less verbose functions.

Maintained by Pedro Fonseca. Last updated 4 years ago.

input-validation statistics validation validations

0.5 match 3.00 score 2 scripts

wenlongren

ScoreEB:Score Test Integrated with Empirical Bayes for Association Study

Perform association test within linear mixed model framework using score test integrated with empirical bayes for genome-wide association study. Firstly, score test was conducted for each single nucleotide polymorphism (SNP) under linear mixed model framework, taking into account the genetic relatedness and population structure. And then all the potentially associated SNPs were selected with a less stringent criterion. Finally, all the selected SNPs were performed empirical bayes in a multi-locus model to identify the true quantitative trait nucleotide (QTN).

Maintained by Wenlong Ren. Last updated 3 years ago.

0.5 match 2 stars 3.00 score 1 scripts

zhengxiaouvic

rmBayes:Performing Bayesian Inference for Repeated-Measures Designs

A Bayesian credible interval is interpreted with respect to posterior probability, and this interpretation is far more intuitive than that of a frequentist confidence interval. However, standard highest-density intervals can be wide due to between-subjects variability and tends to hide within-subject effects, rendering its relationship with the Bayes factor less clear in within-subject (repeated-measures) designs. This urgent issue can be addressed by using within-subject intervals in within-subject designs, which integrate four methods including the Wei-Nathoo-Masson (2023) <doi:10.3758/s13423-023-02295-1>, the Loftus-Masson (1994) <doi:10.3758/BF03210951>, the Nathoo-Kilshaw-Masson (2018) <doi:10.1016/j.jmp.2018.07.005>, and the Heck (2019) <doi:10.31234/osf.io/whp8t> interval estimates.

Maintained by Zhengxiao Wei. Last updated 1 years ago.

bayesian-inference credible-interval hdi repeated-measures stan within-subject cpp

0.5 match 2 stars 3.00 score 2 scripts

mbelitz

phenesse:Estimate Phenological Metrics using Presence-Only Data

Generates Weibull-parameterized estimates of phenology for any percentile of a distribution using the framework established in Cooke (1979) <doi:10.1093/biomet/66.2.367>. Extensive testing against other estimators suggest the weib_percentile() function is especially useful in generating more accurate and less biased estimates of onset and offset (Belitz et al. 2020 <doi.org:10.1111/2041-210X.13448>. Non-parametric bootstrapping can be used to generate confidence intervals around those estimates, although this is computationally expensive. Additionally, this package offers an easy way to perform non-parametric bootstrapping to generate confidence intervals for quantile estimates, mean estimates, or any statistical function of interest.

Maintained by Michael Belitz. Last updated 5 years ago.

0.5 match 2.95 score 18 scripts

ncss-tech

rosettaPTF:R Frontend for Rosetta Pedotransfer Functions

Access Python rosetta-soil pedotransfer functions in an R environment. Rosetta is a neural network-based model for predicting unsaturated soil hydraulic parameters from basic soil characterization data. The model predicts parameters for the van Genuchten unsaturated soil hydraulic properties model, using sand, silt, and clay, bulk density and water content. The codebase is now maintained by Dr. Todd Skaggs and other U.S. Department of Agriculture employees. This R package is intended to provide for use cases that involve many thousands of calls to the pedotransfer function. Less demanding use cases are encouraged to use the web interface or API endpoint. There are additional wrappers of the API endpoints provided by the soilDB R package `ROSETTA()` method.

Maintained by Andrew G. Brown. Last updated 3 months ago.

hydraulic hydrology ksat pedotransfer python reticulate rosetta soil

0.5 match 8 stars 2.90 score 8 scripts

tim-salabim

remote:Empirical Orthogonal Teleconnections in R

Empirical orthogonal teleconnections in R. 'remote' is short for 'R(-based) EMpirical Orthogonal TEleconnections'. It implements a collection of functions to facilitate empirical orthogonal teleconnection analysis. Empirical Orthogonal Teleconnections (EOTs) denote a regression based approach to decompose spatio-temporal fields into a set of independent orthogonal patterns. They are quite similar to Empirical Orthogonal Functions (EOFs) with EOTs producing less abstract results. In contrast to EOFs, which are orthogonal in both space and time, EOT analysis produces patterns that are orthogonal in either space or time.

Maintained by Tim Appelhans. Last updated 9 years ago.

cpp

0.5 match 2.79 score 100 scripts

yuande

tatest:Two-Group Ta-Test

The ta-test is a modified two-sample or two-group t-test of Gosset (1908). In small samples with less than 15 replicates,the ta-test significantly reduces type I error rate but has almost the same power with the t-test and hence can greatly enhance reliability or reproducibility of discoveries in biology and medicine. The ta-test can test single null hypothesis or multiple null hypotheses without needing to correct p-values.

Maintained by Yuan-De Tan. Last updated 3 years ago.

0.5 match 2.70 score 7 scripts

tomasmrkvicka

binspp:Bayesian Inference for Neyman-Scott Point Processes

The Bayesian MCMC estimation of parameters for Thomas-type cluster point process with various inhomogeneities. It allows for inhomogeneity in (i) distribution of parent points, (ii) mean number of points in a cluster, (iii) cluster spread. The package also allows for the Bayesian MCMC algorithm for the homogeneous generalized Thomas process. The cluster size is allowed to have a variance that is greater or less than the expected value (cluster sizes are over or under dispersed). Details are described in Dvořák, Remeš, Beránek & Mrkvička (2022) <arXiv: 10.48550/arXiv.2205.07946>.

Maintained by Remes Radim. Last updated 2 days ago.

cpp

0.5 match 1 stars 2.70 score

andrija-djurovic

monobinShiny:Shiny User Interface for 'monobin' Package

This is an add-on package to the 'monobin' package that simplifies its use. It provides shiny-based user interface (UI) that is especially handy for less experienced 'R' users as well as for those who intend to perform quick scanning of numeric risk factors when building credit rating models. The additional functions implemented in 'monobinShiny' that do no exist in 'monobin' package are: descriptive statistics, special case and outliers imputation. The function descriptive statistics is exported and can be used in 'R' sessions independently from the user interface, while special case and outlier imputation functions are written to be used with shiny UI.

Maintained by Andrija Djurovic. Last updated 3 years ago.

0.5 match 1 stars 2.70 score 6 scripts

rjauslin

WaveSampling:Weakly Associated Vectors (WAVE) Sampling

Spatial data are generally auto-correlated, meaning that if two units selected are close to each other, then it is likely that they share the same properties. For this reason, when sampling in the population it is often needed that the sample is well spread over space. A new method to draw a sample from a population with spatial coordinates is proposed. This method is called wave (Weakly Associated Vectors) sampling. It uses the less correlated vector to a spatial weights matrix to update the inclusion probabilities vector into a sample. For more details see Raphaël Jauslin and Yves Tillé (2019) <doi:10.1007/s13253-020-00407-1>.

Maintained by Raphaël Jauslin. Last updated 2 months ago.

openblas cpp

0.5 match 1 stars 2.70 score 8 scripts

samhaycock

stressor:Algorithms for Testing Models under Stress

Traditional model evaluation metrics fail to capture model performance under less than ideal conditions. This package employs techniques to evaluate models "under-stress". This includes testing models' extrapolation ability, or testing accuracy on specific sub-samples of the overall model space. Details describing stress-testing methods in this package are provided in Haycock (2023) <doi:10.26076/2am5-9f67>. The other primary contribution of this package is provided to R users access to the 'Python' library 'PyCaret' <https://pycaret.org/> for quick and easy access to auto-tuned machine learning models.

Maintained by Sam Haycock. Last updated 11 months ago.

0.5 match 2.70 score 6 scripts

aalfons

robmedExtra:Extra Functionality for (Robust) Mediation Analysis

This companion package extends the package 'robmed' (Alfons, Ates & Groenen, 2022b; <doi:10.18637/jss.v103.i13>) in various ways. Most notably, it provides a graphical user interface for the robust bootstrap test ROBMED (Alfons, Ates & Groenen, 2022a; <doi:10.1177/1094428121999096>) to make the method more accessible to less proficient 'R' users, as well as functions to export the results as a table in a 'Microsoft Word' or 'Microsoft Powerpoint' document, or as a 'LaTeX' table. Furthermore, the package contains a 'shiny' app to compare various bootstrap procedures for mediation analysis on simulated data.

Maintained by Andreas Alfons. Last updated 5 months ago.

0.5 match 1 stars 2.70 score

javenrflo

picR:Predictive Information Criteria for Model Selection

Computation of predictive information criteria (PIC) from select model object classes for model selection in predictive contexts. In contrast to the more widely used Akaike Information Criterion (AIC), which are derived under the assumption that target(s) of prediction (i.e. validation data) are independently and identically distributed to the fitting data, the PIC are derived under less restrictive assumptions and thus generalize AIC to the more practically relevant case of training/validation data heterogeneity. The methodology featured in this package is based on Flores (2021) <https://iro.uiowa.edu/esploro/outputs/doctoral/A-new-class-of-information-criteria/9984097169902771?institution=01IOWA_INST> "A new class of information criteria for improved prediction in the presence of training/validation data heterogeneity".

Maintained by Javier Flores. Last updated 2 years ago.

0.5 match 2.70 score 3 scripts

umich-biostatistics

MetaIntegration:Ensemble Meta-Inference Framework

An ensemble meta-inference framework to integrate multiple regression models into a current study. Gu, T., Taylor, J.M.G. and Mukherjee, B. (2021) <arXiv:2010.09971>. A meta-analysis framework along with two weighted estimators as the ensemble of empirical Bayes estimators, which combines the estimates from the different external models. The proposed framework is flexible and robust in the ways that (i) it is capable of incorporating external models that use a slightly different set of covariates; (ii) it is able to identify the most relevant external information and diminish the influence of information that is less compatible with the internal data; and (iii) it nicely balances the bias-variance trade-off while preserving the most efficiency gain. The proposed estimators are more efficient than the naive analysis of the internal data and other naive combinations of external estimators.

Maintained by Michael Kleinsasser. Last updated 4 years ago.

0.5 match 1 stars 2.70 score 1 scripts