Showing 200 of total 402 results (show query)

tidyverse

dplyr:A Grammar of Data Manipulation

A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

Maintained by Hadley Wickham. Last updated 13 days ago.

data-manipulationgrammarcpp

5.3 match 4.8k stars 24.68 score 659k scripts 7.8k dependents

bioc

mixOmics:Omics Data Integration Project

Multivariate methods are well suited to large omics data sets where the number of variables (e.g. genes, proteins, metabolites) is much larger than the number of samples (patients, cells, mice). They have the appealing properties of reducing the dimension of the data by using instrumental variables (components), which are defined as combinations of all variables. Those components are then used to produce useful graphical outputs that enable better understanding of the relationships and correlation structures between the different data sets that are integrated. mixOmics offers a wide range of multivariate methods for the exploration and integration of biological datasets with a particular focus on variable selection. The package proposes several sparse multivariate models we have developed to identify the key variables that are highly correlated, and/or explain the biological outcome of interest. The data that can be analysed with mixOmics may come from high throughput sequencing technologies, such as omics data (transcriptomics, metabolomics, proteomics, metagenomics etc) but also beyond the realm of omics (e.g. spectral imaging). The methods implemented in mixOmics can also handle missing values without having to delete entire rows with missing data. A non exhaustive list of methods include variants of generalised Canonical Correlation Analysis, sparse Partial Least Squares and sparse Discriminant Analysis. Recently we implemented integrative methods to combine multiple data sets: N-integration with variants of Generalised Canonical Correlation Analysis and P-integration with variants of multi-group Partial Least Squares.

Maintained by Eva Hamrud. Last updated 4 days ago.

immunooncologymicroarraysequencingmetabolomicsmetagenomicsproteomicsgenepredictionmultiplecomparisonclassificationregressionbioconductorgenomicsgenomics-datagenomics-visualizationmultivariate-analysismultivariate-statisticsomicsr-pkgr-project

4.0 match 182 stars 13.71 score 1.3k scripts 22 dependents

viadee

localICE:Local Individual Conditional Expectation

Local Individual Conditional Expectation ('localICE') is a local explanation approach from the field of eXplainable Artificial Intelligence (XAI). localICE is a model-agnostic XAI approach which provides three-dimensional local explanations for particular data instances. The approach is proposed in the master thesis of Martin Walter as an extension to ICE (see Reference). The three dimensions are the two features at the horizontal and vertical axes as well as the target represented by different colors. The approach is applicable for classification and regression problems to explain interactions of two features towards the target. For classification models, the number of classes can be more than two and each class is added as a different color to the plot. The given instance is added to the plot as two dotted lines according to the feature values. The localICE-package can explain features of type factor and numeric of any machine learning model. Automatically supported machine learning packages are 'mlr', 'randomForest', 'caret' or all other with an S3 predict function. For further model types from other libraries, a predict function has to be provided as an argument in order to get access to the model. Reference to the ICE approach: Alex Goldstein, Adam Kapelner, Justin Bleich, Emil Pitkin (2013) <arXiv:1309.6392>.

Maintained by Martin Walter. Last updated 5 years ago.

aiexplainable-aiggplotmachine-learningvisualization

8.4 match 7 stars 3.54 score 3 scripts

bioc

MatrixQCvis:Shiny-based interactive data-quality exploration for omics data

Data quality assessment is an integral part of preparatory data analysis to ensure sound biological information retrieval. We present here the MatrixQCvis package, which provides shiny-based interactive visualization of data quality metrics at the per-sample and per-feature level. It is broadly applicable to quantitative omics data types that come in matrix-like format (features x samples). It enables the detection of low-quality samples, drifts, outliers and batch effects in data sets. Visualizations include amongst others bar- and violin plots of the (count/intensity) values, mean vs standard deviation plots, MA plots, empirical cumulative distribution function (ECDF) plots, visualizations of the distances between samples, and multiple types of dimension reduction plots. Furthermore, MatrixQCvis allows for differential expression analysis based on the limma (moderated t-tests) and proDA (Wald tests) packages. MatrixQCvis builds upon the popular Bioconductor SummarizedExperiment S4 class and enables thus the facile integration into existing workflows. The package is especially tailored towards metabolomics and proteomics mass spectrometry data, but also allows to assess the data quality of other data types that can be represented in a SummarizedExperiment object.

Maintained by Thomas Naake. Last updated 5 months ago.

visualizationshinyappsguiqualitycontroldimensionreductionmetabolomicsproteomicstranscriptomics

5.2 match 4.74 score 4 scripts

growthcharts

brokenstick:Broken Stick Model for Irregular Longitudinal Data

Data on multiple individuals through time are often sampled at times that differ between persons. Irregular observation times can severely complicate the statistical analysis of the data. The broken stick model approximates each subject’s trajectory by one or more connected line segments. The times at which segments connect (breakpoints) are identical for all subjects and under control of the user. A well-fitting broken stick model effectively transforms individual measurements made at irregular times into regular trajectories with common observation times. Specification of the model requires three variables: time, measurement and subject. The model is a special case of the linear mixed model, with time as a linear B-spline and subject as the grouping factor. The main assumptions are: subjects are exchangeable, trajectories between consecutive breakpoints are straight, random effects follow a multivariate normal distribution, and unobserved data are missing at random. The package contains functions for fitting the broken stick model to data, for predicting curves in new data and for plotting broken stick estimates. The package supports two optimization methods, and includes options to structure the variance-covariance matrix of the random effects. The analyst may use the software to smooth growth curves by a series of connected straight lines, to align irregularly observed curves to a common time grid, to create synthetic curves at a user-specified set of breakpoints, to estimate the time-to-time correlation matrix and to predict future observations. See <doi:10.18637/jss.v106.i07> for additional documentation on background, methodology and applications.

Maintained by Stef van Buuren. Last updated 2 years ago.

b-splinegrowth-curveslinear-mixed-modelslongitudinal-data

3.2 match 9 stars 5.33 score 12 scripts

r-forge

stops:Structure Optimized Proximity Scaling

Methods that use flexible variants of multidimensional scaling (MDS) which incorporate parametric nonlinear distance transformations and trade-off the goodness-of-fit fit with structure considerations to find optimal hyperparameters, also known as structure optimized proximity scaling (STOPS) (Rusch, Mair & Hornik, 2023,<doi:10.1007/s11222-022-10197-w>). The package contains various functions, wrappers, methods and classes for fitting, plotting and displaying different 1-way MDS models with ratio, interval, ordinal optimal scaling in a STOPS framework. These cover essentially the functionality of the package smacofx, including Torgerson (classical) scaling with power transformations of dissimilarities, SMACOF MDS with powers of dissimilarities, Sammon mapping with powers of dissimilarities, elastic scaling with powers of dissimilarities, spherical SMACOF with powers of dissimilarities, (ALSCAL) s-stress MDS with powers of dissimilarities, r-stress MDS, MDS with powers of dissimilarities and configuration distances, elastic scaling powers of dissimilarities and configuration distances, Sammon mapping powers of dissimilarities and configuration distances, power stress MDS (POST-MDS), approximate power stress, Box-Cox MDS, local MDS, Isomap, curvilinear component analysis (CLCA), curvilinear distance analysis (CLDA) and sparsified (power) multidimensional scaling and (power) multidimensional distance analysis (experimental models from smacofx influenced by CLCA). All of these models can also be fit by optimizing over hyperparameters based on goodness-of-fit fit only (i.e., no structure considerations). The package further contains functions for optimization, specifically the adaptive Luus-Jaakola algorithm and a wrapper for Bayesian optimization with treed Gaussian process with jumps to linear models, and functions for various c-structuredness indices.

Maintained by Thomas Rusch. Last updated 2 months ago.

openjdk

3.0 match 1 stars 4.48 score 23 scripts

mrc-ide

odin2:Next generation odin

Temporary package for rewriting odin.

Maintained by Rich FitzJohn. Last updated 2 months ago.

2.0 match 5 stars 6.32 score 22 scripts

lleisong

itsdm:Isolation Forest-Based Presence-Only Species Distribution Modeling

Collection of R functions to do purely presence-only species distribution modeling with isolation forest (iForest) and its variations such as Extended isolation forest and SCiForest. See the details of these methods in references: Liu, F.T., Ting, K.M. and Zhou, Z.H. (2008) <doi:10.1109/ICDM.2008.17>, Hariri, S., Kind, M.C. and Brunner, R.J. (2019) <doi:10.1109/TKDE.2019.2947676>, Liu, F.T., Ting, K.M. and Zhou, Z.H. (2010) <doi:10.1007/978-3-642-15883-4_18>, Guha, S., Mishra, N., Roy, G. and Schrijvers, O. (2016) <https://proceedings.mlr.press/v48/guha16.html>, Cortes, D. (2021) <arXiv:2110.13402>. Additionally, Shapley values are used to explain model inputs and outputs. See details in references: Shapley, L.S. (1953) <doi:10.1515/9781400881970-018>, Lundberg, S.M. and Lee, S.I. (2017) <https://dl.acm.org/doi/abs/10.5555/3295222.3295230>, Molnar, C. (2020) <ISBN:978-0-244-76852-2>, Štrumbelj, E. and Kononenko, I. (2014) <doi:10.1007/s10115-013-0679-x>. itsdm also provides functions to diagnose variable response, analyze variable importance, draw spatial dependence of variables and examine variable contribution. As utilities, the package includes a few functions to download bioclimatic variables including 'WorldClim' version 2.0 (see Fick, S.E. and Hijmans, R.J. (2017) <doi:10.1002/joc.5086>) and 'CMCC-BioClimInd' (see Noce, S., Caporaso, L. and Santini, M. (2020) <doi:10.1038/s41597-020-00726-5>.

Maintained by Lei Song. Last updated 2 years ago.

isolation-forestoutlier-detectionpresence-onlymodelshapley-valuespecies-distribution-modelling

2.2 match 4 stars 5.59 score 65 scripts

jbgruber

askgpt:Asking GPT About R Stuff

A chat package connecting to API endpoints by 'OpenAI' (<https://platform.openai.com/>) to answer questions (about R).

Maintained by Johannes Gruber. Last updated 10 months ago.

2.0 match 56 stars 5.68 score 17 scripts

kapelner

GreedyExperimentalDesign:Greedy Experimental Design Construction

Computes experimental designs for a two-arm experiment with covariates via a number of methods: (0) complete randomization and randomization with forced-balance, (1) Greedily optimizing a balance objective function via pairwise switching. This optimization provides lower variance for the treatment effect estimator (and higher power) while preserving a design that is close to complete randomization. We return all iterations of the designs for use in a permutation test, (2) The second is via numerical optimization (via 'gurobi' which must be installed, see <https://www.gurobi.com/documentation/9.1/quickstart_windows/r_ins_the_r_package.html>) a la Bertsimas and Kallus, (3) rerandomization, (4) Karp's method for one covariate, (5) exhaustive enumeration to find the optimal solution (only for small sample sizes), (6) Binary pair matching using the 'nbpMatching' library, (7) Binary pair matching plus design number (1) to further optimize balance, (8) Binary pair matching plus design number (3) to further optimize balance, (9) Hadamard designs, (10) Simultaneous Multiple Kernels. In (1-9) we allow for three objective functions: Mahalanobis distance, Sum of absolute differences standardized and Kernel distances via the 'kernlab' library. This package is the result of a stream of research that can be found in Krieger, A, Azriel, D and Kapelner, A "Nearly Random Designs with Greatly Improved Balance" (2016) <arXiv:1612.02315>, Krieger, A, Azriel, D and Kapelner, A "Better Experimental Design by Hybridizing Binary Matching with Imbalance Optimization" (2021) <arXiv:2012.03330>.

Maintained by Adam Kapelner. Last updated 12 days ago.

cppopenjdk

1.7 match 4.16 score 16 scripts 1 dependents

ataher76

aLBI:Estimating Length-Based Indicators for Fish Stock

Provides tools for estimating length-based indicators from length frequency data to assess fish stock status and manage fisheries sustainably. Implements methods from Cope and Punt (2009) <doi:10.1577/C08-025.1> for data-limited stock assessment and Froese (2004) <doi:10.1111/j.1467-2979.2004.00144.x> for detecting overfishing using simple indicators. Key functions include: FrequencyTable(): Calculate the frequency table from the collected and also the extract the length frequency data from the frequency table with the upper length_range. A numeric value specifying the bin width for class intervals. If not provided, the bin width is automatically calculated using Sturges (1926) <doi:10.1080/01621459.1926.10502161> formula. CalPar(): Calculates various lengths used in fish stock assessment as biological length indicators such as asymptotic length (Linf), maximum length (Lmax), length at sexual maturity (Lm), and optimal length (Lopt). FishPar(): Calculates length-based indicators (LBIs) proposed by Froese (2004) <doi:10.1111/j.1467-2979.2004.00144.x> such as the percentage of mature fish (Pmat), percentage of optimal length fish (Popt), percentage of mega spawners (Pmega), and the sum of these as Pobj. This function also estimates confidence intervals for different lengths, visualizes length frequency distributions, and provides data frames containing calculated values. FishSS(): Makes decisions based on input from Cope and Punt (2009) <doi:10.1577/C08-025.1> and parameters calculated by FishPar() (e.g., Pobj, Pmat, Popt, LM_ratio) to determine stock status as target spawning biomass (TSB40) and limit spawning biomass (LSB25). These tools support fisheries management decisions by providing robust, data-driven insights.

Maintained by Ataher Ali. Last updated 4 months ago.

1.5 match 1 stars 4.60 score 7 scripts