R-universe search: clara

mmaechler

cluster:"Finding Groups in Data": Cluster Analysis Extended Rousseeuw et al.

Methods for Cluster analysis. Much extended the original from Peter Rousseeuw, Anja Struyf and Mia Hubert, based on Kaufman and Rousseeuw (1990) "Finding Groups in Data".

Maintained by Martin Maechler. Last updated 5 days ago.

8.8 match 3 stars 11.98 score 14k scripts 2.2k dependents

clarahapp

MFPCA:Multivariate Functional Principal Component Analysis for Data Observed on Different Dimensional Domains

Calculate a multivariate functional principal component analysis for data observed on different dimensional domains. The estimation algorithm relies on univariate basis expansions for each element of the multivariate functional data (Happ & Greven, 2018) <doi:10.1080/01621459.2016.1273115>. Multivariate and univariate functional data objects are represented by S4 classes for this type of data implemented in the package 'funData'. For more details on the general concepts of both packages and a case study, see Happ-Kurz (2020) <doi:10.18637/jss.v093.i05>.

Maintained by Clara Happ-Kurz. Last updated 3 years ago.

fftw3

8.3 match 32 stars 6.89 score 203 scripts 4 dependents

clarahapp

funData:An S4 Class for Functional Data

S4 classes for univariate and multivariate functional data with utility functions. See <doi:10.18637/jss.v093.i05> for a detailed description of the package functionalities and its interplay with the MFPCA package for multivariate functional principal component analysis <https://CRAN.R-project.org/package=MFPCA>.

Maintained by Clara Happ-Kurz. Last updated 1 years ago.

8.5 match 14 stars 6.15 score 111 scripts 6 dependents

tidymodels

broom:Convert Statistical Objects into Tidy Tibbles

Summarizes key information about statistical objects in tidy tibbles. This makes it easy to report results, create plots and consistently work with large numbers of models at once. Broom provides three verbs that each provide different types of information about a model. tidy() summarizes information about model components such as coefficients of a regression. glance() reports information about an entire model, such as goodness of fit measures like AIC and BIC. augment() adds information about individual observations to a dataset, such as fitted values or influence measures.

Maintained by Simon Couch. Last updated 4 months ago.

modeling tidy-data

2.3 match 1.5k stars 21.56 score 37k scripts 1.4k dependents

musajajorge

makePalette:Make Palette

Functions that allow you to create your own color palette from an image, using mathematical algorithms.

Maintained by Jorge L. C. Musaja. Last updated 1 years ago.

clara color kmeans kmeans-algorithm kmeans-clustering kmeans-clustering-algorithm palette pam

12.8 match 5 stars 3.40 score 1 scripts

celehs

PheNorm:Unsupervised Gold-Standard Label Free Phenotyping Algorithm for EHR Data

The algorithm combines the most predictive variable, such as count of the main International Classification of Diseases (ICD) codes, and other Electronic Health Record (EHR) features (e.g. health utilization and processed clinical note data), to obtain a score for accurate risk prediction and disease classification. In particular, it normalizes the surrogate to resemble gaussian mixture and leverages the remaining features through random corruption denoising. Background and details about the method can be found at Yu et al. (2018) <doi:10.1093/jamia/ocx111>.

Maintained by Clara-Lea Bonzel. Last updated 4 years ago.

8.2 match 5 stars 4.70 score 7 scripts

cantonfe

sitree:Single Tree Simulator

Framework to build an individual tree simulator.

Maintained by Clara Anton Fernandez. Last updated 1 years ago.

10.6 match 3.37 score 39 scripts 1 dependents

matthias-studer

WeightedCluster:Clustering of Weighted Data

Clusters state sequences and weighted data. It provides an optimized weighted PAM algorithm as well as functions for aggregating replicated cases, computing cluster quality measures for a range of clustering solutions and plotting (fuzzy) clusters of state sequences. Parametric bootstraps methods to validate typology of sequences are also provided. Finally, it provides a fuzzy and crisp CLARA algorithm to cluster large database with sequence analysis.

Maintained by Matthias Studer. Last updated 3 months ago.

cpp

5.7 match 5.55 score 106 scripts 4 dependents

nanxstats

ggsci:Scientific Journal and Sci-Fi Themed Color Palettes for 'ggplot2'

A collection of 'ggplot2' color palettes inspired by plots in scientific journals, data visualization libraries, science fiction movies, and TV shows.

Maintained by Nan Xiao. Last updated 9 months ago.

color-palettes data-visualization ggplot2 ggsci sci-fi scientific-journals visualization

1.6 match 680 stars 18.00 score 26k scripts 438 dependents

moderndive

moderndive:Tidyverse-Friendly Introductory Linear Regression

Datasets and wrapper functions for tidyverse-friendly introductory linear regression, used in "Statistical Inference via Data Science: A ModernDive into R and the Tidyverse" available at <https://moderndive.com/>.

Maintained by Albert Y. Kim. Last updated 3 months ago.

1.5 match 88 stars 11.35 score 1.8k scripts

bioc

zinbwave:Zero-Inflated Negative Binomial Model for RNA-Seq Data

Implements a general and flexible zero-inflated negative binomial model that can be used to provide a low-dimensional representations of single-cell RNA-seq data. The model accounts for zero inflation (dropouts), over-dispersion, and the count nature of the data. The model also accounts for the difference in library sizes and optionally for batch effects and/or other covariates, avoiding the need for pre-normalize the data.

Maintained by Davide Risso. Last updated 5 months ago.

immunooncology dimensionreduction geneexpression rnaseq software transcriptomics sequencing singlecell

1.6 match 43 stars 10.53 score 190 scripts 6 dependents

cantonfe

sitreeE:Sitree Extensions

Provides extensions for package 'sitree' for allometric variables, growth, mortality, recruitment, management, tree removal and external modifiers functions.

Maintained by Clara Anton Fernandez. Last updated 3 years ago.

8.5 match 1.95 score 18 scripts

roelandkindt

BiodiversityR:Package for Community Ecology and Suitability Analysis

Graphical User Interface (via the R-Commander) and utility functions (often based on the vegan package) for statistical analysis of biodiversity and ecological communities, including species accumulation curves, diversity indices, Renyi profiles, GLMs for analysis of species abundance and presence-absence, distance matrices, Mantel tests, and cluster, constrained and unconstrained ordination analysis. A book on biodiversity and community ecology analysis is available for free download from the website. In 2012, methods for (ensemble) suitability modelling and mapping were expanded in the package.

Maintained by Roeland Kindt. Last updated 2 months ago.

2.0 match 16 stars 7.42 score 390 scripts 2 dependents

integrated-inferences

CausalQueries:Make, Update, and Query Binary Causal Models

Users can declare causal models over binary nodes, update beliefs about causal types given data, and calculate arbitrary queries. Updating is implemented in 'stan'. See Humphreys and Jacobs, 2023, Integrated Inferences (<DOI: 10.1017/9781316718636>) and Pearl, 2009 Causality (<DOI:10.1017/CBO9780511803161>).

Maintained by Till Tietz. Last updated 23 days ago.

bayes causal dags mixedmethods stan cpp

1.5 match 27 stars 9.03 score 54 scripts

openpharma

crmPack:Object-Oriented Implementation of CRM Designs

Implements a wide range of model-based dose escalation designs, ranging from classical and modern continual reassessment methods (CRMs) based on dose-limiting toxicity endpoints to dual-endpoint designs taking into account a biomarker/efficacy outcome. The focus is on Bayesian inference, making it very easy to setup a new design with its own JAGS code. However, it is also possible to implement 3+3 designs for comparison or models with non-Bayesian estimation. The whole package is written in a modular form in the S4 class system, making it very flexible for adaptation to new models, escalation or stopping rules. Further details are presented in Sabanes Bove et al. (2019) <doi:10.18637/jss.v089.i10>.

Maintained by Daniel Sabanes Bove. Last updated 2 months ago.

jags cpp

1.5 match 21 stars 7.79 score 208 scripts

mattmar

rasterdiv:Diversity Indices for Numerical Matrices

Provides methods to calculate diversity indices on numerical matrices based on information theory, as described in Rocchini, Marcantonio and Ricotta (2017) <doi:10.1016/j.ecolind.2016.07.039>, and Rocchini et al. (2021) <doi:10.1101/2021.01.23.427872>.

Maintained by Matteo Marcantonio. Last updated 20 days ago.

1.5 match 15 stars 7.65 score 44 scripts 1 dependents

biorgeo

bioregion:Comparison of Bioregionalisation Methods

The main purpose of this package is to propose a transparent methodological framework to compare bioregionalisation methods based on hierarchical and non-hierarchical clustering algorithms (Kreft & Jetz (2010) <doi:10.1111/j.1365-2699.2010.02375.x>) and network algorithms (Lenormand et al. (2019) <doi:10.1002/ece3.4718> and Leroy et al. (2019) <doi:10.1111/jbi.13674>).

Maintained by Maxime Lenormand. Last updated 11 days ago.

biogeography bioregion bioregionalization cpp

1.9 match 7 stars 6.27 score 11 scripts

beanumber

teamcolors:Color Palettes for Pro Sports Teams

Provides color palettes corresponding to professional and amateur, sports teams. These can be useful in creating data graphics that are themed for particular teams.

Maintained by Benjamin S. Baumer. Last updated 5 months ago.

1.6 match 48 stars 6.46 score 202 scripts

declaredesign

DesignLibrary:Library of Research Designs

A simple interface to build designs using the package 'DeclareDesign'. In one line of code, users can specify the parameters of individual designs and diagnose their properties. The designers can also be used to compare performance of a given design across a range of combinations of parameters, such as effect size, sample size, and assignment probabilities.

Maintained by Jasper Cooper. Last updated 1 months ago.

1.6 match 30 stars 6.30 score 144 scripts

cran

sparsevb:Spike-and-Slab Variational Bayes for Linear and Logistic Regression

Implements variational Bayesian algorithms to perform scalable variable selection for sparse, high-dimensional linear and logistic regression models. Features include a novel prioritized updating scheme, which uses a preliminary estimator of the variational means during initialization to generate an updating order prioritizing large, more relevant, coefficients. Sparsity is induced via spike-and-slab priors with either Laplace or Gaussian slabs. By default, the heavier-tailed Laplace density is used. Formal derivations of the algorithms and asymptotic consistency results may be found in Kolyan Ray and Botond Szabo (JASA 2020) and Kolyan Ray, Botond Szabo, and Gabriel Clara (NeurIPS 2020).

Maintained by Gabriel Clara. Last updated 2 months ago.

openblas cpp openmp

9.7 match 1.00 score 10 scripts

bioc

kissDE:Retrieves Condition-Specific Variants in RNA-Seq Data

Retrieves condition-specific variants in RNA-seq data (SNVs, alternative-splicings, indels). It has been developed as a post-treatment of 'KisSplice' but can also be used with user's own data.

Maintained by Aurélie Siberchicot. Last updated 5 months ago.

alternativesplicing differentialsplicing experimentaldesign genomicvariation rnaseq transcriptomics

1.5 match 3 stars 5.98 score 7 scripts

awkena

qrlabelr:Generate Machine- And Human-Readable Plot Labels for Experiments

A no-frills open-source solution for designing plot labels affixed with QR codes. It features 'EasyQrlabelr', a 'BrAPI-compliant' 'shiny' app that simplifies the process of plot label design for non-R users. It builds on the methods described by Wu 'et al.' (2020) <doi:10.1111/2041-210X.13405>.

Maintained by Alexander Wireko Kena. Last updated 24 days ago.

experiments field-plots labels-generator qr-code shiny-apps

1.5 match 18 stars 5.94 score 16 scripts

eleonoraarnone

fdaPDE:Physics-Informed Spatial and Functional Data Analysis

An implementation of regression models with partial differential regularizations, making use of the Finite Element Method. The models efficiently handle data distributed over irregularly shaped domains and can comply with various conditions at the boundaries of the domain. A priori information about the spatial structure of the phenomenon under study can be incorporated in the model via the differential regularization. See Sangalli, L. M. (2021) <doi:10.1111/insr.12444> "Spatial Regression With Partial Differential Equation Regularisation" for an overview. The release 1.1-9 requires R (>= 4.2.0) to be installed on windows machines.

Maintained by Eleonora Arnone. Last updated 2 months ago.

cpp

1.5 match 1 stars 3.73 score 267 scripts

jrmccombs

RHPCBenchmark:Benchmarks for High-Performance Computing Environments

Microbenchmarks for determining the run time performance of aspects of the R programming environment and packages relevant to high-performance computation. The benchmarks are divided into three categories: dense matrix linear algebra kernels, sparse matrix linear algebra kernels, and machine learning functionality.

Maintained by James McCombs. Last updated 8 years ago.

1.7 match 3.02 score 21 scripts

mikldk

disclapmix:Discrete Laplace Mixture Inference using the EM Algorithm

Make inference in a mixture of discrete Laplace distributions using the EM algorithm. This can e.g. be used for modelling the distribution of Y chromosomal haplotypes as described in [1, 2] (refer to the URL section).

Maintained by Mikkel Meyer Andersen. Last updated 2 years ago.

cpp

1.1 match 4.32 score 14 scripts

chrhennig

fpc:Flexible Procedures for Clustering

Various methods for clustering and cluster validation. Fixed point clustering. Linear regression clustering. Clustering by merging Gaussian mixture components. Symmetric and asymmetric discriminant projections for visualisation of the separation of groupings. Cluster validation statistics for distance based clustering including corrected Rand index. Standardisation of cluster validation statistics by random clusterings and comparison between many clustering methods and numbers of clusters based on this. Cluster-wise cluster stability assessment. Methods for estimation of the number of clusters: Calinski-Harabasz, Tibshirani and Walther's prediction strength, Fang and Wang's bootstrap stability. Gaussian/multinomial mixture fitting for mixed continuous/categorical variables. Variable-wise statistics for cluster interpretation. DBSCAN clustering. Interface functions for many clustering methods implemented in R, including estimating the number of clusters with kmeans, pam and clara. Modality diagnosis for Gaussian mixtures. For an overview see package?fpc.

Maintained by Christian Hennig. Last updated 6 months ago.

0.5 match 11 stars 9.25 score 2.6k scripts 70 dependents

ybrugnara

dataresqc:C3S Quality Control Tools for Historical Climate Data

Quality control and formatting tools developed for the Copernicus Data Rescue Service. The package includes functions to handle the Station Exchange Format (SEF), various statistical tests for climate data at daily and sub-daily resolution, as well as functions to plot the data. For more information and documentation see <https://datarescue.climate.copernicus.eu/st_data-quality-control>.

Maintained by Yuri Brugnara. Last updated 2 years ago.

1.6 match 2.70 score 7 scripts

ligophorus

FuzzyQ:Fuzzy Quantification of Common and Rare Species

Fuzzy clustering of species in an ecological community as common or rare based on their abundance and occupancy. It also includes functions to compute confidence intervals of classification metrics and plot results. See Balbuena et al. (2020, <doi:10.1101/2020.08.12.247502>).

Maintained by Juan A. Balbuena. Last updated 4 years ago.

1.5 match 2.70 score 7 scripts

laylaparast

OSsurvival:Assessing Surrogacy with a Censored Outcome

Identifies the optimal transformation of a surrogate marker and estimates the proportion of treatment explained (PTE) by the optimally-transformed surrogate at an earlier time point when the primary outcome of interest is a censored time-to-event outcome; details are described in Wang et al (2021) <doi:10.1002/sim.9185>.

Maintained by Layla Parast. Last updated 2 years ago.

1.6 match 1.00 score 3 scripts

jdmde

parallelpam:Parallel Partitioning-Around-Medoids (PAM) for Big Sets of Data

Application of the Partitioning-Around-Medoids (PAM) clustering algorithm described in Schubert, E. and Rousseeuw, P.J.: "Fast and eager k-medoids clustering: O(k) runtime improvement of the PAM, CLARA, and CLARANS algorithms." Information Systems, vol. 101, p. 101804, (2021). <doi:10.1016/j.is.2021.101804>. It uses a binary format for storing and retrieval of matrices developed for the 'jmatrix' package but the functionality of 'jmatrix' is included here, so you do not need to install it. Also, it is used by package 'scellpam', so if you have installed it, you do not need to install this package. PAM can be applied to sets of data whose dissimilarity matrix can be very big. It has been tested with up to 100.000 points. It does this with the help of the code developed for other package, 'jmatrix', which allows the matrix not to be loaded in 'R' memory (which would force it to be of double type) but it gets from disk, which allows using float (or even smaller data types). Moreover, the dissimilarity matrix is calculated in parallel if the computer has several cores so it can open many threads. The initial part of the PAM algorithm can be done with the BUILD or LAB algorithms; the BUILD algorithm has been implemented in parallel. The optimization phase implements the FastPAM1 algorithm, also in parallel. Finally, calculation of silhouette is available and also implemented in parallel.

Maintained by Juan Domingo. Last updated 8 months ago.

cpp

0.5 match 2.60 score 6 scripts