Showing 26 of total 26 results (show query)
ropensci
skimr:Compact and Flexible Summaries of Data
A simple to use summary function that can be used with pipes and displays nicely in the console. The default summary statistics may be modified by the user as can the default formatting. Support for data frames and vectors is included, and users can implement their own skim methods for specific object types as described in a vignette. Default summaries include support for inline spark graphs. Instructions for managing these on specific operating systems are given in the "Using skimr" vignette and the README.
Maintained by Elin Waring. Last updated 2 months ago.
peer-reviewedropenscisummary-statisticsunconfunconf17
1.1k stars 16.80 score 18k scripts 14 dependentsmlr-org
mlr3:Machine Learning in R - Next Generation
Efficient, object-oriented programming on the building blocks of machine learning. Provides 'R6' objects for tasks, learners, resamplings, and measures. The package is geared towards scalability and larger datasets by supporting parallelization and out-of-memory data-backends like databases. While 'mlr3' focuses on the core computational operations, add-on packages provide additional functionality.
Maintained by Marc Becker. Last updated 18 days ago.
classificationdata-sciencemachine-learningmlr3regression
972 stars 14.86 score 2.3k scripts 35 dependentsfriendly
matlib:Matrix Functions for Teaching and Learning Linear Algebra and Multivariate Statistics
A collection of matrix functions for teaching and learning matrix linear algebra as used in multivariate statistical methods. Many of these functions are designed for tutorial purposes in learning matrix algebra ideas using R. In some cases, functions are provided for concepts available elsewhere in R, but where the function call or name is not obvious. In other cases, functions are provided to show or demonstrate an algorithm. In addition, a collection of functions are provided for drawing vector diagrams in 2D and 3D and for rendering matrix expressions and equations in LaTeX.
Maintained by Michael Friendly. Last updated 15 days ago.
diagramslinear-equationsmatrixmatrix-functionsmatrix-visualizervectorvignette
65 stars 12.89 score 900 scripts 11 dependentstidyverse
multidplyr:A Multi-Process 'dplyr' Backend
Partition a data frame across multiple worker processes to provide simple multicore parallelism.
Maintained by Hadley Wickham. Last updated 8 months ago.
645 stars 10.82 score 460 scripts 5 dependentsnacnudus
unpivotr:Unpivot Complex and Irregular Data Layouts
Tools for converting data from complex or irregular layouts to a columnar structure. For example, tables with multilevel column or row headers, or spreadsheets. Header and data cells are selected by their contents and position, as well as formatting and comments where available, and are associated with one other by their proximity in given directions. Functions for data frames and HTML tables are provided.
Maintained by Duncan Garmonsway. Last updated 2 months ago.
186 stars 10.35 score 368 scripts 3 dependentsludvigolsen
groupdata2:Creating Groups from Data
Methods for dividing data into groups. Create balanced partitions and cross-validation folds. Perform time series windowing and general grouping and splitting of data. Balance existing groups with up- and downsampling or collapse them to fewer groups.
Maintained by Ludvig Renbo Olsen. Last updated 3 months ago.
balancecross-validationdatadata-framefoldgroup-factorgroupsparticipantspartitionsplitstaircase
27 stars 9.04 score 338 scripts 7 dependentsmayer79
splitTools:Tools for Data Splitting
Fast, lightweight toolkit for data splitting. Data sets can be partitioned into disjoint groups (e.g. into training, validation, and test) or into (repeated) k-folds for subsequent cross-validation. Besides basic splits, the package supports stratified, grouped as well as blocked splitting. Furthermore, cross-validation folds for time series data can be created. See e.g. Hastie et al. (2001) <doi:10.1007/978-0-387-84858-7> for the basic background on data partitioning and cross-validation.
Maintained by Michael Mayer. Last updated 1 months ago.
cross-validationmachine-learningtime-seriesvalidation
13 stars 8.15 score 169 scripts 4 dependentspolmine
polmineR:Verbs and Nouns for Corpus Analysis
Package for corpus analysis using the Corpus Workbench ('CWB', <https://cwb.sourceforge.io>) as an efficient back end for indexing and querying large corpora. The package offers functionality to flexibly create subcorpora and to carry out basic statistical operations (count, co-occurrences etc.). The original full text of documents can be reconstructed and inspected at any time. Beyond that, the package is intended to serve as an interface to packages implementing advanced statistical procedures. Respective data structures (document-term matrices, term-co-occurrence matrices etc.) can be created based on the indexed corpora.
Maintained by Andreas Blaette. Last updated 1 years ago.
49 stars 7.96 score 311 scriptscwatson
brainGraph:Graph Theory Analysis of Brain MRI Data
A set of tools for performing graph theory analysis of brain MRI data. It works with data from a Freesurfer analysis (cortical thickness, volumes, local gyrification index, surface area), diffusion tensor tractography data (e.g., from FSL) and resting-state fMRI data (e.g., from DPABI). It contains a graphical user interface for graph visualization and data exploration, along with several functions for generating useful figures.
Maintained by Christopher G. Watson. Last updated 1 years ago.
brain-connectivitybrain-imagingcomplex-networksconnectomeconnectomicsfmrigraph-theorymrinetwork-analysisneuroimagingneurosciencestatisticstractography
188 stars 7.86 score 107 scripts 3 dependentsuscbiostats
partition:Agglomerative Partitioning Framework for Dimension Reduction
A fast and flexible framework for agglomerative partitioning. 'partition' uses an approach called Direct-Measure-Reduce to create new variables that maintain the user-specified minimum level of information. Each reduced variable is also interpretable: the original variables map to one and only one variable in the reduced data set. 'partition' is flexible, as well: how variables are selected to reduce, how information loss is measured, and the way data is reduced can all be customized. 'partition' is based on the Partition framework discussed in Millstein et al. (2020) <doi:10.1093/bioinformatics/btz661>.
Maintained by Malcolm Barrett. Last updated 5 months ago.
data-reductiondimensionality-reductionpartitional-clusteringopenblascpp
36 stars 7.72 score 27 scripts 1 dependentsrbgramacy
tgp:Bayesian Treed Gaussian Process Models
Bayesian nonstationary, semiparametric nonlinear regression and design by treed Gaussian processes (GPs) with jumps to the limiting linear model (LLM). Special cases also implemented include Bayesian linear models, CART, treed linear models, stationary separable and isotropic GPs, and GP single-index models. Provides 1-d and 2-d plotting functions (with projection and slice capabilities) and tree drawing, designed for visualization of tgp-class output. Sensitivity analysis and multi-resolution models are supported. Sequential experimental design and adaptive sampling functions are also provided, including ALM, ALC, and expected improvement. The latter supports derivative-free optimization of noisy black-box functions. For details and tutorials, see Gramacy (2007) <doi:10.18637/jss.v019.i09> and Gramacy & Taddy (2010) <doi:10.18637/jss.v033.i06>.
Maintained by Robert B. Gramacy. Last updated 7 months ago.
9 stars 7.36 score 203 scripts 12 dependentsmjskay
ggblend:Blending and Compositing Algebra for 'ggplot2'
Algebra of operations for blending, copying, adjusting, and compositing layers in 'ggplot2'. Supports copying and adjusting the aesthetics or parameters of an existing layer, partitioning a layer into multiple pieces for re-composition, applying affine transformations to layers, and combining layers (or partitions of layers) using blend modes (including commutative blend modes, like multiply and darken). Blend mode support is particularly useful for creating plots with overlapping groups where the layer drawing order does not change the output; see Kindlmann and Scheidegger (2014) <doi:10.1109/TVCG.2014.2346325>.
Maintained by Matthew Kay. Last updated 2 years ago.
186 stars 6.30 score 71 scripts 1 dependentsbeerda
nuggets:Extensible Data Pattern Searching Framework
Extensible framework for subgroup discovery (Atzmueller (2015) <doi:10.1002/widm.1144>), contrast patterns (Chen (2022) <doi:10.48550/arXiv.2209.13556>), emerging patterns (Dong (1999) <doi:10.1145/312129.312191>), association rules (Agrawal (1994) <https://www.vldb.org/conf/1994/P487.PDF>) and conditional correlations (Hájek (1978) <doi:10.1007/978-3-642-66943-9>). Both crisp (Boolean, binary) and fuzzy data are supported. It generates conditions in the form of elementary conjunctions, evaluates them on a dataset and checks the induced sub-data for interesting statistical properties. A user-defined function may be defined to evaluate on each generated condition to search for custom patterns.
Maintained by Michal Burda. Last updated 17 days ago.
association-rule-miningcontrast-pattern-miningdata-miningfuzzyknowledge-discoverypattern-recognitioncppopenmp
2 stars 5.38 score 10 scriptssimonmoulds
lulcc:Land Use Change Modelling in R
Classes and methods for spatially explicit land use change modelling in R.
Maintained by Simon Moulds. Last updated 5 years ago.
41 stars 5.37 score 38 scriptswaternumbers
anomalous:Anomaly Detection using the CAPA and PELT Algorithms
Implimentations of the univariate CAPA <doi:10.1002/sam.11586> and PELT <doi:10.1080/01621459.2012.737745> algotithms along with various cost functions for different distributions and models. The modular design, using R6 classes, favour ease of extension (for example user written cost functions) over the performance of other implimentations (e.g. <doi:10.32614/CRAN.package.changepoint>, <doi:10.32614/CRAN.package.anomaly>).
Maintained by Paul Smith. Last updated 4 months ago.
4.61 score 18 scriptsricardomourarpm
PSinference:Inference for Released Plug-in Sampling Single Synthetic Dataset
Considering the singly imputed synthetic data generated via plug-in sampling under the multivariate normal model, draws inference procedures including the generalized variance, the sphericity test, the test for independence between two subsets of variables, and the test for the regression of one set of variables on the other. For more details see Klein et al. (2021) <doi:10.1007/s13571-019-00215-9>.
Maintained by Ricardo Moura. Last updated 6 months ago.
3 stars 4.13 scorerezamoammadi
liver:"Eating the Liver of Data Science"
Provides a suite of helper functions and a collection of datasets used in the book <https://uncovering-data-science.netlify.app>. It is designed to make data science techniques accessible to individuals with minimal coding experience. Inspired by an ancient Persian idiom, the package likens this learning process to "eating the liver of data science," symbolizing deep and immersive engagement with the field.
Maintained by Reza Mohammadi. Last updated 12 days ago.
4.13 score 67 scriptsbioc
clst:Classification by local similarity threshold
Package for modified nearest-neighbor classification based on calculation of a similarity threshold distinguishing within-group from between-group comparisons.
Maintained by Noah Hoffman. Last updated 5 months ago.
3.78 score 10 scripts 1 dependentschristophe314
longitudinalData:Longitudinal Data
Tools for longitudinal data and joint longitudinal data (used by packages kml and kml3d).
Maintained by Christophe Genolini. Last updated 6 months ago.
1 stars 3.55 score 65 scripts 11 dependentsrcorradin
BNPmix:Bayesian Nonparametric Mixture Models
Functions to perform Bayesian nonparametric univariate and multivariate density estimation and clustering, by means of Pitman-Yor mixtures, and dependent Dirichlet process mixtures for partially exchangeable data. See Corradin et al. (2021) <doi:10.18637/jss.v100.i15> for more details.
Maintained by Riccardo Corradin. Last updated 3 years ago.
3.08 score 24 scriptselilillyco
TSDT:Treatment-Specific Subgroup Detection Tool
Implements a method for identifying subgroups with superior response relative to the overall sample.
Maintained by Brian Denton. Last updated 3 months ago.
2.78 score 60 scriptspy-b
funprog:Functional Programming
High-order functions for data manipulation : sort or group data, given one or more auxiliary functions. Functions are inspired by other pure functional programming languages ('Haskell' mainly). The package also provides built-in function operators for creating compact anonymous functions, as well as the possibility to use the 'purrr' package syntax.
Maintained by Pierre-Yves Berrard. Last updated 4 years ago.
2.70 score 3 scriptsjgraux
DepLogo:Dependency Logo
Plots dependency logos from a set of aligned input sequences.
Maintained by Jan Grau. Last updated 1 years ago.
1 stars 2.41 score 26 scriptscran
intRvals:Analysis of Time-Ordered Event Data with Missed Observations
Calculates event rates and compares means and variances of groups of interval data corrected for missed arrival observations.
Maintained by Adriaan M. Dokter. Last updated 3 years ago.
1.70 scoreglamb85
genpathmox:Pathmox Approach Segmentation Tree Analysis
It provides an interesting solution for handling a high number of segmentation variables in partial least squares structural equation modeling. The package implements the "Pathmox" algorithm (Lamberti, Sanchez, and Aluja,(2016)<doi:10.1002/asmb.2168>) including the F-coefficient test (Lamberti, Sanchez, and Aluja,(2017)<doi:10.1002/asmb.2270>) to detect the path coefficients responsible for the identified differences). The package also allows running the hybrid multi-group approach (Lamberti (2021) <doi:10.1007/s11135-021-01096-9>).
Maintained by Giuseppe Lamberti. Last updated 1 years ago.
1.32 score 21 scripts