R-universe search: topic:data-analysis

sfirke

janitor:Simple Tools for Examining and Cleaning Dirty Data

The main janitor functions can: perfectly format data.frame column names; provide quick counts of variable combinations (i.e., frequency tables and crosstabs); and explore duplicate records. Other janitor functions nicely format the tabulation results. These tabulate-and-report functions approximate popular features of SPSS and Microsoft Excel. This package follows the principles of the "tidyverse" and works well with the pipe function %>%. janitor was built with beginning-to-intermediate R users in mind and is optimized for user-friendliness.

Maintained by Sam Firke. Last updated 3 months ago.

data-analysis data-cleaning data-science dirty-data excel pivot-tables spss tabulations tidyverse

1.4k stars 19.40 score 35k scripts 231 dependents

sebkrantz

collapse:Advanced and Fast Data Transformation

A C/C++ based package for advanced data transformation and statistical computing in R that is extremely fast, class-agnostic, robust and programmer friendly. Core functionality includes a rich set of S3 generic grouped and weighted statistical functions for vectors, matrices and data frames, which provide efficient low-level vectorizations, OpenMP multithreading, and skip missing values by default. These are integrated with fast grouping and ordering algorithms (also callable from C), and efficient data manipulation functions. The package also provides a flexible and rigorous approach to time series and panel data in R. It further includes fast functions for common statistical procedures, detailed (grouped, weighted) summary statistics, powerful tools to work with nested data, fast data object conversions, functions for memory efficient R programming, and helpers to effectively deal with variable labels, attributes, and missing data. It is well integrated with base R classes, 'dplyr'/'tibble', 'data.table', 'sf', 'units', 'plm' (panel-series and data frames), and 'xts'/'zoo'.

Maintained by Sebastian Krantz. Last updated 8 days ago.

data-aggregation data-analysis data-manipulation data-processing data-science data-transformation econometrics high-performance panel-data scientific-computing statistics time-series weighted weights cpp openmp

672 stars 16.68 score 708 scripts 99 dependents

aphalo

ggpmisc:Miscellaneous Extensions to 'ggplot2'

Extensions to 'ggplot2' respecting the grammar of graphics paradigm. Statistics: locate and tag peaks and valleys; label plot with the equation of a fitted polynomial or other types of models; labels with P-value, R^2 or adjusted R^2 or information criteria for fitted models; label with ANOVA table for fitted models; label with summary for fitted models. Model fit classes for which suitable methods are provided by package 'broom' and 'broom.mixed' are supported. Scales and stats to build volcano and quadrant plots based on outcomes, fold changes, p-values and false discovery rates.

Maintained by Pedro J. Aphalo. Last updated 2 hours ago.

data-analysis dataviz ggplot2-annotations ggplot2-stats statistics

107 stars 13.64 score 4.4k scripts 14 dependents

bioc

plyranges:A fluent interface for manipulating GenomicRanges

A dplyr-like interface for interacting with the common Bioconductor classes Ranges and GenomicRanges. By providing a grammatical and consistent way of manipulating these classes their accessiblity for new Bioconductor users is hopefully increased.

Maintained by Michael Love. Last updated 10 days ago.

infrastructure datarepresentation workflowstep coverage bioconductor data-analysis dplyr genomic-ranges genomics tidy-data

144 stars 12.66 score 1.9k scripts 20 dependents

boxuancui

DataExplorer:Automate Data Exploration and Treatment

Automated data exploration process for analytic tasks and predictive modeling, so that users could focus on understanding data and extracting insights. The package scans and analyzes each variable, and visualizes them with typical graphical techniques. Common data processing methods are also available to treat and format data.

Maintained by Boxuan Cui. Last updated 1 years ago.

data-analysis data-exploration data-science eda visualization

523 stars 11.21 score 2.2k scripts

acclab

dabestr:Data Analysis using Bootstrap-Coupled Estimation

Data Analysis using Bootstrap-Coupled ESTimation. Estimation statistics is a simple framework that avoids the pitfalls of significance testing. It uses familiar statistical concepts: means, mean differences, and error bars. More importantly, it focuses on the effect size of one's experiment/intervention, as opposed to a false dichotomy engendered by P values. An estimation plot has two key features: 1. It presents all datapoints as a swarmplot, which orders each point to display the underlying distribution. 2. It presents the effect size as a bootstrap 95% confidence interval on a separate but aligned axes. Estimation plots are introduced in Ho et al., Nature Methods 2019, 1548-7105. <doi:10.1038/s41592-019-0470-3>. The free-to-view PDF is located at <https://www.nature.com/articles/s41592-019-0470-3.epdf?author_access_token=Euy6APITxsYA3huBKOFBvNRgN0jAjWel9jnR3ZoTv0Pr6zJiJ3AA5aH4989gOJS_dajtNr1Wt17D0fh-t4GFcvqwMYN03qb8C33na_UrCUcGrt-Z0J9aPL6TPSbOxIC-pbHWKUDo2XsUOr3hQmlRew%3D%3D>.

Maintained by Yishan Mai. Last updated 1 years ago.

data-analysis data-visualization estimation statistics

214 stars 9.80 score 142 scripts

great-northern-diver

loon:Interactive Statistical Data Visualization

An extendable toolkit for interactive data visualization and exploration.

Maintained by R. Wayne Oldford. Last updated 2 years ago.

data-analysis data-science data-visualization exploratory-analysis exploratory-data-analysis high-dimensional-data interactive-graphics interactive-visualizations loon python statistical-analysis statistical-graphics statistics tcl-extension tk

48 stars 9.00 score 93 scripts 5 dependents

jpquast

protti:Bottom-Up Proteomics and LiP-MS Quality Control and Data Analysis Tools

Useful functions and workflows for proteomics quality control and data analysis of both limited proteolysis-coupled mass spectrometry (LiP-MS) (Feng et. al. (2014) <doi:10.1038/nbt.2999>) and regular bottom-up proteomics experiments. Data generated with search tools such as 'Spectronaut', 'MaxQuant' and 'Proteome Discover' can be easily used due to flexibility of functions.

Maintained by Jan-Philipp Quast. Last updated 5 months ago.

data-analysis lip-ms mass-spectrometry omics protein proteomics systems-biology

63 stars 8.51 score 83 scripts

nceas

metajam:Easily Download Data and Metadata from 'DataONE'

A set of tools to foster the development of reproducible analytical workflow by simplifying the download of data and metadata from 'DataONE' (<https://www.dataone.org>) and easily importing this information into R.

Maintained by Julien Brun. Last updated 7 months ago.

data data-analysis metadata repositories

16 stars 8.21 score 75 scripts

psychbruce

bruceR:Broadly Useful Convenient and Efficient R Functions

Broadly useful convenient and efficient R functions that bring users concise and elegant R data analyses. This package includes easy-to-use functions for (1) basic R programming (e.g., set working directory to the path of currently opened file; import/export data from/to files in any format; print tables to Microsoft Word); (2) multivariate computation (e.g., compute scale sums/means/... with reverse scoring); (3) reliability analyses and factor analyses; (4) descriptive statistics and correlation analyses; (5) t-test, multi-factor analysis of variance (ANOVA), simple-effect analysis, and post-hoc multiple comparison; (6) tidy report of statistical models (to R Console and Microsoft Word); (7) mediation and moderation analyses (PROCESS); and (8) additional toolbox for statistics and graphics.

Maintained by Han-Wu-Shuang Bao. Last updated 10 months ago.

anova data-analysis data-science linear-models linear-regression multilevel-models statistics toolbox

176 stars 7.87 score 316 scripts 3 dependents

tbep-tech

tbeptools:Data and Indicators for the Tampa Bay Estuary Program

Several functions are provided for working with Tampa Bay Estuary Program data and indicators, including the water quality report card, tidal creek assessments, Tampa Bay Nekton Index, Tampa Bay Benthic Index, seagrass transect data, habitat report card, and fecal indicator bacteria. Additional functions are provided for miscellaneous tasks, such as reference library curation.

Maintained by Marcus Beck. Last updated 1 days ago.

data-analysis tampa-bay tbep water-quality

10 stars 7.86 score 133 scripts

gagolews

genieclust:Fast and Robust Hierarchical Clustering with Noise Points Detection

A retake on the Genie algorithm (Gagolewski, 2021 <DOI:10.1016/j.softx.2021.100722>), which is a robust hierarchical clustering method (Gagolewski, Bartoszuk, Cena, 2016 <DOI:10.1016/j.ins.2016.05.003>). It is now faster and more memory efficient; determining the whole cluster hierarchy for datasets of 10M points in low dimensional Euclidean spaces or 100K points in high-dimensional ones takes only a minute or so. Allows clustering with respect to mutual reachability distances so that it can act as a noise point detector or a robustified version of 'HDBSCAN*' (that is able to detect a predefined number of clusters and hence it does not dependent on the somewhat fragile 'eps' parameter). The package also features an implementation of inequality indices (e.g., Gini and Bonferroni), external cluster validity measures (e.g., the normalised clustering accuracy, the adjusted Rand index, the Fowlkes-Mallows index, and normalised mutual information), and internal cluster validity indices (e.g., the Calinski-Harabasz, Davies-Bouldin, Ball-Hall, Silhouette, and generalised Dunn indices). See also the 'Python' version of 'genieclust' available on 'PyPI', which supports sparse data, more metrics, and even larger datasets.

Maintained by Marek Gagolewski. Last updated 9 days ago.

cluster-analysis clustering clustering-algorithm data-analysis data-mining data-science genie hdbscan hierarchical-clustering hierarchical-clustering-algorithm machine-learning machine-learning-algorithms mlpack nmslib python python3 sparse cpp openmp

61 stars 7.33 score 13 scripts 5 dependents

airoldilab

sgd:Stochastic Gradient Descent for Scalable Estimation

A fast and flexible set of tools for large scale estimation. It features many stochastic gradient methods, built-in models, visualization tools, automated hyperparameter tuning, model checking, interval estimation, and convergence diagnostics.

Maintained by Junhyung Lyle Kim. Last updated 1 years ago.

big-data data-analysis gradient-descent statistics openblas cpp

62 stars 7.25 score 71 scripts

capitalone

dataCompareR:Compare Two Data Frames and Summarise the Difference

Easy comparison of two tabular data objects in R. Specifically designed to show differences between two sets of data in a useful way that should make it easier to understand the differences, and if necessary, help you work out how to remedy them. Aims to offer a more useful output than all.equal() when your two data sets do not match, but isn't intended to replace all.equal() as a way to test for equality.

Maintained by Sarah Johnston. Last updated 2 years ago.

compare-data data data-analysis data-science

76 stars 7.24 score 76 scripts

petolau

TSrepr:Time Series Representations

Methods for representations (i.e. dimensionality reduction, preprocessing, feature extraction) of time series to help more accurate and effective time series data mining. Non-data adaptive, data adaptive, model-based and data dictated (clipped) representation methods are implemented. Also various normalisation methods (min-max, z-score, Box-Cox, Yeo-Johnson), and forecasting accuracy measures are implemented.

Maintained by Peter Laurinec. Last updated 5 years ago.

data-analysis data-mining data-mining-algorithms data-science representation time-series time-series-analysis time-series-classification time-series-clustering time-series-data-mining time-series-representations cpp

97 stars 7.23 score 117 scripts

great-northern-diver

loon.ggplot:A Grammar of Interactive Graphics

Provides a bridge between the 'loon' and 'ggplot2' packages. Extends the grammar of ggplot to add clauses to create interactive 'loon' plots. Existing ggplot(s) can be turned into interactive 'loon' plots and 'loon' plots into static ggplot(s); the function 'loon.ggplot()' is the bridge from one plot structure to the other.

Maintained by Zehao Xu. Last updated 11 months ago.

data-analysis ggplot ggplot-features graphics interactive-plots loon visualizations

24 stars 7.11 score 9 scripts 3 dependents

davidchall

ipaddress:Data Analysis for IP Addresses and Networks

Classes and functions for working with IP (Internet Protocol) addresses and networks, inspired by the Python 'ipaddress' module. Offers full support for both IPv4 and IPv6 (Internet Protocol versions 4 and 6) address spaces. It is specifically designed to work well with the 'tidyverse'.

Maintained by David Hall. Last updated 1 years ago.

cyber data-analysis ip-address ipv4 ipv6 vctrs cpp

32 stars 7.02 score 27 scripts 2 dependents

mingzehuang

latentcor:Fast Computation of Latent Correlations for Mixed Data

The first stand-alone R package for computation of latent correlation that takes into account all variable types (continuous/binary/ordinal/zero-inflated), comes with an optimized memory footprint, and is computationally efficient, essentially making latent correlation estimation almost as fast as rank-based correlation estimation. The estimation is based on latent copula Gaussian models. For continuous/binary types, see Fan, J., Liu, H., Ning, Y., and Zou, H. (2017). For ternary type, see Quan X., Booth J.G. and Wells M.T. (2018) <arXiv:1809.06255>. For truncated type or zero-inflated type, see Yoon G., Carroll R.J. and Gaynanova I. (2020) <doi:10.1093/biomet/asaa007>. For approximation method of computation, see Yoon G., Müller C.L. and Gaynanova I. (2021) <doi:10.1080/10618600.2021.1882468>. The latter method uses multi-linear interpolation originally implemented in the R package <https://cran.r-project.org/package=chebpol>.

Maintained by Mingze Huang. Last updated 3 years ago.

data-analysis data-mining data-processing data-science data-structures machine-learning mixed-types statistics

16 stars 6.65 score 46 scripts 1 dependents

serkor1

SLmetrics:Machine Learning Performance Evaluation on Steroids

Performance evaluation metrics for supervised and unsupervised machine learning, statistical learning and artificial intelligence applications. Core computations are implemented in 'C++' for scalability and efficiency.

Maintained by Serkan Korkmaz. Last updated 1 days ago.

cpp data-analysis data-science eigen3 machine-learning performance-metrics rcpp rcppeigen statistics supervised-learning cpp openmp

22 stars 6.56 score

jmgirard

circumplex:Analysis and Visualization of Circular Data

Circumplex models, which organize constructs in a circle around two underlying dimensions, are popular for studying interpersonal functioning, mood/affect, and vocational preferences/environments. This package provides tools for analyzing and visualizing circular data, including scoring functions for relevant instruments and a generalization of the bootstrapped structural summary method from Zimmermann & Wright (2017) <doi:10.1177/1073191115621795> and functions for creating publication-ready tables and figures from the results.

Maintained by Jeffrey Girard. Last updated 5 months ago.

circular circumplex data-analysis ggplot2 interpersonal psychology rcpparmadillo tidyverse openblas cpp openmp

11 stars 6.54 score 52 scripts

r-spark

sparklyr.flint:Sparklyr Extension for 'Flint'

This sparklyr extension makes 'Flint' time series library functionalities (<https://github.com/twosigma/flint>) easily accessible through R.

Maintained by Edgar Ruiz. Last updated 3 years ago.

apache-spark data-analysis data-mining data-science distributed distributed-computing flint remote-clusters spark sparklyr statistical-analysis statistics stats summarization summary-statistics time-series time-series-analysis twosigma-flint

9 stars 6.46 score 54 scripts

pgomba

MDPIexploreR:Web Scraping and Bibliometric Analysis of MDPI Journals

Provides comprehensive tools to scrape and analyze data from the MDPI journals. It allows users to extract metrics such as submission-to-acceptance times, article types, and whether articles are part of special issues. The package can also visualize this information through plots. Additionally, 'MDPIexploreR' offers tools to explore patterns of self-citations within articles and provides insights into guest-edited special issues.

Maintained by Pablo Gómez Barreiro. Last updated 10 days ago.

analysis data-analysis data-visualization mdpi metrics scientific-journals visualization web-scraping

20 stars 6.26 score 9 scripts

jrdnbradford

readMDTable:Read Markdown Tables into Tibbles

Efficient reading of raw markdown tables into tibbles. Designed to accept content from strings, files, and URLs with the ability to extract and read multiple tables from markdown for analysis.

Maintained by Jordan Bradford. Last updated 2 months ago.

data data-analysis data-analytics data-extraction data-mining data-science markdown markdown-parser markdown-table r-programming

7 stars 6.10 score 3 scripts 1 dependents

brad-cannell

freqtables:Make Quick Descriptive Tables for Categorical Variables

Quickly make tables of descriptive statistics (i.e., counts, percentages, confidence intervals) for categorical variables. This package is designed to work in a Tidyverse pipeline, and consideration has been given to get results from R to Microsoft Word ® with minimal pain.

Maintained by Brad Cannell. Last updated 1 years ago.

categorical-data data-analysis descriptive-statistics epidemiology

12 stars 6.00 score 84 scripts

nelson-gon

mde:Missing Data Explorer

Correct identification and handling of missing data is one of the most important steps in any analysis. To aid this process, 'mde' provides a very easy to use yet robust framework to quickly get an idea of where the missing data lies and therefore find the most appropriate action to take. Graham WJ (2009) <doi:10.1146/annurev.psych.58.110405.085530>.

Maintained by Nelson Gonzabato. Last updated 3 years ago.

data-analysis data-cleaning data-exploration data-science datacleaner datacleaning exploratory-data-analysis missing missing-data missing-value-treatment missing-values missingness omit recode replace statistics

4 stars 5.61 score 34 scripts

jmaasch

sanzo:Color Palettes Based on the Works of Sanzo Wada

Inspired by the art and color research of Sanzo Wada (1883-1967), his "Dictionary Of Color Combinations" (2011, ISBN:978-4861522475), and the interactive site by Dain M. Blodorn Kim <https://github.com/dblodorn/sanzo-wada>, this package brings Wada's color combinations to R for easy use in data visualizations. This package honors 60 of Wada's color combinations: 20 duos, 20 trios, and 20 quads.

Maintained by Jacqueline Maasch. Last updated 5 years ago.

color-palettes data-analysis data-science data-visualization sanzo-wada visualizations

30 stars 5.41 score 17 scripts

tanaylab

naryn:Native Access Medical Record Retriever for High Yield Analytics

A toolkit for medical records data analysis. The 'naryn' package implements an efficient data structure for storing medical records, and provides a set of functions for data extraction, manipulation and analysis.

Maintained by Aviezer Lifshitz. Last updated 12 days ago.

data-analysis medical-records cpp

3 stars 5.38 score 4 scripts

c4tb

shinyExprPortal:A Configurable 'shiny' Portal for Sharing Analysis of Molecular Expression Data

Enables deploying configuration file-based 'shiny' apps with minimal programming for interactive exploration and analysis showcase of molecular expression data. For exploration, supports visualization of correlations between rows of an expression matrix and a table of observations, such as clinical measures, and comparison of changes in expression over time. For showcase, enables visualizing the results of differential expression from package such as 'limma', co-expression modules from 'WGCNA' and lower dimensional projections.

Maintained by Rafael Henkin. Last updated 8 months ago.

bioinformatics data-analysis transcriptomics

5 stars 5.30 score 8 scripts

yuanchao-xu

gfer:Green Finance and Environmental Risk

Focuses on data collecting, analyzing and visualization in green finance and environmental risk research and analysis. Main function includes environmental data collecting from official websites such as MEP (Ministry of Environmental Protection of China, <https://www.mee.gov.cn>), water related projects identification and environmental data visualization.

Maintained by Yuanchao Xu. Last updated 11 days ago.

corporate-social-responsibility csr data-analysis data-scraping environmental-risk green-finance stock-data

8 stars 5.11 score 16 scripts

rapidsurveys

oldr:An Implementation of Rapid Assessment Method for Older People

An implementation of the Rapid Assessment Method for Older People or RAM-OP <https://www.helpage.org/resource/rapid-assessment-method-for-older-people-ramop-manual/>. It provides various functions that allow the user to design and plan the assessment and analyse the collected data. RAM-OP provides accurate and reliable estimates of the needs of older people.

Maintained by Ernest Guevarra. Last updated 2 months ago.

assessment data-analysis odk ram-op rapid-assessment

2 stars 5.00 score 4 scripts

mlr-org

mlr3fda:Extending 'mlr3' to Functional Data Analysis

Extends the 'mlr3' ecosystem to functional analysis by adding support for irregular and regular functional data as defined in the 'tf' package. The package provides 'PipeOps' for preprocessing functional columns and for extracting scalar features, thereby allowing standard machine learning algorithms to be applied afterwards. Available operations include simple functional features such as the mean or maximum, smoothing, interpolation, flattening, and functional 'PCA'.

Maintained by Sebastian Fischer. Last updated 8 months ago.

data-analysis data-analysis-in-r data-science functional-data machine-learning mlr3

5 stars 4.95 score 5 scripts

mirzaghaderi

rtpcr:qPCR Data Analysis

Various methods are employed for statistical analysis and graphical presentation of real-time PCR (quantitative PCR or qPCR) data. 'rtpcr' handles amplification efficiency calculation, statistical analysis and graphical representation of real-time PCR data based on up to two reference genes. By accounting for amplification efficiency values, 'rtpcr' was developed using a general calculation method described by Ganger et al. (2017) <doi:10.1186/s12859-017-1949-5> and Taylor et al. (2019) <doi:10.1016/j.tibtech.2018.12.002>, covering both the Livak and Pfaffl methods. Based on the experimental conditions, the functions of the 'rtpcr' package use t-test (for experiments with a two-level factor), analysis of variance (ANOVA), analysis of covariance (ANCOVA) or analysis of repeated measure data to calculate the fold change (FC, Delta Delta Ct method) or relative expression (RE, Delta Ct method). The functions further provide standard errors and confidence intervals for means, apply statistical mean comparisons and present significance. To facilitate function application, different data sets were used as examples and the outputs were explained. ‘rtpcr’ package also provides bar plots using various controlling arguments. The 'rtpcr' package is user-friendly and easy to work with and provides an applicable resource for analyzing real-time PCR data.

Maintained by Ghader Mirzaghaderi. Last updated 4 days ago.

data-analysis qpcr

1 stars 4.90 score 3 scripts

rhenkin

visxhclust:A Shiny App for Visual Exploration of Hierarchical Clustering

A Shiny application and functions for visual exploration of hierarchical clustering with numeric datasets. Allows users to iterative set hyperparameters, select features and evaluate results through various plots and computation of evaluation criteria.

Maintained by Rafael Henkin. Last updated 2 years ago.

clustering data-analysis data-science r-shiny shiny-apps

4 stars 4.86 score 12 scripts

sndmrc

BasketballAnalyzeR:Analysis and Visualization of Basketball Data

Contains data and code to accompany the book P. Zuccolotto and M. Manisera (2020) Basketball Data Science. Applications with R. CRC Press. ISBN 9781138600799.

Maintained by Marco Sandri. Last updated 2 years ago.

basketball-stats data-analysis data-science

35 stars 4.83 score 39 scripts

smaakage85

recorder:Toolkit to Validate New Data for a Predictive Model

A lightweight toolkit to validate new observations when computing their predictions with a predictive model. The validation process consists of two steps: (1) record relevant statistics and meta data of the variables in the original training data for the predictive model and (2) use these data to run a set of basic validation tests on the new set of observations.

Maintained by Lars Kjeldgaard. Last updated 6 years ago.

data-analysis machine-learning predictive-analytics predictive-modeling

4 stars 4.78 score 6 scripts

jatanrt

eprscope:Processing and Analysis of Electron Paramagnetic Resonance Data and Spectra in Chemistry

Processing, analysis and plottting of Electron Paramagnetic Resonance (EPR) spectra in chemistry. Even though the package is mainly focused on continuous wave (CW) EPR/ENDOR, many functions may be also used for the integrated forms of 1D PULSED EPR spectra. It is able to find the most important spectral characteristics like g-factor, linewidth, maximum of derivative or integral intensities and single/double integrals. This is especially important in spectral (time) series consisting of many EPR spectra like during variable temperature experiments, electrochemical or photochemical radical generation and/or decay. Package also enables processing of data/spectra for the analytical (quantitative) purposes. Namely, how many radicals or paramagnetic centers can be found in the analyte/sample. The goal is to evaluate rate constants, considering different kinetic models, to describe the radical reactions. The key feature of the package resides in processing of the universal ASCII text formats (such as '.txt', '.csv' or '.asc') from scratch. No proprietary formats are used (except the MATLAB EasySpin outputs) and in such respect the package is in accordance with the FAIR data principles. Upon 'reading' (also providing automatic procedures for the most common EPR spectrometers) the spectral data are transformed into the universal R 'data frame' format. Subsequently, the EPR spectra can be visualized and are fully consistent either with the 'ggplot2' package or with the interactive formats based on 'plotly'. Additionally, simulations and fitting of the isotropic EPR spectra are also included in the package. Advanced simulation parameters provided by the MATLAB-EasySpin toolbox and results from the quantum chemical calculations like g-factor and hyperfine splitting/coupling constants (a/A) can be compared and summarized in table-format in order to analyze the EPR spectra by the most effective way.

Maintained by Ján Tarábek. Last updated 2 days ago.

chemistry data-analysis data-visualization epr esr fitting optimization programming-language reproducible-research scientific-plotting spectroscopy openjdk

4.76 score 7 scripts

gagolews

genie:Fast, Robust, and Outlier Resistant Hierarchical Clustering

Includes the reference implementation of Genie - a hierarchical clustering algorithm that links two point groups in such a way that an inequity measure (namely, the Gini index) of the cluster sizes does not significantly increase above a given threshold. This method most often outperforms many other data segmentation approaches in terms of clustering quality as tested on a wide range of benchmark datasets. At the same time, Genie retains the high speed of the single linkage approach, therefore it is also suitable for analysing larger data sets. For more details see (Gagolewski et al. 2016 <DOI:10.1016/j.ins.2016.05.003>). For an even faster and more feature-rich implementation, including, amongst others, noise point detection, see the 'genieclust' package (Gagolewski, 2021 <DOI:10.1016/j.softx.2021.100722>).

Maintained by Marek Gagolewski. Last updated 3 years ago.

cluster cluster-analysis clustering data-analysis data-mining data-science datascience genie hierarchical-clustering-algorithm machine-learning machine-learning-algorithms outliers cpp openmp

22 stars 4.55 score 16 scripts

bioc

PRONE:The PROteomics Normalization Evaluator

High-throughput omics data are often affected by systematic biases introduced throughout all the steps of a clinical study, from sample collection to quantification. Normalization methods aim to adjust for these biases to make the actual biological signal more prominent. However, selecting an appropriate normalization method is challenging due to the wide range of available approaches. Therefore, a comparative evaluation of unnormalized and normalized data is essential in identifying an appropriate normalization strategy for a specific data set. This R package provides different functions for preprocessing, normalizing, and evaluating different normalization approaches. Furthermore, normalization methods can be evaluated on downstream steps, such as differential expression analysis and statistical enrichment analysis. Spike-in data sets with known ground truth and real-world data sets of biological experiments acquired by either tandem mass tag (TMT) or label-free quantification (LFQ) can be analyzed.

Maintained by Lis Arend. Last updated 9 days ago.

proteomics preprocessing normalization differentialexpression visualization data-analysis evaluation

2 stars 4.41 score 9 scripts

tbep-tech

peptools:Analysis Tools for Importing, Wrangling, and Summarizing Suffolk County Water Quality Data

Analysis tools for importing, wrangling, and summarizing Suffolk County water quality data. Functions are used to create reporting materials.

Maintained by Marcus Beck. Last updated 1 years ago.

data-analysis pep water-quality

4.33 score 54 scripts

timbeechey

clubpro:Classification Using Binary Procrustes Rotation

Implements a classification method described by Grice (2011, ISBN:978-0-12-385194-9) using binary procrustes rotation; a simplified version of procrustes rotation.

Maintained by Timothy Beechey. Last updated 10 months ago.

classification data-analysis psychology-experiments rcpp statistical-analysis statistics openblas cpp openmp

4.30 score 2 scripts

photosynq

PhotosynQ:Connect to PhotosynQ

Connect R to the PhotosynQ platform (<https://photosynq.org>). It allows to login and logout, as well as receive project information and project data. Further it transforms the received JSON objects into a data frame, which can be used for the final data analysis.

Maintained by Sebastian Kuhlgert. Last updated 4 years ago.

data-analysis photosynq rstudio

5 stars 4.24 score

petulla

readroper:Simply Read ASCII Single and Multicard Polling Datasets

A convenient way to read fixed-width ASCII polling datasets from providers like the Roper Center <https://ropercenter.cornell.edu>.

Maintained by Sam Petulla. Last updated 5 years ago.

ascii data-analysis polling-data

3 stars 4.18 score 3 scripts

globeandmail

upstartr:Utilities Powering the Globe and Mail's Data Journalism Template

Core functions necessary for using The Globe and Mail's R data journalism template, 'startr', along with utilities for day-to-day data journalism tasks, such as reading and writing files, producing graphics and cleaning up datasets.

Maintained by Tom Cardoso. Last updated 1 years ago.

data data-analysis data-journalism data-visualization journalism news

6 stars 4.14 score 46 scripts

devpsylab

petersenlab:A Collection of R Functions by the Petersen Lab

A collection of R functions that are widely used by the Petersen Lab. Included are functions for various purposes, including evaluating the accuracy of judgments and predictions, performing scoring of assessments, generating correlation matrices, conversion of data between various types, data management, psychometric evaluation, extensions related to latent variable modeling, various plotting capabilities, and other miscellaneous useful functions. By making the package available, we hope to make our methods reproducible and replicable by others and to help others perform their data processing and analysis methods more easily and efficiently. The codebase is provided in Petersen (2025) <doi:10.5281/zenodo.7602890> and on 'CRAN': <doi: 10.32614/CRAN.package.petersenlab>. The package is described in "Principles of Psychological Assessment: With Applied Examples in R" (Petersen, 2024, 2025) <doi:10.1201/9781003357421>, <doi:10.25820/work.007199>, <doi:10.5281/zenodo.6466589>.

Maintained by Isaac T. Petersen. Last updated 1 months ago.

data-analysis data-analysis-in-r data-management psychometrics

1 stars 4.10 score 1 scripts

jasdumas

ttbbeer:US Beer Statistics from TTB

U.S. Department of the Treasury, Alcohol and Tobacco Tax and Trade Bureau (TTB) collects data and reports on monthly beer industry production and operations. This data package includes a collection of 10 years (2006 - 2015) worth of data on materials used at U.S. breweries in pounds reported by the Brewer's Report of Operations and the Quarterly Brewer's Report of Operations forms, ready for data analysis. This package also includes historical tax rates on distilled spirits, wine, beer, champagne, and tobacco products as individual data sets.

Maintained by Jasmine Dumas. Last updated 8 months ago.

beer-statistics data-analysis

23 stars 4.06 score 10 scripts

tbep-tech

tbeploads:Calculate Loading Data to Tampa Bay

Loading data from major sources to Tampa Bay are calculated on a monthly or annual basis. Major sources include domestic point source (reuse, end of pipe), industrial point source, material losses, non-point sources (MS4), atmospheric deposition, and groundwater.

Maintained by Marcus Beck. Last updated 8 months ago.

data-analysis loads tampa-bay tbep tbnmc water-quality

3.81 score 3 scripts

sherrisherry

cleandata:To Inspect and Manipulate Data; and to Keep Track of This Process

Functions to work with data frames to prepare data for further analysis. The functions for imputation, encoding, partitioning, and other manipulation can produce log files to keep track of process.

Maintained by Sherry Zhao. Last updated 6 years ago.

data-analysis data-mining machine-learning wrangling

3 stars 3.72 score 35 scripts

timbeechey

opa:An Implementation of Ordinal Pattern Analysis

Quantifies hypothesis to data fit for repeated measures and longitudinal data, as described by Thorngate (1987) <doi:10.1016/S0166-4115(08)60083-7> and Grice et al., (2015) <doi:10.1177/2158244015604192>. Hypothesis and data are encoded as pairwise relative orderings which are then compared to determine the percentage of orderings in the data that are matched by the hypothesis.

Maintained by Timothy Beechey. Last updated 1 years ago.

data-analysis hypothesis-testing longitudinal ordinal rcpp repeated-measures statistics cpp

1 stars 3.70 score 2 scripts

flalom

drugsens:Automated Analysis of 'QuPath' Output Data and Metadata Extraction

A comprehensive toolkit for analyzing microscopy data output from 'QuPath' software. Provides functionality for automated data processing, metadata extraction, and statistical analysis of imaging results. The methodology implemented in this package is based on Labrosse et al. (2024) <doi:10.1016/j.xpro.2024.103274> "Protocol for quantifying drug sensitivity in 3D patient-derived ovarian cancer models", which describes the complete workflow for drug sensitivity analysis in patient-derived cancer models.

Maintained by Flavio Lombardo. Last updated 2 months ago.

data-analysis image-processing qupath workflow

3.48 score 1 scripts

leef-uzh

LEEF:Data Package Containing Only Data and Data Information

Setup package for the LEEF pipeline which loads / installs all necessary packages and functions to run the pipeline.

Maintained by Rainer M. Krug. Last updated 3 years ago.

data-analysis data-processing leef

2.95 score

ghurault

HuraultMisc:Guillem Hurault Functions' Library

Contains various functions for data analysis, notably helpers and diagnostics for Bayesian modelling using Stan.

Maintained by Guillem Hurault. Last updated 4 months ago.

bayesian-statistics data-analysis statistical-models

2.95 score 18 scripts

roaldarbol

anibehavr:Analyse Animal Behaviours

What the package does (one paragraph).

Maintained by Mikkel Roald-Arbøl. Last updated 11 months ago.

animal-behavior behavioural-states data-analysis

3 stars 2.78 score 2 scripts

amoneva

cacc:Conjunctive Analysis of Case Configurations

A set of functions to conduct Conjunctive Analysis of Case Configurations (CACC) as described in Miethe, Hart, and Regoeczi (2008) <doi:10.1007/s10940-008-9044-8>, and identify and quantify situational clustering in dominant case configurations as described in Hart (2019) <doi:10.1177/0011128719866123>. Initially conceived as an exploratory technique for multivariate analysis of categorical data, CACC has developed to include formal statistical tests that can be applied in a wide variety of contexts. This technique allows examining composite profiles of different units of analysis in an alternative way to variable-oriented methods.

Maintained by Asier Moneva. Last updated 6 months ago.

criminology data-analysis social-science

2.70 score 5 scripts

alphaprime7

tidyDenovix:Cleans Spectrophotometry Data Obtained from the Denovix DS-11 Instrument

Cleans spectrophotometry data obtained from the Denovix instrument. The package also provides an option to normalize the data in order to compare the quality of the samples obtained.

Maintained by Tingwei Adeck. Last updated 9 months ago.

data-analysis dna research rna spectrophotometry

1 stars 2.70 score 2 scripts