Showing 54 of total 54 results (show query)
sfirke
janitor:Simple Tools for Examining and Cleaning Dirty Data
The main janitor functions can: perfectly format data.frame column names; provide quick counts of variable combinations (i.e., frequency tables and crosstabs); and explore duplicate records. Other janitor functions nicely format the tabulation results. These tabulate-and-report functions approximate popular features of SPSS and Microsoft Excel. This package follows the principles of the "tidyverse" and works well with the pipe function %>%. janitor was built with beginning-to-intermediate R users in mind and is optimized for user-friendliness.
Maintained by Sam Firke. Last updated 3 months ago.
data-analysisdata-cleaningdata-sciencedirty-dataexcelpivot-tablesspsstabulationstidyverse
1.4k stars 19.40 score 35k scripts 231 dependentssebkrantz
collapse:Advanced and Fast Data Transformation
A C/C++ based package for advanced data transformation and statistical computing in R that is extremely fast, class-agnostic, robust and programmer friendly. Core functionality includes a rich set of S3 generic grouped and weighted statistical functions for vectors, matrices and data frames, which provide efficient low-level vectorizations, OpenMP multithreading, and skip missing values by default. These are integrated with fast grouping and ordering algorithms (also callable from C), and efficient data manipulation functions. The package also provides a flexible and rigorous approach to time series and panel data in R. It further includes fast functions for common statistical procedures, detailed (grouped, weighted) summary statistics, powerful tools to work with nested data, fast data object conversions, functions for memory efficient R programming, and helpers to effectively deal with variable labels, attributes, and missing data. It is well integrated with base R classes, 'dplyr'/'tibble', 'data.table', 'sf', 'units', 'plm' (panel-series and data frames), and 'xts'/'zoo'.
Maintained by Sebastian Krantz. Last updated 8 days ago.
data-aggregationdata-analysisdata-manipulationdata-processingdata-sciencedata-transformationeconometricshigh-performancepanel-datascientific-computingstatisticstime-seriesweightedweightscppopenmp
672 stars 16.68 score 708 scripts 99 dependentsaphalo
ggpmisc:Miscellaneous Extensions to 'ggplot2'
Extensions to 'ggplot2' respecting the grammar of graphics paradigm. Statistics: locate and tag peaks and valleys; label plot with the equation of a fitted polynomial or other types of models; labels with P-value, R^2 or adjusted R^2 or information criteria for fitted models; label with ANOVA table for fitted models; label with summary for fitted models. Model fit classes for which suitable methods are provided by package 'broom' and 'broom.mixed' are supported. Scales and stats to build volcano and quadrant plots based on outcomes, fold changes, p-values and false discovery rates.
Maintained by Pedro J. Aphalo. Last updated 2 hours ago.
data-analysisdatavizggplot2-annotationsggplot2-statsstatistics
107 stars 13.64 score 4.4k scripts 14 dependentsbioc
plyranges:A fluent interface for manipulating GenomicRanges
A dplyr-like interface for interacting with the common Bioconductor classes Ranges and GenomicRanges. By providing a grammatical and consistent way of manipulating these classes their accessiblity for new Bioconductor users is hopefully increased.
Maintained by Michael Love. Last updated 10 days ago.
infrastructuredatarepresentationworkflowstepcoveragebioconductordata-analysisdplyrgenomic-rangesgenomicstidy-data
144 stars 12.66 score 1.9k scripts 20 dependentsboxuancui
DataExplorer:Automate Data Exploration and Treatment
Automated data exploration process for analytic tasks and predictive modeling, so that users could focus on understanding data and extracting insights. The package scans and analyzes each variable, and visualizes them with typical graphical techniques. Common data processing methods are also available to treat and format data.
Maintained by Boxuan Cui. Last updated 1 years ago.
data-analysisdata-explorationdata-scienceedavisualization
523 stars 11.21 score 2.2k scriptsacclab
dabestr:Data Analysis using Bootstrap-Coupled Estimation
Data Analysis using Bootstrap-Coupled ESTimation. Estimation statistics is a simple framework that avoids the pitfalls of significance testing. It uses familiar statistical concepts: means, mean differences, and error bars. More importantly, it focuses on the effect size of one's experiment/intervention, as opposed to a false dichotomy engendered by P values. An estimation plot has two key features: 1. It presents all datapoints as a swarmplot, which orders each point to display the underlying distribution. 2. It presents the effect size as a bootstrap 95% confidence interval on a separate but aligned axes. Estimation plots are introduced in Ho et al., Nature Methods 2019, 1548-7105. <doi:10.1038/s41592-019-0470-3>. The free-to-view PDF is located at <https://www.nature.com/articles/s41592-019-0470-3.epdf?author_access_token=Euy6APITxsYA3huBKOFBvNRgN0jAjWel9jnR3ZoTv0Pr6zJiJ3AA5aH4989gOJS_dajtNr1Wt17D0fh-t4GFcvqwMYN03qb8C33na_UrCUcGrt-Z0J9aPL6TPSbOxIC-pbHWKUDo2XsUOr3hQmlRew%3D%3D>.
Maintained by Yishan Mai. Last updated 1 years ago.
data-analysisdata-visualizationestimationstatistics
214 stars 9.80 score 142 scriptsgreat-northern-diver
loon:Interactive Statistical Data Visualization
An extendable toolkit for interactive data visualization and exploration.
Maintained by R. Wayne Oldford. Last updated 2 years ago.
data-analysisdata-sciencedata-visualizationexploratory-analysisexploratory-data-analysishigh-dimensional-datainteractive-graphicsinteractive-visualizationsloonpythonstatistical-analysisstatistical-graphicsstatisticstcl-extensiontk
48 stars 9.00 score 93 scripts 5 dependentsjpquast
protti:Bottom-Up Proteomics and LiP-MS Quality Control and Data Analysis Tools
Useful functions and workflows for proteomics quality control and data analysis of both limited proteolysis-coupled mass spectrometry (LiP-MS) (Feng et. al. (2014) <doi:10.1038/nbt.2999>) and regular bottom-up proteomics experiments. Data generated with search tools such as 'Spectronaut', 'MaxQuant' and 'Proteome Discover' can be easily used due to flexibility of functions.
Maintained by Jan-Philipp Quast. Last updated 5 months ago.
data-analysislip-msmass-spectrometryomicsproteinproteomicssystems-biology
63 stars 8.51 score 83 scriptsnceas
metajam:Easily Download Data and Metadata from 'DataONE'
A set of tools to foster the development of reproducible analytical workflow by simplifying the download of data and metadata from 'DataONE' (<https://www.dataone.org>) and easily importing this information into R.
Maintained by Julien Brun. Last updated 7 months ago.
datadata-analysismetadatarepositories
16 stars 8.21 score 75 scriptspsychbruce
bruceR:Broadly Useful Convenient and Efficient R Functions
Broadly useful convenient and efficient R functions that bring users concise and elegant R data analyses. This package includes easy-to-use functions for (1) basic R programming (e.g., set working directory to the path of currently opened file; import/export data from/to files in any format; print tables to Microsoft Word); (2) multivariate computation (e.g., compute scale sums/means/... with reverse scoring); (3) reliability analyses and factor analyses; (4) descriptive statistics and correlation analyses; (5) t-test, multi-factor analysis of variance (ANOVA), simple-effect analysis, and post-hoc multiple comparison; (6) tidy report of statistical models (to R Console and Microsoft Word); (7) mediation and moderation analyses (PROCESS); and (8) additional toolbox for statistics and graphics.
Maintained by Han-Wu-Shuang Bao. Last updated 10 months ago.
anovadata-analysisdata-sciencelinear-modelslinear-regressionmultilevel-modelsstatisticstoolbox
176 stars 7.87 score 316 scripts 3 dependentstbep-tech
tbeptools:Data and Indicators for the Tampa Bay Estuary Program
Several functions are provided for working with Tampa Bay Estuary Program data and indicators, including the water quality report card, tidal creek assessments, Tampa Bay Nekton Index, Tampa Bay Benthic Index, seagrass transect data, habitat report card, and fecal indicator bacteria. Additional functions are provided for miscellaneous tasks, such as reference library curation.
Maintained by Marcus Beck. Last updated 1 days ago.
data-analysistampa-baytbepwater-quality
10 stars 7.86 score 133 scriptsgagolews
genieclust:Fast and Robust Hierarchical Clustering with Noise Points Detection
A retake on the Genie algorithm (Gagolewski, 2021 <DOI:10.1016/j.softx.2021.100722>), which is a robust hierarchical clustering method (Gagolewski, Bartoszuk, Cena, 2016 <DOI:10.1016/j.ins.2016.05.003>). It is now faster and more memory efficient; determining the whole cluster hierarchy for datasets of 10M points in low dimensional Euclidean spaces or 100K points in high-dimensional ones takes only a minute or so. Allows clustering with respect to mutual reachability distances so that it can act as a noise point detector or a robustified version of 'HDBSCAN*' (that is able to detect a predefined number of clusters and hence it does not dependent on the somewhat fragile 'eps' parameter). The package also features an implementation of inequality indices (e.g., Gini and Bonferroni), external cluster validity measures (e.g., the normalised clustering accuracy, the adjusted Rand index, the Fowlkes-Mallows index, and normalised mutual information), and internal cluster validity indices (e.g., the Calinski-Harabasz, Davies-Bouldin, Ball-Hall, Silhouette, and generalised Dunn indices). See also the 'Python' version of 'genieclust' available on 'PyPI', which supports sparse data, more metrics, and even larger datasets.
Maintained by Marek Gagolewski. Last updated 9 days ago.
cluster-analysisclusteringclustering-algorithmdata-analysisdata-miningdata-sciencegeniehdbscanhierarchical-clusteringhierarchical-clustering-algorithmmachine-learningmachine-learning-algorithmsmlpacknmslibpythonpython3sparsecppopenmp
61 stars 7.33 score 13 scripts 5 dependentsairoldilab
sgd:Stochastic Gradient Descent for Scalable Estimation
A fast and flexible set of tools for large scale estimation. It features many stochastic gradient methods, built-in models, visualization tools, automated hyperparameter tuning, model checking, interval estimation, and convergence diagnostics.
Maintained by Junhyung Lyle Kim. Last updated 1 years ago.
big-datadata-analysisgradient-descentstatisticsopenblascpp
62 stars 7.25 score 71 scriptscapitalone
dataCompareR:Compare Two Data Frames and Summarise the Difference
Easy comparison of two tabular data objects in R. Specifically designed to show differences between two sets of data in a useful way that should make it easier to understand the differences, and if necessary, help you work out how to remedy them. Aims to offer a more useful output than all.equal() when your two data sets do not match, but isn't intended to replace all.equal() as a way to test for equality.
Maintained by Sarah Johnston. Last updated 2 years ago.
compare-datadatadata-analysisdata-science
76 stars 7.24 score 76 scriptspetolau
TSrepr:Time Series Representations
Methods for representations (i.e. dimensionality reduction, preprocessing, feature extraction) of time series to help more accurate and effective time series data mining. Non-data adaptive, data adaptive, model-based and data dictated (clipped) representation methods are implemented. Also various normalisation methods (min-max, z-score, Box-Cox, Yeo-Johnson), and forecasting accuracy measures are implemented.
Maintained by Peter Laurinec. Last updated 5 years ago.
data-analysisdata-miningdata-mining-algorithmsdata-sciencerepresentationtime-seriestime-series-analysistime-series-classificationtime-series-clusteringtime-series-data-miningtime-series-representationscpp
97 stars 7.23 score 117 scriptsgreat-northern-diver
loon.ggplot:A Grammar of Interactive Graphics
Provides a bridge between the 'loon' and 'ggplot2' packages. Extends the grammar of ggplot to add clauses to create interactive 'loon' plots. Existing ggplot(s) can be turned into interactive 'loon' plots and 'loon' plots into static ggplot(s); the function 'loon.ggplot()' is the bridge from one plot structure to the other.
Maintained by Zehao Xu. Last updated 11 months ago.
data-analysisggplotggplot-featuresgraphicsinteractive-plotsloonvisualizations
24 stars 7.11 score 9 scripts 3 dependentsdavidchall
ipaddress:Data Analysis for IP Addresses and Networks
Classes and functions for working with IP (Internet Protocol) addresses and networks, inspired by the Python 'ipaddress' module. Offers full support for both IPv4 and IPv6 (Internet Protocol versions 4 and 6) address spaces. It is specifically designed to work well with the 'tidyverse'.
Maintained by David Hall. Last updated 1 years ago.
cyberdata-analysisip-addressipv4ipv6vctrscpp
32 stars 7.02 score 27 scripts 2 dependentsmingzehuang
latentcor:Fast Computation of Latent Correlations for Mixed Data
The first stand-alone R package for computation of latent correlation that takes into account all variable types (continuous/binary/ordinal/zero-inflated), comes with an optimized memory footprint, and is computationally efficient, essentially making latent correlation estimation almost as fast as rank-based correlation estimation. The estimation is based on latent copula Gaussian models. For continuous/binary types, see Fan, J., Liu, H., Ning, Y., and Zou, H. (2017). For ternary type, see Quan X., Booth J.G. and Wells M.T. (2018) <arXiv:1809.06255>. For truncated type or zero-inflated type, see Yoon G., Carroll R.J. and Gaynanova I. (2020) <doi:10.1093/biomet/asaa007>. For approximation method of computation, see Yoon G., Müller C.L. and Gaynanova I. (2021) <doi:10.1080/10618600.2021.1882468>. The latter method uses multi-linear interpolation originally implemented in the R package <https://cran.r-project.org/package=chebpol>.
Maintained by Mingze Huang. Last updated 3 years ago.
data-analysisdata-miningdata-processingdata-sciencedata-structuresmachine-learningmixed-typesstatistics
16 stars 6.65 score 46 scripts 1 dependentsserkor1
SLmetrics:Machine Learning Performance Evaluation on Steroids
Performance evaluation metrics for supervised and unsupervised machine learning, statistical learning and artificial intelligence applications. Core computations are implemented in 'C++' for scalability and efficiency.
Maintained by Serkan Korkmaz. Last updated 1 days ago.
cppdata-analysisdata-scienceeigen3machine-learningperformance-metricsrcpprcppeigenstatisticssupervised-learningcppopenmp
22 stars 6.56 scorejmgirard
circumplex:Analysis and Visualization of Circular Data
Circumplex models, which organize constructs in a circle around two underlying dimensions, are popular for studying interpersonal functioning, mood/affect, and vocational preferences/environments. This package provides tools for analyzing and visualizing circular data, including scoring functions for relevant instruments and a generalization of the bootstrapped structural summary method from Zimmermann & Wright (2017) <doi:10.1177/1073191115621795> and functions for creating publication-ready tables and figures from the results.
Maintained by Jeffrey Girard. Last updated 5 months ago.
circularcircumplexdata-analysisggplot2interpersonalpsychologyrcpparmadillotidyverseopenblascppopenmp
11 stars 6.54 score 52 scriptsr-spark
sparklyr.flint:Sparklyr Extension for 'Flint'
This sparklyr extension makes 'Flint' time series library functionalities (<https://github.com/twosigma/flint>) easily accessible through R.
Maintained by Edgar Ruiz. Last updated 3 years ago.
apache-sparkdata-analysisdata-miningdata-sciencedistributeddistributed-computingflintremote-clusterssparksparklyrstatistical-analysisstatisticsstatssummarizationsummary-statisticstime-seriestime-series-analysistwosigma-flint
9 stars 6.46 score 54 scriptspgomba
MDPIexploreR:Web Scraping and Bibliometric Analysis of MDPI Journals
Provides comprehensive tools to scrape and analyze data from the MDPI journals. It allows users to extract metrics such as submission-to-acceptance times, article types, and whether articles are part of special issues. The package can also visualize this information through plots. Additionally, 'MDPIexploreR' offers tools to explore patterns of self-citations within articles and provides insights into guest-edited special issues.
Maintained by Pablo Gómez Barreiro. Last updated 10 days ago.
analysisdata-analysisdata-visualizationmdpimetricsscientific-journalsvisualizationweb-scraping
20 stars 6.26 score 9 scriptsjrdnbradford
readMDTable:Read Markdown Tables into Tibbles
Efficient reading of raw markdown tables into tibbles. Designed to accept content from strings, files, and URLs with the ability to extract and read multiple tables from markdown for analysis.
Maintained by Jordan Bradford. Last updated 2 months ago.
datadata-analysisdata-analyticsdata-extractiondata-miningdata-sciencemarkdownmarkdown-parsermarkdown-tabler-programming
7 stars 6.10 score 3 scripts 1 dependentsbrad-cannell
freqtables:Make Quick Descriptive Tables for Categorical Variables
Quickly make tables of descriptive statistics (i.e., counts, percentages, confidence intervals) for categorical variables. This package is designed to work in a Tidyverse pipeline, and consideration has been given to get results from R to Microsoft Word ® with minimal pain.
Maintained by Brad Cannell. Last updated 1 years ago.
categorical-datadata-analysisdescriptive-statisticsepidemiology
12 stars 6.00 score 84 scriptsnelson-gon
mde:Missing Data Explorer
Correct identification and handling of missing data is one of the most important steps in any analysis. To aid this process, 'mde' provides a very easy to use yet robust framework to quickly get an idea of where the missing data lies and therefore find the most appropriate action to take. Graham WJ (2009) <doi:10.1146/annurev.psych.58.110405.085530>.
Maintained by Nelson Gonzabato. Last updated 3 years ago.
data-analysisdata-cleaningdata-explorationdata-sciencedatacleanerdatacleaningexploratory-data-analysismissingmissing-datamissing-value-treatmentmissing-valuesmissingnessomitrecodereplacestatistics
4 stars 5.61 score 34 scriptsjmaasch
sanzo:Color Palettes Based on the Works of Sanzo Wada
Inspired by the art and color research of Sanzo Wada (1883-1967), his "Dictionary Of Color Combinations" (2011, ISBN:978-4861522475), and the interactive site by Dain M. Blodorn Kim <https://github.com/dblodorn/sanzo-wada>, this package brings Wada's color combinations to R for easy use in data visualizations. This package honors 60 of Wada's color combinations: 20 duos, 20 trios, and 20 quads.
Maintained by Jacqueline Maasch. Last updated 5 years ago.
color-palettesdata-analysisdata-sciencedata-visualizationsanzo-wadavisualizations
30 stars 5.41 score 17 scriptstanaylab
naryn:Native Access Medical Record Retriever for High Yield Analytics
A toolkit for medical records data analysis. The 'naryn' package implements an efficient data structure for storing medical records, and provides a set of functions for data extraction, manipulation and analysis.
Maintained by Aviezer Lifshitz. Last updated 12 days ago.
data-analysismedical-recordscpp
3 stars 5.38 score 4 scriptsc4tb
shinyExprPortal:A Configurable 'shiny' Portal for Sharing Analysis of Molecular Expression Data
Enables deploying configuration file-based 'shiny' apps with minimal programming for interactive exploration and analysis showcase of molecular expression data. For exploration, supports visualization of correlations between rows of an expression matrix and a table of observations, such as clinical measures, and comparison of changes in expression over time. For showcase, enables visualizing the results of differential expression from package such as 'limma', co-expression modules from 'WGCNA' and lower dimensional projections.
Maintained by Rafael Henkin. Last updated 8 months ago.
bioinformaticsdata-analysistranscriptomics
5 stars 5.30 score 8 scriptsyuanchao-xu
gfer:Green Finance and Environmental Risk
Focuses on data collecting, analyzing and visualization in green finance and environmental risk research and analysis. Main function includes environmental data collecting from official websites such as MEP (Ministry of Environmental Protection of China, <https://www.mee.gov.cn>), water related projects identification and environmental data visualization.
Maintained by Yuanchao Xu. Last updated 11 days ago.
corporate-social-responsibilitycsrdata-analysisdata-scrapingenvironmental-riskgreen-financestock-data
8 stars 5.11 score 16 scriptsrapidsurveys
oldr:An Implementation of Rapid Assessment Method for Older People
An implementation of the Rapid Assessment Method for Older People or RAM-OP <https://www.helpage.org/resource/rapid-assessment-method-for-older-people-ramop-manual/>. It provides various functions that allow the user to design and plan the assessment and analyse the collected data. RAM-OP provides accurate and reliable estimates of the needs of older people.
Maintained by Ernest Guevarra. Last updated 2 months ago.
assessmentdata-analysisodkram-oprapid-assessment
2 stars 5.00 score 4 scriptsmlr-org
mlr3fda:Extending 'mlr3' to Functional Data Analysis
Extends the 'mlr3' ecosystem to functional analysis by adding support for irregular and regular functional data as defined in the 'tf' package. The package provides 'PipeOps' for preprocessing functional columns and for extracting scalar features, thereby allowing standard machine learning algorithms to be applied afterwards. Available operations include simple functional features such as the mean or maximum, smoothing, interpolation, flattening, and functional 'PCA'.
Maintained by Sebastian Fischer. Last updated 8 months ago.
data-analysisdata-analysis-in-rdata-sciencefunctional-datamachine-learningmlr3
5 stars 4.95 score 5 scriptsrhenkin
visxhclust:A Shiny App for Visual Exploration of Hierarchical Clustering
A Shiny application and functions for visual exploration of hierarchical clustering with numeric datasets. Allows users to iterative set hyperparameters, select features and evaluate results through various plots and computation of evaluation criteria.
Maintained by Rafael Henkin. Last updated 2 years ago.
clusteringdata-analysisdata-sciencer-shinyshiny-apps
4 stars 4.86 score 12 scriptssndmrc
BasketballAnalyzeR:Analysis and Visualization of Basketball Data
Contains data and code to accompany the book P. Zuccolotto and M. Manisera (2020) Basketball Data Science. Applications with R. CRC Press. ISBN 9781138600799.
Maintained by Marco Sandri. Last updated 2 years ago.
basketball-statsdata-analysisdata-science
35 stars 4.83 score 39 scriptssmaakage85
recorder:Toolkit to Validate New Data for a Predictive Model
A lightweight toolkit to validate new observations when computing their predictions with a predictive model. The validation process consists of two steps: (1) record relevant statistics and meta data of the variables in the original training data for the predictive model and (2) use these data to run a set of basic validation tests on the new set of observations.
Maintained by Lars Kjeldgaard. Last updated 6 years ago.
data-analysismachine-learningpredictive-analyticspredictive-modeling
4 stars 4.78 score 6 scriptsjatanrt
eprscope:Processing and Analysis of Electron Paramagnetic Resonance Data and Spectra in Chemistry
Processing, analysis and plottting of Electron Paramagnetic Resonance (EPR) spectra in chemistry. Even though the package is mainly focused on continuous wave (CW) EPR/ENDOR, many functions may be also used for the integrated forms of 1D PULSED EPR spectra. It is able to find the most important spectral characteristics like g-factor, linewidth, maximum of derivative or integral intensities and single/double integrals. This is especially important in spectral (time) series consisting of many EPR spectra like during variable temperature experiments, electrochemical or photochemical radical generation and/or decay. Package also enables processing of data/spectra for the analytical (quantitative) purposes. Namely, how many radicals or paramagnetic centers can be found in the analyte/sample. The goal is to evaluate rate constants, considering different kinetic models, to describe the radical reactions. The key feature of the package resides in processing of the universal ASCII text formats (such as '.txt', '.csv' or '.asc') from scratch. No proprietary formats are used (except the MATLAB EasySpin outputs) and in such respect the package is in accordance with the FAIR data principles. Upon 'reading' (also providing automatic procedures for the most common EPR spectrometers) the spectral data are transformed into the universal R 'data frame' format. Subsequently, the EPR spectra can be visualized and are fully consistent either with the 'ggplot2' package or with the interactive formats based on 'plotly'. Additionally, simulations and fitting of the isotropic EPR spectra are also included in the package. Advanced simulation parameters provided by the MATLAB-EasySpin toolbox and results from the quantum chemical calculations like g-factor and hyperfine splitting/coupling constants (a/A) can be compared and summarized in table-format in order to analyze the EPR spectra by the most effective way.
Maintained by Ján Tarábek. Last updated 2 days ago.
chemistrydata-analysisdata-visualizationepresrfittingoptimizationprogramming-languagereproducible-researchscientific-plottingspectroscopyopenjdk
4.76 score 7 scriptsgagolews
genie:Fast, Robust, and Outlier Resistant Hierarchical Clustering
Includes the reference implementation of Genie - a hierarchical clustering algorithm that links two point groups in such a way that an inequity measure (namely, the Gini index) of the cluster sizes does not significantly increase above a given threshold. This method most often outperforms many other data segmentation approaches in terms of clustering quality as tested on a wide range of benchmark datasets. At the same time, Genie retains the high speed of the single linkage approach, therefore it is also suitable for analysing larger data sets. For more details see (Gagolewski et al. 2016 <DOI:10.1016/j.ins.2016.05.003>). For an even faster and more feature-rich implementation, including, amongst others, noise point detection, see the 'genieclust' package (Gagolewski, 2021 <DOI:10.1016/j.softx.2021.100722>).
Maintained by Marek Gagolewski. Last updated 3 years ago.
clustercluster-analysisclusteringdata-analysisdata-miningdata-sciencedatasciencegeniehierarchical-clustering-algorithmmachine-learningmachine-learning-algorithmsoutlierscppopenmp
22 stars 4.55 score 16 scriptsbioc
PRONE:The PROteomics Normalization Evaluator
High-throughput omics data are often affected by systematic biases introduced throughout all the steps of a clinical study, from sample collection to quantification. Normalization methods aim to adjust for these biases to make the actual biological signal more prominent. However, selecting an appropriate normalization method is challenging due to the wide range of available approaches. Therefore, a comparative evaluation of unnormalized and normalized data is essential in identifying an appropriate normalization strategy for a specific data set. This R package provides different functions for preprocessing, normalizing, and evaluating different normalization approaches. Furthermore, normalization methods can be evaluated on downstream steps, such as differential expression analysis and statistical enrichment analysis. Spike-in data sets with known ground truth and real-world data sets of biological experiments acquired by either tandem mass tag (TMT) or label-free quantification (LFQ) can be analyzed.
Maintained by Lis Arend. Last updated 9 days ago.
proteomicspreprocessingnormalizationdifferentialexpressionvisualizationdata-analysisevaluation
2 stars 4.41 score 9 scriptstbep-tech
peptools:Analysis Tools for Importing, Wrangling, and Summarizing Suffolk County Water Quality Data
Analysis tools for importing, wrangling, and summarizing Suffolk County water quality data. Functions are used to create reporting materials.
Maintained by Marcus Beck. Last updated 1 years ago.
4.33 score 54 scriptstimbeechey
clubpro:Classification Using Binary Procrustes Rotation
Implements a classification method described by Grice (2011, ISBN:978-0-12-385194-9) using binary procrustes rotation; a simplified version of procrustes rotation.
Maintained by Timothy Beechey. Last updated 10 months ago.
classificationdata-analysispsychology-experimentsrcppstatistical-analysisstatisticsopenblascppopenmp
4.30 score 2 scriptsphotosynq
PhotosynQ:Connect to PhotosynQ
Connect R to the PhotosynQ platform (<https://photosynq.org>). It allows to login and logout, as well as receive project information and project data. Further it transforms the received JSON objects into a data frame, which can be used for the final data analysis.
Maintained by Sebastian Kuhlgert. Last updated 4 years ago.
5 stars 4.24 scorepetulla
readroper:Simply Read ASCII Single and Multicard Polling Datasets
A convenient way to read fixed-width ASCII polling datasets from providers like the Roper Center <https://ropercenter.cornell.edu>.
Maintained by Sam Petulla. Last updated 5 years ago.
asciidata-analysispolling-data
3 stars 4.18 score 3 scriptsglobeandmail
upstartr:Utilities Powering the Globe and Mail's Data Journalism Template
Core functions necessary for using The Globe and Mail's R data journalism template, 'startr', along with utilities for day-to-day data journalism tasks, such as reading and writing files, producing graphics and cleaning up datasets.
Maintained by Tom Cardoso. Last updated 1 years ago.
datadata-analysisdata-journalismdata-visualizationjournalismnews
6 stars 4.14 score 46 scriptsdevpsylab
petersenlab:A Collection of R Functions by the Petersen Lab
A collection of R functions that are widely used by the Petersen Lab. Included are functions for various purposes, including evaluating the accuracy of judgments and predictions, performing scoring of assessments, generating correlation matrices, conversion of data between various types, data management, psychometric evaluation, extensions related to latent variable modeling, various plotting capabilities, and other miscellaneous useful functions. By making the package available, we hope to make our methods reproducible and replicable by others and to help others perform their data processing and analysis methods more easily and efficiently. The codebase is provided in Petersen (2025) <doi:10.5281/zenodo.7602890> and on 'CRAN': <doi: 10.32614/CRAN.package.petersenlab>. The package is described in "Principles of Psychological Assessment: With Applied Examples in R" (Petersen, 2024, 2025) <doi:10.1201/9781003357421>, <doi:10.25820/work.007199>, <doi:10.5281/zenodo.6466589>.
Maintained by Isaac T. Petersen. Last updated 1 months ago.
data-analysisdata-analysis-in-rdata-managementpsychometrics
1 stars 4.10 score 1 scriptsjasdumas
ttbbeer:US Beer Statistics from TTB
U.S. Department of the Treasury, Alcohol and Tobacco Tax and Trade Bureau (TTB) collects data and reports on monthly beer industry production and operations. This data package includes a collection of 10 years (2006 - 2015) worth of data on materials used at U.S. breweries in pounds reported by the Brewer's Report of Operations and the Quarterly Brewer's Report of Operations forms, ready for data analysis. This package also includes historical tax rates on distilled spirits, wine, beer, champagne, and tobacco products as individual data sets.
Maintained by Jasmine Dumas. Last updated 8 months ago.
23 stars 4.06 score 10 scriptstbep-tech
tbeploads:Calculate Loading Data to Tampa Bay
Loading data from major sources to Tampa Bay are calculated on a monthly or annual basis. Major sources include domestic point source (reuse, end of pipe), industrial point source, material losses, non-point sources (MS4), atmospheric deposition, and groundwater.
Maintained by Marcus Beck. Last updated 8 months ago.
data-analysisloadstampa-baytbeptbnmcwater-quality
3.81 score 3 scriptssherrisherry
cleandata:To Inspect and Manipulate Data; and to Keep Track of This Process
Functions to work with data frames to prepare data for further analysis. The functions for imputation, encoding, partitioning, and other manipulation can produce log files to keep track of process.
Maintained by Sherry Zhao. Last updated 6 years ago.
data-analysisdata-miningmachine-learningwrangling
3 stars 3.72 score 35 scriptstimbeechey
opa:An Implementation of Ordinal Pattern Analysis
Quantifies hypothesis to data fit for repeated measures and longitudinal data, as described by Thorngate (1987) <doi:10.1016/S0166-4115(08)60083-7> and Grice et al., (2015) <doi:10.1177/2158244015604192>. Hypothesis and data are encoded as pairwise relative orderings which are then compared to determine the percentage of orderings in the data that are matched by the hypothesis.
Maintained by Timothy Beechey. Last updated 1 years ago.
data-analysishypothesis-testinglongitudinalordinalrcpprepeated-measuresstatisticscpp
1 stars 3.70 score 2 scriptsflalom
drugsens:Automated Analysis of 'QuPath' Output Data and Metadata Extraction
A comprehensive toolkit for analyzing microscopy data output from 'QuPath' software. Provides functionality for automated data processing, metadata extraction, and statistical analysis of imaging results. The methodology implemented in this package is based on Labrosse et al. (2024) <doi:10.1016/j.xpro.2024.103274> "Protocol for quantifying drug sensitivity in 3D patient-derived ovarian cancer models", which describes the complete workflow for drug sensitivity analysis in patient-derived cancer models.
Maintained by Flavio Lombardo. Last updated 2 months ago.
data-analysisimage-processingqupathworkflow
3.48 score 1 scriptsleef-uzh
LEEF:Data Package Containing Only Data and Data Information
Setup package for the LEEF pipeline which loads / installs all necessary packages and functions to run the pipeline.
Maintained by Rainer M. Krug. Last updated 3 years ago.
data-analysisdata-processingleef
2.95 scoreghurault
HuraultMisc:Guillem Hurault Functions' Library
Contains various functions for data analysis, notably helpers and diagnostics for Bayesian modelling using Stan.
Maintained by Guillem Hurault. Last updated 4 months ago.
bayesian-statisticsdata-analysisstatistical-models
2.95 score 18 scriptsroaldarbol
anibehavr:Analyse Animal Behaviours
What the package does (one paragraph).
Maintained by Mikkel Roald-Arbøl. Last updated 11 months ago.
animal-behaviorbehavioural-statesdata-analysis
3 stars 2.78 score 2 scriptsamoneva
cacc:Conjunctive Analysis of Case Configurations
A set of functions to conduct Conjunctive Analysis of Case Configurations (CACC) as described in Miethe, Hart, and Regoeczi (2008) <doi:10.1007/s10940-008-9044-8>, and identify and quantify situational clustering in dominant case configurations as described in Hart (2019) <doi:10.1177/0011128719866123>. Initially conceived as an exploratory technique for multivariate analysis of categorical data, CACC has developed to include formal statistical tests that can be applied in a wide variety of contexts. This technique allows examining composite profiles of different units of analysis in an alternative way to variable-oriented methods.
Maintained by Asier Moneva. Last updated 6 months ago.
criminologydata-analysissocial-science
2.70 score 5 scriptsalphaprime7
tidyDenovix:Cleans Spectrophotometry Data Obtained from the Denovix DS-11 Instrument
Cleans spectrophotometry data obtained from the Denovix instrument. The package also provides an option to normalize the data in order to compare the quality of the samples obtained.
Maintained by Tingwei Adeck. Last updated 9 months ago.
data-analysisdnaresearchrnaspectrophotometry
1 stars 2.70 score 2 scripts