Showing 54 of total 54 results (show query)
end-to-end-provenance
provSummarizeR:Summarizes Provenance Related to Inputs and Outputs of a Script or Console Commands
Reads the provenance collected by the 'rdtLite' or 'rdt' packages, or other tools providing compatible PROV JSON output, created by the execution of a script or a console session, and provides a human-readable summary identifying the input and output files, the scripts used (if any), errors and warnings produced, and the environment in which it was executed. It can also optionally package all the files into a zip file. The exact format of the PROV JSON file created by 'rdtLite' and 'rdt' is described in <https://github.com/End-to-end-provenance/ExtendedProvJson>. More information about 'rdtLite' and associated tools is available at <https://github.com/End-to-end-provenance/> and Lerner, Boose, and Perez (2018), Using Introspection to Collect Provenance in R, Informatics, <doi: 10.3390/informatics5010012>.
Maintained by Emery Boose. Last updated 3 years ago.
19.1 match 4.18 score 7 scriptsend-to-end-provenance
rdtLite:Provenance Collector
Defines functions that can be used to collect provenance as an 'R' script executes or during a console session. The output is a text file in 'PROV-JSON' format.
Maintained by Barbara Lerner. Last updated 3 years ago.
21.6 match 2 stars 3.56 score 36 scriptsbilldenney
PKNCA:Perform Pharmacokinetic Non-Compartmental Analysis
Compute standard Non-Compartmental Analysis (NCA) parameters for typical pharmacokinetic analyses and summarize them.
Maintained by Bill Denney. Last updated 16 days ago.
ncanoncompartmental-analysispharmacokinetics
5.4 match 73 stars 12.61 score 214 scripts 4 dependentscnathe
Rlabkey:Data Exchange Between R and 'LabKey' Server
The 'LabKey' client library for R makes it easy for R users to load live data from a 'LabKey' Server, <https://www.labkey.com/>, into the R environment for analysis, provided users have permissions to read the data. It also enables R users to insert, update, and delete records stored on a 'LabKey' Server, provided they have appropriate permissions to do so.
Maintained by Cory Nathe. Last updated 3 days ago.
15.8 match 4.25 score 388 scripts 1 dependentscboettig
neonstore:NEON Data Store
The National Ecological Observatory Network (NEON) provides access to its numerous data products through its REST API, <https://data.neonscience.org/data-api/>. This package provides a high-level user interface for downloading and storing NEON data products. Unlike 'neonUtilities', this package will avoid repeated downloading, provides persistent storage, and improves performance. 'neonstore' can also construct a local 'duckdb' database of stacked tables, making it possible to work with tables that are far to big to fit into memory.
Maintained by Carl Boettiger. Last updated 11 months ago.
databaseecologyneon-dataprovenance
10.0 match 9 stars 6.67 score 143 scripts 11 dependentsend-to-end-provenance
provViz:Provenance Visualizer
Displays provenance graphically for provenance collected by the 'rdt' or 'rdtLite' packages, or other tools providing compatible PROV JSON output. The exact format of the JSON created by 'rdt' and 'rdtLite' is described in <https://github.com/End-to-end-provenance/ExtendedProvJson>. More information about rdtLite and associated tools is available at <https://github.com/End-to-end-provenance/> and Barbara Lerner, Emery Boose, and Luis Perez (2018), Using Introspection to Collect Provenance in R, Informatics, <doi: 10.3390/informatics5010012>.
Maintained by Barbara Lerner. Last updated 3 years ago.
18.1 match 3.48 score 2 scripts 1 dependentsend-to-end-provenance
provTraceR:Uses Provenance to Trace File Lineage for One or more R Scripts
Uses provenance collected by 'rdtLite' package or comparable tool to display information about input files, output files, and exchanged files for a single R script or a series of R scripts.
Maintained by Emery Boose. Last updated 5 years ago.
16.7 match 3.70 score 4 scriptsdataobservatory-eu
dataset:Create Data Frames that are Easier to Exchange and Reuse
The aim of the 'dataset' package is to make tidy datasets easier to release, exchange and reuse. It organizes and formats data frame 'R' objects into well-referenced, well-described, interoperable datasets into release and reuse ready form.
Maintained by Daniel Antal. Last updated 20 days ago.
7.9 match 15 stars 7.81 score 76 scripts 1 dependentsblernermhc
provDebugR:A Time-Travelling Debugger
Uses provenance post-execution to help the user understand and debug their script by providing functions to look at intermediate steps and data values, their forwards and backwards lineage, and to understand the steps leading up to warning and error messages. 'provDebugR' uses provenance produced by 'rdtLite' (available on CRAN), stored in PROV-JSON format.
Maintained by Barbara Lerner. Last updated 4 years ago.
16.0 match 3.64 score 22 scriptsend-to-end-provenance
provExplainR:Compare Provenance Collections to Explain Changed Script Outputs
Inspects provenance collected by the 'rdt' or 'rdtLite' packages, or other tools providing compatible PROV JSON output created by the execution of a script, and find differences between two provenance collections. Factors under examination included the hardware and software used to execute the script, versions of attached libraries, use of global variables, modified inputs and outputs, and changes in main and sourced scripts. Based on detected changes, 'provExplainR' can be used to study how these factors affect the behavior of the script and generate a promising diagnosis of the causes of different script results. More information about 'rdtLite' and associated tools is available at <https://github.com/End-to-end-provenance/> and Barbara Lerner, Emery Boose, and Luis Perez (2018), Using Introspection to Collect Provenance in R, Informatics, <doi:10.3390/informatics5010012>.
Maintained by Barbara Lerner. Last updated 3 years ago.
19.2 match 3.00 score 8 scriptsemitanaka
edibble:Encapsulating Elements of Experimental Design
A system to facilitate designing comparative (and non-comparative) experiments using the grammar of experimental designs <https://emitanaka.org/edibble-book/>. An experimental design is treated as an intermediate, mutable object that is built progressively by fundamental experimental components like units, treatments, and their relation. The system aids in experimental planning, management and workflow.
Maintained by Emi Tanaka. Last updated 4 months ago.
6.6 match 217 stars 7.43 score 62 scriptsropensci
bowerbird:Keep a Collection of Sparkly Data Resources
Tools to get and maintain a data repository from third-party data providers.
Maintained by Ben Raymond. Last updated 5 days ago.
ropensciantarcticsouthern oceandataenvironmentalsatelliteclimatepeer-reviewed
6.3 match 50 stars 7.16 score 16 scripts 1 dependentstesselle
nexus:Sourcing Archaeological Materials by Chemical Composition
Exploration and analysis of compositional data in the framework of Aitchison (1986, ISBN: 978-94-010-8324-9). This package provides tools for chemical fingerprinting and source tracking of ancient materials.
Maintained by Nicolas Frerebeau. Last updated 12 days ago.
archaeologyarchaeological-sciencearchaeometrycompositional-dataprovenance-studies
7.5 match 5.21 score 26 scripts 1 dependentsropensci
rdataretriever:R Interface to the Data Retriever
Provides an R interface to the Data Retriever <https://retriever.readthedocs.io/en/latest/> via the Data Retriever's command line interface. The Data Retriever automates the tasks of finding, downloading, and cleaning public datasets, and then stores them in a local database.
Maintained by Henry Senyondo. Last updated 8 months ago.
datadata-sciencedatabasedatasetsscience
4.8 match 46 stars 7.70 score 36 scriptscanmod
iidda.api:IIDDA API
R Bindings for the IIDDA API.
Maintained by Steve Walker. Last updated 4 months ago.
6.0 match 5.29 score 10 scriptsropensci
drake:A Pipeline Toolkit for Reproducible Computation at Scale
A general-purpose computational engine for data analysis, drake rebuilds intermediate data objects when their dependencies change, and it skips work when the results are already up to date. Not every execution starts from scratch, there is native support for parallel and distributed computing, and completed projects have tangible evidence that they are reproducible. Extensive documentation, from beginner-friendly tutorials to practical examples and more, is available at the reference website <https://docs.ropensci.org/drake/> and the online manual <https://books.ropensci.org/drake/>.
Maintained by William Michael Landau. Last updated 3 months ago.
data-sciencedrakehigh-performance-computingmakefilepeer-reviewedpipelinereproducibilityreproducible-researchropensciworkflow
2.0 match 1.3k stars 11.49 score 1.7k scripts 1 dependentsbioc
HuBMAPR:Interface to 'HuBMAP'
'HuBMAP' provides an open, global bio-molecular atlas of the human body at the cellular level. The `datasets()`, `samples()`, `donors()`, `publications()`, and `collections()` functions retrieves the information for each of these entity types. `*_details()` are available for individual entries of each entity type. `*_derived()` are available for retrieving derived datasets or samples for individual entries of each entity type. Data files can be accessed using `bulk_data_transfer()`.
Maintained by Christine Hou. Last updated 1 months ago.
softwaresinglecelldataimportthirdpartyclientspatialinfrastructurebioconductor-packageclienthubmaprstudio
3.8 match 3 stars 5.80 score 1 scriptsblernermhc
provParseR:Pulls Information from Prov.Json Files
R functions to access provenance information collected by 'rdt' or 'rdtLite'. The information is stored inside a 'ProvInfo' object and can be accessed through a collection of functions that will return the requested data. The exact format of the JSON created by 'rdt' and 'rdtLite' is described in <https://github.com/End-to-end-provenance/ExtendedProvJson>.
Maintained by Barbara Lerner. Last updated 3 years ago.
5.0 match 3.20 score 21 scripts 5 dependentsbioc
BiocPkgTools:Collection of simple tools for learning about Bioconductor Packages
Bioconductor has a rich ecosystem of metadata around packages, usage, and build status. This package is a simple collection of functions to access that metadata from R. The goal is to expose metadata for data mining and value-added functionality such as package searching, text mining, and analytics on packages.
Maintained by Sean Davis. Last updated 12 days ago.
softwareinfrastructurebioconductormetadata
2.0 match 21 stars 7.67 score 68 scriptscanmod
iidda:Processing Infectious Disease Datasets in IIDDA.
Part of an open toolchain for processing infectious disease datasets available through the IIDDA data repository.
Maintained by Steve Walker. Last updated 4 months ago.
2.3 match 6.07 score 133 scripts 3 dependentsropensci
EDIutils:An API Client for the Environmental Data Initiative Repository
A client for the Environmental Data Initiative repository REST API. The 'EDI' data repository <https://portal.edirepository.org/nis/home.jsp> is for publication and reuse of ecological data with emphasis on metadata accuracy and completeness. It is built upon the 'PASTA+' software stack <https://pastaplus-core.readthedocs.io/en/latest/index.html#> and was developed in collaboration with the US 'LTER' Network <https://lternet.edu/>. 'EDIutils' includes functions to search and access existing data, evaluate and upload new data, and assist other data management tasks common to repository users.
Maintained by Colin Smith. Last updated 1 years ago.
ecologyeml-metadataopen-accessopen-dataresearch-data-managementresearch-data-repository
2.0 match 10 stars 6.47 score 117 scriptsbioc
ndexr:NDEx R client library
This package offers an interface to NDEx servers, e.g. the public server at http://ndexbio.org/. It can retrieve and save networks via the API. Networks are offered as RCX object and as igraph representation.
Maintained by Florian Auer. Last updated 5 months ago.
2.0 match 9 stars 6.44 score 38 scriptsterminological
dtrackr:Track your Data Pipelines
Track and document 'dplyr' data pipelines. As you filter, mutate, and join your way through a data set, 'dtrackr' seamlessly keeps track of your data flow and makes publication ready documentation of a data pipeline simple.
Maintained by Robert Challen. Last updated 5 months ago.
1.3 match 69 stars 8.78 score 362 scripts 1 dependentsguillaumepressiat
pmeasyr:Donnees PMSI avec R
Import de donnees PMSI. Gestion des archives. Formats depuis 2011. Connexion et interface avec une db. requetr. Valorisation des rsa, des rapss.
Maintained by Guillaume Pressiat. Last updated 13 days ago.
1.6 match 20 stars 6.76 score 53 scriptsropensci
datapack:A Flexible Container to Transport and Manipulate Data and Associated Resources
Provides a flexible container to transport and manipulate complex sets of data. These data may consist of multiple data files and associated meta data and ancillary files. Individual data objects have associated system level meta data, and data files are linked together using the OAI-ORE standard resource map which describes the relationships between the files. The OAI- ORE standard is described at <https://www.openarchives.org/ore/>. Data packages can be serialized and transported as structured files that have been created following the BagIt specification. The BagIt specification is described at <https://tools.ietf.org/html/draft-kunze-bagit-08>.
Maintained by Matthew B. Jones. Last updated 3 years ago.
1.2 match 44 stars 8.56 score 195 scripts 4 dependentsgreen-striped-gecko
dartR.captive:Analysing 'SNP' Data to Support Captive Breeding
Functions are provided that facilitate the analysis of SNP (single nucleotide polymorphism) data to answer questions regarding captive breeding and relatedness between individuals. 'dartR.captive' is part of the 'dartRverse' suit of packages. Gruber et al. (2018) <doi:10.1111/1755-0998.12745>. Mijangos et al. (2022) <doi:10.1111/2041-210X.13918>.
Maintained by Bernd Gruber. Last updated 27 days ago.
5.0 match 1 stars 2.00 score 3 scriptspredictiveecology
LandR:Landscape Ecosystem Modelling in R
Utilities for 'LandR' suite of landscape simulation models. These models simulate forest vegetation dynamics based on LANDIS-II, and incorporate fire and insect disturbance, as well as other important ecological processes. Models are implemented as 'SpaDES' modules.
Maintained by Eliot J B McIntire. Last updated 4 days ago.
ecological-modellinglandscape-ecosystem-modellingspades
1.7 match 17 stars 6.07 score 12 scripts 4 dependentsnceas
metajam:Easily Download Data and Metadata from 'DataONE'
A set of tools to foster the development of reproducible analytical workflow by simplifying the download of data and metadata from 'DataONE' (<https://www.dataone.org>) and easily importing this information into R.
Maintained by Julien Brun. Last updated 7 months ago.
datadata-analysismetadatarepositories
1.1 match 16 stars 8.21 score 75 scriptsbristol-vaccine-centre
avoncap:AvonCap Study Analysis
A WIP set of functions allowing data load, wrangling of the AvonCap data set.
Maintained by Rob Challen. Last updated 3 months ago.
3.6 match 2.34 score 11 scriptsnlmixr2
lbfgsb3c:Limited Memory BFGS Minimizer with Bounds on Parameters with optim() 'C' Interface
Interfacing to Nocedal et al. L-BFGS-B.3.0 (See <http://users.iems.northwestern.edu/~nocedal/lbfgsb.html>) limited memory BFGS minimizer with bounds on parameters. This is a fork of 'lbfgsb3'. This registers a 'R' compatible 'C' interface to L-BFGS-B.3.0 that uses the same function types and optimization as the optim() function (see writing 'R' extensions and source for details). This package also adds more stopping criteria as well as allowing the adjustment of more tolerances.
Maintained by Matthew L Fidler. Last updated 6 months ago.
1.1 match 1 stars 7.33 score 17 scripts 16 dependentsfamuvie
breedR:Statistical Methods for Forest Genetic Resources Analysts
Statistical tools to build predictive models for the breeders community. It aims to assess the genetic value of individuals under a number of situations, including spatial autocorrelation, genetic/environment interaction and competition. It is under active development as part of the Trees4Future project, particularly developed having forest genetic trials in mind. But can be used for animals or other situations as well.
Maintained by Facundo Muรฑoz. Last updated 8 months ago.
1.3 match 33 stars 5.44 score 24 scriptssbg
biocompute:Create and Manipulate BioCompute Objects
Tools to create, validate, and export BioCompute Objects described in King et al. (2019) <doi:10.17605/osf.io/h59uh>. Users can encode information in data frames, and compose BioCompute Objects from the domains defined by the standard. A checksum validator and a JSON schema validator are provided. This package also supports exporting BioCompute Objects as JSON, PDF, HTML, or 'Word' documents, and exporting to cloud-based platforms.
Maintained by Soner Koc. Last updated 9 months ago.
biocomputebiocompute-objectsbioinformaticsscience-communicationsevenbridgesstandardizationworkflow
1.7 match 3 stars 4.07 score 13 scriptsblernermhc
provGraphR:Creates Adjacency Matrices for Lineage Searches
Creates and manages a provenance graph corresponding to the provenance created by the 'rdtLite' package, which collects provenance from R scripts. 'rdtLite' is available on CRAN. The provenance format is an extension of the W3C PROV JSON format (<https://www.w3.org/Submission/2013/SUBM-prov-json-20130424/>). The extended JSON provenance format is described in <https://github.com/End-to-end-provenance/ExtendedProvJson>.
Maintained by Barbara Lerner. Last updated 3 years ago.
3.1 match 2.18 score 4 scripts 1 dependentsropensci
DataPackageR:Construct Reproducible Analytic Data Sets as R Packages
A framework to help construct R data packages in a reproducible manner. Potentially time consuming processing of raw data sets into analysis ready data sets is done in a reproducible manner and decoupled from the usual 'R CMD build' process so that data sets can be processed into R objects in the data package and the data package can then be shared, built, and installed by others without the need to repeat computationally costly data processing. The package maintains data provenance by turning the data processing scripts into package vignettes, as well as enforcing documentation and version checking of included data objects. Data packages can be version controlled on 'GitHub', and used to share data for manuscripts, collaboration and reproducible research.
Maintained by Dave Slager. Last updated 6 months ago.
0.5 match 156 stars 9.38 score 72 scriptsphilipmostert
PointedSDMs:Fit Models Derived from Point Processes to Species Distributions using 'inlabru'
Integrated species distribution modeling is a rising field in quantitative ecology thanks to significant rises in the quantity of data available, increases in computational speed and the proven benefits of using such models. Despite this, the general software to help ecologists construct such models in an easy-to-use framework is lacking. We therefore introduce the R package 'PointedSDMs': which provides the tools to help ecologists set up integrated models and perform inference on them. There are also functions within the package to help run spatial cross-validation for model selection, as well as generic plotting and predicting functions. An introduction to these methods is discussed in Issac, Jarzyna, Keil, Dambly, Boersch-Supan, Browning, Freeman, Golding, Guillera-Arroita, Henrys, Jarvis, Lahoz-Monfort, Pagel, Pescott, Schmucki, Simmonds and OโHara (2020) <doi:10.1016/j.tree.2019.08.006>.
Maintained by Philip Mostert. Last updated 2 months ago.
0.5 match 25 stars 8.57 score 50 scripts 1 dependentsgmbecker
switchr:Installing, Managing, and Switching Between Distinct Sets of Installed Packages
Provides an abstraction for managing, installing, and switching between sets of installed R packages. This allows users to maintain multiple package libraries simultaneously, e.g. to maintain strict, package-version-specific reproducibility of many analyses, or work within a development/production release paradigm. Introduces a generalized package installation process which supports multiple repository and non-repository sources and tracks package provenance.
Maintained by Gabriel Becker. Last updated 2 years ago.
0.5 match 59 stars 6.49 score 52 scriptsropensci
dendroNetwork:Create Networks of Dendrochronological Series using Pairwise Similarity
Creating dendrochronological networks based on the similarity between tree-ring series or chronologies. The package includes various functions to compare tree-ring curves building upon the 'dplR' package. The networks can be used to visualise and understand the relations between tree-ring curves. These networks are also very useful to estimate the provenance of wood as described in Visser (2021) <DOI:10.5334/jcaa.79> or wood-use within a structure/context/site as described in Visser and Vorst (2022) <DOI:10.1163/27723194-bja10014>.
Maintained by Ronald Visser. Last updated 1 months ago.
visualizationgraphandnetworkthirdpartyclientnetworkarchaeologydendrochronologydendroprovenancenetwork-analysistree-rings
0.5 match 7 stars 6.05 score 9 scriptsbioc
ppcseq:Probabilistic Outlier Identification for RNA Sequencing Generalized Linear Models
Relative transcript abundance has proven to be a valuable tool for understanding the function of genes in biological systems. For the differential analysis of transcript abundance using RNA sequencing data, the negative binomial model is by far the most frequently adopted. However, common methods that are based on a negative binomial model are not robust to extreme outliers, which we found to be abundant in public datasets. So far, no rigorous and probabilistic methods for detection of outliers have been developed for RNA sequencing data, leaving the identification mostly to visual inspection. Recent advances in Bayesian computation allow large-scale comparison of observed data against its theoretical distribution given in a statistical model. Here we propose ppcseq, a key quality-control tool for identifying transcripts that include outlier data points in differential expression analysis, which do not follow a negative binomial distribution. Applying ppcseq to analyse several publicly available datasets using popular tools, we show that from 3 to 10 percent of differentially abundant transcripts across algorithms and datasets had statistics inflated by the presence of outliers.
Maintained by Stefano Mangiola. Last updated 5 months ago.
rnaseqdifferentialexpressiongeneexpressionnormalizationclusteringqualitycontrolsequencingtranscriptiontranscriptomicsbayesian-inferencedeseq2edgernegative-binomialoutlierstancpp
0.5 match 7 stars 5.65 score 16 scriptslarsvancutsem
piratings:Calculate Pi Ratings for Teams Competing in Sport Matches
Calculate and optimize dynamic performance ratings of association football teams competing in matches, in accordance with the method used in the research paper "Determining the level of ability of football teams by dynamic ratings based on the relative discrepancies in scores between adversaries", by dr. Constantinou and dr. Fenton. This dynamic rating system has proven to provide superior results for predicting association football outcomes. The research paper can be found here: (<http://www.constantinou.info/downloads/papers/pi-ratings.pdf>).
Maintained by Lars Van Cutsem. Last updated 6 years ago.
0.5 match 13 stars 5.29 score 9 scriptstidylab
R6P:Design Patterns in R
Build robust and maintainable software with object-oriented design patterns in R. Design patterns abstract and present in neat, well-defined components and interfaces the experience of many software designers and architects over many years of solving similar problems. These are solutions that have withstood the test of time with respect to re-usability, flexibility, and maintainability. 'R6P' provides abstract base classes with examples for a few known design patterns. The patterns were selected by their applicability to analytic projects in R. Using these patterns in R projects have proven effective in dealing with the complexity that data-driven applications possess.
Maintained by Harel Lustiger. Last updated 3 months ago.
0.5 match 10 stars 4.88 score 2 scripts 5 dependentsfairdatapipeline
rDataPipeline:Functions to Interact with the 'FAIR Data Pipeline'
R implementation of the 'FAIR Data Pipeline API'. The 'FAIR Data Pipeline' is intended to enable tracking of provenance of FAIR (findable, accessible and interoperable) data used in epidemiological modelling.
Maintained by Ryan Field. Last updated 3 months ago.
0.5 match 4 stars 4.52 score 11 scriptskrashkov
pcSteiner:Convenient Tool for Solving the Prize-Collecting Steiner Tree Problem
The Prize-Collecting Steiner Tree problem asks to find a subgraph connecting a given set of vertices with the most expensive nodes and least expensive edges. Since it is proven to be NP-hard, exact and efficient algorithm does not exist. This package provides convenient functionality for obtaining an approximate solution to this problem using loopy belief propagation algorithm.
Maintained by Aleksei Krasikov. Last updated 5 years ago.
graph-algorithmsr-languagesteiner-treesteiner-tree-problem
0.5 match 2 stars 4.00 score 3 scriptscboettig
taxalight:A Lightweight and Lightning-Fast Taxonomic Naming Interface
Creates a local Lightning Memory-Mapped Database ('LMDB') of many commonly used taxonomic authorities and provides functions that can quickly query this data. Supported taxonomic authorities include the Integrated Taxonomic Information System ('ITIS'), National Center for Biotechnology Information ('NCBI'), Global Biodiversity Information Facility ('GBIF'), Catalogue of Life ('COL'), and Open Tree Taxonomy ('OTT'). Name and identifier resolution using 'LMDB' can be hundreds of times faster than either relational databases or internet-based queries. Precise data provenance information for data derived from naming providers is also included.
Maintained by Carl Boettiger. Last updated 4 years ago.
0.5 match 5 stars 3.40 score 4 scriptswpihongzhang
TFisher:Optimal Thresholding Fisher's P-Value Combination Method
We provide the cumulative distribution function (CDF), quantile, and statistical power calculator for a collection of thresholding Fisher's p-value combination methods, including Fisher's p-value combination method, truncated product method and, in particular, soft-thresholding Fisher's p-value combination method which is proven to be optimal in some context of signal detection. The p-value calculator for the omnibus version of these tests are also included. For reference, please see Hong Zhang and Zheyang Wu. "TFisher Tests: Optimal and Adaptive Thresholding for Combining p-Values", submitted.
Maintained by Hong Zhang. Last updated 7 years ago.
0.5 match 3.34 score 18 scripts 15 dependentskorydjohnson
rai:Revisiting-Alpha-Investing for Polynomial Regression
A modified implementation of stepwise regression that greedily searches the space of interactions among features in order to build polynomial regression models. Furthermore, the hypothesis tests conducted are valid-post model selection due to the use of a revisiting procedure that implements an alpha-investing rule. As a result, the set of rejected sequential hypotheses is proven to control the marginal false discover rate. When not searching for polynomials, the package provides a statistically valid algorithm to run and terminate stepwise regression. For more information, see Johnson, Stine, and Foster (2019) <arXiv:1510.06322>.
Maintained by Kory D. Johnson. Last updated 3 years ago.
0.5 match 3 stars 3.18 score 7 scriptsaudreh
mergeTrees:Aggregating Trees
Aggregates a set of trees with the same leaves to create a consensus tree. The trees are typically obtained via hierarchical clustering, hence the hclust format is used to encode both the aggregated trees and the final consensus tree. The method is exact and proven to be O(nqlog(n)), n being the individuals and q being the number of trees to aggregate.
Maintained by Audrey Hulot. Last updated 6 years ago.
0.5 match 2 stars 3.00 score 7 scriptsivanlizaga
fingerPro:Sediment Source Fingerprinting
Quantifies the provenance of the sediments in a catchment or study area. Based on a comprehensive characterization of the sediment sources and the end sediment mixtures a mixing model algorithm is applied to the sediment mixtures in order to estimate the relative contribution of each potential source. The package includes several statistical methods such as Kruskal-Wallis test, discriminant function analysis ('DFA'), principal component plot ('PCA') to select the optimal subset of tracer properties. The variability within each sediment source is also considered to estimate the statistical distribution of the sources contribution.
Maintained by Ivan Lizaga. Last updated 7 years ago.
0.5 match 1.11 score 13 scriptsghahn-hsph
fastOnlineCpt:Online Multivariate Changepoint Detection
Implementation of a simple algorithm designed for online multivariate changepoint detection of a mean in sparse changepoint settings. The algorithm is based on a modified cusum statistic and guarantees control of the type I error on any false discoveries, while featuring O(1) time and O(1) memory updates per series as well as a proven detection delay.
Maintained by Georg Hahn. Last updated 4 years ago.
0.5 match 1.00 score 8 scriptsaurora-torrente
briKmeans:Package for Brik, Fabrik and Fdebrik Algorithms to Initialise Kmeans
Implementation of the BRIk, FABRIk and FDEBRIk algorithms to initialise k-means. These methods are intended for the clustering of multivariate and functional data, respectively. They make use of the Modified Band Depth and bootstrap to identify appropriate initial seeds for k-means, which are proven to be better options than many techniques in the literature. Torrente and Romo (2021) <doi:10.1007/s00357-020-09372-3> It makes use of the functions kma and kma.similarity, from the archived package fdakma, by Alice Parodi et al.
Maintained by Aurora Torrente. Last updated 3 years ago.
0.5 match 1.00 scorecran
kpcaIG:Variables Interpretability with Kernel PCA
The kernelized version of principal component analysis (KPCA) has proven to be a valid nonlinear alternative for tackling the nonlinearity of biological sample spaces. However, it poses new challenges in terms of the interpretability of the original variables. 'kpcaIG' aims to provide a tool to select the most relevant variables based on the kernel PCA representation of the data as in Briscik et al. (2023) <doi:10.1186/s12859-023-05404-y>. It also includes functions for 2D and 3D visualization of the original variables (as arrows) into the kernel principal components axes, highlighting the contribution of the most important ones.
Maintained by Mitja Briscik. Last updated 8 months ago.
0.5 match 1 stars 1.00 score