Showing 200 of total 234 results (show query)

henrikbengtsson

R.utils:Various Programming Utilities

Utility functions useful when programming and developing R packages.

Maintained by Henrik Bengtsson. Last updated 1 years ago.

5.6 match 63 stars 13.74 score 5.7k scripts 814 dependents

r-spatial

spdep:Spatial Dependence: Weighting Schemes, Statistics

A collection of functions to create spatial weights matrix objects from polygon 'contiguities', from point patterns by distance and tessellations, for summarizing these objects, and for permitting their use in spatial data analysis, including regional aggregation by minimum spanning tree; a collection of tests for spatial 'autocorrelation', including global 'Morans I' and 'Gearys C' proposed by 'Cliff' and 'Ord' (1973, ISBN: 0850860369) and (1981, ISBN: 0850860814), 'Hubert/Mantel' general cross product statistic, Empirical Bayes estimates and 'Assunção/Reis' (1999) <doi:10.1002/(SICI)1097-0258(19990830)18:16%3C2147::AID-SIM179%3E3.0.CO;2-I> Index, 'Getis/Ord' G ('Getis' and 'Ord' 1992) <doi:10.1111/j.1538-4632.1992.tb00261.x> and multicoloured join count statistics, 'APLE' ('Li 'et al.' ) <doi:10.1111/j.1538-4632.2007.00708.x>, local 'Moran's I', 'Gearys C' ('Anselin' 1995) <doi:10.1111/j.1538-4632.1995.tb00338.x> and 'Getis/Ord' G ('Ord' and 'Getis' 1995) <doi:10.1111/j.1538-4632.1995.tb00912.x>, 'saddlepoint' approximations ('Tiefelsdorf' 2002) <doi:10.1111/j.1538-4632.2002.tb01084.x> and exact tests for global and local 'Moran's I' ('Bivand et al.' 2009) <doi:10.1016/j.csda.2008.07.021> and 'LOSH' local indicators of spatial heteroscedasticity ('Ord' and 'Getis') <doi:10.1007/s00168-011-0492-y>. The implementation of most of these measures is described in 'Bivand' and 'Wong' (2018) <doi:10.1007/s11749-018-0599-x>, with further extensions in 'Bivand' (2022) <doi:10.1111/gean.12319>. 'Lagrange' multiplier tests for spatial dependence in linear models are provided ('Anselin et al'. 1996) <doi:10.1016/0166-0462(95)02111-6>, as are 'Rao' score tests for hypothesised spatial 'Durbin' models based on linear models ('Koley' and 'Bera' 2023) <doi:10.1080/17421772.2023.2256810>. A local indicators for categorical data (LICD) implementation based on 'Carrer et al.' (2021) <doi:10.1016/j.jas.2020.105306> and 'Bivand et al.' (2017) <doi:10.1016/j.spasta.2017.03.003> was added in 1.3-7. From 'spdep' and 'spatialreg' versions >= 1.2-1, the model fitting functions previously present in this package are defunct in 'spdep' and may be found in 'spatialreg'.

Maintained by Roger Bivand. Last updated 1 months ago.

spatial-autocorrelationspatial-dependencespatial-weights

1.8 match 131 stars 16.59 score 6.0k scripts 106 dependents

green-striped-gecko

dartR:Importing and Analysing 'SNP' and 'Silicodart' Data Generated by Genome-Wide Restriction Fragment Analysis

Functions are provided that facilitate the import and analysis of 'SNP' (single nucleotide polymorphism) and 'silicodart' (presence/absence) data. The main focus is on data generated by 'DarT' (Diversity Arrays Technology), however, data from other sequencing platforms can be used once 'SNP' or related fragment presence/absence data from any source is imported. Genetic datasets are stored in a derived 'genlight' format (package 'adegenet'), that allows for a very compact storage of data and metadata. Functions are available for importing and exporting of 'SNP' and 'silicodart' data, for reporting on and filtering on various criteria (e.g. 'CallRate', heterozygosity, reproducibility, maximum allele frequency). Additional functions are available for visualization (e.g. Principle Coordinate Analysis) and creating a spatial representation using maps. 'dartR' supports also the analysis of 3rd party software package such as 'newhybrid', 'structure', 'NeEstimator' and 'blast'. Since version 2.0.3 we also implemented simulation functions, that allow to forward simulate 'SNP' dynamics under different population and evolutionary dynamics. Comprehensive tutorials and support can be found at our 'github' repository: github.com/green-striped-gecko/dartR/. If you want to cite 'dartR', you find the information by typing citation('dartR') in the console.

Maintained by Bernd Gruber. Last updated 5 days ago.

3.3 match 34 stars 7.41 score

thothorn

maxstat:Maximally Selected Rank Statistics

Maximally selected rank statistics with several p-value approximations.

Maintained by Torsten Hothorn. Last updated 8 years ago.

3.1 match 1 stars 7.69 score 107 scripts 59 dependents

brodieg

diffobj:Diffs for R Objects

Generate a colorized diff of two R objects for an intuitive visualization of their differences.

Maintained by Brodie Gaslam. Last updated 3 years ago.

diff

1.8 match 231 stars 13.17 score 107 scripts 494 dependents

fishr-core-team

FSA:Simple Fisheries Stock Assessment Methods

A variety of simple fish stock assessment methods.

Maintained by Derek H. Ogle. Last updated 2 months ago.

fishfisheriesfisheries-managementfisheries-stock-assessmentpopulation-dynamicsstock-assessment

1.7 match 69 stars 11.16 score 1.7k scripts 6 dependents

traversc

stringfish:Alt String Implementation

Provides an extendable, performant and multithreaded 'alt-string' implementation backed by 'C++' vectors and strings.

Maintained by Travers Ching. Last updated 5 months ago.

pcre2cpp

1.5 match 67 stars 10.14 score 14 scripts 57 dependents

mrc-ide

gonovax:Deterministic Compartmental Model of Gonorrhoea with Vaccination

Model for gonorrhoea vaccination, using odin.

Maintained by Lilith Whittles. Last updated 18 days ago.

3.0 match 3 stars 4.56 score

r-lib

marquee:Markdown Parser and Renderer for R Graphics

Provides the mean to parse and render markdown text with grid along with facilities to define the styling of the text.

Maintained by Thomas Lin Pedersen. Last updated 4 days ago.

cpp

1.2 match 86 stars 8.59 score 28 scripts 1 dependents

felixfan

FinCal:Time Value of Money, Time Series Analysis and Computational Finance

Package for time value of money calculation, time series analysis and computational finance.

Maintained by Felix Yanhui Fan. Last updated 8 years ago.

1.7 match 23 stars 6.14 score 203 scripts 1 dependents

glgrabow

spreval:Evaluation of Sprinkler Irrigation Uniformity and Efficiency

Processing and analysis of field collected or simulated sprinkler system catch data (depths) to characterize irrigation uniformity and efficiency using standard and other measures. Standard measures include the Christiansen coefficient of uniformity (CU) as found in Christiansen, J.E.(1942, ISBN:0138779295, "Irrigation by Sprinkling"); and distribution uniformity (DU), potential efficiency of the low quarter (PELQ), and application efficiency of the low quarter (AELQ) that are implementations of measures of the same notation in Keller, J. and Merriam, J.L. (1978) "Farm Irrigation System Evaluation: A Guide for Management" <https://pdf.usaid.gov/pdf_docs/PNAAG745.pdf>. spreval::DU.lh is similar to spreval::DU but is the distribution uniformity of the low half instead of low quarter as in DU. spreval::PELQT is a version of spreval::PELQ adapted for traveling systems instead of lateral move or solid-set sprinkler systems. The function spreval::eff is analogous to the method used to compute application efficiency for furrow irrigation presented in Walker, W. and Skogerboe, G.V. (1987,ISBN:0138779295, "Surface Irrigation: Theory and Practice"),that uses piecewise integration of infiltrated depth compared against soil-moisture deficit (SMD), when the argument "target" is set equal to SMD. The other functions contained in the package provide graphical representation of sprinkler system uniformity, and other standard univariate parametric and non-parametric statistical measures as applied to sprinkler system catch depths. A sample data set of field test data spreval::catchcan (catch depths) is provided and is used in examples and vignettes. Agricultural systems emphasized, but this package can be used for landscape irrigation evaluation, and a landscape (turf) vignette is included as an example application.

Maintained by Garry Grabow. Last updated 3 years ago.

1.7 match 4.30 score 9 scripts

poissonconsulting

subfoldr2:Save and Load R Objects

Facilitates saving and loading R objects, data frames, tables, plots, text blocks and numbers to subfolders.

Maintained by Joe Thorley. Last updated 30 days ago.

1.8 match 2 stars 3.70 score 5 scripts

jimbrig

jimstools:Tools for R

What the package does (one paragraph).

Maintained by Jimmy Briggs. Last updated 3 years ago.

functionspersonalutility

1.8 match 2 stars 3.00 score 2 scripts

ropensci

git2rdata:Store and Retrieve Data.frames in a Git Repository

The git2rdata package is an R package for writing and reading dataframes as plain text files. A metadata file stores important information. 1) Storing metadata allows to maintain the classes of variables. By default, git2rdata optimizes the data for file storage. The optimization is most effective on data containing factors. The optimization makes the data less human readable. The user can turn this off when they prefer a human readable format over smaller files. Details on the implementation are available in vignette("plain_text", package = "git2rdata"). 2) Storing metadata also allows smaller row based diffs between two consecutive commits. This is a useful feature when storing data as plain text files under version control. Details on this part of the implementation are available in vignette("version_control", package = "git2rdata"). Although we envisioned git2rdata with a git workflow in mind, you can use it in combination with other version control systems like subversion or mercurial. 3) git2rdata is a useful tool in a reproducible and traceable workflow. vignette("workflow", package = "git2rdata") gives a toy example. 4) vignette("efficiency", package = "git2rdata") provides some insight into the efficiency of file storage, git repository size and speed for writing and reading.

Maintained by Thierry Onkelinx. Last updated 2 months ago.

reproducible-researchversion-control

0.5 match 99 stars 10.03 score 216 scripts 4 dependents

bristol-vaccine-centre

avoncap:AvonCap Study Analysis

A WIP set of functions allowing data load, wrangling of the AvonCap data set.

Maintained by Rob Challen. Last updated 4 months ago.

1.8 match 2.34 score 11 scripts

metinbulus

pwrss:Statistical Power and Sample Size Calculation Tools

Statistical power and minimum required sample size calculations for (1) testing a proportion (one-sample) against a constant, (2) testing a mean (one-sample) against a constant, (3) testing difference between two proportions (independent samples), (4) testing difference between two means or groups (parametric and non-parametric tests for independent and paired samples), (5) testing a correlation (one-sample) against a constant, (6) testing difference between two correlations (independent samples), (7) testing a single coefficient in multiple linear regression, logistic regression, and Poisson regression (with standardized or unstandardized coefficients, with no covariates or covariate adjusted), (8) testing an indirect effect (with standardized or unstandardized coefficients, with no covariates or covariate adjusted) in the mediation analysis (Sobel, Joint, and Monte Carlo tests), (9) testing an R-squared against zero in linear regression, (10) testing an R-squared difference against zero in hierarchical regression, (11) testing an eta-squared or f-squared (for main and interaction effects) against zero in analysis of variance (could be one-way, two-way, and three-way), (12) testing an eta-squared or f-squared (for main and interaction effects) against zero in analysis of covariance (could be one-way, two-way, and three-way), (13) testing an eta-squared or f-squared (for between, within, and interaction effects) against zero in one-way repeated measures analysis of variance (with non-sphericity correction and repeated measures correlation), and (14) testing goodness-of-fit or independence for contingency tables. Alternative hypothesis can be formulated as "not equal", "less", "greater", "non-inferior", "superior", or "equivalent" in (1), (2), (3), and (4); as "not equal", "less", or "greater" in (5), (6), (7) and (8); but always as "greater" in (9), (10), (11), (12), (13), and (14). Reference: Bulus and Polat (2023) <https://osf.io/ua5fc>.

Maintained by Metin Bulus. Last updated 4 months ago.

0.8 match 1 stars 4.67 score 57 scripts

olgaviedma

LadderFuelsR:Automated Tool for Vertical Fuel Continuity Analysis using Airborne Laser Scanning Data

Set of tools for analyzing vertical fuel continuity at the tree level using Airborne Laser Scanning data. The workflow consisted of: 1) calculating the vertical height profiles of each segmented tree; 2) identifying gaps and fuel layers; 3) estimating the distance between fuel layers; and 4) retrieving the fuel layers base height and depth. Additionally, other functions recalculate previous metrics after considering distances greater than certain threshold. Moreover, the package calculates: i) the percentage of Leaf Area Density comprised in each fuel layer, ii) remove fuel layers with Leaf Area Density (LAD) percentage less than 10, and iii) recalculate the distances among the reminder ones. On the other hand, it identifies the crown base height (CBH) based on different criteria: the fuel layer with the highest LAD percentage and the fuel layers located at the largest- and at the last-distance. When there is only one fuel layer, it also identifies the CBH performing a segmented linear regression (breaking points) on the cumulative sum of LAD as a function of height. Finally, a collection of plotting functions is developed to represent: i) the initial gaps and fuel layers; ii) the fuels base height, depths and gaps with distances greater than certain threshold and, iii) the CBH based on different criteria. The methods implemented in this package are original and have not been published elsewhere.

Maintained by Olga Viedma. Last updated 5 months ago.

ladderfuelsr

0.5 match 7 stars 4.80 score 4 scripts

haghish

mlim:Single and Multiple Imputation with Automated Machine Learning

Machine learning algorithms have been used for performing single missing data imputation and most recently, multiple imputations. However, this is the first attempt for using automated machine learning algorithms for performing both single and multiple imputation. Automated machine learning is a procedure for fine-tuning the model automatic, performing a random search for a model that results in less error, without overfitting the data. The main idea is to allow the model to set its own parameters for imputing each variable separately instead of setting fixed predefined parameters to impute all variables of the dataset. Using automated machine learning, the package fine-tunes an Elastic Net (default) or Gradient Boosting, Random Forest, Deep Learning, Extreme Gradient Boosting, or Stacked Ensemble machine learning model (from one or a combination of other supported algorithms) for imputing the missing observations. This procedure has been implemented for the first time by this package and is expected to outperform other packages for imputing missing data that do not fine-tune their models. The multiple imputation is implemented via bootstrapping without letting the duplicated observations to harm the cross-validation procedure, which is the way imputed variables are evaluated. Most notably, the package implements automated procedure for handling imputing imbalanced data (class rarity problem), which happens when a factor variable has a level that is far more prevalent than the other(s). This is known to result in biased predictions, hence, biased imputation of missing data. However, the autobalancing procedure ensures that instead of focusing on maximizing accuracy (classification error) in imputing factor variables, a fairer procedure and imputation method is practiced.

Maintained by E. F. Haghish. Last updated 8 months ago.

automatic-machine-learningautomlclassimbalancedata-scienceelastic-netextreme-gradient-boostinggbmglmgradient-boostinggradient-boosting-machineimputationimputation-algorithmimputation-methodsmachine-learningmissing-datamultipleimputationstack-ensemble

0.5 match 31 stars 4.49 score 7 scripts

izmirlig

pwrFDR:FDR Power

Computing Average and TPX Power under various BHFDR type sequential procedures. All of these procedures involve control of some summary of the distribution of the FDP, e.g. the proportion of discoveries which are false in a given experiment. The most widely known of these, the BH-FDR procedure, controls the FDR which is the mean of the FDP. A lesser known procedure, due to Lehmann and Romano, controls the FDX, or probability that the FDP exceeds a user provided threshold. This is less conservative than FWE control procedures but much more conservative than the BH-FDR proceudre. This package and the references supporting it introduce a new procedure for controlling the FDX which we call the BH-FDX procedure. This procedure iteratively identifies, given alpha and lower threshold delta, an alpha* less than alpha at which BH-FDR guarantees FDX control. This uses asymptotic approximation and is only slightly more conservative than the BH-FDR procedure. Likewise, we can think of the power in multiple testing experiments in terms of a summary of the distribution of the True Positive Proportion (TPP), the portion of tests truly non-null distributed that are called significant. The package will compute power, sample size or any other missing parameter required for power defined as (i) the mean of the TPP (average power) or (ii) the probability that the TPP exceeds a given value, lambda, (TPX power) via asymptotic approximation. All supplied theoretical results are also obtainable via simulation. The suggested approach is to narrow in on a design via the theoretical approaches and then make final adjustments/verify the results by simulation. The theoretical results are described in Izmirlian, G (2020) Statistics and Probability letters, "<doi:10.1016/j.spl.2020.108713>", and an applied paper describing the methodology with a simulation study is in preparation. See citation("pwrFDR").

Maintained by Grant Izmirlian. Last updated 3 months ago.

0.8 match 2.58 score 19 scripts

meenakshi-kushwaha

mmaqshiny:Explore Air Quality Mobile-Monitoring Data

Mobile-monitoring or sensors on a mobile platform, is an increasingly popular approach to measure high-resolution pollution data at the street level. Coupled with location data, spatial visualization of air-quality parameters helps detect localized areas of high air pollution, also called hotspots. In this approach, portable sensors are mounted on a vehicle and driven on predetermined routes to collect high frequency data (1 Hz). 'mmaqshiny' is for analysing, visualizing and spatial mapping of high-resolution air-quality data collected by specific devices installed on a moving platform. 1 Hz data of PM2.5 (mass concentrations of particulate matter with size less than 2.5 microns), Black carbon mass concentrations (BC), ultra-fine particle number concentrations, carbon dioxide along with GPS coordinates and relative humidity (RH) data collected by popular portable instruments (TSI DustTrak-8530, Aethlabs microAeth-AE51, TSI CPC3007, LICOR Li-830, Garmin GPSMAP 64s, Omega USB RH probe respectively). It incorporates device specific cleaning and correction algorithms. RH correction is applied to DustTrak PM2.5 following the Chakrabarti et al., (2004) <doi:10.1016/j.atmosenv.2004.03.007>. Provision is given to add linear regression coefficients for correcting the PM2.5 data (if required). BC data will be cleaned for the vibration generated noise, by adopting the statistical procedure as explained in Apte et al., (2011) <doi:10.1016/j.atmosenv.2011.05.028>, followed by a loading correction as suggested by Ban-Weiss et al., (2009) <doi:10.1021/es8021039>. For the number concentration data, provision is given for dilution correction factor (if a diluter is used with CPC3007; default value is 1). The package joins the raw, cleaned and corrected data from the above said instruments and outputs as a downloadable csv file.

Maintained by Adithi R. Upadhya. Last updated 3 years ago.

0.5 match 5 stars 3.70 score 4 scripts

cran

meerva:Analysis of Data with Measurement Error Using a Validation Subsample

Sometimes data for analysis are obtained using more convenient or less expensive means yielding "surrogate" variables for what could be obtained more accurately, albeit with less convenience; or less conveniently or at more expense yielding "reference" variables, thought of as being measured without error. Analysis of the surrogate variables measured with error generally yields biased estimates when the objective is to make inference about the reference variables. Often it is thought that ignoring the measurement error in surrogate variables only biases effects toward the null hypothesis, but this need not be the case. Measurement errors may bias parameter estimates either toward or away from the null hypothesis. If one has a data set with surrogate variable data from the full sample, and also reference variable data from a randomly selected subsample, then one can assess the bias introduced by measurement error in parameter estimation, and use this information to derive improved estimates based upon all available data. Formulaically these estimates based upon the reference variables from the validation subsample combined with the surrogate variables from the whole sample can be interpreted as starting with the estimate from reference variables in the validation subsample, and "augmenting" this with additional information from the surrogate variables. This suggests the term "augmented" estimate. The meerva package calculates these augmented estimates in the regression setting when there is a randomly selected subsample with both surrogate and reference variables. Measurement errors may be differential or non-differential, in any or all predictors (simultaneously) as well as outcome. The augmented estimates derive, in part, from the multivariate correlation between regression model parameter estimates from the reference variables and the surrogate variables, both from the validation subset. Because the validation subsample is chosen at random any biases imposed by measurement error, whether non-differential or differential, are reflected in this correlation and these correlations can be used to derive estimates for the reference variables using data from the whole sample. The main functions in the package are meerva.fit which calculates estimates for a dataset, and meerva.sim.block which simulates multiple datasets as described by the user, and analyzes these datasets, storing the regression coefficient estimates for inspection. The augmented estimates, as well as how measurement error may arise in practice, is described in more detail by Kremers WK (2021) <arXiv:2106.14063> and is an extension of the works by Chen Y-H, Chen H. (2000) <doi:10.1111/1467-9868.00243>, Chen Y-H. (2002) <doi:10.1111/1467-9868.00324>, Wang X, Wang Q (2015) <doi:10.1016/j.jmva.2015.05.017> and Tong J, Huang J, Chubak J, et al. (2020) <doi:10.1093/jamia/ocz180>.

Maintained by Walter K Kremers. Last updated 3 years ago.

0.9 match 2.00 score

bioc

transcriptR:An Integrative Tool for ChIP- And RNA-Seq Based Primary Transcripts Detection and Quantification

The differences in the RNA types being sequenced have an impact on the resulting sequencing profiles. mRNA-seq data is enriched with reads derived from exons, while GRO-, nucRNA- and chrRNA-seq demonstrate a substantial broader coverage of both exonic and intronic regions. The presence of intronic reads in GRO-seq type of data makes it possible to use it to computationally identify and quantify all de novo continuous regions of transcription distributed across the genome. This type of data, however, is more challenging to interpret and less common practice compared to mRNA-seq. One of the challenges for primary transcript detection concerns the simultaneous transcription of closely spaced genes, which needs to be properly divided into individually transcribed units. The R package transcriptR combines RNA-seq data with ChIP-seq data of histone modifications that mark active Transcription Start Sites (TSSs), such as, H3K4me3 or H3K9/14Ac to overcome this challenge. The advantage of this approach over the use of, for example, gene annotations is that this approach is data driven and therefore able to deal also with novel and case specific events. Furthermore, the integration of ChIP- and RNA-seq data allows the identification all known and novel active transcription start sites within a given sample.

Maintained by Armen R. Karapetyan. Last updated 5 months ago.

immunooncologytranscriptionsoftwaresequencingrnaseqcoverage

0.5 match 3.30 score 2 scripts

cmclean5

rSpectral:Spectral Modularity Clustering

Implements the network clustering algorithm described in Newman (2006) <doi:10.1103/PhysRevE.74.036104>. The complete iterative algorithm comprises of two steps. In the first step, the network is expressed in terms of its leading eigenvalue and eigenvector and recursively partition into two communities. Partitioning occurs if the maximum positive eigenvalue is greater than the tolerance (10e-5) for the current partition, and if it results in a positive contribution to the Modularity. Given an initial separation using the leading eigen step, 'rSpectral' then continues to maximise for the change in Modularity using a fine-tuning step - or variate thereof. The first stage here is to find the node which, when moved from one community to another, gives the maximum change in Modularity. This node’s community is then fixed and we repeat the process until all nodes have been moved. The whole process is repeated from this new state until the change in the Modularity, between the new and old state, is less than the predefined tolerance. A slight variant of the fine-tuning step, which can improve speed of the calculation, is also provided. Instead of moving each node into each community in turn, we only consider moves of neighbouring nodes, found in different communities, to the community of the current node of interest. The two steps process is repeatedly applied to each new community found, subdivided each community into two new communities, until we are unable to find any division that results in a positive change in Modularity.

Maintained by Anatoly Sorokin. Last updated 2 years ago.

openblascpp

0.5 match 1 stars 3.18 score 9 scripts 1 dependents