R-universe search: centroiding

braverock

PortfolioAnalytics:Portfolio Analysis, Including Numerical Methods for Optimization of Portfolios

Portfolio optimization and analysis routines and graphics.

Maintained by Brian G. Peterson. Last updated 4 hours ago.

16.9 match 87 stars 11.60 score 626 scripts 2 dependents

bnaras

pamr:Pam: Prediction Analysis for Microarrays

Some functions for sample classification in microarrays.

Maintained by Balasubramanian Narasimhan. Last updated 9 months ago.

22.8 match 7.98 score 256 scripts 14 dependents

asardaes

dtwclust:Time Series Clustering Along with Optimizations for the Dynamic Time Warping Distance

Time series clustering along with optimized techniques related to the Dynamic Time Warping distance and its corresponding lower bounds. Implementations of partitional, hierarchical, fuzzy, k-Shape and TADPole clustering are available. Functionality can be easily extended with custom distance measures and centroid definitions. Implementations of DTW barycenter averaging, a distance based on global alignment kernels, and the soft-DTW distance and centroid routines are also provided. All included distance functions have custom loops optimized for the calculation of cross-distance matrices, including parallelization support. Several cluster validity indices are included.

Maintained by Alexis Sarda. Last updated 8 months ago.

clustering dtw time-series openblas cpp

11.7 match 262 stars 12.35 score 406 scripts 14 dependents

bioc

Spectra:Spectra Infrastructure for Mass Spectrometry Data

The Spectra package defines an efficient infrastructure for storing and handling mass spectrometry spectra and functionality to subset, process, visualize and compare spectra data. It provides different implementations (backends) to store mass spectrometry data. These comprise backends tuned for fast data access and processing and backends for very large data sets ensuring a small memory footprint.

Maintained by RforMassSpectrometry Package Maintainer. Last updated 25 days ago.

infrastructure proteomics massspectrometry metabolomics bioconductor hacktoberfest mass-spectrometry

10.0 match 41 stars 13.01 score 254 scripts 35 dependents

itsleeds

od:Manipulate and Map Origin-Destination Data

The aim of 'od' is to provide tools and example datasets for working with origin-destination ('OD') datasets of the type used to describe aggregate urban mobility patterns (Carey et al. 1981) <doi:10.1287/trsc.15.1.32>. The package builds on functions for working with 'OD' data in the package 'stplanr', (Lovelace and Ellison 2018) <doi:10.32614/RJ-2018-053> with a focus on computational efficiency and support for the 'sf' class system (Pebesma 2018) <doi:10.32614/RJ-2018-009>. With few dependencies and a simple class system based on data frames, the package is intended to facilitate efficient analysis of 'OD' datasets and to provide a place for developing new functions. The package enables the creation and analysis of geographic entities representing large scale mobility patterns, from daily travel between zones in cities to migration between countries.

Maintained by Robin Lovelace. Last updated 6 months ago.

15.0 match 33 stars 8.50 score 96 scripts 6 dependents

satijalab

SeuratObject:Data Structures for Single Cell Data

Defines S4 classes for single-cell genomic data and associated information, such as dimensionality reduction embeddings, nearest-neighbor graphs, and spatially-resolved coordinates. Provides data access methods and R-native hooks to ensure the Seurat object is familiar to other R users. See Satija R, Farrell J, Gennert D, et al (2015) <doi:10.1038/nbt.3192>, Macosko E, Basu A, Satija R, et al (2015) <doi:10.1016/j.cell.2015.05.002>, and Stuart T, Butler A, et al (2019) <doi:10.1016/j.cell.2019.05.031> for more details.

Maintained by Paul Hoffman. Last updated 2 years ago.

cpp

9.5 match 25 stars 11.69 score 1.2k scripts 88 dependents

ropensci

spatsoc:Group Animal Relocation Data by Spatial and Temporal Relationship

Detects spatial and temporal groups in GPS relocations (Robitaille et al. (2019) <doi:10.1111/2041-210X.13215>). It can be used to convert GPS relocations to gambit-of-the-group format to build proximity-based social networks In addition, the randomizations function provides data-stream randomization methods suitable for GPS data.

Maintained by Alec L. Robitaille. Last updated 2 months ago.

animal gps network social spatial

10.8 match 24 stars 9.97 score 145 scripts 3 dependents

bioc

Cardinal:A mass spectrometry imaging toolbox for statistical analysis

Implements statistical & computational tools for analyzing mass spectrometry imaging datasets, including methods for efficient pre-processing, spatial segmentation, and classification.

Maintained by Kylie Ariel Bemis. Last updated 3 months ago.

software infrastructure proteomics lipidomics massspectrometry imagingmassspectrometry immunooncology normalization clustering classification regression

10.3 match 48 stars 10.32 score 200 scripts

rspatial

terra:Spatial Data Analysis

Methods for spatial data analysis with vector (points, lines, polygons) and raster (grid) data. Methods for vector data include geometric operations such as intersect and buffer. Raster methods include local, focal, global, zonal and geometric operations. The predict and interpolate methods facilitate the use of regression type (interpolation, machine learning) models for spatial prediction, including with satellite remote sensing data. Processing of very large files is supported. See the manual and tutorials on <https://rspatial.org/> to get started. 'terra' replaces the 'raster' package ('terra' can do more, and it is faster and easier to use).

Maintained by Robert J. Hijmans. Last updated 3 hours ago.

geospatial raster spatial vector onetbb proj gdal geos cpp

6.0 match 560 stars 17.62 score 17k scripts 857 dependents

ludvigolsen

rearrr:Rearranging Data

Arrange data by a set of methods. Use rearrangers to reorder data points and mutators to change their values. From basic utilities, to centering the greatest value, to swirling in 3-dimensional space, 'rearrr' enables creativity when plotting and experimenting with data.

Maintained by Ludvig Renbo Olsen. Last updated 26 days ago.

arrange cluster expand forming generate ggplot2 order plotting-in-r roll rotate shaping swirl transformations

10.6 match 24 stars 7.26 score 128 scripts 8 dependents

andrewljackson

SIBER:Stable Isotope Bayesian Ellipses in R

Fits bi-variate ellipses to stable isotope data using Bayesian inference with the aim being to describe and compare their isotopic niche.

Maintained by Andrew Jackson. Last updated 10 months ago.

community-ecology ecology niche-modelling stable-isotopes jags cpp

8.4 match 37 stars 9.15 score 187 scripts 1 dependents

bioc

MSnbase:Base Functions and Classes for Mass Spectrometry and Proteomics

MSnbase provides infrastructure for manipulation, processing and visualisation of mass spectrometry and proteomics data, ranging from raw to quantitative and annotated data.

Maintained by Laurent Gatto. Last updated 18 days ago.

immunooncology infrastructure proteomics massspectrometry qualitycontrol dataimport bioconductor bioinformatics mass-spectrometry proteomics-data visualisation cpp

6.0 match 131 stars 12.76 score 772 scripts 36 dependents

jhstaudacher

CoopGame:Important Concepts of Cooperative Game Theory

The theory of cooperative games with transferable utility offers useful insights into the way parties can share gains from cooperation and secure sustainable agreements, see e.g. one of the books by Chakravarty, Mitra and Sarkar (2015, ISBN:978-1107058798) or by Driessen (1988, ISBN:978-9027727299) for more details. A comprehensive set of tools for cooperative game theory with transferable utility is provided. Users can create special families of cooperative games, like e.g. bankruptcy games, cost sharing games and weighted voting games. There are functions to check various game properties and to compute five different set-valued solution concepts for cooperative games. A large number of point-valued solution concepts is available reflecting the diverse application areas of cooperative game theory. Some of these point-valued solution concepts can be used to analyze weighted voting games and measure the influence of individual voters within a voting body. There are routines for visualizing both set-valued and point-valued solutions in the case of three or four players.

Maintained by Jochen Staudacher. Last updated 4 years ago.

18.1 match 4.10 score 424 scripts 1 dependents

iandryden

shapes:Statistical Shape Analysis

Routines for the statistical analysis of landmark shapes, including Procrustes analysis, graphical displays, principal components analysis, permutation and bootstrap tests, thin-plate spline transformation grids and comparing covariance matrices. See Dryden, I.L. and Mardia, K.V. (2016). Statistical shape analysis, with Applications in R (2nd Edition), John Wiley and Sons.

Maintained by Ian Dryden. Last updated 5 days ago.

8.5 match 7 stars 8.61 score 225 scripts 24 dependents

rspatial

geosphere:Spherical Trigonometry

Spherical trigonometry for geographic applications. That is, compute distances and related measures for angular (longitude/latitude) locations.

Maintained by Robert J. Hijmans. Last updated 6 months ago.

cpp

5.3 match 36 stars 13.80 score 5.7k scripts 119 dependents

tidymodels

recipes:Preprocessing and Feature Engineering Steps for Modeling

A recipe prepares your data for modeling. We provide an extensible framework for pipeable sequences of feature engineering steps provides preprocessing tools to be applied to data. Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets. The resulting processed output can then be used as inputs for statistical or machine learning models.

Maintained by Max Kuhn. Last updated 5 hours ago.

3.8 match 586 stars 18.79 score 7.2k scripts 381 dependents

ropensci

CoordinateCleaner:Automated Cleaning of Occurrence Records from Biological Collections

Automated flagging of common spatial and temporal errors in biological and paleontological collection data, for the use in conservation, ecology and paleontology. Includes automated tests to easily flag (and exclude) records assigned to country or province centroid, the open ocean, the headquarters of the Global Biodiversity Information Facility, urban areas or the location of biodiversity institutions (museums, zoos, botanical gardens, universities). Furthermore identifies per species outlier coordinates, zero coordinates, identical latitude/longitude and invalid coordinates. Also implements an algorithm to identify data sets with a significant proportion of rounded coordinates. Especially suited for large data sets. The reference for the methodology is: Zizka et al. (2019) <doi:10.1111/2041-210X.13152>.

Maintained by Alexander Zizka. Last updated 1 years ago.

6.0 match 82 stars 10.93 score 306 scripts 3 dependents

mikeblazanin

gcplyr:Wrangle and Analyze Growth Curve Data

Easy wrangling and model-free analysis of microbial growth curve data, as commonly output by plate readers. Tools for reshaping common plate reader outputs into 'tidy' formats and merging them with design information, making data easy to work with using 'gcplyr' and other packages. Also streamlines common growth curve processing steps, like smoothing and calculating derivatives, and facilitates model-free characterization and analysis of growth data. See methods at <https://mikeblazanin.github.io/gcplyr/>.

Maintained by Mike Blazanin. Last updated 2 months ago.

dplyr ggplot2 tidyverse

8.2 match 30 stars 7.53 score 75 scripts

neurodata

lolR:Linear Optimal Low-Rank Projection

Supervised learning techniques designed for the situation when the dimensionality exceeds the sample size have a tendency to overfit as the dimensionality of the data increases. To remedy this High dimensionality; low sample size (HDLSS) situation, we attempt to learn a lower-dimensional representation of the data before learning a classifier. That is, we project the data to a situation where the dimensionality is more manageable, and then are able to better apply standard classification or clustering techniques since we will have fewer dimensions to overfit. A number of previous works have focused on how to strategically reduce dimensionality in the unsupervised case, yet in the supervised HDLSS regime, few works have attempted to devise dimensionality reduction techniques that leverage the labels associated with the data. In this package and the associated manuscript Vogelstein et al. (2017) <arXiv:1709.01233>, we provide several methods for feature extraction, some utilizing labels and some not, along with easily extensible utilities to simplify cross-validative efforts to identify the best feature extraction method. Additionally, we include a series of adaptable benchmark simulations to serve as a standard for future investigative efforts into supervised HDLSS. Finally, we produce a comprehensive comparison of the included algorithms across a range of benchmark simulations and real data applications.

Maintained by Eric Bridgeford. Last updated 4 years ago.

8.4 match 20 stars 7.28 score 80 scripts

itsleeds

pct:Propensity to Cycle Tool

Functions and example data to teach and increase the reproducibility of the methods and code underlying the Propensity to Cycle Tool (PCT), a research project and web application hosted at <https://www.pct.bike/>. For an academic paper on the methods, see Lovelace et al (2017) <doi:10.5198/jtlu.2016.862>.

Maintained by Robin Lovelace. Last updated 27 days ago.

8.9 match 20 stars 6.54 score

ropensci

geojsonio:Convert Data from and to 'GeoJSON' or 'TopoJSON'

Convert data to 'GeoJSON' or 'TopoJSON' from various R classes, including vectors, lists, data frames, shape files, and spatial classes. 'geojsonio' does not aim to replace packages like 'sp', 'rgdal', 'rgeos', but rather aims to be a high level client to simplify conversions of data from and to 'GeoJSON' and 'TopoJSON'.

Maintained by Michael Mahoney. Last updated 1 years ago.

geojson topojson geospatial conversion data input-output io

5.3 match 151 stars 10.83 score 2.9k scripts 13 dependents

bioc

SpatialFeatureExperiment:Integrating SpatialExperiment with Simple Features in sf

A new S4 class integrating Simple Features with the R package sf to bring geospatial data analysis methods based on vector data to spatial transcriptomics. Also implements management of spatial neighborhood graphs and geometric operations. This pakage builds upon SpatialExperiment and SingleCellExperiment, hence methods for these parent classes can still be used.

Maintained by Lambda Moses. Last updated 2 months ago.

datarepresentation transcriptomics spatial

6.0 match 49 stars 9.40 score 322 scripts 1 dependents

bioc

ProtGenerics:Generic infrastructure for Bioconductor mass spectrometry packages

S4 generic functions and classes needed by Bioconductor proteomics packages.

Maintained by Laurent Gatto. Last updated 3 months ago.

infrastructure proteomics massspectrometry bioconductor mass-spectrometry metabolomics

6.0 match 8 stars 9.36 score 4 scripts 188 dependents

momx

Momocs:Morphometrics using R

The goal of 'Momocs' is to provide a complete, convenient, reproducible and open-source toolkit for 2D morphometrics. It includes most common 2D morphometrics approaches on outlines, open outlines, configurations of landmarks, traditional morphometrics, and facilities for data preparation, manipulation and visualization with a consistent grammar throughout. It allows reproducible, complex morphometrics analyses and other morphometrics approaches should be easy to plug in, or develop from, on top of this canvas.

Maintained by Vincent Bonhomme. Last updated 1 years ago.

morphometrics

7.6 match 51 stars 7.42 score 346 scripts

jmsigner

amt:Animal Movement Tools

Manage and analyze animal movement data. The functionality of 'amt' includes methods to calculate home ranges, track statistics (e.g. step lengths, speed, or turning angles), prepare data for fitting habitat selection analyses, and simulation of space-use from fitted step-selection functions.

Maintained by Johannes Signer. Last updated 5 months ago.

5.3 match 41 stars 10.54 score 418 scripts

blosloos

enviPat:Isotope Pattern, Profile and Centroid Calculation for Mass Spectrometry

Fast and very memory-efficient calculation of isotope patterns, subsequent convolution to theoretical envelopes (profiles) plus valley detection and centroidization or intensoid calculation. Batch processing, resolution interpolation, wrapper, adduct calculations and molecular formula parsing. Loos, M., Gerber, C., Corona, F., Hollender, J., Singer, H. (2015) <doi:10.1021/acs.analchem.5b00941>.

Maintained by Martin Loos. Last updated 8 months ago.

8.7 match 7 stars 6.35 score 48 scripts 7 dependents

spatstat

spatstat.geom:Geometrical Functionality of the 'spatstat' Family

Defines spatial data types and supports geometrical operations on them. Data types include point patterns, windows (domains), pixel images, line segment patterns, tessellations and hyperframes. Capabilities include creation and manipulation of data (using command line or graphical interaction), plotting, geometrical operations (rotation, shift, rescale, affine transformation), convex hull, discretisation and pixellation, Dirichlet tessellation, Delaunay triangulation, pairwise distances, nearest-neighbour distances, distance transform, morphological operations (erosion, dilation, closing, opening), quadrat counting, geometrical measurement, geometrical covariance, colour maps, calculus on spatial domains, Gaussian blur, level sets of images, transects of images, intersections between objects, minimum distance matching. (Excludes spatial data on a network, which are supported by the package 'spatstat.linnet'.)

Maintained by Adrian Baddeley. Last updated 7 days ago.

classes-and-objects distance-calculation geometry geometry-processing images mensuration plotting point-patterns spatial-data spatial-data-analysis

4.5 match 7 stars 12.14 score 241 scripts 229 dependents

opengeos

whitebox:'WhiteboxTools' R Frontend

An R frontend for the 'WhiteboxTools' library, which is an advanced geospatial data analysis platform developed by Prof. John Lindsay at the University of Guelph's Geomorphometry and Hydrogeomatics Research Group. 'WhiteboxTools' can be used to perform common geographical information systems (GIS) analysis operations, such as cost-distance analysis, distance buffering, and raster reclassification. Remote sensing and image processing tasks include image enhancement (e.g. panchromatic sharpening, contrast adjustments), image mosaicing, numerous filtering operations, simple classification (k-means), and common image transformations. 'WhiteboxTools' also contains advanced tooling for spatial hydrological analysis (e.g. flow-accumulation, watershed delineation, stream network analysis, sink removal), terrain analysis (e.g. common terrain indices such as slope, curvatures, wetness index, hillshading; hypsometric analysis; multi-scale topographic position analysis), and LiDAR data processing. Suggested citation: Lindsay (2016) <doi:10.1016/j.cageo.2016.07.003>.

Maintained by Andrew Brown. Last updated 6 months ago.

geomorphometry geoprocessing geospatial gis hydrology remote-sensing rstudio

5.6 match 173 stars 9.65 score 203 scripts 2 dependents

srkobakian

sugarbag:Create Tessellated Hexagon Maps

Create a hexagon tile map display from spatial polygons. Each polygon is represented by a hexagon tile, placed as close to it's original centroid as possible, with a focus on maintaining spatial relationship to a focal point. Developed to aid visualisation and analysis of spatial distributions across Australia, which can be challenging due to the concentration of the population on the coast and wide open interior.

Maintained by Dianne Cook. Last updated 2 years ago.

7.8 match 42 stars 6.52 score 53 scripts

bioc

GSgalgoR:An Evolutionary Framework for the Identification and Study of Prognostic Gene Expression Signatures in Cancer

A multi-objective optimization algorithm for disease sub-type discovery based on a non-dominated sorting genetic algorithm. The 'Galgo' framework combines the advantages of clustering algorithms for grouping heterogeneous 'omics' data and the searching properties of genetic algorithms for feature selection. The algorithm search for the optimal number of clusters determination considering the features that maximize the survival difference between sub-types while keeping cluster consistency high.

Maintained by Carlos Catania. Last updated 5 months ago.

geneexpression transcription clustering classification survival

9.1 match 15 stars 5.48 score 6 scripts

swarm-lab

swaRm:Processing Collective Movement Data

Function library for processing collective movement data (e.g. fish schools, ungulate herds, baboon troops) collected from GPS trackers or computer vision tracking software.

Maintained by Simon Garnier. Last updated 1 years ago.

animal-behavior animal-behaviour collective-behavior collective-behaviour

8.9 match 21 stars 5.50 score 8 scripts 1 dependents

dadongz

OncoSubtype:Predict Cancer Subtypes Based on TCGA Data using Machine Learning Method

Provide functionality for cancer subtyping using nearest centroids or machine learning methods based on TCGA data.

Maintained by Dadong Zhang. Last updated 1 years ago.

12.5 match 1 stars 3.70 score 1 scripts

roelandkindt

BiodiversityR:Package for Community Ecology and Suitability Analysis

Graphical User Interface (via the R-Commander) and utility functions (often based on the vegan package) for statistical analysis of biodiversity and ecological communities, including species accumulation curves, diversity indices, Renyi profiles, GLMs for analysis of species abundance and presence-absence, distance matrices, Mantel tests, and cluster, constrained and unconstrained ordination analysis. A book on biodiversity and community ecology analysis is available for free download from the website. In 2012, methods for (ensemble) suitability modelling and mapping were expanded in the package.

Maintained by Roeland Kindt. Last updated 2 months ago.

6.2 match 17 stars 7.13 score 390 scripts 2 dependents

aiorazabala

qmethod:Analysis of Subjective Perspectives Using Q Methodology

Analysis of Q methodology, used to identify distinct perspectives existing within a group. This methodology is used across social, health and environmental sciences to understand diversity of attitudes, discourses, or decision-making styles (for more information, see <https://qmethod.org/>). A single function runs the full analysis. Each step can be run separately using the corresponding functions: for automatic flagging of Q-sorts (manual flagging is optional), for statement scores, for distinguishing and consensus statements, and for general characteristics of the factors. The package allows to choose either principal components or centroid factor extraction, manual or automatic flagging, a number of mathematical methods for rotation (or none), and a number of correlation coefficients for the initial correlation matrix, among many other options. Additional functions are available to import and export data (from raw *.CSV, 'HTMLQ' and 'FlashQ' *.CSV, 'PQMethod' *.DAT and 'easy-htmlq' *.JSON files), to print and plot, to import raw data from individual *.CSV files, and to make printable cards. The package also offers functions to print Q cards and to generate Q distributions for study administration. See further details in the package documentation, and in the web pages below, which include a cookbook, guidelines for more advanced analysis (how to perform manual flagging or change the sign of factors), data management, and a graphical user interface (GUI) for online and offline use.

Maintained by Aiora Zabala. Last updated 1 years ago.

7.2 match 38 stars 6.03 score 47 scripts

bioc

tidytof:Analyze High-dimensional Cytometry Data Using Tidy Data Principles

This package implements an interactive, scientific analysis pipeline for high-dimensional cytometry data built using tidy data principles. It is specifically designed to play well with both the tidyverse and Bioconductor software ecosystems, with functionality for reading/writing data files, data cleaning, preprocessing, clustering, visualization, modeling, and other quality-of-life functions. tidytof implements a "grammar" of high-dimensional cytometry data analysis.

Maintained by Timothy Keyes. Last updated 5 months ago.

singlecell flowcytometry bioinformatics cytometry data-science single-cell tidyverse cpp

6.0 match 18 stars 7.24 score 35 scripts

jeffreyevans

spatialEco:Spatial Analysis and Modelling Utilities

Utilities to support spatial data manipulation, query, sampling and modelling in ecological applications. Functions include models for species population density, spatial smoothing, multivariate separability, point process model for creating pseudo- absences and sub-sampling, Quadrant-based sampling and analysis, auto-logistic modeling, sampling models, cluster optimization, statistical exploratory tools and raster-based metrics.

Maintained by Jeffrey S. Evans. Last updated 29 days ago.

biodiversity conservation ecology r-spatial raster spatial vector

4.5 match 110 stars 9.55 score 736 scripts 2 dependents

robinlovelace

simodels:Flexible Framework for Developing Spatial Interaction Models

Develop spatial interaction models (SIMs). SIMs predict the amount of interaction, for example number of trips per day, between geographic entities representing trip origins and destinations. Contains functions for creating origin-destination datasets from geographic input datasets and calculating movement between origin-destination pairs with constrained, production-constrained, and attraction-constrained models (Wilson 1979) <doi:10.1068/a030001>.

Maintained by Robin Lovelace. Last updated 25 days ago.

6.0 match 18 stars 6.90 score 11 scripts

cpanse

protViz:Visualizing and Analyzing Mass Spectrometry Related Data in Proteomics

Helps with quality checks, visualizations and analysis of mass spectrometry data, coming from proteomics experiments. The package is developed, tested and used at the Functional Genomics Center Zurich <https://fgcz.ch>. We use this package mainly for prototyping, teaching, and having fun with proteomics data. But it can also be used to do data analysis for small scale data sets.

Maintained by Christian Panse. Last updated 1 years ago.

fun mass-spectrometry peptide-identification proteomics quantification visualization cpp

5.1 match 11 stars 7.88 score 72 scripts 2 dependents

cran

flexclust:Flexible Cluster Algorithms

The main function kcca implements a general framework for k-centroids cluster analysis supporting arbitrary distance measures and centroid computation. Further cluster methods include hard competitive learning, neural gas, and QT clustering. There are numerous visualization methods for cluster results (neighborhood graphs, convex cluster hulls, barcharts of centroids, ...), and bootstrap methods for the analysis of cluster stability.

Maintained by Bettina Grün. Last updated 1 months ago.

6.7 match 3 stars 5.99 score 53 dependents

barnhilldave

TML:Tropical Geometry Tools for Machine Learning

Suite of tropical geometric tools for use in machine learning applications. These methods may be summarized in the following references: Yoshida, et al. (2022) <arxiv:2209.15045>, Barnhill et al. (2023) <arxiv:2303.02539>, Barnhill and Yoshida (2023) <doi:10.3390/math11153433>, Aliatimis et al. (2023) <arXiv:2306.08796>, Yoshida et al. (2022) <arXiv:2206.04206>, and Yoshida et al. (2019) <doi:10.1007/s11538-018-0493-4>.

Maintained by David Barnhill. Last updated 8 months ago.

11.4 match 3 stars 3.48 score 1 scripts

adamlilith

fasterRaster:Faster Raster and Spatial Vector Processing Using 'GRASS GIS'

Processing of large-in-memory/large-on disk rasters and spatial vectors using 'GRASS GIS' <https://grass.osgeo.org/>. Most functions in the 'terra' package are recreated. Processing of medium-sized and smaller spatial objects will nearly always be faster using 'terra' or 'sf', but for large-in-memory/large-on-disk objects, 'fasterRaster' may be faster. To use most of the functions, you must have the stand-alone version (not the 'OSGeoW4' installer version) of 'GRASS GIS' 8.0 or higher.

Maintained by Adam B. Smith. Last updated 5 days ago.

aspect distance fragmentation fragmentation-indices gis grass grass-gis raster raster-projection rasterize slope topography vectorization

5.0 match 57 stars 7.68 score 8 scripts

ecospat

ecospat:Spatial Ecology Miscellaneous Methods

Collection of R functions and data sets for the support of spatial ecology analyses with a focus on pre, core and post modelling analyses of species distribution, niche quantification and community assembly. Written by current and former members and collaborators of the ecospat group of Antoine Guisan, Department of Ecology and Evolution (DEE) and Institute of Earth Surface Dynamics (IDYST), University of Lausanne, Switzerland. Read Di Cola et al. (2016) <doi:10.1111/ecog.02671> for details.

Maintained by Olivier Broennimann. Last updated 2 months ago.

4.0 match 32 stars 9.35 score 418 scripts 1 dependents

dernst

flexord:Flexible Clustering of Ordinal and Mixed-with-Ordinal Data

Extends the capabilities for flexible partitioning and model-based clustering available in the packages 'flexclust' and 'flexmix' to handle ordinal and mixed-with-ordinal data types via new distance, centroid and driver functions that make various assumptions regarding ordinality. Using them within the flex-scheme allows for easy comparisons across methods.

Maintained by Lena Ortega Menjivar. Last updated 6 days ago.

6.6 match 2 stars 5.51 score

vegandevs

vegan:Community Ecology Package

Ordination methods, diversity analysis and other functions for community and vegetation ecologists.

Maintained by Jari Oksanen. Last updated 1 months ago.

ecological-modelling ecology ordination fortran openblas

1.9 match 476 stars 19.40 score 15k scripts 445 dependents

emf-creaf

vegclust:Fuzzy Clustering of Vegetation Data

A set of functions to: (1) perform fuzzy clustering of vegetation data (De Caceres et al, 2010) <doi:10.1111/j.1654-1103.2010.01211.x>; (2) to assess ecological community similarity on the basis of structure and composition (De Caceres et al, 2013) <doi:10.1111/2041-210X.12116>.

Maintained by Miquel De Cáceres. Last updated 8 months ago.

5.7 match 2 stars 6.27 score 52 scripts 6 dependents

kylebittinger

usedist:Distance Matrix Utilities

Functions to re-arrange, extract, and work with distances.

Maintained by Kyle Bittinger. Last updated 10 months ago.

5.4 match 14 stars 6.63 score 169 scripts 6 dependents

topepo

caret:Classification and Regression Training

Misc functions for training and plotting classification and regression models.

Maintained by Max Kuhn. Last updated 4 months ago.

1.8 match 1.6k stars 19.24 score 61k scripts 303 dependents

bblonder

hypervolume:High Dimensional Geometry, Set Operations, Projection, and Inference Using Kernel Density Estimation, Support Vector Machines, and Convex Hulls

Estimates the shape and volume of high-dimensional datasets and performs set operations: intersection / overlap, union, unique components, inclusion test, and hole detection. Uses stochastic geometry approach to high-dimensional kernel density estimation, support vector machine delineation, and convex hull generation. Applications include modeling trait and niche hypervolumes and species distribution modeling.

Maintained by Benjamin Blonder. Last updated 2 months ago.

openblas cpp

3.5 match 23 stars 9.69 score 211 scripts 7 dependents

adafede

CentroidR:CentroidR

CentroidR provides the infrastructure to centroid profile spectra.

Maintained by Adriano Rutz. Last updated 3 days ago.

centroiding spectra

12.4 match 2.60 score

epivec

TDLM:Systematic Comparison of Trip Distribution Laws and Models

The main purpose of this package is to propose a rigorous framework to fairly compare trip distribution laws and models as described in Lenormand et al. (2016) <doi:10.1016/j.jtrangeo.2015.12.008>.

Maintained by Maxime Lenormand. Last updated 26 days ago.

6.6 match 2 stars 4.85 score 3 scripts

jmadinlab

habtools:Tools and Metrics for 3D Surfaces and Objects

A collection of functions for sampling and simulating 3D surfaces and objects and estimating metrics like rugosity, fractal dimension, convexity, sphericity, circularity, second moments of area and volume, and more.

Maintained by Nina Schiettekatte. Last updated 26 days ago.

5.2 match 12 stars 6.10 score 9 scripts

murrayefford

secr:Spatially Explicit Capture-Recapture

Functions to estimate the density and size of a spatially distributed animal population sampled with an array of passive detectors, such as traps, or by searching polygons or transects. Models incorporating distance-dependent detection are fitted by maximizing the likelihood. Tools are included for data manipulation and model selection.

Maintained by Murray Efford. Last updated 5 days ago.

cpp

3.0 match 3 stars 10.06 score 410 scripts 5 dependents

inseefr

btb:Beyond the Border - Kernel Density Estimation for Urban Geography

The kernelSmoothing() function allows you to square and smooth geolocated data. It calculates a classical kernel smoothing (conservative) or a geographically weighted median. There are four major call modes of the function. The first call mode is kernelSmoothing(obs, epsg, cellsize, bandwidth) for a classical kernel smoothing and automatic grid. The second call mode is kernelSmoothing(obs, epsg, cellsize, bandwidth, quantiles) for a geographically weighted median and automatic grid. The third call mode is kernelSmoothing(obs, epsg, cellsize, bandwidth, centroids) for a classical kernel smoothing and user grid. The fourth call mode is kernelSmoothing(obs, epsg, cellsize, bandwidth, quantiles, centroids) for a geographically weighted median and user grid. Geographically weighted summary statistics : a framework for localised exploratory data analysis, C.Brunsdon & al., in Computers, Environment and Urban Systems C.Brunsdon & al. (2002) <doi:10.1016/S0198-9715(01)00009-6>, Statistical Analysis of Spatial and Spatio-Temporal Point Patterns, Third Edition, Diggle, pp. 83-86, (2003) <doi:10.1080/13658816.2014.937718>.

Maintained by Solène Colin. Last updated 15 days ago.

statistical-package cpp

4.6 match 15 stars 6.50 score 15 scripts

ropensci

eph:Argentina's Permanent Household Survey Data and Manipulation Utilities

Tools to download and manipulate the Permanent Household Survey from Argentina (EPH is the Spanish acronym for Permanent Household Survey). e.g: get_microdata() for downloading the datasets, get_poverty_lines() for downloading the official poverty baskets, calculate_poverty() for the calculation of stating if a household is in poverty or not, following the official methodology. organize_panels() is used to concatenate observations from different periods, and organize_labels() adds the official labels to the data. The implemented methods are based on INDEC (2016) <http://www.estadistica.ec.gba.gov.ar/dpe/images/SOCIEDAD/EPH_metodologia_22_pobreza.pdf>. As this package works with the argentinian Permanent Household Survey and its main audience is from this country, the documentation was written in Spanish.

Maintained by Carolina Pradier. Last updated 8 months ago.

eph indec mercado-de-trabajo rstatses

3.5 match 59 stars 8.38 score 255 scripts

christopherkenny

geomander:Geographic Tools for Studying Gerrymandering

A compilation of tools to complete common tasks for studying gerrymandering. This focuses on the geographic tool side of common problems, such as linking different levels of spatial units or estimating how to break up units. Functions exist for creating redistricting-focused data for the US.

Maintained by Christopher T. Kenny. Last updated 1 months ago.

cpp

3.6 match 14 stars 7.81 score 191 scripts 1 dependents

roaldarbol

animovement:An R toolbox for analysing animal movement across space and time

An R toolbox for analysing animal movement across space and time.

Maintained by Mikkel Roald-Arbøl. Last updated 3 months ago.

animal-behaviour animal-movement neuroethology neuroscience

5.5 match 10 stars 4.81 score 8 scripts

tguillerme

dispRity:Measuring Disparity

A modular package for measuring disparity (multidimensional space occupancy). Disparity can be calculated from any matrix defining a multidimensional space. The package provides a set of implemented metrics to measure properties of the space and allows users to provide and test their own metrics. The package also provides functions for looking at disparity in a serial way (e.g. disparity through time) or per groups as well as visualising the results. Finally, this package provides several statistical tests for disparity analysis.

Maintained by Thomas Guillerme. Last updated 14 days ago.

disparity ecology multidimensionality palaeobiology

3.0 match 26 stars 8.65 score 220 scripts 1 dependents

jayanilakshika

quollr:Visualising How Nonlinear Dimension Reduction Warps Your Data

To construct a model in 2D space from 2D embedding data and then lift it to the high-dimensional space. Additionally, it provides tools to visualize the model in 2D space and to overlay the fitted model on data using the tour technique. Furthermore, it facilitates the generation of summaries of high-dimensional distributions.

Maintained by Jayani P.G. Lakshika. Last updated 2 days ago.

5.7 match 3 stars 4.48 score 7 scripts

cran

smfishHmrf:Hidden Markov Random Field for Spatial Transcriptomic Data

Discovery of spatial patterns with Hidden Markov Random Field. This package is designed for spatial transcriptomic data and single molecule fluorescent in situ hybridization (FISH) data such as sequential fluorescence in situ hybridization (seqFISH) and multiplexed error-robust fluorescence in situ hybridization (MERFISH). The methods implemented in this package are described in Zhu et al. (2018) <doi:10.1038/nbt.4260>.

Maintained by Qian Zhu. Last updated 4 years ago.

15.0 match 1.70 score

sapfluxnet

sapfluxnetr:Working with 'Sapfluxnet' Project Data

Access, modify, aggregate and plot data from the 'Sapfluxnet' project (<http://sapfluxnet.creaf.cat>), the first global database of sap flow measurements.

Maintained by Victor Granda. Last updated 2 years ago.

3.9 match 25 stars 6.57 score 49 scripts

benbruyneel

proteinDiscover:ProteinDiscover

Provides an interface to the data contained in Proteome Discoverer (Thermo Scientific) results.

Maintained by Ben Bruyneel. Last updated 1 years ago.

mass-spectrometry proteomics proteomics-data-analysis

7.5 match 2 stars 3.00 score 2 scripts

trelliscope

trelliscope:Create Interactive Multi-Panel Displays

Trelliscope enables interactive exploration of data frames of visualizations.

Maintained by Ryan Hafen. Last updated 7 months ago.

visualization

3.5 match 29 stars 6.43 score 117 scripts

ropengov

geofi:Access Finnish Geospatial Data

Designed to simplify geospatial data access from the Statistics Finland Web Feature Service API <https://geo.stat.fi/geoserver/index.html>, the geofi package offers researchers and analysts a set of tools to obtain and harmonize administrative spatial data for a wide range of applications, from urban planning to environmental research. The package contains annually updated time series of municipality key datasets that can be used for data aggregation and language translations.

Maintained by Markus Kainu. Last updated 2 months ago.

ropengov ggplot2

2.8 match 20 stars 8.17 score 61 scripts

kaiaragaki

classifyBLCA:What the Package Does (One Line, Title Case)

What the package does (one paragraph).

Maintained by Kai Aragaki. Last updated 2 years ago.

13.1 match 1.70 score

inlabru-org

fmesher:Triangle Meshes and Related Geometry Tools

Generate planar and spherical triangle meshes, compute finite element calculations for 1- and 2-dimensional flat and curved manifolds with associated basis function spaces, methods for lines and polygons, and transparent handling of coordinate reference systems and coordinate transformation, including 'sf' and 'sp' geometries. The core 'fmesher' library code was originally part of the 'INLA' package, and implements parts of "Triangulations and Applications" by Hjelle and Daehlen (2006) <doi:10.1007/3-540-33261-8>.

Maintained by Finn Lindgren. Last updated 18 hours ago.

cpp

1.9 match 16 stars 11.28 score 261 scripts 26 dependents

bioc

MsCoreUtils:Core Utils for Mass Spectrometry Data

MsCoreUtils defines low-level functions for mass spectrometry data and is independent of any high-level data structures. These functions include mass spectra processing functions (noise estimation, smoothing, binning, baseline estimation), quantitative aggregation functions (median polish, robust summarisation, ...), missing data imputation, data normalisation (quantiles, vsn, ...), misc helper functions, that are used across high-level data structure within the R for Mass Spectrometry packages.

Maintained by RforMassSpectrometry Package Maintainer. Last updated 11 days ago.

infrastructure proteomics massspectrometry metabolomics bioconductor mass-spectrometry utils

2.0 match 16 stars 10.57 score 41 scripts 71 dependents

teunbrand

ggh4x:Hacks for 'ggplot2'

A 'ggplot2' extension that does a variety of little helpful things. The package extends 'ggplot2' facets through customisation, by setting individual scales per panel, resizing panels and providing nested facets. Also allows multiple colour and fill scales per plot. Also hosts a smaller collection of stats, geoms and axis guides.

Maintained by Teun van den Brand. Last updated 13 days ago.

ggplot-extension ggplot2

1.5 match 617 stars 14.06 score 4.4k scripts 21 dependents

josiahparry

rsgeo:An Interface to Rust's 'geo' Library

An R interface to the GeoRust crates 'geo' and 'geo-types' providing access to geometry primitives and algorithms.

Maintained by Josiah Parry. Last updated 8 months ago.

rust cargo

5.3 match 47 stars 3.96 score 13 scripts

bioc

CelliD:Unbiased Extraction of Single Cell gene signatures using Multiple Correspondence Analysis

CelliD is a clustering-free multivariate statistical method for the robust extraction of per-cell gene signatures from single-cell RNA-seq. CelliD allows unbiased cell identity recognition across different donors, tissues-of-origin, model organisms and single-cell omics protocols. The package can also be used to explore functional pathways enrichment in single cell data.

Maintained by Akira Cortal. Last updated 5 months ago.

rnaseq singlecell dimensionreduction clustering genesetenrichment geneexpression atacseq openblas cpp openmp

4.3 match 4.85 score 70 scripts

pbs-software

PBSmapping:Mapping Fisheries Data and Spatial Analysis Tools

This software has evolved from fisheries research conducted at the Pacific Biological Station (PBS) in 'Nanaimo', British Columbia, Canada. It extends the R language to include two-dimensional plotting features similar to those commonly available in a Geographic Information System (GIS). Embedded C code speeds algorithms from computational geometry, such as finding polygons that contain specified point events or converting between longitude-latitude and Universal Transverse Mercator (UTM) coordinates. Additionally, we include 'C++' code developed by Angus Johnson for the 'Clipper' library, data for a global shoreline, and other data sets in the public domain. Under the user's R library directory '.libPaths()', specifically in './PBSmapping/doc', a complete user's guide is offered and should be consulted to use package functions effectively.

Maintained by Rowan Haigh. Last updated 6 months ago.

cpp

2.0 match 11 stars 10.16 score 652 scripts 9 dependents

mikejohnson51

AOI:Areas of Interest

A consistent tool kit for forward and reverse geocoding and defining boundaries for spatial analysis.

Maintained by Mike Johnson. Last updated 1 years ago.

aoi area-of-interest bounding-boxes gis spatial subset

4.0 match 37 stars 4.98 score 174 scripts 1 dependents

dusadrian

venn:Draw Venn Diagrams

A close to zero dependency package to draw and display Venn diagrams up to 7 sets, and any Boolean union of set intersections.

Maintained by Adrian Dusa. Last updated 6 months ago.

2.0 match 30 stars 9.90 score 508 scripts 13 dependents

plangfelder

WGCNA:Weighted Correlation Network Analysis

Functions necessary to perform Weighted Correlation Network Analysis on high-dimensional data as originally described in Horvath and Zhang (2005) <doi:10.2202/1544-6115.1128> and Langfelder and Horvath (2008) <doi:10.1186/1471-2105-9-559>. Includes functions for rudimentary data cleaning, construction of correlation networks, module identification, summarization, and relating of variables and modules to sample traits. Also includes a number of utility functions for data manipulation and visualization.

Maintained by Peter Langfelder. Last updated 6 months ago.

cpp

2.0 match 54 stars 9.65 score 5.3k scripts 32 dependents

bioc

CMA:Synthesis of microarray-based classification

This package provides a comprehensive collection of various microarray-based classification algorithms both from Machine Learning and Statistics. Variable Selection, Hyperparameter tuning, Evaluation and Comparison can be performed combined or stepwise in a user-friendly environment.

Maintained by Roman Hornung. Last updated 5 months ago.

classification decisiontree

3.8 match 5.09 score 61 scripts

bioc

matter:Out-of-core statistical computing and signal processing

Toolbox for larger-than-memory scientific computing and visualization, providing efficient out-of-core data structures using files or shared memory, for dense and sparse vectors, matrices, and arrays, with applications to nonuniformly sampled signals and images.

Maintained by Kylie A. Bemis. Last updated 4 months ago.

infrastructure datarepresentation dataimport dimensionreduction preprocessing cpp

2.0 match 57 stars 9.52 score 64 scripts 2 dependents

qile0317

APackOfTheClones:Visualization of Clonal Expansion for Single Cell Immune Profiles

Visualize clonal expansion via circle-packing. 'APackOfTheClones' extends 'scRepertoire' to produce a publication-ready visualization of clonal expansion at a single cell resolution, by representing expanded clones as differently sized circles. The method was originally implemented by Murray Christian and Ben Murrell in the following immunology study: Ma et al. (2021) <doi:10.1126/sciimmunol.abg6356>.

Maintained by Qile Yang. Last updated 4 months ago.

clonal-analysis immune-repertoire immune-system scrna-seq scrnaseq seurat single-cell single-cell-genomics cpp

2.9 match 15 stars 6.45 score 15 scripts

bioc

ggcyto:Visualize Cytometry data with ggplot

With the dedicated fortify method implemented for flowSet, ncdfFlowSet and GatingSet classes, both raw and gated flow cytometry data can be plotted directly with ggplot. ggcyto wrapper and some customed layers also make it easy to add gates and population statistics to the plot.

Maintained by Mike Jiang. Last updated 5 months ago.

immunooncology flowcytometry cellbasedassays infrastructure visualization

1.7 match 58 stars 11.25 score 362 scripts 5 dependents

welch-lab

rliger:Linked Inference of Genomic Experimental Relationships

Uses an extension of nonnegative matrix factorization to identify shared and dataset-specific factors. See Welch J, Kozareva V, et al (2019) <doi:10.1016/j.cell.2019.05.006>, and Liu J, Gao C, Sodicoff J, et al (2020) <doi:10.1038/s41596-020-0391-8> for more details.

Maintained by Yichen Wang. Last updated 3 months ago.

nonnegative-matrix-factorization single-cell openblas cpp

1.7 match 408 stars 10.77 score 334 scripts 1 dependents

astamm

fdacluster:Joint Clustering and Alignment of Functional Data

Implementations of the k-means, hierarchical agglomerative and DBSCAN clustering methods for functional data which allows for jointly aligning and clustering curves. It supports functional data defined on one-dimensional domains but possibly evaluating in multivariate codomains. It supports functional data defined in arrays but also via the 'fd' and 'funData' classes for functional data defined in the 'fda' and 'funData' packages respectively. It currently supports shift, dilation and affine warping functions for functional data defined on the real line and uses the SRVF framework to handle boundary-preserving warping for functional data defined on a specific interval. Main reference for the k-means algorithm: Sangalli L.M., Secchi P., Vantini S., Vitelli V. (2010) "k-mean alignment for curve clustering" <doi:10.1016/j.csda.2009.12.008>. Main reference for the SRVF framework: Tucker, J. D., Wu, W., & Srivastava, A. (2013) "Generative models for functional data using phase and amplitude separation" <doi:10.1016/j.csda.2012.12.001>.

Maintained by Aymeric Stamm. Last updated 2 months ago.

openblas cpp openmp

3.0 match 5 stars 6.14 score 31 scripts 1 dependents

immunogenomics

harmony:Fast, Sensitive, and Accurate Integration of Single Cell Data

Implementation of the Harmony algorithm for single cell integration, described in Korsunsky et al <doi:10.1038/s41592-019-0619-0>. Package includes a standalone Harmony function and interfaces to external frameworks.

Maintained by Ilya Korsunsky. Last updated 5 months ago.

algorithm data-integration scrna-seq openblas cpp

1.3 match 554 stars 13.74 score 5.5k scripts 8 dependents

admahood

neonPlantEcology:Process NEON Plant Data for Ecological Analysis

Downloading and organizing plant presence and percent cover data from the National Ecological Observatory Network <https://www.neonscience.org>.

Maintained by Adam Mahood. Last updated 3 months ago.

3.6 match 8 stars 5.08 score 7 scripts

zarquon42b

Morpho:Calculations and Visualisations Related to Geometric Morphometrics

A toolset for Geometric Morphometrics and mesh processing. This includes (among other stuff) mesh deformations based on reference points, permutation tests, detection of outliers, processing of sliding semi-landmarks and semi-automated surface landmark placement.

Maintained by Stefan Schlager. Last updated 5 months ago.

openblas cpp openmp

1.8 match 51 stars 10.01 score 218 scripts 13 dependents

rozetasimonovska

SDPDmod:Spatial Dynamic Panel Data Modeling

Spatial model calculation for static and dynamic panel data models, weights matrix creation and Bayesian model comparison. Bayesian model comparison methods were described by 'LeSage' (2014) <doi:10.1016/j.spasta.2014.02.002>. The 'Lee'-'Yu' transformation approach is described in 'Yu', 'De Jong' and 'Lee' (2008) <doi:10.1016/j.jeconom.2008.08.002>, 'Lee' and 'Yu' (2010) <doi:10.1016/j.jeconom.2009.08.001> and 'Lee' and 'Yu' (2010) <doi:10.1017/S0266466609100099>.

Maintained by Rozeta Simonovska. Last updated 12 months ago.

3.6 match 5 stars 4.98 score 19 scripts

ethanyxu

ADPclust:Fast Clustering Using Adaptive Density Peak Detection

An implementation of ADPclust clustering procedures (Fast Clustering Using Adaptive Density Peak Detection). The work is built and improved upon the idea of Rodriguez and Laio (2014)<DOI:10.1126/science.1242072>. ADPclust clusters data by finding density peaks in a density-distance plot generated from local multivariate Gaussian density estimation. It includes an automatic centroids selection and parameter optimization algorithm, which finds the number of clusters and cluster centroids by comparing average silhouettes on a grid of testing clustering results; It also includes a user interactive algorithm that allows the user to manually selects cluster centroids from a two dimensional "density-distance plot". Here is the research article associated with this package: "Wang, Xiao-Feng, and Yifan Xu (2015)<DOI:10.1177/0962280215609948> Fast clustering using adaptive density peak detection." Statistical methods in medical research". url: <http://smm.sagepub.com/content/early/2015/10/15/0962280215609948.abstract>.

Maintained by Ethan Yifan Xu. Last updated 3 years ago.

3.2 match 10 stars 5.34 score 44 scripts

bioc

smoppix:Analyze Single Molecule Spatial Omics Data Using the Probabilistic Index

Test for univariate and bivariate spatial patterns in spatial omics data with single-molecule resolution. The tests implemented allow for analysis of nested designs and are automatically calibrated to different biological specimens. Tests for aggregation, colocalization, gradients and vicinity to cell edge or centroid are provided.

Maintained by Stijn Hawinkel. Last updated 1 months ago.

transcriptomics spatial singlecell cpp

3.3 match 1 stars 5.10 score 4 scripts

chrhennig

fpc:Flexible Procedures for Clustering

Various methods for clustering and cluster validation. Fixed point clustering. Linear regression clustering. Clustering by merging Gaussian mixture components. Symmetric and asymmetric discriminant projections for visualisation of the separation of groupings. Cluster validation statistics for distance based clustering including corrected Rand index. Standardisation of cluster validation statistics by random clusterings and comparison between many clustering methods and numbers of clusters based on this. Cluster-wise cluster stability assessment. Methods for estimation of the number of clusters: Calinski-Harabasz, Tibshirani and Walther's prediction strength, Fang and Wang's bootstrap stability. Gaussian/multinomial mixture fitting for mixed continuous/categorical variables. Variable-wise statistics for cluster interpretation. DBSCAN clustering. Interface functions for many clustering methods implemented in R, including estimating the number of clusters with kmeans, pam and clara. Modality diagnosis for Gaussian mixtures. For an overview see package?fpc.

Maintained by Christian Hennig. Last updated 6 months ago.

1.8 match 11 stars 9.32 score 2.6k scripts 69 dependents

tomasfryda

h2o:R Interface for the 'H2O' Scalable Machine Learning Platform

R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).

Maintained by Tomas Fryda. Last updated 1 years ago.

2.0 match 3 stars 8.20 score 7.8k scripts 11 dependents

bart1

move:Visualizing and Analyzing Animal Track Data

Contains functions to access movement data stored in 'movebank.org' as well as tools to visualize and statistically analyze animal movement data, among others functions to calculate dynamic Brownian Bridge Movement Models. Move helps addressing movement ecology questions.

Maintained by Bart Kranstauber. Last updated 4 months ago.

cpp

1.9 match 8.70 score 690 scripts 3 dependents

bioc

MsBackendSql:SQL-based Mass Spectrometry Data Backend

SQL-based mass spectrometry (MS) data backend supporting also storange and handling of very large data sets. Objects from this package are supposed to be used with the Spectra Bioconductor package. Through the MsBackendSql with its minimal memory footprint, this package thus provides an alternative MS data representation for very large or remote MS data sets.

Maintained by Johannes Rainer. Last updated 15 days ago.

infrastructure massspectrometry metabolomics dataimport proteomics

3.0 match 4 stars 5.41 score 16 scripts

computationalstylistics

stylo:Stylometric Multivariate Analyses

Supervised and unsupervised multivariate methods, supplemented by GUI and some visualizations, to perform various analyses in the field of computational stylistics, authorship attribution, etc. For further reference, see Eder et al. (2016), <https://journal.r-project.org/archive/2016/RJ-2016-007/index.html>. You are also encouraged to visit the Computational Stylistics Group's website <https://computationalstylistics.github.io/>, where a reasonable amount of information about the package and related projects are provided.

Maintained by Maciej Eder. Last updated 3 months ago.

1.9 match 187 stars 8.58 score 462 scripts

freezenik

BayesX:R Utilities Accompanying the Software Package BayesX

Functions for exploring and visualising estimation results obtained with BayesX, a free software for estimating structured additive regression models (<https://www.uni-goettingen.de/de/bayesx/550513.html>). In addition, functions that allow to read, write and manipulate map objects that are required in spatial analyses performed with BayesX.

Maintained by Nikolaus Umlauf. Last updated 1 years ago.

4.3 match 3.71 score 48 scripts 3 dependents

cran

sda:Shrinkage Discriminant Analysis and CAT Score Variable Selection

Provides an efficient framework for high-dimensional linear and diagonal discriminant analysis with variable selection. The classifier is trained using James-Stein-type shrinkage estimators and predictor variables are ranked using correlation-adjusted t-scores (CAT scores). Variable selection error is controlled using false non-discovery rates or higher criticism.

Maintained by Korbinian Strimmer. Last updated 3 years ago.

4.9 match 3.21 score 3 dependents

jdtuck

fdasrvf:Elastic Functional Data Analysis

Performs alignment, PCA, and modeling of multidimensional and unidimensional functions using the square-root velocity framework (Srivastava et al., 2011 <doi:10.48550/arXiv.1103.3817> and Tucker et al., 2014 <DOI:10.1016/j.csda.2012.12.001>). This framework allows for elastic analysis of functional data through phase and amplitude separation.

Maintained by J. Derek Tucker. Last updated 1 months ago.

openblas cpp openmp

2.0 match 13 stars 7.79 score 83 scripts 3 dependents

glenndavis52

munsellinterpol:Interpolate Munsell Renotation Data from Hue Value/Chroma to CIE/RGB

Methods for interpolating data in the Munsell color system following the ASTM D-1535 standard. Hues and chromas with decimal values can be interpolated and converted to/from the Munsell color system and CIE xyY, CIE XYZ, CIE Lab, CIE Luv, or RGB. Includes ISCC-NBS color block lookup. Based on the work by Paul Centore, "The Munsell and Kubelka-Munk Toolbox".

Maintained by Glenn Davis. Last updated 2 months ago.

3.4 match 2 stars 4.31 score 43 scripts 2 dependents

cran

NCSampling:Nearest Centroid (NC) Sampling

Provides functionality for performing Nearest Centroid (NC) Sampling. The NC sampling procedure was developed for forestry applications and selects plots for ground measurement so as to maximize the efficiency of imputation estimates. It uses multiple auxiliary variables and multivariate clustering to search for an optimal sample. Further details are given in Melville G. & Stone C. (2016) <doi:10.1080/00049158.2016.1218265>.

Maintained by Gavin Melville. Last updated 8 years ago.

14.4 match 1.00 score

m-py

anticlust:Subset Partitioning via Anticlustering

The method of anticlustering partitions a pool of elements into groups (i.e., anticlusters) with the goal of maximizing between-group similarity or within-group heterogeneity. The anticlustering approach thereby reverses the logic of cluster analysis that strives for high within-group homogeneity and clear separation between groups. Computationally, anticlustering is accomplished by maximizing instead of minimizing a clustering objective function, such as the intra-cluster variance (used in k-means clustering) or the sum of pairwise distances within clusters. The main function anticlustering() gives access to optimal and heuristic anticlustering methods described in Papenberg and Klau (2021; <doi:10.1037/met0000301>), Brusco et al. (2020; <doi:10.1111/bmsp.12186>), Papenberg (2024; <doi:10.1111/bmsp.12315>), and Papenberg et al. (2025; <doi:10.1101/2025.03.03.641320>). The optimal algorithms require that an integer linear programming solver is installed. This package will install 'lpSolve' (<https://cran.r-project.org/package=lpSolve>) as a default solver, but it is also possible to use the package 'Rglpk' (<https://cran.r-project.org/package=Rglpk>), which requires the GNU linear programming kit (<https://www.gnu.org/software/glpk/glpk.html>), the package 'Rsymphony' (<https://cran.r-project.org/package=Rsymphony>), which requires the SYMPHONY ILP solver (<https://github.com/coin-or/SYMPHONY>), or the commercial solver Gurobi, which provides its own R package that is not available via CRAN (<https://www.gurobi.com/downloads/>). 'Rglpk', 'Rsymphony', 'gurobi' and their system dependencies have to be manually installed by the user because they are only suggested dependencies. Full access to the bicriterion anticlustering method proposed by Brusco et al. (2020) is given via the function bicriterion_anticlustering(), while kplus_anticlustering() implements the full functionality of the k-plus anticlustering approach proposed by Papenberg (2024). Some other functions are available to solve classical clustering problems. The function balanced_clustering() applies a cluster analysis under size constraints, i.e., creates equal-sized clusters. The function matching() can be used for (unrestricted, bipartite, or K-partite) matching. The function wce() can be used optimally solve the (weighted) cluster editing problem, also known as correlation clustering, clique partitioning problem or transitivity clustering.

Maintained by Martin Papenberg. Last updated 5 days ago.

1.5 match 34 stars 9.27 score 60 scripts 2 dependents

tgoodbody

sgsR:Structurally Guided Sampling

Structurally guided sampling (SGS) approaches for airborne laser scanning (ALS; LIDAR). Primary functions provide means to generate data-driven stratifications & methods for allocating samples. Intermediate functions for calculating and extracting important information about input covariates and samples are also included. Processing outcomes are intended to help forest and environmental management practitioners better optimize field sample placement as well as assess and augment existing sample networks in the context of data distributions and conditions. ALS data is the primary intended use case, however any rasterized remote sensing data can be used, enabling data-driven stratifications and sampling approaches.

Maintained by Tristan RH Goodbody. Last updated 29 days ago.

1.9 match 46 stars 7.50 score 34 scripts

trevorld

affiner:A Finer Way to Render 3D Illustrated Objects in 'grid' Using Affine Transformations

Dilate, permute, project, reflect, rotate, shear, and translate 2D and 3D points. Supports parallel projections including oblique projections such as the cabinet projection as well as axonometric projections such as the isometric projection. Use 'grid's "affine transformation" feature to render illustrated flat surfaces.

Maintained by Trevor L. Davis. Last updated 4 months ago.

2.0 match 9 stars 6.91 score 1 scripts 5 dependents

mobiodiv

mobr:Measurement of Biodiversity

Functions for calculating metrics for the measurement biodiversity and its changes across scales, treatments, and gradients. The methods implemented in this package are described in: Chase, J.M., et al. (2018) <doi:10.1111/ele.13151>, McGlinn, D.J., et al. (2019) <doi:10.1111/2041-210X.13102>, McGlinn, D.J., et al. (2020) <doi:10.1101/851717>, and McGlinn, D.J., et al. (2023) <doi:10.1101/2023.09.19.558467>.

Maintained by Daniel McGlinn. Last updated 12 days ago.

biodiversity conservation ecology rarefaction species statistics

1.6 match 23 stars 8.65 score 93 scripts

tidymodels

tidyclust:A Common API to Clustering

A common interface to specifying clustering models, in the same style as 'parsnip'. Creates unified interface across different functions and computational engines.

Maintained by Emil Hvitfeldt. Last updated 2 months ago.

1.9 match 112 stars 7.21 score 139 scripts

vtshen

AHM:Additive Heredity Model: Method for the Mixture-of-Mixtures Experiments

An implementation of the additive heredity model for the mixture-of-mixtures experiments of Shen et al. (2019) in Technometrics <doi:10.1080/00401706.2019.1630010>. The additive heredity model considers an additive structure to inherently connect the major components with the minor components. The additive heredity model has a meaningful interpretation for the estimated model because of the hierarchical and heredity principles applied and the nonnegative garrote technique used for variable selection.

Maintained by Sumin Shen. Last updated 6 years ago.

5.0 match 2.70 score 2 scripts

bioc

scmap:A tool for unsupervised projection of single cell RNA-seq data

Single-cell RNA-seq (scRNA-seq) is widely used to investigate the composition of complex tissues since the technology allows researchers to define cell-types using unsupervised clustering of the transcriptome. However, due to differences in experimental methods and computational analyses, it is often challenging to directly compare the cells identified in two different experiments. scmap is a method for projecting cells from a scRNA-seq experiment on to the cell-types or individual cells identified in a different experiment.

Maintained by Vladimir Kiselev. Last updated 5 months ago.

immunooncology singlecell software classification supportvectormachine rnaseq visualization transcriptomics datarepresentation transcription sequencing preprocessing geneexpression dataimport bioconductor-package human-cell-atlas projection-mapping single-cell-rna-seq openblas cpp

1.5 match 95 stars 8.82 score 172 scripts

pdil

usmapdata:Mapping Data for 'usmap' Package

Provides a container for data used by the 'usmap' package. The data used by 'usmap' has been extracted into this package so that the file size of the 'usmap' package can be reduced greatly. The data in this package will be updated roughly once per year as new map data files are provided by the US Census Bureau.

Maintained by Paolo Di Lorenzo. Last updated 25 days ago.

counties data fips mapping states usa

2.0 match 5 stars 6.59 score 35 scripts 3 dependents

bioc

HiCDOC:A/B compartment detection and differential analysis

HiCDOC normalizes intrachromosomal Hi-C matrices, uses unsupervised learning to predict A/B compartments from multiple replicates, and detects significant compartment changes between experiment conditions. It provides a collection of functions assembled into a pipeline to filter and normalize the data, predict the compartments and visualize the results. It accepts several type of data: tabular `.tsv` files, Cooler `.cool` or `.mcool` files, Juicer `.hic` files or HiC-Pro `.matrix` and `.bed` files.

Maintained by Maigné Élise. Last updated 4 months ago.

hic dna3dstructure normalization sequencing software clustering cpp

2.3 match 4 stars 5.86 score 6 scripts 1 dependents

bioc

geva:Gene Expression Variation Analysis (GEVA)

Statistic methods to evaluate variations of differential expression (DE) between multiple biological conditions. It takes into account the fold-changes and p-values from previous differential expression (DE) results that use large-scale data (*e.g.*, microarray and RNA-seq) and evaluates which genes would react in response to the distinct experiments. This evaluation involves an unique pipeline of statistical methods, including weighted summarization, quantile detection, cluster analysis, and ANOVA tests, in order to classify a subset of relevant genes whose DE is similar or dependent to certain biological factors.

Maintained by Itamar José Guimarães Nunes. Last updated 5 months ago.

classification differentialexpression geneexpression microarray multiplecomparison rnaseq systemsbiology transcriptomics

3.0 match 2 stars 4.30 score 4 scripts

bioc

cola:A Framework for Consensus Partitioning

Subgroup classification is a basic task in genomic data analysis, especially for gene expression and DNA methylation data analysis. It can also be used to test the agreement to known clinical annotations, or to test whether there exist significant batch effects. The cola package provides a general framework for subgroup classification by consensus partitioning. It has the following features: 1. It modularizes the consensus partitioning processes that various methods can be easily integrated. 2. It provides rich visualizations for interpreting the results. 3. It allows running multiple methods at the same time and provides functionalities to straightforward compare results. 4. It provides a new method to extract features which are more efficient to separate subgroups. 5. It automatically generates detailed reports for the complete analysis. 6. It allows applying consensus partitioning in a hierarchical manner.

Maintained by Zuguang Gu. Last updated 2 months ago.

clustering geneexpression classification software consensus-clustering cpp

1.7 match 61 stars 7.49 score 112 scripts

bioc

Mfuzz:Soft clustering of omics time series data

The Mfuzz package implements noise-robust soft clustering of omics time-series data, including transcriptomic, proteomic or metabolomic data. It is based on the use of c-means clustering. For convenience, it includes a graphical user interface.

Maintained by Matthias Futschik. Last updated 5 months ago.

microarray clustering timecourse preprocessing visualization

1.6 match 7.64 score 338 scripts 4 dependents

gavinrozzi

zipcodeR:Data & Functions for Working with US ZIP Codes

Make working with ZIP codes in R painless with an integrated dataset of U.S. ZIP codes and functions for working with them. Search ZIP codes by multiple geographies, including state, county, city & across time zones. Also included are functions for relating ZIP codes to Census data, geocoding & distance calculations.

Maintained by Gavin Rozzi. Last updated 1 years ago.

1.7 match 80 stars 7.31 score 176 scripts

pgiraudoux

pgirmess:Spatial Analysis and Data Mining for Field Ecologists

Set of tools for reading, writing and transforming spatial and seasonal data, model selection and specific statistical tests for ecologists. It includes functions to interpolate regular positions of points between landmarks, to discretize polylines into regular point positions, link distant observations to points and convert a bounding box in a spatial object. It also provides miscellaneous functions for field ecologists such as spatial statistics and inference on diversity indexes, writing data.frame with Chinese characters.

Maintained by Patrick Giraudoux. Last updated 1 years ago.

1.7 match 5 stars 7.29 score 422 scripts 2 dependents

tscnlab

LightLogR:Process Data from Wearable Light Loggers and Optical Radiation Dosimeters

Import, processing, validation, and visualization of personal light exposure measurement data from wearable devices. The package implements features such as the import of data and metadata files, conversion of common file formats, validation of light logging data, verification of crucial metadata, calculation of common parameters, and semi-automated analysis and visualization.

Maintained by Johannes Zauner. Last updated 1 months ago.

dosimetry light time-series-analysis wearable-devices wearable-sensors

2.0 match 12 stars 5.88 score 28 scripts

swfsc

eSDM:Ensemble Tool for Predictions from Species Distribution Models

A tool which allows users to create and evaluate ensembles of species distribution model (SDM) predictions. Functionality is offered through R functions or a GUI (R Shiny app). This tool can assist users in identifying spatial uncertainties and making informed conservation and management decisions. The package is further described in Woodman et al (2019) <doi:10.1111/2041-210X.13283>.

Maintained by Sam Woodman. Last updated 6 months ago.

1.9 match 11 stars 6.07 score 24 scripts

ptarroso

phylin:Spatial Interpolation of Genetic Data

The spatial interpolation of genetic distances between samples is based on a modified kriging method that accepts a genetic distance matrix and generates a map of probability of lineage presence. This package also offers tools to generate a map of potential contact zones between groups with user-defined thresholds in the tree to account for old and recent divergence. Additionally, it has functions for IDW interpolation using genetic data and midpoints.

Maintained by Pedro Tarroso. Last updated 5 years ago.

3.8 match 2.99 score 49 scripts

cepardot

FactoClass:Combination of Factorial Methods and Cluster Analysis

Some functions of 'ade4' and 'stats' are combined in order to obtain a partition of the rows of a data table, with columns representing variables of scales: quantitative, qualitative or frequency. First, a principal axes method is performed and then, a combination of Ward agglomerative hierarchical classification and K-means is performed, using some of the first coordinates obtained from the previous principal axes method. In order to permit different weights of the elements to be clustered, the function 'kmeansW', programmed in C++, is included. It is a modification of 'kmeans'. Some graphical functions include the option: 'gg=FALSE'. When 'gg=TRUE', they use the 'ggplot2' and 'ggrepel' packages to avoid the super-position of the labels.

Maintained by Campo Elias Pardo. Last updated 1 years ago.

cpp

5.0 match 2.21 score 163 scripts

szymonnowakowski

hclust1d:Hierarchical Clustering of Univariate (1d) Data

Univariate agglomerative hierarchical clustering with a comprehensive list of choices of a linkage function in O(n*log n) time. The better algorithmic time complexity is paired with an efficient 'C++' implementation.

Maintained by Szymon Nowakowski. Last updated 2 years ago.

cpp

2.2 match 3 stars 4.95 score 9 scripts 1 dependents

cran

PracTools:Designing and Weighting Survey Samples

Functions and datasets to support Valliant, Dever, and Kreuter (2018), <doi:10.1007/978-3-319-93632-1>, "Practical Tools for Designing and Weighting Survey Samples". Contains functions for sample size calculation for survey samples using stratified or clustered one-, two-, and three-stage sample designs, and single-stage audit sample designs. Functions are included that will group geographic units accounting for distances apart and measures of size. Other functions compute variance components for multistage designs and sample sizes in two-phase designs. A number of example data sets are included.

Maintained by Richard Valliant. Last updated 9 months ago.

3.4 match 1 stars 3.18 score 1 dependents

biometris

douconca:Double Constrained Correspondence Analysis for Trait-Environment Analysis in Ecology

Double constrained correspondence analysis (dc-CA) analyzes (multi-)trait (multi-)environment ecological data by using the 'vegan' package and native R code. Throughout the two step algorithm of ter Braak et al. (2018) is used. This algorithm combines and extends community- (sample-) and species-level analyses, i.e. the usual community weighted means (CWM)-based regression analysis and the species-level analysis of species-niche centroids (SNC)-based regression analysis. The two steps use canonical correspondence analysis to regress the abundance data on to the traits and (weighted) redundancy analysis to regress the CWM of the orthonormalized traits on to the environmental predictors. The function dc_CA() has an option to divide the abundance data of a site by the site total, giving equal site weights. This division has the advantage that the multivariate analysis corresponds with an unweighted (multi-trait) community-level analysis, instead of being weighted. The first step of the algorithm uses vegan::cca(). The second step uses wrda() but vegan::rda() if the site weights are equal. This version has a predict() function. For details see ter Braak et al. 2018 <doi:10.1007/s10651-017-0395-x>.

Maintained by Bart-Jan van Rossum. Last updated 4 months ago.

correspondence-analysis ecology ecology-modeling multi-environment multi-trait

2.1 match 5.00 score 6 scripts

valentint

rda:Shrunken Centroids Regularized Discriminant Analysis

Provides functions implementing the shrunken centroids regularized discriminant analysis for classification purpose in high dimensional data. The method is described in Guo at al. (2013) <doi:10.1093/biostatistics/kxj035>.

Maintained by Valentin Todorov. Last updated 2 years ago.

3.5 match 3.02 score 21 scripts

bioc

TrajectoryUtils:Single-Cell Trajectory Analysis Utilities

Implements low-level utilities for single-cell trajectory analysis, primarily intended for re-use inside higher-level packages. Include a function to create a cluster-level minimum spanning tree and data structures to hold pseudotime inference results.

Maintained by Aaron Lun. Last updated 5 months ago.

geneexpression singlecell

1.8 match 5.91 score 16 scripts 9 dependents

cran

centiserve:Find Graph Centrality Indices

Calculates centrality indices additional to the 'igraph' package centrality functions.

Maintained by Mahdi Jalili. Last updated 8 years ago.

5.1 match 1 stars 2.08 score 1 dependents

matildabrown

rWCVP:Generating Summaries, Reports and Plots from the World Checklist of Vascular Plants

A companion to the World Checklist of Vascular Plants (WCVP). It includes functions to generate maps and species lists, as well as match names to the WCVP. For more details and to cite the package, see: Brown M.J.M., Walker B.E., Black N., Govaerts R., Ondo I., Turner R., Nic Lughadha E. (in press). "rWCVP: A companion R package to the World Checklist of Vascular Plants". New Phytologist.

Maintained by Matilda Brown. Last updated 1 years ago.

1.7 match 22 stars 6.17 score 45 scripts 1 dependents

josiahparry

sdf:What the Package Does (One Line, Title Case)

What the package does (one paragraph).

Maintained by Josiah Parry. Last updated 2 years ago.

3.3 match 27 stars 3.13 score 6 scripts

bioc

CatsCradle:This package provides methods for analysing spatial transcriptomics data and for discovering gene clusters

This package addresses two broad areas. It allows for in-depth analysis of spatial transcriptomic data by identifying tissue neighbourhoods. These are contiguous regions of tissue surrounding individual cells. 'CatsCradle' allows for the categorisation of neighbourhoods by the cell types contained in them and the genes expressed in them. In particular, it produces Seurat objects whose individual elements are neighbourhoods rather than cells. In addition, it enables the categorisation and annotation of genes by producing Seurat objects whose elements are genes.

Maintained by Michael Shapiro. Last updated 15 days ago.

biologicalquestion statisticalmethod geneexpression singlecell transcriptomics spatial

1.6 match 3 stars 6.52 score

cidm-ph

ggautomap:Create Maps from a Column of Place Names

Mapping tools that convert place names to coordinates on the fly. These 'ggplot2' extensions make maps from a data frame where one of the columns contains place names, without having to directly work with the underlying geospatial data and tools. The corresponding map data must be registered with 'cartographer' either by the user or by another package.

Maintained by Carl Suster. Last updated 1 years ago.

data-visualization geospatial ggplot-extension ggplot2

2.0 match 24 stars 5.08 score 5 scripts

kaiaragaki

reclanc:A Revival of the ClaNC Algorithm

Classification of microarrays to nearest centroids (ClaNC) <doi:10.1093/bioinformatics/bti756> selects optimal genes for centroids, similar to Prediction Analysis for Microarrays (PAM) but using fewer corrective factors, resulting in greater sensitivity and accuracy. Unfortunately, the original source of ClaNC can no longer be found. 'reclanc' reimplements this algorithm, with the the additional benefit of increased interoperability with standard data structures and modeling ecosystems.

Maintained by Kai Aragaki. Last updated 8 months ago.

2.6 match 3.85 score 5 scripts

nikkrieger

USpopcenters:United States Centers of Population (Centroids)

Centers of population (centroid) data for census areas in the United States.

Maintained by Nik Krieger. Last updated 2 years ago.

3.6 match 1 stars 2.70 score 2 scripts

ropensci

weatherOz:An API Client for Australian Weather and Climate Data Resources

Provides automated downloading, parsing and formatting of weather data for Australia through API endpoints provided by the Department of Primary Industries and Regional Development ('DPIRD') of Western Australia and by the Science and Technology Division of the Queensland Government's Department of Environment and Science ('DES'). As well as the Bureau of Meteorology ('BOM') of the Australian government precis and coastal forecasts, and downloading and importing radar and satellite imagery files. 'DPIRD' weather data are accessed through public 'APIs' provided by 'DPIRD', <https://www.agric.wa.gov.au/weather-api-20>, providing access to weather station data from the 'DPIRD' weather station network. Australia-wide weather data are based on data from the Australian Bureau of Meteorology ('BOM') data and accessed through 'SILO' (Scientific Information for Land Owners) Jeffrey et al. (2001) <doi:10.1016/S1364-8152(01)00008-1>. 'DPIRD' data are made available under a Creative Commons Attribution 3.0 Licence (CC BY 3.0 AU) license <https://creativecommons.org/licenses/by/3.0/au/deed.en>. SILO data are released under a Creative Commons Attribution 4.0 International licence (CC BY 4.0) <https://creativecommons.org/licenses/by/4.0/>. 'BOM' data are (c) Australian Government Bureau of Meteorology and released under a Creative Commons (CC) Attribution 3.0 licence or Public Access Licence ('PAL') as appropriate, see <http://www.bom.gov.au/other/copyright.shtml> for further details.

Maintained by Rodrigo Pires. Last updated 1 months ago.

dpird bom meteorological-data weather-forecast australia weather weather-data meteorology western-australia australia-bureau-of-meteorology western-australia-agriculture australia-agriculture australia-climate australia-weather api-client climate data rainfall weather-api

1.1 match 31 stars 8.47 score 40 scripts

icosa-grid

icosa:Global Triangular and Penta-Hexagonal Grids Based on Tessellated Icosahedra

Implementation of icosahedral grids in three dimensions. The spherical-triangular tessellation can be set to create grids with custom resolutions. Both the primary triangular and their inverted penta-hexagonal grids can be calculated. Additional functions are provided that allow plotting of the grids and associated data, the interaction of the grids with other raster and vector objects, and treating the grids as a graphs.

Maintained by Adam T. Kocsis. Last updated 8 months ago.

grid cpp

1.8 match 4 stars 5.41 score 65 scripts

flaviomoc

divraster:Diversity Metrics Calculations for Rasterized Data

Alpha and beta diversity for taxonomic (TD), functional (FD), and phylogenetic (PD) dimensions based on rasters. Spatial and temporal beta diversity can be partitioned into replacement and richness difference components. It also calculates standardized effect size for FD and PD alpha diversity and the average individual traits across multilayer rasters. The layers of the raster represent species, while the cells represent communities. Methods details can be found at Cardoso et al. 2022 <https://CRAN.R-project.org/package=BAT> and Heming et al. 2023 <https://CRAN.R-project.org/package=SESraster>.

Maintained by Flávio M. M. Mota. Last updated 15 days ago.

1.8 match 10 stars 5.40 score 7 scripts

berndbischl

tspmeta:Instance Feature Calculation and Evolutionary Instance Generation for the Traveling Salesman Problem

Instance feature calculation and evolutionary instance generation for the traveling salesman problem. Also contains code to "morph" two TSP instances into each other. And the possibility to conveniently run a couple of solvers on TSP instances.

Maintained by Bernd Bischl. Last updated 9 years ago.

2.3 match 5 stars 4.08 score 24 scripts

datalowe

synr:Explore and Process Synesthesia Consistency Test Data

Explore synesthesia consistency test data, calculate consistency scores, and classify participant data as valid or invalid.

Maintained by Lowe Wilsson. Last updated 1 years ago.

data-cleaning synesthesia

1.7 match 5.32 score 139 scripts

weksi-budiaji

kmed:Distance-Based k-Medoids

Algorithms of distance-based k-medoids clustering: simple and fast k-medoids, ranked k-medoids, and increasing number of clusters in k-medoids. Calculate distances for mixed variable data such as Gower, Podani, Wishart, Huang, Harikumar-PV, and Ahmad-Dey. Cluster validation applies internal and relative criteria. The internal criteria includes silhouette index and shadow values. The relative criterium applies bootstrap procedure producing a heatmap with a flexible reordering matrix algorithm such as complete, ward, or average linkages. The cluster result can be plotted in a marked barplot or pca biplot.

Maintained by Weksi Budiaji. Last updated 3 years ago.

2.9 match 3.15 score 141 scripts

clancylabuiuc

moRphomenses:Geometric Morphometric Tools to Align, Scale, and Compare "Shape" of Menstrual Cycle Hormones

Mitteroecker & Gunz (2009) <doi:10.1007/s11692-009-9055-x> describe how geometric morphometric methods allow researchers to quantify the size and shape of physical biological structures. We provide tools to extend geometric morphometric principles to the study of non-physical structures, hormone profiles, as outlined in Ehrlich et al (2021) <doi:10.1002/ajpa.24514>. Easily transform daily measures into multivariate landmark-based data. Includes custom functions to apply multivariate methods for data exploration as well as hypothesis testing. Also includes 'shiny' web app to streamline data exploration. Developed to study menstrual cycle hormones but functions have been generalized and should be applicable to any biomarker over any time period.

Maintained by Daniel Ehrlich. Last updated 3 months ago.

2.3 match 2 stars 4.04 score 4 scripts

achubaty

grainscape:Landscape Connectivity, Habitat, and Protected Area Networks

Given a landscape resistance surface, creates minimum planar graph (Fall et al. (2007) <doi:10.1007/s10021-007-9038-7>) and grains of connectivity (Galpern et al. (2012) <doi:10.1111/j.1365-294X.2012.05677.x>) models that can be used to calculate effective distances for landscape connectivity at multiple scales. Documentation is provided by several vignettes, and a paper (Chubaty, Galpern & Doctolero (2020) <doi:10.1111/2041-210X.13350>).

Maintained by Alex M Chubaty. Last updated 2 months ago.

habitat-connectivity landscape-connectivity spatial-graphs cpp

1.3 match 19 stars 6.76 score 20 scripts

o1iv3r

ClustImpute:K-Means Clustering with Build-in Missing Data Imputation

This k-means algorithm is able to cluster data with missing values and as a by-product completes the data set. The implementation can deal with missing values in multiple variables and is computationally efficient since it iteratively uses the current cluster assignment to define a plausible distribution for missing value imputation. Weights are used to shrink early random draws for missing values (i.e., draws based on the cluster assignments after few iterations) towards the global mean of each feature. This shrinkage slowly fades out after a fixed number of iterations to reflect the increasing credibility of cluster assignments. See the vignette for details.

Maintained by Oliver Pfaffel. Last updated 4 years ago.

1.8 match 7 stars 4.96 score 13 scripts

jrosen48

prcr:Person-Centered Analysis

Provides an easy-to-use yet adaptable set of tools to conduct person-center analysis using a two-step clustering procedure. As described in Bergman and El-Khouri (1999) <DOI:10.1002/(SICI)1521-4036(199910)41:6%3C753::AID-BIMJ753%3E3.0.CO;2-K>, hierarchical clustering is performed to determine the initial partition for the subsequent k-means clustering procedure.

Maintained by Joshua M Rosenberg. Last updated 5 years ago.

1.9 match 5 stars 4.65 score 18 scripts

azvoleff

gfcanalysis:Tools for Working with Hansen et al. Global Forest Change Dataset

Supports analyses using the Global Forest Change dataset released by Hansen et al. gfcanalysis was originally written for the Tropical Ecology Assessment and Monitoring (TEAM) Network. For additional details on the Global Forest Change dataset, see: Hansen, M. et al. 2013. "High-Resolution Global Maps of 21st-Century Forest Cover Change." Science 342 (15 November): 850-53. The forest change data and more information on the product is available at <http://earthenginepartners.appspot.com>.

Maintained by Matthew Cooper. Last updated 1 years ago.

1.7 match 17 stars 4.93 score 33 scripts

apwheele

ptools:Tools for Poisson Data

Functions used for analyzing count data, mostly crime counts. Includes checking difference in two Poisson counts (e-test), checking the fit for a Poisson distribution, small sample tests for counts in bins, Weighted Displacement Difference test (Wheeler and Ratcliffe, 2018) <doi:10.1186/s40163-018-0085-5>, to evaluate crime changes over time in treated/control areas. Additionally includes functions for aggregating spatial data and spatial feature engineering.

Maintained by Andrew Wheeler. Last updated 1 years ago.

crime-analysis criminal-justice criminology

1.9 match 5 stars 4.44 score 11 scripts

alfodefalco

dPCP:Automated Analysis of Multiplex Digital PCR Data

The automated clustering and quantification of the digital PCR data is based on the combination of 'DBSCAN' (Hahsler et al. (2019) <doi:10.18637/jss.v091.i01>) and 'c-means' (Bezdek et al. (1981) <doi:10.1007/978-1-4757-0450-1>) algorithms. The analysis is independent of multiplexing geometry, dPCR system, and input amount. The details about input data and parameters are available in the vignette.

Maintained by Alfonso De Falco. Last updated 2 years ago.

1.9 match 2 stars 4.36 score 23 scripts

cran

cba:Clustering for Business Analytics

Implements clustering techniques such as Proximus and Rock, utility functions for efficient computation of cross distances and data manipulation.

Maintained by Christian Buchta. Last updated 8 months ago.

2.3 match 3.62 score 3 dependents

kaerosen

tilemaps:Generate Tile Maps

Implements an algorithm for generating maps, known as tile maps, in which each region is represented by a single tile of the same shape and size. The algorithm was first proposed in "Generating Tile Maps" by Graham McNeill and Scott Hale (2017) <doi:10.1111/cgf.13200>. Functions allow users to generate, plot, and compare square or hexagon tile maps.

Maintained by Kaelyn Rosenberg. Last updated 1 years ago.

1.5 match 45 stars 5.35 score 8 scripts

cran

spcosa:Spatial Coverage Sampling and Random Sampling from Compact Geographical Strata

Spatial coverage sampling and random sampling from compact geographical strata created by k-means. See Walvoort et al. (2010) <doi:10.1016/j.cageo.2010.04.005> for details.

Maintained by Dennis Walvoort. Last updated 2 years ago.

openjdk

2.3 match 2 stars 3.48 score 1 dependents

ahfoss

kamila:Methods for Clustering Mixed-Type Data

Implements methods for clustering mixed-type data, specifically combinations of continuous and nominal data. Special attention is paid to the often-overlooked problem of equitably balancing the contribution of the continuous and categorical variables. This package implements KAMILA clustering, a novel method for clustering mixed-type data in the spirit of k-means clustering. It does not require dummy coding of variables, and is efficient enough to scale to rather large data sets. Also implemented is Modha-Spangler clustering, which uses a brute-force strategy to maximize the cluster separation simultaneously in the continuous and categorical variables. For more information, see Foss, Markatou, Ray, & Heching (2016) <doi:10.1007/s10994-016-5575-7> and Foss & Markatou (2018) <doi:10.18637/jss.v083.i13>.

Maintained by Alexander Foss. Last updated 2 years ago.

cpp

1.8 match 16 stars 4.25 score 22 scripts

andriyprotsak5

UAHDataScienceUC:Learn Clustering Techniques Through Examples and Code

A comprehensive educational package combining clustering algorithms with detailed step-by-step explanations. Provides implementations of both traditional (hierarchical, k-means) and modern (Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Gaussian Mixture Models (GMM), genetic k-means) clustering methods as described in Ezugwu et. al., (2022) <doi:10.1016/j.engappai.2022.104743>. Includes educational datasets highlighting different clustering challenges, based on 'scikit-learn' examples (Pedregosa et al., 2011) <https://jmlr.csail.mit.edu/papers/v12/pedregosa11a.html>. Features detailed algorithm explanations, visualizations, and weighted distance calculations for enhanced learning.

Maintained by Andriy Protsak Protsak. Last updated 1 months ago.

2.3 match 3.30 score

shubhamdutta26

mapindiatools:Mapping Data for 'mapindia' Package

Provides a container for data used by the 'mapindia' package. The data used by 'mapindia' has been extracted into this package so that the file size of the 'mapindia' package can be reduced considerably. The data in this package will be updated when latest data is available.

Maintained by Shubham Dutta. Last updated 5 months ago.

2.0 match 3.65 score 1 dependents

bioc

xenLite:Simple classes and methods for managing Xenium datasets

Define a relatively light class for managing Xenium data using Bioconductor. Address use of parquet for coordinates, SpatialExperiment for assay and sample data. Address serialization and use of cloud storage.

Maintained by Vincent Carey. Last updated 5 months ago.

infrastructure

1.6 match 1 stars 4.48 score 4 scripts

bioc

Rmagpie:MicroArray Gene-expression-based Program In Error rate estimation

Microarray Classification is designed for both biologists and statisticians. It offers the ability to train a classifier on a labelled microarray dataset and to then use that classifier to predict the class of new observations. A range of modern classifiers are available, including support vector machines (SVMs), nearest shrunken centroids (NSCs)... Advanced methods are provided to estimate the predictive error rate and to report the subset of genes which appear essential in discriminating between classes.

Maintained by Camille Maumet. Last updated 5 months ago.

microarray classification

2.2 match 3.30 score 1 scripts

maxar

IncDTW:Incremental Calculation of Dynamic Time Warping

The Dynamic Time Warping (DTW) distance measure for time series allows non-linear alignments of time series to match similar patterns in time series of different lengths and or different speeds. IncDTW is characterized by (1) the incremental calculation of DTW (reduces runtime complexity to a linear level for updating the DTW distance) - especially for life data streams or subsequence matching, (2) the vector based implementation of DTW which is faster because no matrices are allocated (reduces the space complexity from a quadratic to a linear level in the number of observations) - for all runtime intensive DTW computations, (3) the subsequence matching algorithm runDTW, that efficiently finds the k-NN to a query pattern in a long time series, and (4) C++ in the heart. For details about DTW see the original paper "Dynamic programming algorithm optimization for spoken word recognition" by Sakoe and Chiba (1978) <DOI:10.1109/TASSP.1978.1163055>. For details about this package, Dynamic Time Warping and Incremental Dynamic Time Warping please see "IncDTW: An R Package for Incremental Calculation of Dynamic Time Warping" by Leodolter et al. (2021) <doi:10.18637/jss.v099.i09>.

Maintained by Maximilian Leodolter. Last updated 3 years ago.

cpp

3.3 match 2.18 score 15 scripts

dustinstoltz

text2map:R Tools for Text Matrices, Embeddings, and Networks

This is a collection of functions optimized for working with with various kinds of text matrices. Focusing on the text matrix as the primary object - represented either as a base R dense matrix or a 'Matrix' package sparse matrix - allows for a consistent and intuitive interface that stays close to the underlying mathematical foundation of computational text analysis. In particular, the package includes functions for working with word embeddings, text networks, and document-term matrices. Methods developed in Stoltz and Taylor (2019) <doi:10.1007/s42001-019-00048-6>, Taylor and Stoltz (2020) <doi:10.1007/s42001-020-00075-8>, Taylor and Stoltz (2020) <doi:10.15195/v7.a23>, and Stoltz and Taylor (2021) <doi:10.1016/j.poetic.2021.101567>.

Maintained by Dustin Stoltz. Last updated 4 months ago.

1.8 match 3.82 score 22 scripts

jackdunnnz

iai:Interface to 'Interpretable AI' Modules

An interface to the algorithms of 'Interpretable AI' <https://www.interpretable.ai> from the R programming language. 'Interpretable AI' provides various modules, including 'Optimal Trees' for classification, regression, prescription and survival analysis, 'Optimal Imputation' for missing data imputation and outlier detection, and 'Optimal Feature Selection' for exact sparse regression. The 'iai' package is an open-source project. The 'Interpretable AI' software modules are proprietary products, but free academic and evaluation licenses are available.

Maintained by Jack Dunn. Last updated 5 months ago.

3.4 match 1 stars 2.00 score 7 scripts

dieghernan

arcgeocoder:Geocoding with the 'ArcGIS' REST API Service

Lite interface for finding locations of addresses or businesses around the world using the 'ArcGIS' REST API service <https://developers.arcgis.com/rest/geocode/api-reference/overview-world-geocoding-service.htm>. Address text can be converted to location candidates and a location can be converted into an address. No API key required.

Maintained by Diego Hernangómez. Last updated 6 days ago.

geocoding arcgis address reverse-geocoding api-wrapper api-rest arcgis-api gis

1.2 match 2 stars 5.59 score 15 scripts

cran

bangladesh:Provides Ready to Use Shapefiles for Geographical Map of Bangladesh

Usually, it is difficult to plot choropleth maps for Bangladesh in 'R'. The 'bangladesh' package provides ready-to-use shapefiles for different administrative regions of Bangladesh (e.g., Division, District, Upazila, and Union). This package helps users to draw thematic maps of administrative regions of Bangladesh easily as it comes with the 'sf' objects for the boundaries. It also provides functions allowing users to efficiently get specific area maps and center coordinates for regions. Users can also search for a specific area and calculate the centroids of those areas.

Maintained by Musaddiqur Rahman Ovi. Last updated 2 years ago.

2.4 match 1 stars 2.70 score

cran

OasisR:Outright Tool for the Analysis of Spatial Inequalities and Segregation

A comprehensive set of indexes and tests for social segregation analysis, as described in Tivadar (2019) - 'OasisR': An R Package to Bring Some Order to the World of Segregation Measurement <doi:10.18637/jss.v089.i07>. The package is the most complete existing tool and it clarifies many ambiguities and errors regarding the definition of segregation indices. Additionally, 'OasisR' introduces several resampling methods that enable testing their statistical significance (randomization tests, bootstrapping, and jackknife methods).

Maintained by Mihai Tivadar. Last updated 5 months ago.

3.4 match 2 stars 1.78 score 1 dependents

fherla

sarp.snowprofile.alignment:Snow Profile Alignment, Aggregation, and Clustering

Snow profiles describe the vertical (1D) stratigraphy of layered snow with different layer characteristics, such as grain type, hardness, deposition date, and many more. Hence, they represent a data format similar to multivariate time series containing categorical, ordinal, and numerical data types. Use this package to align snow profiles by matching their individual layers based on Dynamic Time Warping (DTW). The aligned profiles can then be assessed with an independent, global similarity measure that is geared towards avalanche hazard assessment. Finally, through exploiting data aggregation and clustering methods, the similarity measure provides the foundation for grouping and summarizing snow profiles according to similar hazard conditions. In particular, this package allows for averaging large numbers of snow profiles with DTW Barycenter Averaging and thereby facilitates the computation of individual layer distributions and summary statistics that are relevant for avalanche forecasting purposes. For more background information refer to Herla, Horton, Mair, and Haegeli (2021) <doi:10.5194/gmd-14-239-2021>, Herla, Mair, and Haegeli (2022) <doi:10.5194/tc-16-3149-2022>, and Horton, Herla, and Haegeli (2024) <doi:10.5194/egusphere-2024-1609>.

Maintained by Florian Herla. Last updated 7 months ago.

1.8 match 3.45 score 14 scripts

timcdlucas

paleomorph:Geometric Morphometric Tools for Paleobiology

Fill missing symmetrical data with mirroring, calculate Procrustes alignments with or without scaling, and compute standard or vector correlation and covariance matrices (congruence coefficients) of 3D landmarks. Tolerates missing data for all analyses.

Maintained by Tim Lucas. Last updated 8 years ago.

morphometrics paleobiology procrustes statistical-analysis

1.7 match 4 stars 3.60 score 20 scripts

joycekang

symphony:Efficient and Precise Single-Cell Reference Atlas Mapping

Implements the Symphony single-cell reference building and query mapping algorithms and additional functions described in Kang et al <https://www.nature.com/articles/s41467-021-25957-x>.

Maintained by Joyce Kang. Last updated 2 years ago.

openblas cpp

1.5 match 3.83 score 134 scripts

amessbee

rsdepth:Ray Shooting Depth (i.e. RS Depth) Functions for Bivariate Analysis

Ray Shooting Depth functions are provided for bivariate analysis. This mainly includes functions for computing the bivariate depth as well as RS median. Drawing functions for depth bags are also provided.

Maintained by Mudassir Shabbir. Last updated 3 years ago.

cpp

5.3 match 1.04 score 11 scripts

somenv

SOMEnv:SOM Algorithm for the Analysis of Multivariate Environmental Data

Analysis of multivariate environmental high frequency data by Self-Organizing Map and k-means clustering algorithms. By means of the graphical user interface it provides a comfortable way to elaborate by self-organizing map algorithm rather big datasets (txt files up to 100 MB ) obtained by environmental high-frequency monitoring by sensors/instruments. The functions present in the package are based on 'kohonen' and 'openair' packages implemented by functions embedding Vesanto et al. (2001) <http://www.cis.hut.fi/projects/somtoolbox/package/papers/techrep.pdf> heuristic rules for map initialization parameters, k-means clustering algorithm and map features visualization. Cluster profiles visualization as well as graphs dedicated to the visualization of time-dependent variables Licen et al. (2020) <doi:10.4209/aaqr.2019.08.0414> are provided.

Maintained by Sabina Licen. Last updated 4 years ago.

2.0 match 1 stars 2.70 score

yhenryli

PAC:Partition-Assisted Clustering and Multiple Alignments of Networks

Implements partition-assisted clustering and multiple alignments of networks. It 1) utilizes partition-assisted clustering to find robust and accurate clusters and 2) discovers coherent relationships of clusters across multiple samples. It is particularly useful for analyzing single-cell data set. Please see Li et al. (2017) <doi:10.1371/journal.pcbi.1005875> for detail method description.

Maintained by Ye Henry Li. Last updated 4 years ago.

cpp

1.6 match 3.30 score 7 scripts

cran

IDmeasurer:Assessment of Individual Identity in Animal Signals

Provides tools for assessment and quantification of individual identity information in animal signals. This package accompanies a research article by Linhart et al. (2019) <doi:10.1101/546143>: "Measuring individual identity information in animal signals: Overview and performance of available identity metrics".

Maintained by Pavel Linhart. Last updated 6 years ago.

1.8 match 2.70 score

jsl5-code

mixexp:Design and Analysis of Mixture Experiments

Functions for creating designs for mixture experiments, making ternary contour plots, and making mixture effect plots.

Maintained by John Lawson. Last updated 5 months ago.

fortran

1.8 match 1 stars 2.75 score 31 scripts 2 dependents

federicogiorgi

corto:Inference of Gene Regulatory Networks

We present 'corto' (Correlation Tool), a simple package to infer gene regulatory networks and visualize master regulators from gene expression data using DPI (Data Processing Inequality) and bootstrapping to recover edges. An initial step is performed to calculate all significant edges between a list of source nodes (centroids) and target genes. Then all triplets containing two centroids and one target are tested in a DPI step which removes edges. A bootstrapping process then calculates the robustness of the network, eventually re-adding edges previously removed by DPI. The algorithm has been optimized to run outside a computing cluster, using a fast correlation implementation. The package finally provides functions to calculate network enrichment analysis from RNA-Seq and ATAC-Seq signatures as described in the article by Giorgi lab (2020) <doi:10.1093/bioinformatics/btaa223>.

Maintained by Federico M. Giorgi. Last updated 2 years ago.

0.8 match 20 stars 6.25 score 59 scripts

travis-barton

LilRhino:For Implementation of Feed Reduction, Learning Examples, NLP and Code Management

This is for code management functions, NLP tools, a Monty Hall simulator, and for implementing my own variable reduction technique called Feed Reduction. The Feed Reduction technique is not yet published, but is merely a tool for implementing a series of binary neural networks meant for reducing data into N dimensions, where N is the number of possible values of the response variable.

Maintained by Travis Barton. Last updated 3 years ago.

1.7 match 1 stars 2.78 score 12 scripts

sophiekersting

treeDbalance:Computation of 3D Tree Imbalance

The main goal of the R package 'treeDbalance' is to provide functions for the computation of several measurements of 3D node imbalance and their respective 3D tree imbalance indices, as well as to introduce the new 'phylo3D' format for rooted 3D tree objects. Moreover, it encompasses an example dataset of 3D models of 63 beans in 'phylo3D' format. Please note that this R package was developed alongside the project described in the manuscript 'Measuring 3D tree imbalance of plant models using graph-theoretical approaches' by M. Fischer, S. Kersting, and L. Kühn (2023) <arXiv:2307.14537>, which provides precise mathematical definitions of the measurements. Furthermore, the package contains several helpful functions, for example, some auxiliary functions for computing the ancestors, descendants, and depths of the nodes, which ensures that the computations can be done in linear time. Most functions of 'treeDbalance' require as input a rooted tree in the 'phylo3D' format, an extended 'phylo' format (as introduced in the R package 'ape' 1.9 in November 2006). Such a 'phylo3D' object must have at least two new attributes next to those required by the 'phylo' format: 'node.coord', the coordinates of the nodes, as well as 'edge.weight', the literal weight or volume of the edges. Optional attributes are 'edge.diam', the diameter of the edges, and 'edge.length', the length of the edges. For visualization purposes one can also specify 'edge.type', which ranges from normal cylinder to bud to leaf, as well as 'edge.color' to change the color of the edge depiction. This project was supported by the joint research project DIG-IT! funded by the European Social Fund (ESF), reference: ESF/14-BM-A55-0017/19, and the Ministry of Education, Science and Culture of Mecklenburg-Western Pomerania, Germany, as well as by the the project ArtIGROW, which is a part of the WIR!-Alliance 'ArtIFARM – Artificial Intelligence in Farming' funded by the German Federal Ministry of Education and Research (FKZ: 03WIR4805).

Maintained by Sophie Kersting. Last updated 2 years ago.

4.0 match 1.00 score

cran

overlapptest:Test Overlapping of Polygons Against Random Rotation

Tests the observed overlapping polygon area in a collection of polygons against a null model of random rotation, as explained in De la Cruz et al. (2017) <doi:10.13140/RG.2.2.12825.72801>.

Maintained by Marcelino de la Cruz. Last updated 2 years ago.

2.0 match 2.00 score

bioc

TSCAN:Tools for Single-Cell Analysis

Provides methods to perform trajectory analysis based on a minimum spanning tree constructed from cluster centroids. Computes pseudotemporal cell orderings by mapping cells in each cluster (or new cells) to the closest edge in the tree. Uses linear modelling to identify differentially expressed genes along each path through the tree. Several plotting and interactive visualization functions are also implemented.

Maintained by Zhicheng Ji. Last updated 5 months ago.

geneexpression visualization gui

0.5 match 7.58 score 207 scripts 3 dependents

bewicklab

HybridMicrobiomes:Analysis of Host-Associated Microbiomes from Hybrid Organisms

A set of tools to analyze and visualize the relationships between host-associated microbiomes of hybrid organisms and those of their progenitor species. Though not necessary, installing the microViz package is recommended as a check for phyloseq objects. To install microViz from R Universe use the following command: install.packages("microViz", repos = c(davidbarnett = "https://david-barnett.r-universe.dev", getOption("repos"))). To install microViz from GitHub use the following commands: install.packages("devtools") followed by devtools::install_github("david-barnett/microViz").

Maintained by Sharon Bewick. Last updated 1 years ago.

software phyloseq

3.7 match 1.00 score

s-u

fastshp:Fast routines for hanlding large ESRI shapefiles (.shp)

Routines for handling of large ESRI shapefiles (.shp). This includes reading, thinning of points and matching of points to containing shapes. The main aim for this package is to provide the speed to support large shapefiles (millions of points). It is several orders of maginute faster than some other shapefile packages.

Maintained by Simon Urbanek. Last updated 7 years ago.

cpp

1.9 match 9 stars 1.95 score 8 scripts

cran

SpatialAcc:Spatial Accessibility Measures

Provides a set of spatial accessibility measures from a set of locations (demand) to another set of locations (supply). It aims, among others, to support research on spatial accessibility to health care facilities. Includes the locations and some characteristics of major public hospitals in Greece.

Maintained by Stamatis Kalogirou. Last updated 12 months ago.

3.6 match 1.00 score

viroli

quantileDA:Quantile Classifier

Code for centroid, median and quantile classifiers.

Maintained by Cinzia Viroli. Last updated 1 years ago.

2.5 match 1.00 score 10 scripts

jungaeleeb

distanceHD:Distance Metrics for High-Dimensional Clustering

We provide three distance metrics for measuring the separation between two clusters in high-dimensional spaces. The first metric is the centroid distance, which calculates the Euclidean distance between the centers of the two groups. The second is a ridge Mahalanobis distance, which incorporates a ridge correction constant, alpha, to ensure that the covariance matrix is invertible. The third metric is the maximal data piling distance, which computes the orthogonal distance between the affine spaces spanned by each class. These three distances are asymptotically interconnected and are applicable in tasks such as discrimination, clustering, and outlier detection in high-dimensional settings.

Maintained by Jung Ae Lee. Last updated 2 months ago.

2.4 match 1.00 score

musajajorge

mapsPERU:Maps of Peru

Information of the centroids and geographical limits of the regions, departments, provinces and districts of Peru.

Maintained by Jorge L. C. Musaja. Last updated 2 years ago.

maps peru

0.6 match 17 stars 4.04 score 13 scripts

andrefujita

cemco:Fit 'CemCO' Algorithm

'CemCO' algorithm, a model-based (Gaussian) clustering algorithm that removes/minimizes the effects of undesirable covariates during the clustering process both in cluster centroids and in cluster covariance structures (Relvas C. & Fujita A., (2020) <arXiv:2004.02333>).

Maintained by Andre Fujita. Last updated 2 years ago.

2.2 match 1.00 score

bioc

PDATK:Pancreatic Ductal Adenocarcinoma Tool-Kit

Pancreatic ductal adenocarcinoma (PDA) has a relatively poor prognosis and is one of the most lethal cancers. Molecular classification of gene expression profiles holds the potential to identify meaningful subtypes which can inform therapeutic strategy in the clinical setting. The Pancreatic Cancer Adenocarcinoma Tool-Kit (PDATK) provides an S4 class-based interface for performing unsupervised subtype discovery, cross-cohort meta-clustering, gene-expression-based classification, and subsequent survival analysis to identify prognostically useful subtypes in pancreatic cancer and beyond. Two novel methods, Consensus Subtypes in Pancreatic Cancer (CSPC) and Pancreatic Cancer Overall Survival Predictor (PCOSP) are included for consensus-based meta-clustering and overall-survival prediction, respectively. Additionally, four published subtype classifiers and three published prognostic gene signatures are included to allow users to easily recreate published results, apply existing classifiers to new data, and benchmark the relative performance of new methods. The use of existing Bioconductor classes as input to all PDATK classes and methods enables integration with existing Bioconductor datasets, including the 21 pancreatic cancer patient cohorts available in the MetaGxPancreas data package. PDATK has been used to replicate results from Sandhu et al (2019) [https://doi.org/10.1200/cci.18.00102] and an additional paper is in the works using CSPC to validate subtypes from the included published classifiers, both of which use the data available in MetaGxPancreas. The inclusion of subtype centroids and prognostic gene signatures from these and other publications will enable researchers and clinicians to classify novel patient gene expression data, allowing the direct clinical application of the classifiers included in PDATK. Overall, PDATK provides a rich set of tools to identify and validate useful prognostic and molecular subtypes based on gene-expression data, benchmark new classifiers against existing ones, and apply discovered classifiers on novel patient data to inform clinical decision making.

Maintained by Benjamin Haibe-Kains. Last updated 5 months ago.

geneexpression pharmacogenetics pharmacogenomics software classification survival clustering geneprediction

0.5 match 1 stars 4.31 score 17 scripts

cran

EcotoneFinder:Characterising and Locating Ecotones and Communities

Analytical methods to locate and characterise ecotones, ecosystems and environmental patchiness along ecological gradients. Methods are implemented for isolated sampling or for space/time series. It includes Detrended Correspondence Analysis (Hill & Gauch (1980) <doi:10.1007/BF00048870>), fuzzy clustering (De Cáceres et al. (2010) <doi:10.1080/01621459.1963.10500845>), biodiversity indices (Jost (2006) <doi:10.1111/j.2006.0030-1299.14714.x>), and network analyses (Epskamp et al. (2012) <doi:10.18637/jss.v048.i04>) - as well as tools to explore the number of clusters in the data. Functions to produce synthetic ecological datasets are also provided.

Maintained by Antoine Bagnaro. Last updated 4 years ago.

2.0 match 1.00 score

cran

Allspice:RNA-Seq Profile Classifier

We developed a lightweight machine learning tool for RNA profiling of acute lymphoblastic leukemia (ALL), however, it can be used for any problem where multiple classes need to be identified from multi-dimensional data. The methodology is described in Makinen V-P, Rehn J, Breen J, Yeung D, White DL (2022) Multi-cohort transcriptomic subtyping of B-cell acute lymphoblastic leukemia, International Journal of Molecular Sciences 23:4574, <doi:10.3390/ijms23094574>. The classifier contains optimized mean profiles of the classes (centroids) as observed in the training data, and new samples are matched to these centroids using the shortest Euclidean distance. Centroids derived from a dataset of 1,598 ALL patients are included, but users can train the models with their own data as well. The output includes both numerical and visual presentations of the classification results. Samples with mixed features from multiple classes or atypical values are also identified.

Maintained by Ville-Petteri Makinen. Last updated 2 years ago.

0.9 match 2.00 score

cran

SpatialVx:Spatial Forecast Verification

Spatial forecast verification refers to verifying weather forecasts when the verification set (forecast and observations) is on a spatial field, usually a high-resolution gridded spatial field. Most of the functions here require the forecast and observed fields to be gridded and on the same grid. For a thorough review of most of the methods in this package, please see Gilleland et al. (2009) <doi: 10.1175/2009WAF2222269.1> and for a tutorial on some of the main functions available here, see Gilleland (2022) <doi: 10.5065/4px3-5a05>.

Maintained by Eric Gilleland. Last updated 4 months ago.

1.8 match 1 stars 1.00 score

doer0

mixOofA:Design and Analysis of Order-of-Addition Mixture Experiments

A facility to generate various classes of fractional designs for order-of-addition experiments namely fractional order-of-additions orthogonal arrays, see Voelkel, Joseph G. (2019). "The design of order-of-addition experiments." Journal of Quality Technology 51:3, 230-241, <doi:10.1080/00224065.2019.1569958>. Provides facility to construct component orthogonal arrays, see Jian-Feng Yang, Fasheng Sun and Hongquan Xu (2020). "A Component Position Model, Analysis and Design for Order-of-Addition Experiments." Technometrics, <doi:10.1080/00401706.2020.1764394>. Supports generation of fractional designs for order-of-addition mixture experiments. Analysis of data from order-of-addition mixture experiments is also supported.

Maintained by Baidya Nath Mandal. Last updated 8 months ago.

1.8 match 1.00 score

cran

drclust:Simultaneous Clustering and (or) Dimensionality Reduction

Methods for simultaneous clustering and dimensionality reduction such as: Double k-means, Reduced k-means, Factorial k-means, Clustering with Disjoint PCA but also methods for exclusively dimensionality reduction: Disjoint PCA, Disjoint FA. The statistical methods implemented refer to the following articles: de Soete G., Carroll J. (1994) "K-means clustering in a low-dimensional Euclidean space" <doi:10.1007/978-3-642-51175-2_24> ; Vichi M. (2001) "Double k-means Clustering for Simultaneous Classification of Objects and Variables" <doi:10.1007/978-3-642-59471-7_6> ; Vichi M., Kiers H.A.L. (2001) "Factorial k-means analysis for two-way data" <doi:10.1016/S0167-9473(00)00064-5> ; Vichi M., Saporta G. (2009) "Clustering and disjoint principal component analysis" <doi:10.1016/j.csda.2008.05.028> ; Vichi M. (2017) "Disjoint factor analysis with cross-loadings" <doi:10.1007/s11634-016-0263-9>.

Maintained by Ionel Prunila. Last updated 11 months ago.

openblas cpp openmp

1.8 match 1.00 score

r-forge

anacor:Simple and Canonical Correspondence Analysis

Performs simple and canonical CA (covariates on rows/columns) on a two-way frequency table (with missings) by means of SVD. Different scaling methods (standard, centroid, Benzecri, Goodman) as well as various plots including confidence ellipsoids are provided.

Maintained by Patrick Mair. Last updated 7 days ago.

0.5 match 3.40 score 21 scripts

bioc

cancerclass:Development and validation of diagnostic tests from high-dimensional molecular data

The classification protocol starts with a feature selection step and continues with nearest-centroid classification. The accurarcy of the predictor can be evaluated using training and test set validation, leave-one-out cross-validation or in a multiple random validation protocol. Methods for calculation and visualization of continuous prediction scores allow to balance sensitivity and specificity and define a cutoff value according to clinical requirements.

Maintained by Daniel Kosztyla. Last updated 5 months ago.

cancer microarray classification visualization

0.5 match 3.30 score 10 scripts

rajkumpismb

PCAPAM50:Enhanced 'PAM50' Subtyping of Breast Cancer

Accurate classification of breast cancer tumors based on gene expression data is not a trivial task, and it lacks standard practices.The 'PAM50' classifier, which uses 50 gene centroid correlation distances to classify tumors, faces challenges with balancing estrogen receptor (ER) status and gene centering. The 'PCAPAM50' package leverages principal component analysis and iterative 'PAM50' calls to create a gene expression-based ER-balanced subset for gene centering, avoiding the use of protein expression-based ER data resulting into an enhanced Breast Cancer subtyping.

Maintained by Praveen-Kumar Raj-Kumar. Last updated 3 months ago.

0.5 match 2.48 score 3 scripts

cran

inaparc:Initialization Algorithms for Partitioning Cluster Analysis

Partitioning clustering algorithms divide data sets into k subsets or partitions so-called clusters. They require some initialization procedures for starting the algorithms. Initialization of cluster prototypes is one of such kind of procedures for most of the partitioning algorithms. Cluster prototypes are the centers of clusters, i.e. centroids or medoids, representing the clusters in a data set. In order to initialize cluster prototypes, the package 'inaparc' contains a set of the functions that are the implementations of several linear time-complexity and loglinear time-complexity methods in addition to some novel techniques. Initialization of fuzzy membership degrees matrices is another important task for starting the probabilistic and possibilistic partitioning algorithms. In order to initialize membership degrees matrices required by these algorithms, a number of functions based on some traditional and novel initialization techniques are also available in the package 'inaparc'.

Maintained by Zeynel Cebeci. Last updated 3 years ago.

0.5 match 2.18 score 5 dependents

cran

ddc:Distance Density Clustering Algorithm

A distance density clustering (DDC) algorithm in R. DDC uses dynamic time warping (DTW) to compute a similarity matrix, based on which cluster centers and cluster assignments are found. DDC inherits dynamic time warping (DTW) arguments and constraints. The cluster centers are centroid points that are calculated using the DTW Barycenter Averaging (DBA) algorithm. The clustering process is divisive. At each iteration, cluster centers are updated and data is reassigned to cluster centers. Early stopping is possible. The output includes cluster centers and clustering assignment, as described in the paper (Ma et al (2017) <doi:10.1109/ICDMW.2017.11>).

Maintained by Ruizhe Ma. Last updated 2 years ago.

0.5 match 2.00 score

lutzhamel

popsom7:A Fast, User-Friendly Implementation of Self-Organizing Maps (SOMs)

Methods for building self-organizing maps (SOMs) with a number of distinguishing features such automatic centroid detection and cluster visualization using starbursts. For more details see the paper "Improved Interpretability of the Unified Distance Matrix with Connected Components" by Hamel and Brown (2011) in <ISBN:1-60132-168-6>. The package provides user-friendly access to two models we construct: (a) a SOM model and (b) a centroid based clustering model. The package also exposes a number of quality metrics for the quantitative evaluation of the map, Hamel (2016) <doi:10.1007/978-3-319-28518-4_4>. Finally, we reintroduced our fast, vectorized training algorithm for SOM with substantial improvements. It is about an order of magnitude faster than the canonical, stochastic C implementation <doi:10.1007/978-3-030-01057-7_60>.

Maintained by Lutz Hamel. Last updated 30 days ago.

fortran

0.8 match 1.30 score 2 scripts

ottaviaepifania

shortIRT:Procedures Based on Item Response Theory Models for the Development of Short Test Forms

Implement different Item Response Theory (IRT) based procedures for the development of static short test forms (STFs) from a test. Two main procedures are considered, specifically the typical IRT-based procedure for the development of STF, and a recently introduced procedure (Epifania, Anselmi & Robusto, 2022 <doi:10.1007/978-3-031-27781-8_7>). The procedures differ in how the most informative items are selected for the inclusion in the STF, either by considering their item information functions without considering any specific level of the latent trait (typical procedure) or by considering their informativeness with respect to specific levels of the latent trait, denoted as theta targets (the newly introduced procedure). Regarding the latter procedure, three methods are implemented for the definition of the theta targets: (i) theta targets are defined by segmenting the latent trait in equal intervals and considering the midpoint of each interval (equal interval procedure, eip), (ii) by clustering the latent trait to obtain unequal intervals and considering the centroids of the clusters as the theta targets (unequal intervals procedure, uip), and (iii) by letting the user set the specific theta targets of interest (user-defined procedure, udp). For further details on the procedure, please refer to Epifania, Anselmi & Robusto (2022) <doi:10.1007/978-3-031-27781-8_7>.

Maintained by Ottavia M. Epifania. Last updated 1 years ago.

0.5 match 1.70 score 3 scripts

pavlomozharovskyi

TukeyRegion:Tukey Region and Median

Tukey regions are polytopes in the Euclidean space, viz. upper-level sets of the Tukey depth function on given data. The bordering hyperplanes of a Tukey region are computed as well as its vertices, facets, centroid, and volume. In addition, the Tukey median set, which is the non-empty Tukey region having highest depth level, and its barycenter (= Tukey median) are calculated. Tukey regions are visualized in dimension two and three. For details see Liu, Mosler, and Mozharovskyi (2019, <doi:10.1080/10618600.2018.1546595>). See file LICENSE.note for additional license information.

Maintained by Pavlo Mozharovskyi. Last updated 2 years ago.

cpp

0.5 match 1.00 score

hdraisma

represent:Determine How Representative Two Multidimensional Data Sets are

Compute the values of various parameters evaluating how similar two multidimensional datasets' structures are in multidimensional space, as described in: Jouan-Rimbaud, D., Massart, D. L., Saby, C. A., Puel, C. (1998), <doi:10.1016/S0169-7439(98)00005-7>. The computed parameters evaluate three properties, namely, the direction of the data sets, the variance-covariance of the data points, and the location of the data sets' centroids. The package contains workhorse function jrparams(), as well as two helper functions Mboxtest() and JRsMahaldist(), and four example data sets.

Maintained by Harmen Draisma. Last updated 1 years ago.

0.5 match 1.00 score 5 scripts

yuanbofaith

protag:Search Tagged Peptides & Draw Highlighted Mass Spectra

In a typical protein labelling procedure, proteins are chemically tagged with a functional group, usually at specific sites, then digested into peptides, which are then analyzed using matrix-assisted laser desorption ionization - time of flight mass spectrometry (MALDI-TOF MS) to generate peptide fingerprint. Relative to the control, peptides that are heavier by the mass of the labelling group are informative for sequence determination. Searching for peptides with such mass shifts, however, can be difficult. This package, designed to tackle this inconvenience, takes as input the mass list of two or multiple MALDI-TOF MS mass lists, and makes pairwise comparisons between the labeled groups vs. control, and restores centroid mass spectra with highlighted peaks of interest for easier visual examination. Particularly, peaks differentiated by the mass of the labelling group are defined as a “pair”, those with equal masses as a “match”, and all the other peaks as a “mismatch”.For more bioanalytical background information, refer to following publications: Jingjing Deng (2015) <doi:10.1007/978-1-4939-2550-6_19>; Elizabeth Chang (2016) <doi:10.7171/jbt.16-2702-002>.

Maintained by Bo Yuan. Last updated 6 years ago.

0.5 match 1 stars 1.00 score 1 scripts