Showing 200 of total 415 results (show query)
jlmelville
rnndescent:Nearest Neighbor Descent Method for Approximate Nearest Neighbors
The Nearest Neighbor Descent method for finding approximate nearest neighbors by Dong and co-workers (2010) <doi:10.1145/1963405.1963487>. Based on the 'Python' package 'PyNNDescent' <https://github.com/lmcinnes/pynndescent>.
Maintained by James Melville. Last updated 8 months ago.
approximate-nearest-neighbor-searchcpp
57.6 match 11 stars 7.31 score 75 scriptssatijalab
Seurat:Tools for Single Cell Genomics
A toolkit for quality control, analysis, and exploration of single cell RNA sequencing data. 'Seurat' aims to enable users to identify and interpret sources of heterogeneity from single cell transcriptomic measurements, and to integrate diverse types of single cell data. See Satija R, Farrell J, Gennert D, et al (2015) <doi:10.1038/nbt.3192>, Macosko E, Basu A, Satija R, et al (2015) <doi:10.1016/j.cell.2015.05.002>, Stuart T, Butler A, et al (2019) <doi:10.1016/j.cell.2019.05.031>, and Hao, Hao, et al (2020) <doi:10.1101/2020.10.12.335331> for more details.
Maintained by Paul Hoffman. Last updated 1 years ago.
human-cell-atlassingle-cell-genomicssingle-cell-rna-seqcpp
19.3 match 2.4k stars 16.86 score 50k scripts 73 dependentsjlmelville
RcppHNSW:'Rcpp' Bindings for 'hnswlib', a Library for Approximate Nearest Neighbors
'Hnswlib' is a C++ library for Approximate Nearest Neighbors. This package provides a minimal R interface by relying on the 'Rcpp' package. See <https://github.com/nmslib/hnswlib> for more on 'hnswlib'. 'hnswlib' is released under Version 2.0 of the Apache License.
Maintained by James Melville. Last updated 3 months ago.
approximate-nearest-neighbor-searchhnswk-nearest-neighborsknnnearest-neighbor-searchnmslibrcppcpp
30.1 match 36 stars 10.07 score 63 scripts 77 dependentsstatnet
ergm:Fit, Simulate and Diagnose Exponential-Family Models for Networks
An integrated set of tools to analyze and simulate networks based on exponential-family random graph models (ERGMs). 'ergm' is a part of the Statnet suite of packages for network analysis. See Hunter, Handcock, Butts, Goodreau, and Morris (2008) <doi:10.18637/jss.v024.i03> and Krivitsky, Hunter, Morris, and Klumb (2023) <doi:10.18637/jss.v105.i06>.
Maintained by Pavel N. Krivitsky. Last updated 7 days ago.
18.0 match 100 stars 15.36 score 1.4k scripts 36 dependentsbioc
BiocNeighbors:Nearest Neighbor Detection for Bioconductor Packages
Implements exact and approximate methods for nearest neighbor detection, in a framework that allows them to be easily switched within Bioconductor packages or workflows. Exact searches can be performed using the k-means for k-nearest neighbors algorithm or with vantage point trees. Approximate searches can be performed using the Annoy or HNSW libraries. Searching on either Euclidean or Manhattan distances is supported. Parallelization is achieved for all methods by using BiocParallel. Functions are also provided to search for all neighbors within a given distance.
Maintained by Aaron Lun. Last updated 12 days ago.
25.9 match 10.14 score 646 scripts 89 dependentssatijalab
SeuratObject:Data Structures for Single Cell Data
Defines S4 classes for single-cell genomic data and associated information, such as dimensionality reduction embeddings, nearest-neighbor graphs, and spatially-resolved coordinates. Provides data access methods and R-native hooks to ensure the Seurat object is familiar to other R users. See Satija R, Farrell J, Gennert D, et al (2015) <doi:10.1038/nbt.3192>, Macosko E, Basu A, Satija R, et al (2015) <doi:10.1016/j.cell.2015.05.002>, and Stuart T, Butler A, et al (2019) <doi:10.1016/j.cell.2019.05.031> for more details.
Maintained by Paul Hoffman. Last updated 1 years ago.
20.9 match 25 stars 11.69 score 1.2k scripts 88 dependentsjosiahparry
sfdep:Spatial Dependence for Simple Features
An interface to 'spdep' to integrate with 'sf' objects and the 'tidyverse'.
Maintained by Dexter Locke. Last updated 6 months ago.
30.7 match 130 stars 7.01 score 130 scriptsmhahsler
dbscan:Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Related Algorithms
A fast reimplementation of several density-based algorithms of the DBSCAN family. Includes the clustering algorithms DBSCAN (density-based spatial clustering of applications with noise) and HDBSCAN (hierarchical DBSCAN), the ordering algorithm OPTICS (ordering points to identify the clustering structure), shared nearest neighbor clustering, and the outlier detection algorithms LOF (local outlier factor) and GLOSH (global-local outlier score from hierarchies). The implementations use the kd-tree data structure (from library ANN) for faster k-nearest neighbor search. An R interface to fast kNN and fixed-radius NN search is also provided. Hahsler, Piekenbrock and Doran (2019) <doi:10.18637/jss.v091.i01>.
Maintained by Michael Hahsler. Last updated 2 months ago.
clusteringdbscandensity-based-clusteringhdbscanlofopticscpp
13.6 match 321 stars 15.62 score 1.6k scripts 84 dependentsgeodacenter
rgeoda:R Library for Spatial Data Analysis
Provides spatial data analysis functionalities including Exploratory Spatial Data Analysis, Spatial Cluster Detection and Clustering Analysis, Regionalization, etc. based on the C++ source code of 'GeoDa', which is an open-source software tool that serves as an introduction to spatial data analysis. The 'GeoDa' software and its documentation are available at <https://geodacenter.github.io>.
Maintained by Xun Li. Last updated 9 days ago.
dataanalysisgeodageospatialcpp
22.3 match 73 stars 7.85 score 179 scripts 1 dependentseddelbuettel
RcppAnnoy:'Rcpp' Bindings for 'Annoy', a Library for Approximate Nearest Neighbors
'Annoy' is a small C++ library for Approximate Nearest Neighbors written for efficient memory usage as well an ability to load from / save to disk. This package provides an R interface by relying on the 'Rcpp' package, exposing the same interface as the original Python wrapper to 'Annoy'. See <https://github.com/spotify/annoy> for more on 'Annoy'. 'Annoy' is released under Version 2.0 of the Apache License. Also included is a small Windows port of 'mmap' which is released under the MIT license.
Maintained by Dirk Eddelbuettel. Last updated 8 days ago.
annoynearestnearest-neighborscpp
14.4 match 72 stars 11.97 score 57 scripts 147 dependentsklausvigo
kknn:Weighted k-Nearest Neighbors
Weighted k-Nearest Neighbors for Classification, Regression and Clustering.
Maintained by Klaus Schliep. Last updated 4 years ago.
14.7 match 23 stars 11.08 score 4.6k scripts 41 dependentsprodriguezsosa
conText:'a la Carte' on Text (ConText) Embedding Regression
A fast, flexible and transparent framework to estimate context-specific word and short document embeddings using the 'a la carte' embeddings approach developed by Khodak et al. (2018) <arXiv:1805.05388> and evaluate hypotheses about covariate effects on embeddings using the regression framework developed by Rodriguez et al. (2021)<https://github.com/prodriguezsosa/EmbeddingRegression>.
Maintained by Pedro L. Rodriguez. Last updated 11 months ago.
17.0 match 104 stars 9.40 score 1.7k scriptsigraph
igraph:Network Analysis and Visualization
Routines for simple graphs and network analysis. It can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality methods and much more.
Maintained by Kirill Mรผller. Last updated 5 hours ago.
complex-networksgraph-algorithmsgraph-theorymathematicsnetwork-analysisnetwork-graphfortranlibxml2glpkopenblascpp
6.8 match 582 stars 21.11 score 31k scripts 1.9k dependentsrich-iannone
DiagrammeR:Graph/Network Visualization
Build graph/network structures using functions for stepwise addition and deletion of nodes and edges. Work with data available in tables for bulk addition of nodes, edges, and associated metadata. Use graph selections and traversals to apply changes to specific nodes or edges. A wide selection of graph algorithms allow for the analysis of graphs. Visualize the graphs and take advantage of any aesthetic properties assigned to nodes and edges.
Maintained by Richard Iannone. Last updated 2 months ago.
graphgraph-functionsnetwork-graphproperty-graphvisualization
8.9 match 1.7k stars 15.18 score 3.8k scripts 87 dependentspaulnorthrop
donut:Nearest Neighbour Search with Variables on a Torus
Finds the k nearest neighbours in a dataset of specified points, adding the option to wrap certain variables on a torus. The user chooses the algorithm to use to find the nearest neighbours. Two such algorithms, provided by the packages 'RANN' <https://cran.r-project.org/package=RANN>, and 'nabor' <https://cran.r-project.org/package=nabor>, are suggested.
Maintained by Paul J. Northrop. Last updated 2 years ago.
degreesdonutedgesknn-algorithmknn-searchnabornearestnearest-neighbornearest-neighbor-searchnearest-neighborsnearest-neighbour-algorithmnearest-neighboursneighborsperiodicityranntoruswrap
31.7 match 1 stars 4.18 score 5 scripts 1 dependentscran
FNN:Fast Nearest Neighbor Search Algorithms and Applications
Cover-tree and kd-tree fast k-nearest neighbor search algorithms and related applications including KNN classification, regression and information measures are implemented.
Maintained by Shengqiao Li. Last updated 6 months ago.
13.0 match 5 stars 9.94 score 2.4k scripts 594 dependentsfrbcesab
chessboard:Create Network Connections Based on Chess Moves
Provides functions to work with directed (asymmetric) and undirected (symmetric) spatial networks. It makes the creation of connectivity matrices easier, i.e. a binary matrix of dimension n x n, where n is the number of nodes (sampling units) indicating the presence (1) or the absence (0) of an edge (link) between pairs of nodes. Different network objects can be produced by 'chessboard': node list, neighbor list, edge list, connectivity matrix. It can also produce objects that will be used later in Moran's Eigenvector Maps (Dray et al. (2006) <doi:10.1016/j.ecolmodel.2006.02.015>) and Asymetric Eigenvector Maps (Blanchet et al. (2008) <doi:10.1016/j.ecolmodel.2008.04.001>), methods available in the package 'adespatial' (Dray et al. (2023) <https://CRAN.R-project.org/package=adespatial>). This work is part of the FRB-CESAB working group Bridge <https://www.fondationbiodiversite.fr/en/the-frb-in-action/programs-and-projects/le-cesab/bridge/>.
Maintained by Nicolas Casajus. Last updated 1 years ago.
connectivity-matrixdirected-networksneighborhoodnetworkone-dimensional-networksspatial-networkstwo-dimensional-networksundirected-networks
25.5 match 4 stars 4.78 scoremlampros
KernelKnn:Kernel k Nearest Neighbors
Extends the simple k-nearest neighbors algorithm by incorporating numerous kernel functions and a variety of distance metrics. The package takes advantage of 'RcppArmadillo' to speed up the calculation of distances between observations.
Maintained by Lampros Mouselimis. Last updated 2 years ago.
cpp11distance-metrickernel-methodsknnrcpparmadilloopenblascppopenmp
12.4 match 17 stars 9.16 score 54 scripts 13 dependentsgmcmacran
dann:Discriminant Adaptive Nearest Neighbor Classification
Discriminant Adaptive Nearest Neighbor Classification is a variation of k nearest neighbors where the shape of the neighborhood is data driven. This package implements dann and sub_dann from Hastie (1996) <https://web.stanford.edu/~hastie/Papers/dann_IEEE.pdf>.
Maintained by Greg McMahan. Last updated 8 months ago.
28.7 match 3.74 score 37 scriptsbioc
CatsCradle:This package provides methods for analysing spatial transcriptomics data and for discovering gene clusters
This package addresses two broad areas. It allows for in-depth analysis of spatial transcriptomic data by identifying tissue neighbourhoods. These are contiguous regions of tissue surrounding individual cells. 'CatsCradle' allows for the categorisation of neighbourhoods by the cell types contained in them and the genes expressed in them. In particular, it produces Seurat objects whose individual elements are neighbourhoods rather than cells. In addition, it enables the categorisation and annotation of genes by producing Seurat objects whose elements are genes.
Maintained by Michael Shapiro. Last updated 1 months ago.
biologicalquestionstatisticalmethodgeneexpressionsinglecelltranscriptomicsspatial
15.9 match 3 stars 6.50 scoretidymodels
dials:Tools for Creating Tuning Parameter Values
Many models contain tuning parameters (i.e. parameters that cannot be directly estimated from the data). These tools can be used to define objects for creating, simulating, or validating values for such parameters.
Maintained by Hannah Frick. Last updated 30 days ago.
7.1 match 114 stars 14.31 score 426 scripts 52 dependentshannameyer
CAST:'caret' Applications for Spatial-Temporal Models
Supporting functionality to run 'caret' with spatial or spatial-temporal data. 'caret' is a frequently used package for model training and prediction using machine learning. CAST includes functions to improve spatial or spatial-temporal modelling tasks using 'caret'. It includes the newly suggested 'Nearest neighbor distance matching' cross-validation to estimate the performance of spatial prediction models and allows for spatial variable selection to selects suitable predictor variables in view to their contribution to the spatial model performance. CAST further includes functionality to estimate the (spatial) area of applicability of prediction models. Methods are described in Meyer et al. (2018) <doi:10.1016/j.envsoft.2017.12.001>; Meyer et al. (2019) <doi:10.1016/j.ecolmodel.2019.108815>; Meyer and Pebesma (2021) <doi:10.1111/2041-210X.13650>; Milร et al. (2022) <doi:10.1111/2041-210X.13851>; Meyer and Pebesma (2022) <doi:10.1038/s41467-022-29838-9>; Linnenbrink et al. (2023) <doi:10.5194/egusphere-2023-1308>; Schumacher et al. (2024) <doi:10.5194/egusphere-2024-2730>. The package is described in detail in Meyer et al. (2024) <doi:10.48550/arXiv.2404.06978>.
Maintained by Hanna Meyer. Last updated 2 months ago.
autocorrelationcaretfeature-selectionmachine-learningoverfittingpredictive-modelingspatialspatio-temporalvariable-selection
8.0 match 114 stars 11.97 score 298 scripts 1 dependentsjefferislab
RANN:Fast Nearest Neighbour Search (Wraps ANN Library) Using L2 Metric
Finds the k nearest neighbours for every point in a given dataset in O(N log N) time using Arya and Mount's ANN library (v1.1.3). There is support for approximate as well as exact searches, fixed radius searches and 'bd' as well as 'kd' trees. The distance is computed using the L2 (Euclidean) metric. Please see package 'RANN.L1' for the same functionality using the L1 (Manhattan, taxicab) metric.
Maintained by Gregory Jefferis. Last updated 7 months ago.
ann-librarynearest-neighborsnearest-neighbourscpp
7.5 match 58 stars 12.21 score 1.3k scripts 190 dependentsjeffreyevans
yaImpute:Nearest Neighbor Observation Imputation and Evaluation Tools
Performs nearest neighbor-based imputation using one or more alternative approaches to processing multivariate data. These include methods based on canonical correlation: analysis, canonical correspondence analysis, and a multivariate adaptation of the random forest classification and regression techniques of Leo Breiman and Adele Cutler. Additional methods are also offered. The package includes functions for comparing the results from running alternative techniques, detecting imputation targets that are notably distant from reference observations, detecting and correcting for bias, bootstrapping and building ensemble imputations, and mapping results.
Maintained by Jeffrey S. Evans. Last updated 6 months ago.
12.3 match 3 stars 7.40 score 94 scripts 12 dependentsjfrench
smerc:Statistical Methods for Regional Counts
Implements statistical methods for analyzing the counts of areal data, with a focus on the detection of spatial clusters and clustering. The package has a heavy emphasis on spatial scan methods, which were first introduced by Kulldorff and Nagarwalla (1995) <doi:10.1002/sim.4780140809> and Kulldorff (1997) <doi:10.1080/03610929708831995>.
Maintained by Joshua French. Last updated 5 months ago.
12.9 match 3 stars 6.11 score 45 scripts 3 dependentskisungyou
Rdimtools:Dimension Reduction and Estimation Methods
We provide linear and nonlinear dimension reduction techniques. Intrinsic dimension estimation methods for exploratory analysis are also provided. For more details on the package, see the paper by You and Shung (2022) <doi:10.1016/j.simpa.2022.100414>.
Maintained by Kisung You. Last updated 2 years ago.
dimension-estimationdimension-reductionmanifold-learningsubspace-learningopenblascppopenmp
9.3 match 52 stars 8.37 score 186 scripts 8 dependentsbioc
ChemmineR:Cheminformatics Toolkit for R
ChemmineR is a cheminformatics package for analyzing drug-like small molecule data in R. Its latest version contains functions for efficient processing of large numbers of molecules, physicochemical/structural property predictions, structural similarity searching, classification and clustering of compound libraries with a wide spectrum of algorithms. In addition, it offers visualization functions for compound clustering results and chemical structures.
Maintained by Thomas Girke. Last updated 5 months ago.
cheminformaticsbiomedicalinformaticspharmacogeneticspharmacogenomicsmicrotitreplateassaycellbasedassaysvisualizationinfrastructuredataimportclusteringproteomicsmetabolomicscpp
8.3 match 14 stars 9.42 score 253 scripts 12 dependentsmichaeldorman
nngeo:k-Nearest Neighbor Join for Spatial Data
K-nearest neighbor search for projected and non-projected 'sf' spatial layers. Nearest neighbor search uses (1) C code from 'GeographicLib' for lon-lat point layers, (2) function knn() from package 'nabor' for projected point layers, or (3) function st_distance() from package 'sf' for line or polygon layers. The package also includes several other utility functions for spatial analysis.
Maintained by Michael Dorman. Last updated 11 months ago.
8.0 match 81 stars 9.70 score 600 scripts 6 dependentsbioc
GenomicDistributions:GenomicDistributions: fast analysis of genomic intervals with Bioconductor
If you have a set of genomic ranges, this package can help you with visualization and comparison. It produces several kinds of plots, for example: Chromosome distribution plots, which visualize how your regions are distributed over chromosomes; feature distance distribution plots, which visualizes how your regions are distributed relative to a feature of interest, like Transcription Start Sites (TSSs); genomic partition plots, which visualize how your regions overlap given genomic features such as promoters, introns, exons, or intergenic regions. It also makes it easy to compare one set of ranges to another.
Maintained by Kristyna Kupkova. Last updated 5 months ago.
softwaregenomeannotationgenomeassemblydatarepresentationsequencingcoveragefunctionalgenomicsvisualization
10.1 match 26 stars 7.44 score 25 scriptslkremer
ggpointdensity:A Cross Between a 2D Density Plot and a Scatter Plot
A cross between a 2D density plot and a scatter plot, implemented as a 'ggplot2' geom. Points in the scatter plot are colored by the number of neighboring points. This is useful to visualize the 2D-distribution of points in case of overplotting.
Maintained by Lukas P. M. Kremer. Last updated 10 months ago.
2d-density-plotdensity-visualizationgeomggplot-extensionggplot2ggplot2-enhancementsggplot2-geomsneighboring-pointsscatter-plotvisualization
8.0 match 424 stars 9.30 score 1.1k scripts 4 dependentswelch-lab
cytosignal:What the Package Does (One Line, Title Case)
What the package does (one paragraph).
Maintained by Jialin Liu. Last updated 7 days ago.
11.8 match 16 stars 5.95 score 6 scriptstopepo
caret:Classification and Regression Training
Misc functions for training and plotting classification and regression models.
Maintained by Max Kuhn. Last updated 3 months ago.
3.6 match 1.6k stars 19.24 score 61k scripts 303 dependentsjkrijthe
Rtsne:T-Distributed Stochastic Neighbor Embedding using a Barnes-Hut Implementation
An R wrapper around the fast T-distributed Stochastic Neighbor Embedding implementation by Van der Maaten (see <https://github.com/lvdmaaten/bhtsne/> for more information on the original implementation).
Maintained by Jesse Krijthe. Last updated 9 months ago.
5.0 match 256 stars 13.95 score 4.4k scripts 231 dependentsssnn-airr
shazam:Immunoglobulin Somatic Hypermutation Analysis
Provides a computational framework for analyzing mutations in immunoglobulin (Ig) sequences. Includes methods for Bayesian estimation of antigen-driven selection pressure, mutational load quantification, building of somatic hypermutation (SHM) models, and model-dependent distance calculations. Also includes empirically derived models of SHM for both mice and humans. Citations: Gupta and Vander Heiden, et al (2015) <doi:10.1093/bioinformatics/btv359>, Yaari, et al (2012) <doi:10.1093/nar/gks457>, Yaari, et al (2013) <doi:10.3389/fimmu.2013.00358>, Cui, et al (2016) <doi:10.4049/jimmunol.1502263>.
Maintained by Susanna Marquez. Last updated 2 months ago.
8.4 match 7.43 score 222 scripts 2 dependentsjefferis
nabor:Wraps 'libnabo', a Fast K Nearest Neighbour Library for Low Dimensions
An R wrapper for 'libnabo', an exact or approximate k nearest neighbour library which is optimised for low dimensional spaces (e.g. 3D). 'libnabo' has speed and space advantages over the 'ANN' library wrapped by package 'RANN'. 'nabor' includes a knn function that is designed as a drop-in replacement for 'RANN' function nn2. In addition, objects which include the k-d tree search structure can be returned to speed up repeated queries of the same set of target points.
Maintained by Gregory Jefferis. Last updated 5 years ago.
7.5 match 22 stars 8.21 score 104 scripts 34 dependentskjhealy
gssrdoc:Document General Social Survey Variable
The General Social Survey (GSS) is a long-running, mostly annual survey of US households. It is administered by the National Opinion Research Center (NORC). This package contains the a tibble with information on the survey variables, together with every variable documented as an R help page. For more information on the GSS see \url{http://gss.norc.org}.
Maintained by Kieran Healy. Last updated 11 months ago.
26.7 match 2.28 score 38 scriptsdvrbts
labdsv:Ordination and Multivariate Analysis for Ecology
A variety of ordination and community analyses useful in analysis of data sets in community ecology. Includes many of the common ordination methods, with graphical routines to facilitate their interpretation, as well as several novel analyses.
Maintained by David W. Roberts. Last updated 2 years ago.
9.6 match 3 stars 6.08 score 452 scripts 13 dependentsludovikcoba
rrecsys:Environment for Evaluating Recommender Systems
Processes standard recommendation datasets (e.g., a user-item rating matrix) as input and generates rating predictions and lists of recommended items. Standard algorithm implementations which are included in this package are the following: Global/Item/User-Average baselines, Weighted Slope One, Item-Based KNN, User-Based KNN, FunkSVD, BPR and weighted ALS. They can be assessed according to the standard offline evaluation methodology (Shani, et al. (2011) <doi:10.1007/978-0-387-85820-3_8>) for recommender systems using measures such as MAE, RMSE, Precision, Recall, F1, AUC, NDCG, RankScore and coverage measures. The package (Coba, et al.(2017) <doi: 10.1007/978-3-319-60042-0_36>) is intended for rapid prototyping of recommendation algorithms and education purposes.
Maintained by Ludovik รoba. Last updated 3 years ago.
8.3 match 23 stars 6.84 score 25 scriptsbioc
recountmethylation:Access and analyze public DNA methylation array data compilations
Resources for cross-study analyses of public DNAm array data from NCBI GEO repo, produced using Illumina's Infinium HumanMethylation450K (HM450K) and MethylationEPIC (EPIC) platforms. Provided functions enable download, summary, and filtering of large compilation files. Vignettes detail background about file formats, example analyses, and more. Note the disclaimer on package load and consult the main manuscripts for further info.
Maintained by Sean K Maden. Last updated 5 months ago.
dnamethylationepigeneticsmicroarraymethylationarrayexperimenthub
8.8 match 9 stars 6.28 score 9 scriptsr-lidar
lidR:Airborne LiDAR Data Manipulation and Visualization for Forestry Applications
Airborne LiDAR (Light Detection and Ranging) interface for data manipulation and visualization. Read/write 'las' and 'laz' files, computation of metrics in area based approach, point filtering, artificial point reduction, classification from geographic data, normalization, individual tree segmentation and other manipulations.
Maintained by Jean-Romain Roussel. Last updated 1 months ago.
alsforestrylaslazlidarpoint-cloudremote-sensingopenblascppopenmp
3.8 match 623 stars 14.47 score 844 scripts 8 dependentsmlampros
nmslibR:Non Metric Space (Approximate) Library
A Non-Metric Space Library ('NMSLIB' <https://github.com/nmslib/nmslib>) wrapper, which according to the authors "is an efficient cross-platform similarity search library and a toolkit for evaluation of similarity search methods. The goal of the 'NMSLIB' <https://github.com/nmslib/nmslib> Library is to create an effective and comprehensive toolkit for searching in generic non-metric spaces. Being comprehensive is important, because no single method is likely to be sufficient in all cases. Also note that exact solutions are hardly efficient in high dimensions and/or non-metric spaces. Hence, the main focus is on approximate methods". The wrapper also includes Approximate Kernel k-Nearest-Neighbor functions based on the 'NMSLIB' <https://github.com/nmslib/nmslib> 'Python' Library.
Maintained by Lampros Mouselimis. Last updated 2 years ago.
approximate-nearest-neighbor-searchnmslibnon-metricpythonreticulatecppopenmp
10.1 match 12 stars 5.14 score 23 scriptsbioc
scrapper:Bindings to C++ Libraries for Single-Cell Analysis
Implements R bindings to C++ code for analyzing single-cell (expression) data, mostly from various libscran libraries. Each function performs an individual step in the single-cell analysis workflow, ranging from quality control to clustering and marker detection. It is mostly intended for other Bioconductor package developers to build more user-friendly end-to-end workflows.
Maintained by Aaron Lun. Last updated 5 days ago.
normalizationrnaseqsoftwaregeneexpressiontranscriptomicssinglecellbatcheffectqualitycontroldifferentialexpressionfeatureextractionprincipalcomponentclusteringopenblascpp
9.2 match 5.55 score 32 scriptsbioc
matter:Out-of-core statistical computing and signal processing
Toolbox for larger-than-memory scientific computing and visualization, providing efficient out-of-core data structures using files or shared memory, for dense and sparse vectors, matrices, and arrays, with applications to nonuniformly sampled signals and images.
Maintained by Kylie A. Bemis. Last updated 3 months ago.
infrastructuredatarepresentationdataimportdimensionreductionpreprocessingcpp
5.3 match 57 stars 9.52 score 64 scripts 2 dependentsbioc
batchelor:Single-Cell Batch Correction Methods
Implements a variety of methods for batch correction of single-cell (RNA sequencing) data. This includes methods based on detecting mutually nearest neighbors, as well as several efficient variants of linear regression of the log-expression values. Functions are also provided to perform global rescaling to remove differences in depth between batches, and to perform a principal components analysis that is robust to differences in the numbers of cells across batches.
Maintained by Aaron Lun. Last updated 3 days ago.
sequencingrnaseqsoftwaregeneexpressiontranscriptomicssinglecellbatcheffectnormalizationcpp
5.5 match 9.10 score 1.2k scripts 10 dependentsbioc
singleCellTK:Comprehensive and Interactive Analysis of Single Cell RNA-Seq Data
The Single Cell Toolkit (SCTK) in the singleCellTK package provides an interface to popular tools for importing, quality control, analysis, and visualization of single cell RNA-seq data. SCTK allows users to seamlessly integrate tools from various packages at different stages of the analysis workflow. A general "a la carte" workflow gives users the ability access to multiple methods for data importing, calculation of general QC metrics, doublet detection, ambient RNA estimation and removal, filtering, normalization, batch correction or integration, dimensionality reduction, 2-D embedding, clustering, marker detection, differential expression, cell type labeling, pathway analysis, and data exporting. Curated workflows can be used to run Seurat and Celda. Streamlined quality control can be performed on the command line using the SCTK-QC pipeline. Users can analyze their data using commands in the R console or by using an interactive Shiny Graphical User Interface (GUI). Specific analyses or entire workflows can be summarized and shared with comprehensive HTML reports generated by Rmarkdown. Additional documentation and vignettes can be found at camplab.net/sctk.
Maintained by Joshua David Campbell. Last updated 24 days ago.
singlecellgeneexpressiondifferentialexpressionalignmentclusteringimmunooncologybatcheffectnormalizationqualitycontroldataimportgui
4.9 match 181 stars 10.16 score 252 scriptsbioc
RCy3:Functions to Access and Control Cytoscape
Vizualize, analyze and explore networks using Cytoscape via R. Anything you can do using the graphical user interface of Cytoscape, you can now do with a single RCy3 function.
Maintained by Alex Pico. Last updated 5 months ago.
visualizationgraphandnetworkthirdpartyclientnetwork
3.6 match 52 stars 13.39 score 628 scripts 15 dependentsbioc
tidytof:Analyze High-dimensional Cytometry Data Using Tidy Data Principles
This package implements an interactive, scientific analysis pipeline for high-dimensional cytometry data built using tidy data principles. It is specifically designed to play well with both the tidyverse and Bioconductor software ecosystems, with functionality for reading/writing data files, data cleaning, preprocessing, clustering, visualization, modeling, and other quality-of-life functions. tidytof implements a "grammar" of high-dimensional cytometry data analysis.
Maintained by Timothy Keyes. Last updated 5 months ago.
singlecellflowcytometrybioinformaticscytometrydata-sciencesingle-celltidyversecpp
6.7 match 18 stars 7.24 score 35 scriptskosukeimai
MatchIt:Nonparametric Preprocessing for Parametric Causal Inference
Selects matched samples of the original treated and control groups with similar covariate distributions -- can be used to match exactly on covariates, to match on propensity scores, or perform a variety of other matching procedures. The package also implements a series of recommendations offered in Ho, Imai, King, and Stuart (2007) <DOI:10.1093/pan/mpl013>. (The 'gurobi' package, which is not on CRAN, is optional and comes with an installation of the Gurobi Optimizer, available at <https://www.gurobi.com>.)
Maintained by Noah Greifer. Last updated 2 days ago.
3.2 match 220 stars 15.03 score 2.4k scripts 21 dependentsmlr-org
mlr3learners:Recommended Learners for 'mlr3'
Recommended Learners for 'mlr3'. Extends 'mlr3' with interfaces to essential machine learning packages on CRAN. This includes, but is not limited to: (penalized) linear and logistic regression, linear and quadratic discriminant analysis, k-nearest neighbors, naive Bayes, support vector machines, and gradient boosting.
Maintained by Marc Becker. Last updated 4 months ago.
classificationlearnersmachine-learningmlr3regression
4.1 match 91 stars 11.51 score 1.5k scripts 10 dependentsbioimaginggroup
bioimagetools:Tools for Microscopy Imaging
Tools for 3D imaging, mostly for biology/microscopy. Read and write TIFF stacks. Functions for segmentation, filtering and analyzing 3D point patterns.
Maintained by Volker Schmid. Last updated 3 years ago.
8.8 match 4 stars 5.30 score 33 scripts 1 dependentsjuanmartinsantos
rgnoisefilt:Elimination of Noisy Samples in Regression Datasets using Noise Filters
Traditional noise filtering methods aim at removing noisy samples from a classification dataset. This package adapts classic and recent filtering techniques for use in regression problems, and it also incorporates methods specifically designed for regression data. In order to do this, it uses approaches proposed in the specialized literature, such as Martin et al. (2021) [<doi:10.1109/ACCESS.2021.3123151>] and Arnaiz-Gonzalez et al. (2016) [<doi:10.1016/j.eswa.2015.12.046>]. Thus, the goal of the implemented noise filters is to eliminate samples with noise in regression datasets.
Maintained by Juan Martin. Last updated 1 years ago.
11.0 match 2 stars 4.00 score 3 scriptscran
rNeighborQTL:Interval Mapping for Quantitative Trait Loci Underlying Neighbor Effects
To enable quantitative trait loci mapping of neighbor effects, this package extends a single-marker regression to interval mapping. The theoretical background of the method is described in Sato et al. (2021) <doi:10.1093/g3journal/jkab017>.
Maintained by Yasuhiro Sato. Last updated 4 years ago.
22.0 match 2.00 score 3 scriptscran
rNeighborGWAS:Testing Neighbor Effects in Marker-Based Regressions
To incorporate neighbor genotypic identity into genome-wide association studies, the package provides a set of functions for variation partitioning and association mapping. The theoretical background of the method is described in Sato et al. (2021) <doi:10.1038/s41437-020-00401-w>.
Maintained by Yasuhiro Sato. Last updated 4 years ago.
17.7 match 2.48 score 15 scriptsprioritizr
prioritizr:Systematic Conservation Prioritization in R
Systematic conservation prioritization using mixed integer linear programming (MILP). It provides a flexible interface for building and solving conservation planning problems. Once built, conservation planning problems can be solved using a variety of commercial and open-source exact algorithm solvers. By using exact algorithm solvers, solutions can be generated that are guaranteed to be optimal (or within a pre-specified optimality gap). Furthermore, conservation problems can be constructed to optimize the spatial allocation of different management actions or zones, meaning that conservation practitioners can identify solutions that benefit multiple stakeholders. To solve large-scale or complex conservation planning problems, users should install the Gurobi optimization software (available from <https://www.gurobi.com/>) and the 'gurobi' R package (see Gurobi Installation Guide vignette for details). Users can also install the IBM CPLEX software (<https://www.ibm.com/products/ilog-cplex-optimization-studio/cplex-optimizer>) and the 'cplexAPI' R package (available at <https://github.com/cran/cplexAPI>). Additionally, the 'rcbc' R package (available at <https://github.com/dirkschumacher/rcbc>) can be used to generate solutions using the CBC optimization software (<https://github.com/coin-or/Cbc>). For further details, see Hanson et al. (2025) <doi:10.1111/cobi.14376>.
Maintained by Richard Schuster. Last updated 12 days ago.
biodiversityconservationconservation-planneroptimizationprioritizationsolverspatialcpp
3.5 match 124 stars 11.82 score 584 scripts 2 dependentsrcurtin
mlpack:'Rcpp' Integration for the 'mlpack' Library
A fast, flexible machine learning library, written in C++, that aims to provide fast, extensible implementations of cutting-edge machine learning algorithms. See also Curtin et al. (2023) <doi:10.21105/joss.05026>.
Maintained by Ryan Curtin. Last updated 3 months ago.
10.9 match 3.71 score 20 scripts 8 dependentspachadotdev
analogsea:Interface to 'DigitalOcean'
Provides a set of functions for interacting with the 'DigitalOcean' API <https://www.digitalocean.com/>, including creating images, destroying them, rebooting, getting details on regions, and available images.
Maintained by Mauricio Vargas. Last updated 2 years ago.
5.3 match 159 stars 7.56 score 100 scripts 1 dependentsklausvigo
phangorn:Phylogenetic Reconstruction and Analysis
Allows for estimation of phylogenetic trees and networks using Maximum Likelihood, Maximum Parsimony, distance methods and Hadamard conjugation (Schliep 2011). Offers methods for tree comparison, model selection and visualization of phylogenetic networks as described in Schliep et al. (2017).
Maintained by Klaus Schliep. Last updated 1 months ago.
softwaretechnologyqualitycontrolphylogenetic-analysisphylogeneticsopenblascpp
2.3 match 206 stars 16.69 score 2.5k scripts 135 dependentsnsaph-software
GPCERF:Gaussian Processes for Estimating Causal Exposure Response Curves
Provides a non-parametric Bayesian framework based on Gaussian process priors for estimating causal effects of a continuous exposure and detecting change points in the causal exposure response curves using observational data. Ren, B., Wu, X., Braun, D., Pillai, N., & Dominici, F.(2021). "Bayesian modeling for exposure response curve via gaussian processes: Causal effects of exposure to air pollution on health outcomes." arXiv preprint <doi:10.48550/arXiv.2105.03454>.
Maintained by Boyu Ren. Last updated 11 months ago.
5.9 match 9 stars 6.33 score 16 scriptsfridleylab
spatialTIME:Spatial Analysis of Vectra Immunoflourescent Data
Visualization and analysis of Vectra Immunoflourescent data. Options for calculating both the univariate and bivariate Ripley's K are included. Calculations are performed using a permutation-based approach presented by Wilson et al. <doi:10.1101/2021.04.27.21256104>.
Maintained by Fridley Lab. Last updated 7 months ago.
6.1 match 4 stars 6.08 score 30 scriptsghtaranto
scapesClassification:User-Defined Classification of Raster Surfaces
Series of algorithms to translate users' mental models of seascapes, landscapes and, more generally, of geographic features into computer representations (classifications). Spaces and geographic objects are classified with user-defined rules taking into account spatial data as well as spatial relationships among different classes and objects.
Maintained by Gerald H. Taranto. Last updated 3 years ago.
classification-algorithmobject-detectionrasterspatial
8.7 match 1 stars 4.22 score 33 scriptsgpiras
sphet:Estimation of Spatial Autoregressive Models with and without Heteroskedastic Innovations
Functions for fitting Cliff-Ord-type spatial autoregressive models with and without heteroskedastic innovations using Generalized Method of Moments estimation are provided. Some support is available for fitting spatial HAC models, and for fitting with non-spatial endogeneous variables using instrumental variables.
Maintained by Gianfranco Piras. Last updated 7 days ago.
4.9 match 8 stars 7.43 score 188 scripts 3 dependentsgrasia
knnp:Time Series Prediction using K-Nearest Neighbors Algorithm (Parallel)
Two main functionalities are provided. One of them is predicting values with k-nearest neighbors algorithm and the other is optimizing the parameters k and d of the algorithm. These are carried out in parallel using multiple threads.
Maintained by Daniel Bastarrica Lacalle. Last updated 5 years ago.
knearest-neighbor-algorithmparalleltime-series-forecasting
13.1 match 1 stars 2.70 score 8 scriptsjulianfaraway
faraway:Datasets and Functions for Books by Julian Faraway
Books are "Linear Models with R" published 1st Ed. August 2004, 2nd Ed. July 2014, 3rd Ed. February 2025 by CRC press, ISBN 9781439887332, and "Extending the Linear Model with R" published by CRC press in 1st Ed. December 2005 and 2nd Ed. March 2016, ISBN 9781584884248 and "Practical Regression and ANOVA in R" contributed documentation on CRAN (now very dated).
Maintained by Julian Faraway. Last updated 1 months ago.
3.8 match 29 stars 9.43 score 1.7k scripts 1 dependentsrspatial
terra:Spatial Data Analysis
Methods for spatial data analysis with vector (points, lines, polygons) and raster (grid) data. Methods for vector data include geometric operations such as intersect and buffer. Raster methods include local, focal, global, zonal and geometric operations. The predict and interpolate methods facilitate the use of regression type (interpolation, machine learning) models for spatial prediction, including with satellite remote sensing data. Processing of very large files is supported. See the manual and tutorials on <https://rspatial.org/> to get started. 'terra' replaces the 'raster' package ('terra' can do more, and it is faster and easier to use).
Maintained by Robert J. Hijmans. Last updated 4 hours ago.
geospatialrasterspatialvectoronetbbprojgdalgeoscpp
2.0 match 559 stars 17.64 score 17k scripts 851 dependentsbioc
HGC:A fast hierarchical graph-based clustering method
HGC (short for Hierarchical Graph-based Clustering) is an R package for conducting hierarchical clustering on large-scale single-cell RNA-seq (scRNA-seq) data. The key idea is to construct a dendrogram of cells on their shared nearest neighbor (SNN) graph. HGC provides functions for building graphs and for conducting hierarchical clustering on the graph. The users with old R version could visit https://github.com/XuegongLab/HGC/tree/HGC4oldRVersion to get HGC package built for R 3.6.
Maintained by XGlab. Last updated 5 months ago.
singlecellsoftwareclusteringrnaseqgraphandnetworkdnaseqcpp
7.5 match 4.70 score 25 scriptsjeffreyevans
spatialEco:Spatial Analysis and Modelling Utilities
Utilities to support spatial data manipulation, query, sampling and modelling in ecological applications. Functions include models for species population density, spatial smoothing, multivariate separability, point process model for creating pseudo- absences and sub-sampling, Quadrant-based sampling and analysis, auto-logistic modeling, sampling models, cluster optimization, statistical exploratory tools and raster-based metrics.
Maintained by Jeffrey S. Evans. Last updated 14 days ago.
biodiversityconservationecologyr-spatialrasterspatialvector
3.7 match 110 stars 9.55 score 736 scripts 2 dependentskharchenkolab
N2R:Fast and Scalable Approximate k-Nearest Neighbor Search Methods using 'N2' Library
Implements methods to perform fast approximate K-nearest neighbor search on input matrix. Algorithm based on the 'N2' implementation of an approximate nearest neighbor search using hierarchical Navigable Small World (NSW) graphs. The original algorithm is described in "Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs", Y. Malkov and D. Yashunin, <doi:10.1109/TPAMI.2018.2889473>, <arXiv:1603.09320>.
Maintained by Evan Biederstedt. Last updated 1 years ago.
6.9 match 10 stars 5.08 score 3 scripts 2 dependentsbioc
IRanges:Foundation of integer range manipulation in Bioconductor
Provides efficient low-level and highly reusable S4 classes for storing, manipulating and aggregating over annotated ranges of integers. Implements an algebra of range operations, including efficient algorithms for finding overlaps and nearest neighbors. Defines efficient list-like classes for storing, transforming and aggregating large grouped data, i.e., collections of atomic vectors and DataFrames.
Maintained by Hervรฉ Pagรจs. Last updated 1 months ago.
infrastructuredatarepresentationbioconductor-packagecore-package
2.3 match 22 stars 15.09 score 2.1k scripts 1.8k dependentsbiooss
sensitivity:Global Sensitivity Analysis of Model Outputs and Importance Measures
A collection of functions for sensitivity analysis of model outputs (factor screening, global sensitivity analysis and robustness analysis), for variable importance measures of data, as well as for interpretability of machine learning models. Most of the functions have to be applied on scalar output, but several functions support multi-dimensional outputs.
Maintained by Bertrand Iooss. Last updated 7 months ago.
5.0 match 17 stars 6.74 score 472 scripts 8 dependentstidymodels
recipes:Preprocessing and Feature Engineering Steps for Modeling
A recipe prepares your data for modeling. We provide an extensible framework for pipeable sequences of feature engineering steps provides preprocessing tools to be applied to data. Statistical parameters for the steps can be estimated from an initial data set and then applied to other data sets. The resulting processed output can then be used as inputs for statistical or machine learning models.
Maintained by Max Kuhn. Last updated 6 days ago.
1.8 match 584 stars 18.71 score 7.2k scripts 380 dependentsreddertar
smotefamily:A Collection of Oversampling Techniques for Class Imbalance Problem Based on SMOTE
A collection of various oversampling techniques developed from SMOTE is provided. SMOTE is a oversampling technique which synthesizes a new minority instance between a pair of one minority instance and one of its K nearest neighbor. Other techniques adopt this concept with other criteria in order to generate balanced dataset for class imbalance problem.
Maintained by Wacharasak Siriseriwan. Last updated 1 years ago.
5.7 match 2 stars 5.93 score 512 scripts 8 dependentsplangfelder
WGCNA:Weighted Correlation Network Analysis
Functions necessary to perform Weighted Correlation Network Analysis on high-dimensional data as originally described in Horvath and Zhang (2005) <doi:10.2202/1544-6115.1128> and Langfelder and Horvath (2008) <doi:10.1186/1471-2105-9-559>. Includes functions for rudimentary data cleaning, construction of correlation networks, module identification, summarization, and relating of variables and modules to sample traits. Also includes a number of utility functions for data manipulation and visualization.
Maintained by Peter Langfelder. Last updated 6 months ago.
3.5 match 54 stars 9.65 score 5.3k scripts 32 dependentsbioc
Biobase:Biobase: Base functions for Bioconductor
Functions that are needed by many other packages or which replace R functions.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
infrastructurebioconductor-packagecore-package
2.0 match 9 stars 16.45 score 6.6k scripts 1.8k dependentstidymodels
parsnip:A Common API to Modeling and Analysis Functions
A common interface is provided to allow users to specify a model without having to remember the different argument names across different functions or computational engines (e.g. 'R', 'Spark', 'Stan', 'H2O', etc).
Maintained by Max Kuhn. Last updated 5 days ago.
2.0 match 612 stars 16.37 score 3.4k scripts 69 dependentsemmanuelparadis
ape:Analyses of Phylogenetics and Evolution
Functions for reading, writing, plotting, and manipulating phylogenetic trees, analyses of comparative data in a phylogenetic framework, ancestral character analyses, analyses of diversification and macroevolution, computing distances from DNA sequences, reading and writing nucleotide sequences as well as importing from BioConductor, and several tools such as Mantel's test, generalized skyline plots, graphical exploration of phylogenetic data (alex, trex, kronoviz), estimation of absolute evolutionary rates and clock-like trees using mean path lengths and penalized likelihood, dating trees with non-contemporaneous sequences, translating DNA into AA sequences, and assessing sequence alignments. Phylogeny estimation can be done with the NJ, BIONJ, ME, MVR, SDM, and triangle methods, and several methods handling incomplete distance matrices (NJ*, BIONJ*, MVR*, and the corresponding triangle method). Some functions call external applications (PhyML, Clustal, T-Coffee, Muscle) whose results are returned into R.
Maintained by Emmanuel Paradis. Last updated 23 hours ago.
1.9 match 64 stars 17.22 score 13k scripts 599 dependentsfreezenik
BayesX:R Utilities Accompanying the Software Package BayesX
Functions for exploring and visualising estimation results obtained with BayesX, a free software for estimating structured additive regression models (<https://www.uni-goettingen.de/de/bayesx/550513.html>). In addition, functions that allow to read, write and manipulate map objects that are required in spatial analyses performed with BayesX.
Maintained by Nikolaus Umlauf. Last updated 1 years ago.
8.6 match 3.71 score 48 scripts 3 dependentsr-forge
RobAStBase:Robust Asymptotic Statistics
Base S4-classes and functions for robust asymptotic statistics.
Maintained by Matthias Kohl. Last updated 2 months ago.
6.3 match 4.96 score 64 scripts 4 dependentsbioc
GenomicRanges:Representation and manipulation of genomic intervals
The ability to efficiently represent and manipulate genomic annotations and alignments is playing a central role when it comes to analyzing high-throughput sequencing data (a.k.a. NGS data). The GenomicRanges package defines general purpose containers for storing and manipulating genomic intervals and variables defined along a genome. More specialized containers for representing and manipulating short alignments against a reference genome, or a matrix-like summarization of an experiment, are defined in the GenomicAlignments and SummarizedExperiment packages, respectively. Both packages build on top of the GenomicRanges infrastructure.
Maintained by Hervรฉ Pagรจs. Last updated 4 months ago.
geneticsinfrastructuredatarepresentationsequencingannotationgenomeannotationcoveragebioconductor-packagecore-package
1.8 match 44 stars 17.75 score 13k scripts 1.3k dependentsbioc
RAIDS:Accurate Inference of Genetic Ancestry from Cancer Sequences
This package implements specialized algorithms that enable genetic ancestry inference from various cancer sequences sources (RNA, Exome and Whole-Genome sequences). This package also implements a simulation algorithm that generates synthetic cancer-derived data. This code and analysis pipeline was designed and developed for the following publication: Belleau, P et al. Genetic Ancestry Inference from Cancer-Derived Molecular Data across Genomic and Transcriptomic Platforms. Cancer Res 1 January 2023; 83 (1): 49โ58.
Maintained by Pascal Belleau. Last updated 5 months ago.
geneticssoftwaresequencingwholegenomeprincipalcomponentgeneticvariabilitydimensionreductionbiocviewsancestrycancer-genomicsexome-sequencinggenomicsinferencer-languagerna-seqrna-sequencingwhole-genome-sequencing
5.0 match 5 stars 6.23 score 19 scriptsfranciscomartinezdelrio
tsfknn:Time Series Forecasting Using Nearest Neighbors
Allows forecasting time series using nearest neighbors regression Francisco Martinez, Maria P. Frias, Maria D. Perez-Godoy and Antonio J. Rivera (2019) <doi:10.1007/s10462-017-9593-z>. When the forecasting horizon is higher than 1, two multi-step ahead forecasting strategies can be used. The model built is autoregressive, that is, it is only based on the observations of the time series. The nearest neighbors used in a prediction can be consulted and plotted.
Maintained by Francisco Martinez. Last updated 1 years ago.
5.6 match 11 stars 5.54 score 63 scriptsfreezenik
R2BayesX:Estimate Structured Additive Regression Models with 'BayesX'
An R interface to estimate structured additive regression (STAR) models with 'BayesX'.
Maintained by Nikolaus Umlauf. Last updated 1 years ago.
8.6 match 1 stars 3.55 score 118 scripts 1 dependentsl-ramirez-lopez
resemble:Memory-Based Learning in Spectral Chemometrics
Functions for dissimilarity analysis and memory-based learning (MBL, a.k.a local modeling) in complex spectral data sets. Most of these functions are based on the methods presented in Ramirez-Lopez et al. (2013) <doi:10.1016/j.geoderma.2012.12.014>.
Maintained by Leonardo Ramirez-Lopez. Last updated 2 years ago.
chemoinformaticschemometricsinfrared-spectroscopylazy-learninglocal-regressionmachine-learningmemory-based-learningnirpedometricssoil-spectroscopyspectral-dataspectral-libraryspectroscopyopenblascppopenmp
5.2 match 20 stars 5.91 score 27 scriptsanimint
animint2:Animated Interactive Grammar of Graphics
Functions are provided for defining animated, interactive data visualizations in R code, and rendering on a web page. The 2018 Journal of Computational and Graphical Statistics paper, <doi:10.1080/10618600.2018.1513367> describes the concepts implemented.
Maintained by Toby Hocking. Last updated 28 days ago.
3.4 match 64 stars 8.87 score 173 scriptsr-spatial
spdep:Spatial Dependence: Weighting Schemes, Statistics
A collection of functions to create spatial weights matrix objects from polygon 'contiguities', from point patterns by distance and tessellations, for summarizing these objects, and for permitting their use in spatial data analysis, including regional aggregation by minimum spanning tree; a collection of tests for spatial 'autocorrelation', including global 'Morans I' and 'Gearys C' proposed by 'Cliff' and 'Ord' (1973, ISBN: 0850860369) and (1981, ISBN: 0850860814), 'Hubert/Mantel' general cross product statistic, Empirical Bayes estimates and 'Assunรงรฃo/Reis' (1999) <doi:10.1002/(SICI)1097-0258(19990830)18:16%3C2147::AID-SIM179%3E3.0.CO;2-I> Index, 'Getis/Ord' G ('Getis' and 'Ord' 1992) <doi:10.1111/j.1538-4632.1992.tb00261.x> and multicoloured join count statistics, 'APLE' ('Li 'et al.' ) <doi:10.1111/j.1538-4632.2007.00708.x>, local 'Moran's I', 'Gearys C' ('Anselin' 1995) <doi:10.1111/j.1538-4632.1995.tb00338.x> and 'Getis/Ord' G ('Ord' and 'Getis' 1995) <doi:10.1111/j.1538-4632.1995.tb00912.x>, 'saddlepoint' approximations ('Tiefelsdorf' 2002) <doi:10.1111/j.1538-4632.2002.tb01084.x> and exact tests for global and local 'Moran's I' ('Bivand et al.' 2009) <doi:10.1016/j.csda.2008.07.021> and 'LOSH' local indicators of spatial heteroscedasticity ('Ord' and 'Getis') <doi:10.1007/s00168-011-0492-y>. The implementation of most of these measures is described in 'Bivand' and 'Wong' (2018) <doi:10.1007/s11749-018-0599-x>, with further extensions in 'Bivand' (2022) <doi:10.1111/gean.12319>. 'Lagrange' multiplier tests for spatial dependence in linear models are provided ('Anselin et al'. 1996) <doi:10.1016/0166-0462(95)02111-6>, as are 'Rao' score tests for hypothesised spatial 'Durbin' models based on linear models ('Koley' and 'Bera' 2023) <doi:10.1080/17421772.2023.2256810>. A local indicators for categorical data (LICD) implementation based on 'Carrer et al.' (2021) <doi:10.1016/j.jas.2020.105306> and 'Bivand et al.' (2017) <doi:10.1016/j.spasta.2017.03.003> was added in 1.3-7. From 'spdep' and 'spatialreg' versions >= 1.2-1, the model fitting functions previously present in this package are defunct in 'spdep' and may be found in 'spatialreg'.
Maintained by Roger Bivand. Last updated 19 days ago.
spatial-autocorrelationspatial-dependencespatial-weights
1.8 match 131 stars 16.62 score 6.0k scripts 107 dependentsmadr0008
mldr.resampling:Resampling Algorithms for Multi-Label Datasets
Collection of the state of the art multi-label resampling algorithms. The objective of these algorithms is to achieve balance in multi-label datasets.
Maintained by Miguel รngel Dรกvila. Last updated 1 years ago.
11.0 match 1 stars 2.70 score 7 scriptsbioc
SummarizedExperiment:A container (S4 class) for matrix-like assays
The SummarizedExperiment container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples.
Maintained by Hervรฉ Pagรจs. Last updated 5 months ago.
geneticsinfrastructuresequencingannotationcoveragegenomeannotationbioconductor-packagecore-package
1.8 match 34 stars 16.85 score 8.6k scripts 1.2k dependentsmobiodiv
mobr:Measurement of Biodiversity
Functions for calculating metrics for the measurement biodiversity and its changes across scales, treatments, and gradients. The methods implemented in this package are described in: Chase, J.M., et al. (2018) <doi:10.1111/ele.13151>, McGlinn, D.J., et al. (2019) <doi:10.1111/2041-210X.13102>, McGlinn, D.J., et al. (2020) <doi:10.1101/851717>, and McGlinn, D.J., et al. (2023) <doi:10.1101/2023.09.19.558467>.
Maintained by Daniel McGlinn. Last updated 5 months ago.
biodiversityconservationecologyrarefactionspeciesstatistics
3.4 match 23 stars 8.59 score 93 scriptstkonopka
umap:Uniform Manifold Approximation and Projection
Uniform manifold approximation and projection is a technique for dimension reduction. The algorithm was described by McInnes and Healy (2018) in <arXiv:1802.03426>. This package provides an interface for two implementations. One is written from scratch, including components for nearest-neighbor search and for embedding. The second implementation is a wrapper for 'python' package 'umap-learn' (requires separate installation, see vignette for more details).
Maintained by Tomasz Konopka. Last updated 11 months ago.
dimensionality-reductionumapcpp
2.2 match 132 stars 12.74 score 3.6k scripts 43 dependentsthomasp85
densityClust:Clustering by Fast Search and Find of Density Peaks
An improved implementation (based on k-nearest neighbors) of the density peak clustering algorithm, originally described by Alex Rodriguez and Alessandro Laio (Science, 2014 vol. 344). It can handle large datasets (> 100,000 samples) very efficiently. It was initially implemented by Thomas Lin Pedersen, with inputs from Sean Hughes and later improved by Xiaojie Qiu to handle large datasets with kNNs.
Maintained by Thomas Lin Pedersen. Last updated 1 years ago.
3.9 match 153 stars 7.14 score 75 scriptsbioc
GenomicFeatures:Query the gene models of a given organism/assembly
Extract the genomic locations of genes, transcripts, exons, introns, and CDS, for the gene models stored in a TxDb object. A TxDb object is a small database that contains the gene models of a given organism/assembly. Bioconductor provides a small collection of TxDb objects in the form of ready-to-install TxDb packages for the most commonly studied organisms. Additionally, the user can easily make a TxDb object (or package) for the organism/assembly of their choice by using the tools from the txdbmaker package.
Maintained by H. Pagรจs. Last updated 4 months ago.
geneticsinfrastructureannotationsequencinggenomeannotationbioconductor-packagecore-package
1.8 match 26 stars 15.34 score 5.3k scripts 339 dependentsbesanson
FastKNN:Fast k Nearest Neighbor
This are different Functions related to the k Nearest Neighbo classifier. The distance matrix is an input making the computation faster and allowing other distances than euclidean.
Maintained by Gaston Besanson. Last updated 10 years ago.
6.7 match 3.97 score 62 scripts 1 dependentsjdonaldson
tsne:T-Distributed Stochastic Neighbor Embedding for R (t-SNE)
A "pure R" implementation of the t-SNE algorithm.
Maintained by Justin Donaldson. Last updated 6 years ago.
2.8 match 58 stars 9.35 score 656 scripts 13 dependentsbbbruce
nncc:Nearest Neighbors Matching of Case-Control Data
Provides nearest-neighbors matching and analysis of case-control data. Cui, Z., Marder, E. P., Click, E. S., Hoekstra, R. M., & Bruce, B. B. (2022) <doi:10.1097/EDE.0000000000001504>.
Maintained by Beau Bruce. Last updated 1 years ago.
9.7 match 2.70 score 3 scriptsjuanv66x
viralx:Explainers for Regression Models in HIV Research
A dedicated viral-explainer model tool designed to empower researchers in the field of HIV research, particularly in viral load and CD4 (Cluster of Differentiation 4) lymphocytes regression modeling. Drawing inspiration from the 'tidymodels' framework for rigorous model building of Max Kuhn and Hadley Wickham (2020) <https://www.tidymodels.org>, and the 'DALEXtra' tool for explainability by Przemyslaw Biecek (2020) <doi:10.48550/arXiv.2009.13248>. It aims to facilitate interpretable and reproducible research in biostatistics and computational biology for the benefit of understanding HIV dynamics.
Maintained by Juan Pablo Acuรฑa Gonzรกlez. Last updated 4 months ago.
8.6 match 3.00 score 1 scriptsjoeguinness
GpGp:Fast Gaussian Process Computation Using Vecchia's Approximation
Functions for fitting and doing predictions with Gaussian process models using Vecchia's (1988) approximation. Package also includes functions for reordering input locations, finding ordered nearest neighbors (with help from 'FNN' package), grouping operations, and conditional simulations. Covariance functions for spatial and spatial-temporal data on Euclidean domains and spheres are provided. The original approximation is due to Vecchia (1988) <http://www.jstor.org/stable/2345768>, and the reordering and grouping methods are from Guinness (2018) <doi:10.1080/00401706.2018.1437476>. Model fitting employs a Fisher scoring algorithm described in Guinness (2019) <doi:10.48550/arXiv.1905.08374>.
Maintained by Joseph Guinness. Last updated 5 months ago.
4.1 match 10 stars 6.16 score 160 scripts 6 dependentsbioc
wpm:Well Plate Maker
The Well-Plate Maker (WPM) is a shiny application deployed as an R package. Functions for a command-line/script use are also available. The WPM allows users to generate well plate maps to carry out their experiments while improving the handling of batch effects. In particular, it helps controlling the "plate effect" thanks to its ability to randomize samples over multiple well plates. The algorithm for placing the samples is inspired by the backtracking algorithm: the samples are placed at random while respecting specific spatial constraints.
Maintained by Helene Borges. Last updated 5 months ago.
guiproteomicsmassspectrometrybatcheffectexperimentaldesign
5.3 match 6 stars 4.78 score 7 scriptsdfsp-spirit
fsbrain:Managing and Visualizing Brain Surface Data
Provides high-level access to neuroimaging data from standard software packages like 'FreeSurfer' <http://freesurfer.net/> on the level of subjects and groups. Load morphometry data, surfaces and brain parcellations based on atlases. Mask data using labels, load data for specific atlas regions only, and visualize data and statistical results directly in 'R'.
Maintained by Tim Schรคfer. Last updated 4 months ago.
3dbraindtifreesurfermeshmrineuroimagingresearchsurfacevisualizationvoxel
3.9 match 66 stars 6.47 score 15 scriptsegenn
rtemis:Machine Learning and Visualization
Advanced Machine Learning and Visualization. Unsupervised Learning (Clustering, Decomposition), Supervised Learning (Classification, Regression), Cross-Decomposition, Bagging, Boosting, Meta-models. Static and interactive graphics.
Maintained by E.D. Gennatas. Last updated 1 months ago.
data-sciencedata-visualizationmachine-learningmachine-learning-libraryvisualization
3.5 match 145 stars 7.09 score 50 scripts 2 dependentsspatlyu
tidyrgeoda:A tidy interface for rgeoda
An interface for 'rgeoda' to integrate with 'sf' objects and the 'tidyverse'.
Maintained by Wenbo Lv. Last updated 7 months ago.
geocomputationgeoinformaticsgisciencespatial-analysisspatial-statistics
4.9 match 16 stars 5.11 score 5 scriptspromerpr
scanstatistics:Space-Time Anomaly Detection using Scan Statistics
Detection of anomalous space-time clusters using the scan statistics methodology. Focuses on prospective surveillance of data streams, scanning for clusters with ongoing anomalies. Hypothesis testing is made possible by Monte Carlo simulation. Allรฉvius (2018) <doi:10.21105/joss.00515>.
Maintained by Paul Romer Present. Last updated 2 years ago.
5.1 match 1 stars 4.81 score 43 scriptsbioc
scran:Methods for Single-Cell RNA-Seq Data Analysis
Implements miscellaneous functions for interpretation of single-cell RNA-seq data. Methods are provided for assignment of cell cycle phase, detection of highly variable and significantly correlated genes, identification of marker genes, and other common tasks in routine single-cell analysis workflows.
Maintained by Aaron Lun. Last updated 5 months ago.
immunooncologynormalizationsequencingrnaseqsoftwaregeneexpressiontranscriptomicssinglecellclusteringbioconductor-packagehuman-cell-atlassingle-cell-rna-seqopenblascpp
1.9 match 41 stars 13.14 score 7.6k scripts 36 dependentschrhennig
prabclus:Functions for Clustering and Testing of Presence-Absence, Abundance and Multilocus Genetic Data
Distance-based parametric bootstrap tests for clustering with spatial neighborhood information. Some distance measures, Clustering of presence-absence, abundance and multilocus genetic data for species delimitation, nearest neighbor based noise detection. Genetic distances between communities. Tests whether various distance-based regressions are equal. Try package?prabclus for on overview.
Maintained by Christian Hennig. Last updated 6 months ago.
4.1 match 1 stars 5.99 score 90 scripts 71 dependentsbioc
MetaNeighbor:Single cell replicability analysis
MetaNeighbor allows users to quantify cell type replicability across datasets using neighbor voting.
Maintained by Stephan Fischer. Last updated 5 months ago.
immunooncologygeneexpressiongomultiplecomparisonsinglecelltranscriptomics
4.1 match 5.89 score 78 scriptsdbolotov
neighbr:Classification, Regression, Clustering with K Nearest Neighbors
Classification, regression, and clustering with k nearest neighbors algorithm. Implements several distance and similarity measures, covering continuous and logical features. Outputs ranked neighbors. Most features of this package are directly based on the PMML specification for KNN.
Maintained by Dmitriy Bolotov. Last updated 5 years ago.
9.8 match 2.48 score 30 scriptsrozetasimonovska
SDPDmod:Spatial Dynamic Panel Data Modeling
Spatial model calculation for static and dynamic panel data models, weights matrix creation and Bayesian model comparison. Bayesian model comparison methods were described by 'LeSage' (2014) <doi:10.1016/j.spasta.2014.02.002>. The 'Lee'-'Yu' transformation approach is described in 'Yu', 'De Jong' and 'Lee' (2008) <doi:10.1016/j.jeconom.2008.08.002>, 'Lee' and 'Yu' (2010) <doi:10.1016/j.jeconom.2009.08.001> and 'Lee' and 'Yu' (2010) <doi:10.1017/S0266466609100099>.
Maintained by Rozeta Simonovska. Last updated 11 months ago.
4.8 match 5 stars 4.98 score 19 scriptstanaylab
misha:Toolkit for Analysis of Genomic Data
A toolkit for analysis of genomic data. The 'misha' package implements an efficient data structure for storing genomic data, and provides a set of functions for data extraction, manipulation and analysis. Some of the 2D genome algorithms were described in Yaffe and Tanay (2011) <doi:10.1038/ng.947>.
Maintained by Aviezer Lifshitz. Last updated 6 days ago.
4.0 match 4 stars 5.86 scorebioc
xcms:LC-MS and GC-MS Data Analysis
Framework for processing and visualization of chromatographically separated and single-spectra mass spectral data. Imports from AIA/ANDI NetCDF, mzXML, mzData and mzML files. Preprocesses data for high-throughput, untargeted analyte profiling.
Maintained by Steffen Neumann. Last updated 3 days ago.
immunooncologymassspectrometrymetabolomicsbioconductorfeature-detectionmass-spectrometrypeak-detectioncpp
1.7 match 196 stars 14.31 score 984 scripts 11 dependentsmnwright
bnnSurvival:Bagged k-Nearest Neighbors Survival Prediction
Implements a bootstrap aggregated (bagged) version of the k-nearest neighbors survival probability prediction method (Lowsky et al. 2013). In addition to the bootstrapping of training samples, the features can be subsampled in each baselearner to break the correlation between them. The Rcpp package is used to speed up the computation.
Maintained by Marvin N. Wright. Last updated 8 years ago.
8.7 match 1 stars 2.70 score 5 scriptssfcheung
modelbpp:Model BIC Posterior Probability
Fits the neighboring models of a fitted structural equation model and assesses the model uncertainty of the fitted model based on BIC posterior probabilities, using the method presented in Wu, Cheung, and Leung (2020) <doi:10.1080/00273171.2019.1574546>.
Maintained by Shu Fai Cheung. Last updated 6 months ago.
lavaanmodel-comparisonmodel-comparison-and-selectionmodel-selectionstructural-equation-modeling
5.2 match 4.54 score 2 scriptspaobranco
UBL:An Implementation of Re-Sampling Approaches to Utility-Based Learning for Both Classification and Regression Tasks
Provides a set of functions that can be used to obtain better predictive performance on cost-sensitive and cost/benefits tasks (for both regression and classification). This includes re-sampling approaches that modify the original data set biasing it towards the user preferences.
Maintained by Paula Branco. Last updated 3 months ago.
3.5 match 33 stars 6.39 score 165 scripts 1 dependentsfguenther
LSAfun:Applied Latent Semantic Analysis (LSA) Functions
Provides functions that allow for convenient working with vector space models of semantics/distributional semantic models/word embeddings. Originally built for LSA models (hence the name), but can be used for all such vector-based models. For actually building a vector semantic space, use the package 'lsa' or other specialized software. Downloadable semantic spaces can be found at <https://sites.google.com/site/fritzgntr/software-resources>.
Maintained by Fritz Guenther. Last updated 1 years ago.
6.9 match 1 stars 3.18 score 85 scripts 1 dependentskharchenkolab
sccore:Core Utilities for Single-Cell RNA-Seq
Core utilities for single-cell RNA-seq data analysis. Contained within are utility functions for working with differential expression (DE) matrices and count matrices, a collection of functions for manipulating and plotting data via 'ggplot2', and functions to work with cell graphs and cell embeddings. Graph-based methods include embedding kNN cell graphs into a UMAP <doi:10.21105/joss.00861>, collapsing vertices of each cluster in the graph, and propagating graph labels.
Maintained by Evan Biederstedt. Last updated 1 years ago.
3.4 match 12 stars 6.44 score 36 scripts 9 dependentsstuart-lab
Signac:Analysis of Single-Cell Chromatin Data
A framework for the analysis and exploration of single-cell chromatin data. The 'Signac' package contains functions for quantifying single-cell chromatin data, computing per-cell quality control metrics, dimension reduction and normalization, visualization, and DNA sequence motif analysis. Reference: Stuart et al. (2021) <doi:10.1038/s41592-021-01282-5>.
Maintained by Tim Stuart. Last updated 7 months ago.
atacbioinformaticssingle-cellzlibcpp
1.8 match 349 stars 12.19 score 3.7k scripts 1 dependentsjlmelville
uwot:The Uniform Manifold Approximation and Projection (UMAP) Method for Dimensionality Reduction
An implementation of the Uniform Manifold Approximation and Projection dimensionality reduction by McInnes et al. (2018) <doi:10.48550/arXiv.1802.03426>. It also provides means to transform new data and to carry out supervised dimensionality reduction. An implementation of the related LargeVis method of Tang et al. (2016) <doi:10.48550/arXiv.1602.00370> is also provided. This is a complete re-implementation in R (and C++, via the 'Rcpp' package): no Python installation is required. See the uwot website (<https://github.com/jlmelville/uwot>) for more documentation and examples.
Maintained by James Melville. Last updated 20 days ago.
dimensionality-reductionumapcpp
1.3 match 328 stars 15.74 score 2.0k scripts 140 dependentsfelixthestudent
cellpypes:Cell Type Pipes for Single-Cell RNA Sequencing Data
Annotate single-cell RNA sequencing data manually based on marker gene thresholds. Find cell type rules (gene+threshold) through exploration, use the popular piping operator '%>%' to reconstruct complex cell type hierarchies. 'cellpypes' models technical noise to find positive and negative cells for a given expression threshold and returns cell type labels or pseudobulks. Cite this package as Frauhammer (2022) <doi:10.5281/zenodo.6555728> and visit <https://github.com/FelixTheStudent/cellpypes> for tutorials and newest features.
Maintained by Felix Frauhammer. Last updated 1 years ago.
celltype-annotationclassification-algorithmscrnaseqsingle-cell-rna-seq
4.7 match 51 stars 4.41 score 8 scriptsbioc
Cardinal:A mass spectrometry imaging toolbox for statistical analysis
Implements statistical & computational tools for analyzing mass spectrometry imaging datasets, including methods for efficient pre-processing, spatial segmentation, and classification.
Maintained by Kylie Ariel Bemis. Last updated 3 months ago.
softwareinfrastructureproteomicslipidomicsmassspectrometryimagingmassspectrometryimmunooncologynormalizationclusteringclassificationregression
2.0 match 47 stars 10.34 score 200 scriptsbioc
Banksy:Spatial transcriptomic clustering
Banksy is an R package that incorporates spatial information to cluster cells in a feature space (e.g. gene expression). To incorporate spatial information, BANKSY computes the mean neighborhood expression and azimuthal Gabor filters that capture gene expression gradients. These features are combined with the cell's own expression to embed cells in a neighbor-augmented product space which can then be clustered, allowing for accurate and spatially-aware cell typing and tissue domain segmentation.
Maintained by Joseph Lee. Last updated 13 days ago.
clusteringspatialsinglecellgeneexpressiondimensionreductionclustering-algorithmsingle-cell-omicsspatial-omics
2.3 match 90 stars 9.03 score 248 scriptsdfriend21
quadtree:Region Quadtrees for Spatial Data
Provides functionality for working with raster-like quadtrees (also called โregion quadtreesโ), which allow for variable-sized cells. The package allows for flexibility in the quadtree creation process. Several functions defining how to split and aggregate cells are provided, and custom functions can be written for both of these processes. In addition, quadtrees can be created using other quadtrees as โtemplatesโ, so that the new quadtree's structure is identical to the template quadtree. The package also includes functionality for modifying quadtrees, querying values, saving quadtrees to a file, and calculating least-cost paths using the quadtree as a resistance surface.
Maintained by Derek Friend. Last updated 2 years ago.
3.2 match 19 stars 6.34 score 58 scriptsthothorn
ipred:Improved Predictors
Improved predictive models by indirect classification and bagging for classification, regression and survival problems as well as resampling based estimators of prediction error.
Maintained by Torsten Hothorn. Last updated 8 months ago.
1.9 match 10.76 score 3.3k scripts 411 dependentscogbrainhealthlab
VertexWiseR:Simplified Vertex-Wise Analyses of Whole-Brain and Hippocampal Surface
Provides functions to run statistical analyses on surface-based neuroimaging data, computing measures including cortical thickness and surface area of the whole-brain and of the hippocampi. It can make use of 'FreeSurfer', 'fMRIprep' and 'HCP' preprocessed datasets and 'HippUnfold' hippocampal segmentation outputs for a given sample by restructuring the data values into a single file. The single file can then be used by the package for analyses independently from its base dataset and without need for its access.
Maintained by Charly Billaud. Last updated 7 days ago.
3.4 match 1 stars 5.84 score 12 scriptscran
e1071:Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien
Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, generalized k-nearest neighbour ...
Maintained by David Meyer. Last updated 6 months ago.
1.8 match 29 stars 11.26 score 2.0k dependentscmlmagneville
mFD:Compute and Illustrate the Multiple Facets of Functional Diversity
Computing functional traits-based distances between pairs of species for species gathered in assemblages allowing to build several functional spaces. The package allows to compute functional diversity indices assessing the distribution of species (and of their dominance) in a given functional space for each assemblage and the overlap between assemblages in a given functional space, see: Chao et al. (2018) <doi:10.1002/ecm.1343>, Maire et al. (2015) <doi:10.1111/geb.12299>, Mouillot et al. (2013) <doi:10.1016/j.tree.2012.10.004>, Mouillot et al. (2014) <doi:10.1073/pnas.1317625111>, Ricotta and Szeidl (2009) <doi:10.1016/j.tpb.2009.10.001>. Graphical outputs are included. Visit the 'mFD' website for more information, documentation and examples.
Maintained by Camille Magneville. Last updated 3 months ago.
2.7 match 26 stars 7.35 score 61 scriptsbioc
DepecheR:Determination of essential phenotypic elements of clusters in high-dimensional entities
The purpose of this package is to identify traits in a dataset that can separate groups. This is done on two levels. First, clustering is performed, using an implementation of sparse K-means. Secondly, the generated clusters are used to predict outcomes of groups of individuals based on their distribution of observations in the different clusters. As certain clusters with separating information will be identified, and these clusters are defined by a sparse number of variables, this method can reduce the complexity of data, to only emphasize the data that actually matters.
Maintained by Jakob Theorell. Last updated 5 months ago.
softwarecellbasedassaystranscriptiondifferentialexpressiondatarepresentationimmunooncologytranscriptomicsclassificationclusteringdimensionreductionfeatureextractionflowcytometryrnaseqsinglecellvisualizationcpp
3.8 match 5.18 score 15 scriptsmikeasilva
simplegraphdb:A Simple Graph Database
This is a graph database in 'SQLite'. It is inspired by Denis Papathanasiou's Python simple-graph project on 'GitHub'.
Maintained by Michael Silva. Last updated 4 years ago.
5.2 match 7 stars 3.75 score 16 scriptsrnabioco
valr:Genome Interval Arithmetic
Read and manipulate genome intervals and signals. Provides functionality similar to command-line tool suites within R, enabling interactive analysis and visualization of genome-scale data. Riemondy et al. (2017) <doi:10.12688/f1000research.11997.1>.
Maintained by Kent Riemondy. Last updated 8 days ago.
bedtoolsgenomeinterval-arithmeticcpp
2.0 match 90 stars 9.69 score 227 scriptsyuelyu21
SCIntRuler:Guiding the Integration of Multiple Single-Cell RNA-Seq Datasets
The accumulation of single-cell RNA-seq (scRNA-seq) studies highlights the potential benefits of integrating multiple datasets. By augmenting sample sizes and enhancing analytical robustness, integration can lead to more insightful biological conclusions. However, challenges arise due to the inherent diversity and batch discrepancies within and across studies. SCIntRuler, a novel R package, addresses these challenges by guiding the integration of multiple scRNA-seq datasets.
Maintained by Yue Lyu. Last updated 5 months ago.
sequencinggeneticvariabilitysinglecellcpp
4.0 match 2 stars 4.85 score 3 scriptstnagler
kdecopula:Kernel Smoothing for Bivariate Copula Densities
Provides fast implementations of kernel smoothing techniques for bivariate copula densities, in particular density estimation and resampling.
Maintained by Thomas Nagler. Last updated 7 years ago.
3.4 match 8 stars 5.63 score 31 scripts 1 dependentsbioc
annotate:Annotation for microarrays
Using R enviroments for annotation.
Maintained by Bioconductor Package Maintainer. Last updated 5 months ago.
1.6 match 11.41 score 812 scripts 243 dependentsemilhvitfeldt
fastTextR:An Interface to the 'fastText' Library
An interface to the 'fastText' library <https://github.com/facebookresearch/fastText>. The package can be used for text classification and to learn word vectors. An example how to use 'fastTextR' can be found in the 'README' file.
Maintained by Emil Hvitfeldt. Last updated 1 years ago.
3.3 match 4 stars 5.50 score 44 scripts 2 dependentsjcatwood
nntmvn:Draw Samples of Truncated Multivariate Normal Distributions
Draw samples from truncated multivariate normal distribution using the sequential nearest neighbor (SNN) method introduced in "Scalable Sampling of Truncated Multivariate Normals Using Sequential Nearest-Neighbor Approximation" <doi:10.48550/arXiv.2406.17307>.
Maintained by Jian Cao. Last updated 1 months ago.
6.4 match 2.85 score 3 scriptsrezakj
iCellR:Analyzing High-Throughput Single Cell Sequencing Data
A toolkit that allows scientists to work with data from single cell sequencing technologies such as scRNA-seq, scVDJ-seq, scATAC-seq, CITE-Seq and Spatial Transcriptomics (ST). Single (i) Cell R package ('iCellR') provides unprecedented flexibility at every step of the analysis pipeline, including normalization, clustering, dimensionality reduction, imputation, visualization, and so on. Users can design both unsupervised and supervised models to best suit their research. In addition, the toolkit provides 2D and 3D interactive visualizations, differential expression analysis, filters based on cells, genes and clusters, data merging, normalizing for dropouts, data imputation methods, correcting for batch differences, pathway analysis, tools to find marker genes for clusters and conditions, predict cell types and pseudotime analysis. See Khodadadi-Jamayran, et al (2020) <doi:10.1101/2020.05.05.078550> and Khodadadi-Jamayran, et al (2020) <doi:10.1101/2020.03.31.019109> for more details.
Maintained by Alireza Khodadadi-Jamayran. Last updated 8 months ago.
10xgenomics3dbatch-normalizationcell-type-classificationcite-seqclusteringclustering-algorithmdiffusion-mapsdropouticellrimputationintractive-graphnormalizationpseudotimescrna-seqscvdj-seqsingel-cell-sequencingumapcpp
3.3 match 121 stars 5.56 score 7 scripts 1 dependentsluukvdmeer
sfnetworks:Tidy Geospatial Networks
Provides a tidy approach to spatial network analysis, in the form of classes and functions that enable a seamless interaction between the network analysis package 'tidygraph' and the spatial analysis package 'sf'.
Maintained by Lucas van der Meer. Last updated 3 months ago.
geospatial-networksnetwork-analysisrspatialsimple-featuresspatial-analysisspatial-data-sciencespatial-networkstidygraphtidyverse
1.9 match 372 stars 9.63 score 332 scripts 6 dependentsbioc
bluster:Clustering Algorithms for Bioconductor
Wraps common clustering algorithms in an easily extended S4 framework. Backends are implemented for hierarchical, k-means and graph-based clustering. Several utilities are also provided to compare and evaluate clustering results.
Maintained by Aaron Lun. Last updated 5 months ago.
immunooncologysoftwaregeneexpressiontranscriptomicssinglecellclusteringcpp
1.9 match 9.43 score 636 scripts 51 dependentsbioc
celda:CEllular Latent Dirichlet Allocation
Celda is a suite of Bayesian hierarchical models for clustering single-cell RNA-sequencing (scRNA-seq) data. It is able to perform "bi-clustering" and simultaneously cluster genes into gene modules and cells into cell subpopulations. It also contains DecontX, a novel Bayesian method to computationally estimate and remove RNA contamination in individual cells without empty droplet information. A variety of scRNA-seq data visualization functions is also included.
Maintained by Joshua Campbell. Last updated 28 days ago.
singlecellgeneexpressionclusteringsequencingbayesianimmunooncologydataimportcppopenmp
1.6 match 147 stars 10.47 score 256 scripts 2 dependentsmlampros
OpenImageR:An Image Processing Toolkit
Incorporates functions for image preprocessing, filtering and image recognition. The package takes advantage of 'RcppArmadillo' to speed up computationally intensive functions. The histogram of oriented gradients descriptor is a modification of the 'findHOGFeatures' function of the 'SimpleCV' computer vision platform, the average_hash(), dhash() and phash() functions are based on the 'ImageHash' python library. The Gabor Feature Extraction functions are based on 'Matlab' code of the paper, "CloudID: Trustworthy cloud-based and cross-enterprise biometric identification" by M. Haghighat, S. Zonouz, M. Abdel-Mottaleb, Expert Systems with Applications, vol. 42, no. 21, pp. 7905-7916, 2015, <doi:10.1016/j.eswa.2015.06.025>. The 'SLIC' and 'SLICO' superpixel algorithms were explained in detail in (i) "SLIC Superpixels Compared to State-of-the-art Superpixel Methods", Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Suesstrunk, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, num. 11, p. 2274-2282, May 2012, <doi:10.1109/TPAMI.2012.120> and (ii) "SLIC Superpixels", Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Suesstrunk, EPFL Technical Report no. 149300, June 2010.
Maintained by Lampros Mouselimis. Last updated 2 years ago.
filteringgabor-feature-extractiongabor-filtershog-featuresimageimage-hashingprocessingrcpparmadillorecognitionslicslicosuperpixelsopenblascppopenmp
1.7 match 60 stars 9.86 score 358 scripts 8 dependentsnetworkgroupr
fastnet:Large-Scale Social Network Analysis
We present an implementation of the algorithms required to simulate large-scale social networks and retrieve their most relevant metrics.
Maintained by Nazrul Shaikh. Last updated 8 years ago.
5.0 match 5 stars 3.37 score 47 scriptsmichaelhallquist
ggbrain:Create Images of Volumetric Brain Data in NIfTI Format Using 'ggplot2' Syntax
A 'ggplot2'-consistent approach to generating 2D displays of volumetric brain imaging data. Display data from multiple NIfTI images using standard 'ggplot2' conventions such scales, limits, and themes to control the appearance of displays. The resulting plots are returned as 'patchwork' objects, inheriting from 'ggplot', allowing for any standard modifications of display aesthetics supported by 'ggplot2'.
Maintained by Michael Hallquist. Last updated 25 days ago.
3.3 match 2 stars 5.03 score 18 scriptsr-spatial
rgee:R Bindings for Calling the 'Earth Engine' API
Earth Engine <https://earthengine.google.com/> client library for R. All of the 'Earth Engine' API classes, modules, and functions are made available. Additional functions implemented include importing (exporting) of Earth Engine spatial objects, extraction of time series, interactive map display, assets management interface, and metadata display. See <https://r-spatial.github.io/rgee/> for further details.
Maintained by Cesar Aybar. Last updated 4 days ago.
earth-engineearthenginegoogle-earth-enginegoogleearthenginespatial-analysisspatial-data
1.2 match 715 stars 13.77 score 1.9k scripts 3 dependentsjfrench
smacpod:Statistical Methods for the Analysis of Case-Control Point Data
Statistical methods for analyzing case-control point data. Methods include the ratio of kernel densities, the difference in K Functions, the spatial scan statistic, and q nearest neighbors of cases.
Maintained by Joshua French. Last updated 5 months ago.
4.4 match 3.69 score 49 scriptscomputationalstylistics
stylo:Stylometric Multivariate Analyses
Supervised and unsupervised multivariate methods, supplemented by GUI and some visualizations, to perform various analyses in the field of computational stylistics, authorship attribution, etc. For further reference, see Eder et al. (2016), <https://journal.r-project.org/archive/2016/RJ-2016-007/index.html>. You are also encouraged to visit the Computational Stylistics Group's website <https://computationalstylistics.github.io/>, where a reasonable amount of information about the package and related projects are provided.
Maintained by Maciej Eder. Last updated 2 months ago.
1.9 match 187 stars 8.58 score 462 scriptsmlr-org
mlr3extralearners:Extra Learners For mlr3
Extra learners for use in mlr3.
Maintained by Sebastian Fischer. Last updated 4 months ago.
1.8 match 94 stars 9.16 score 474 scriptssunweisurrey
snn:Stabilized Nearest Neighbor Classifier
Implement K-nearest neighbor classifier, weighted nearest neighbor classifier, bagged nearest neighbor classifier, optimal weighted nearest neighbor classifier and stabilized nearest neighbor classifier, and perform model selection via 5 fold cross-validation for them. This package also provides functions for computing the classification error and classification instability of a classification procedure.
Maintained by Wei Sun. Last updated 10 years ago.
15.3 match 1.04 score 11 scriptsbioc
sesame:SEnsible Step-wise Analysis of DNA MEthylation BeadChips
Tools For analyzing Illumina Infinium DNA methylation arrays. SeSAMe provides utilities to support analyses of multiple generations of Infinium DNA methylation BeadChips, including preprocessing, quality control, visualization and inference. SeSAMe features accurate detection calling, intelligent inference of ethnicity, sex and advanced quality control routines.
Maintained by Wanding Zhou. Last updated 2 months ago.
dnamethylationmethylationarraypreprocessingqualitycontrolbioinformaticsdna-methylationmicroarray
1.8 match 69 stars 9.08 score 258 scripts 1 dependentscran
SKNN:A Super K-Nearest Neighbor (SKNN) Classification Algorithm
It's a Super K-Nearest Neighbor classification method with using kernel density to describe weight of the distance between a training observation and the testing sample.
Maintained by Yarong Yang. Last updated 5 months ago.
8.8 match 1.78 scorebioc
qmtools:Quantitative Metabolomics Data Processing Tools
The qmtools (quantitative metabolomics tools) package provides basic tools for processing quantitative metabolomics data with the standard SummarizedExperiment class. This includes functions for imputation, normalization, feature filtering, feature clustering, dimension-reduction, and visualization to help users prepare data for statistical analysis. This package also offers a convenient way to compute empirical Bayes statistics for which metabolic features are different between two sets of study samples. Several functions in this package could also be used in other types of omics data.
Maintained by Jaehyun Joo. Last updated 5 months ago.
metabolomicspreprocessingnormalizationdimensionreductionmassspectrometry
3.6 match 1 stars 4.30 score 5 scriptsbioc
rifi:'rifi' analyses data from rifampicin time series created by microarray or RNAseq
'rifi' analyses data from rifampicin time series created by microarray or RNAseq. 'rifi' is a transcriptome data analysis tool for the holistic identification of transcription and decay associated processes. The decay constants and the delay of the onset of decay is fitted for each probe/bin. Subsequently, probes/bins of equal properties are combined into segments by dynamic programming, independent of a existing genome annotation. This allows to detect transcript segments of different stability or transcriptional events within one annotated gene. In addition to the classic decay constant/half-life analysis, 'rifi' detects processing sites, transcription pausing sites, internal transcription start sites in operons, sites of partial transcription termination in operons, identifies areas of likely transcriptional interference by the collision mechanism and gives an estimate of the transcription velocity. All data are integrated to give an estimate of continous transcriptional units, i.e. operons. Comprehensive output tables and visualizations of the full genome result and the individual fits for all probes/bins are produced.
Maintained by Jens Georg. Last updated 5 months ago.
rnaseqdifferentialexpressiongeneregulationtranscriptomicsregressionmicroarraysoftware
3.3 match 4.60 score 1 scriptscpanse
protViz:Visualizing and Analyzing Mass Spectrometry Related Data in Proteomics
Helps with quality checks, visualizations and analysis of mass spectrometry data, coming from proteomics experiments. The package is developed, tested and used at the Functional Genomics Center Zurich <https://fgcz.ch>. We use this package mainly for prototyping, teaching, and having fun with proteomics data. But it can also be used to do data analysis for small scale data sets.
Maintained by Christian Panse. Last updated 1 years ago.
funmass-spectrometrypeptide-identificationproteomicsquantificationvisualizationcpp
1.9 match 11 stars 7.88 score 72 scripts 2 dependentsbioc
Sconify:A toolkit for performing KNN-based statistics for flow and mass cytometry data
This package does k-nearest neighbor based statistics and visualizations with flow and mass cytometery data. This gives tSNE maps"fold change" functionality and provides a data quality metric by assessing manifold overlap between fcs files expected to be the same. Other applications using this package include imputation, marker redundancy, and testing the relative information loss of lower dimension embeddings compared to the original manifold.
Maintained by Tyler J Burns. Last updated 5 months ago.
immunooncologysinglecellflowcytometrysoftwaremultiplecomparisonvisualization
3.1 match 4.74 score 11 scriptschristopherkenny
geomander:Geographic Tools for Studying Gerrymandering
A compilation of tools to complete common tasks for studying gerrymandering. This focuses on the geographic tool side of common problems, such as linking different levels of spatial units or estimating how to break up units. Functions exist for creating redistricting-focused data for the US.
Maintained by Christopher T. Kenny. Last updated 19 days ago.
1.9 match 14 stars 7.81 score 191 scripts 1 dependentsbioc
SPIAT:Spatial Image Analysis of Tissues
SPIAT (**Sp**atial **I**mage **A**nalysis of **T**issues) is an R package with a suite of data processing, quality control, visualization and data analysis tools. SPIAT is compatible with data generated from single-cell spatial proteomics platforms (e.g. OPAL, CODEX, MIBI, cellprofiler). SPIAT reads spatial data in the form of X and Y coordinates of cells, marker intensities and cell phenotypes. SPIAT includes six analysis modules that allow visualization, calculation of cell colocalization, categorization of the immune microenvironment relative to tumor areas, analysis of cellular neighborhoods, and the quantification of spatial heterogeneity, providing a comprehensive toolkit for spatial data analysis.
Maintained by Yuzhou Feng. Last updated 1 days ago.
biomedicalinformaticscellbiologyspatialclusteringdataimportimmunooncologyqualitycontrolsinglecellsoftwarevisualization
1.7 match 22 stars 8.59 score 69 scriptsbioc
scDblFinder:scDblFinder
The scDblFinder package gathers various methods for the detection and handling of doublets/multiplets in single-cell sequencing data (i.e. multiple cells captured within the same droplet or reaction volume). It includes methods formerly found in the scran package, the new fast and comprehensive scDblFinder method, and a reimplementation of the Amulet detection method for single-cell ATAC-seq.
Maintained by Pierre-Luc Germain. Last updated 2 months ago.
preprocessingsinglecellrnaseqatacseqdoubletssingle-cell
1.2 match 184 stars 12.34 score 888 scripts 1 dependentsbrian-j-smith
MachineShop:Machine Learning Models and Tools
Meta-package for statistical and machine learning with a unified interface for model fitting, prediction, performance assessment, and presentation of results. Approaches for model fitting and prediction of numerical, categorical, or censored time-to-event outcomes include traditional regression models, regularization methods, tree-based methods, support vector machines, neural networks, ensembles, data preprocessing, filtering, and model tuning and selection. Performance metrics are provided for model assessment and can be estimated with independent test sets, split sampling, cross-validation, or bootstrap resampling. Resample estimation can be executed in parallel for faster processing and nested in cases of model tuning and selection. Modeling results can be summarized with descriptive statistics; calibration curves; variable importance; partial dependence plots; confusion matrices; and ROC, lift, and other performance curves.
Maintained by Brian J Smith. Last updated 7 months ago.
classification-modelsmachine-learningpredictive-modelingregression-modelssurvival-models
1.8 match 62 stars 7.95 score 121 scriptstpetzoldt
simecol:Simulation of Ecological (and Other) Dynamic Systems
An object oriented framework to simulate ecological (and other) dynamic systems. It can be used for differential equations, individual-based (or agent-based) and other models as well. It supports structuring of simulation scenarios (to avoid copy and paste) and aims to improve readability and re-usability of code.
Maintained by Thomas Petzoldt. Last updated 7 months ago.
3.0 match 4.76 score 190 scriptsbioc
imcRtools:Methods for imaging mass cytometry data analysis
This R package supports the handling and analysis of imaging mass cytometry and other highly multiplexed imaging data. The main functionality includes reading in single-cell data after image segmentation and measurement, data formatting to perform channel spillover correction and a number of spatial analysis approaches. First, cell-cell interactions are detected via spatial graph construction; these graphs can be visualized with cells representing nodes and interactions representing edges. Furthermore, per cell, its direct neighbours are summarized to allow spatial clustering. Per image/grouping level, interactions between types of cells are counted, averaged and compared against random permutations. In that way, types of cells that interact more (attraction) or less (avoidance) frequently than expected by chance are detected.
Maintained by Daniel Schulz. Last updated 5 months ago.
immunooncologysinglecellspatialdataimportclusteringimcsingle-cell
1.9 match 24 stars 7.58 score 126 scriptstidymodels
spatialsample:Spatial Resampling Infrastructure
Functions and classes for spatial resampling to use with the 'rsample' package, such as spatial cross-validation (Brenning, 2012) <doi:10.1109/IGARSS.2012.6352393>. The scope of 'rsample' and 'spatialsample' is to provide the basic building blocks for creating and analyzing resamples of a spatial data set, but neither package includes functions for modeling or computing statistics. The resampled spatial data sets created by 'spatialsample' do not contain much overhead in memory.
Maintained by Michael Mahoney. Last updated 6 months ago.
1.7 match 73 stars 8.19 score 118 scripts 2 dependentsbioc
dandelionR:Single-cell Immune Repertoire Trajectory Analysis in R
dandelionR is an R package for performing single-cell immune repertoire trajectory analysis, based on the original python implementation. It provides the necessary functions to interface with scRepertoire and a custom implementation of an absorbing Markov chain for pseudotime inference, inspired by the Palantir Python package.
Maintained by Kelvin Tuong. Last updated 14 days ago.
softwareimmunooncologysinglecell
2.4 match 8 stars 5.81 score 7 scriptsfsavje
distances:Tools for Distance Metrics
Provides tools for constructing, manipulating and using distance metrics.
Maintained by Fredrik Savje. Last updated 1 years ago.
2.0 match 17 stars 6.92 score 117 scripts 12 dependentsadeverse
adegraphics:An S4 Lattice-Based Package for the Representation of Multivariate Data
Graphical functionalities for the representation of multivariate data. It is a complete re-implementation of the functions available in the 'ade4' package.
Maintained by Aurรฉlie Siberchicot. Last updated 8 months ago.
1.3 match 9 stars 10.37 score 386 scripts 6 dependentszcolburn
Bioi:Biological Image Analysis
Single linkage clustering and connected component analyses are often performed on biological images. 'Bioi' provides a set of functions for performing these tasks. This functionality is implemented in several key functions that can extend to from 1 to many dimensions. The single linkage clustering method implemented here can be used on n-dimensional data sets, while connected component analyses are limited to 3 or fewer dimensions.
Maintained by Zachary Colburn. Last updated 5 years ago.
biological-data-analysisbiologycellcppimage-analysismicroscopycpp
3.6 match 3.81 score 13 scriptsmtrupiano1
knnwtsim:K Nearest Neighbor Forecasting with a Tailored Similarity Metric
Functions to implement K Nearest Neighbor forecasting using a weighted similarity metric tailored to the problem of forecasting univariate time series where recent observations, seasonal patterns, and exogenous predictors are all relevant in predicting future observations of the series in question. For more information on the formulation of this similarity metric please see Trupiano (2021) <arXiv:2112.06266>.
Maintained by Matthew Trupiano. Last updated 3 years ago.
forecastingknn-regressionmachine-learningtime-series
5.1 match 1 stars 2.70 score 2 scriptskenaho1
asbio:A Collection of Statistical Tools for Biologists
Contains functions from: Aho, K. (2014) Foundational and Applied Statistics for Biologists using R. CRC/Taylor and Francis, Boca Raton, FL, ISBN: 978-1-4398-7338-0.
Maintained by Ken Aho. Last updated 2 months ago.
1.9 match 5 stars 7.32 score 310 scripts 3 dependentsbioc
systemPipeTools:Tools for data visualization
systemPipeTools package extends the widely used systemPipeR (SPR) workflow environment with an enhanced toolkit for data visualization, including utilities to automate the data visualizaton for analysis of differentially expressed genes (DEGs). systemPipeTools provides data transformation and data exploration functions via scatterplots, hierarchical clustering heatMaps, principal component analysis, multidimensional scaling, generalized principal components, t-Distributed Stochastic Neighbor embedding (t-SNE), and MA and volcano plots. All these utilities can be integrated with the modular design of the systemPipeR environment that allows users to easily substitute any of these features and/or custom with alternatives.
Maintained by Daniela Cassol. Last updated 5 months ago.
infrastructuredataimportsequencingqualitycontrolreportwritingexperimentaldesignclusteringdifferentialexpressionmultidimensionalscalingprincipalcomponent
3.4 match 4.00 score 4 scriptskisungyou
Riemann:Learning with Data on Riemannian Manifolds
We provide a variety of algorithms for manifold-valued data, including Frรฉchet summaries, hypothesis testing, clustering, visualization, and other learning tasks. See Bhattacharya and Bhattacharya (2012) <doi:10.1017/CBO9781139094764> for general exposition to statistics on manifolds.
Maintained by Kisung You. Last updated 2 years ago.
3.7 match 10 stars 3.70 score 8 scriptsbnprks
BPCells:Single Cell Counts Matrices to PCA
> Efficient operations for single cell ATAC-seq fragments and RNA counts matrices. Interoperable with standard file formats, and introduces efficient bit-packed formats that allow large storage savings and increased read speeds.
Maintained by Benjamin Parks. Last updated 1 months ago.
1.8 match 184 stars 7.48 score 172 scriptssharifrahmanie
MBMethPred:Medulloblastoma Subgroups Prediction
Utilizing a combination of machine learning models (Random Forest, Naive Bayes, K-Nearest Neighbor, Support Vector Machines, Extreme Gradient Boosting, and Linear Discriminant Analysis) and a deep Artificial Neural Network model, 'MBMethPred' can predict medulloblastoma subgroups, including wingless (WNT), sonic hedgehog (SHH), Group 3, and Group 4 from DNA methylation beta values. See Sharif Rahmani E, Lawarde A, Lingasamy P, Moreno SV, Salumets A and Modhukur V (2023), MBMethPred: a computational framework for the accurate classification of childhood medulloblastoma subgroups using data integration and AI-based approaches. Front. Genet. 14:1233657. <doi: 10.3389/fgene.2023.1233657> for more details.
Maintained by Edris Sharif Rahmani. Last updated 1 years ago.
3.6 match 3.70 score 1 scriptsbioc
ncGTW:Alignment of LC-MS Profiles by Neighbor-wise Compound-specific Graphical Time Warping with Misalignment Detection
The purpose of ncGTW is to help XCMS for LC-MS data alignment. Currently, ncGTW can detect the misaligned feature groups by XCMS, and the user can choose to realign these feature groups by ncGTW or not.
Maintained by Chiung-Ting Wu. Last updated 5 months ago.
softwaremassspectrometrymetabolomicsalignmentcpp
2.7 match 8 stars 4.90 score 3 scriptsbioc
BioNERO:Biological Network Reconstruction Omnibus
BioNERO aims to integrate all aspects of biological network inference in a single package, including data preprocessing, exploratory analyses, network inference, and analyses for biological interpretations. BioNERO can be used to infer gene coexpression networks (GCNs) and gene regulatory networks (GRNs) from gene expression data. Additionally, it can be used to explore topological properties of protein-protein interaction (PPI) networks. GCN inference relies on the popular WGCNA algorithm. GRN inference is based on the "wisdom of the crowds" principle, which consists in inferring GRNs with multiple algorithms (here, CLR, GENIE3 and ARACNE) and calculating the average rank for each interaction pair. As all steps of network analyses are included in this package, BioNERO makes users avoid having to learn the syntaxes of several packages and how to communicate between them. Finally, users can also identify consensus modules across independent expression sets and calculate intra and interspecies module preservation statistics between different networks.
Maintained by Fabricio Almeida-Silva. Last updated 5 months ago.
softwaregeneexpressiongeneregulationsystemsbiologygraphandnetworkpreprocessingnetworknetworkinference
1.7 match 27 stars 7.78 score 50 scripts 1 dependentschavent
ClustOfVar:Clustering of Variables
Cluster analysis of a set of variables. Variables can be quantitative, qualitative or a mixture of both.
Maintained by Marie Chavent. Last updated 5 years ago.
2.0 match 7 stars 6.47 score 142 scripts 2 dependentsthomasp85
particles:A Graph Based Particle Simulator Based on D3-Force
Simulating particle movement in 2D space has many application. The 'particles' package implements a particle simulator based on the ideas behind the 'd3-force' 'JavaScript' library. 'particles' implements all forces defined in 'd3-force' as well as others such as vector fields, traps, and attractors.
Maintained by Thomas Lin Pedersen. Last updated 3 months ago.
d3jsgraph-layoutnetworknetwork-visualizationparticlessimulationcpp
1.8 match 119 stars 7.19 score 43 scriptssweinand
pricelevels:Spatial Price Level Comparisons
Price comparisons within or between countries provide an overall measure of the relative difference in prices, often denoted as price levels. This package provides index number methods for such price comparisons (e.g., The World Bank, 2011, <doi:10.1596/978-0-8213-9728-2>). Moreover, it contains functions for sampling and characterizing price data.
Maintained by Sebastian Weinand. Last updated 10 months ago.
index-numbersprice-comparisonspatial-analysis
3.0 match 4.30 score 2 scriptsrmgpanw
gtexr:Query the GTEx Portal API
A convenient R interface to the Genotype-Tissue Expression (GTEx) Portal API. For more information on the API, see <https://gtexportal.org/api/v2/redoc>.
Maintained by Alasdair Warwick. Last updated 6 months ago.
api-wrapperbioinformaticseqtlgtexsqtl
2.0 match 5 stars 6.41 score 5 scriptscefet-rj-dal
daltoolbox:Leveraging Experiment Lines to Data Analytics
The natural increase in the complexity of current research experiments and data demands better tools to enhance productivity in Data Analytics. The package is a framework designed to address the modern challenges in data analytics workflows. The package is inspired by Experiment Line concepts. It aims to provide seamless support for users in developing their data mining workflows by offering a uniform data model and method API. It enables the integration of various data mining activities, including data preprocessing, classification, regression, clustering, and time series prediction. It also offers options for hyper-parameter tuning and supports integration with existing libraries and languages. Overall, the package provides researchers with a comprehensive set of functionalities for data science, promoting ease of use, extensibility, and integration with various tools and libraries. Information on Experiment Line is based on Ogasawara et al. (2009) <doi:10.1007/978-3-642-02279-1_20>.
Maintained by Eduardo Ogasawara. Last updated 1 months ago.
1.9 match 1 stars 6.65 score 536 scripts 4 dependentsswarm-lab
swaRm:Processing Collective Movement Data
Function library for processing collective movement data (e.g. fish schools, ungulate herds, baboon troops) collected from GPS trackers or computer vision tracking software.
Maintained by Simon Garnier. Last updated 1 years ago.
animal-behavioranimal-behaviourcollective-behaviorcollective-behaviour
2.3 match 21 stars 5.50 score 8 scripts 1 dependentscran
RSDA:R to Symbolic Data Analysis
Symbolic Data Analysis (SDA) was proposed by professor Edwin Diday in 1987, the main purpose of SDA is to substitute the set of rows (cases) in the data table for a concept (second order statistical unit). This package implements, to the symbolic case, certain techniques of automatic classification, as well as some linear models.
Maintained by Oldemar Rodriguez. Last updated 1 years ago.
3.8 match 1 stars 3.26 score 3 dependentsbioc
scde:Single Cell Differential Expression
The scde package implements a set of statistical methods for analyzing single-cell RNA-seq data. scde fits individual error models for single-cell RNA-seq measurements. These models can then be used for assessment of differential expression between groups of cells, as well as other types of analysis. The scde package also contains the pagoda framework which applies pathway and gene set overdispersion analysis to identify and characterize putative cell subpopulations based on transcriptional signatures. The overall approach to the differential expression analysis is detailed in the following publication: "Bayesian approach to single-cell differential expression analysis" (Kharchenko PV, Silberstein L, Scadden DT, Nature Methods, doi: 10.1038/nmeth.2967). The overall approach to subpopulation identification and characterization is detailed in the following pre-print: "Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis" (Fan J, Salathia N, Liu R, Kaeser G, Yung Y, Herman J, Kaper F, Fan JB, Zhang K, Chun J, and Kharchenko PV, Nature Methods, doi:10.1038/nmeth.3734).
Maintained by Evan Biederstedt. Last updated 5 months ago.
immunooncologyrnaseqstatisticalmethoddifferentialexpressionbayesiantranscriptionsoftwareanalysisbioinformaticsheterogenityngssingle-celltranscriptomicsopenblascppopenmp
1.6 match 173 stars 7.53 score 141 scriptsjkim82133
TDA:Statistical Tools for Topological Data Analysis
Tools for Topological Data Analysis. The package focuses on statistical analysis of persistent homology and density clustering. For that, this package provides an R interface for the efficient algorithms of the C++ libraries 'GUDHI' <https://project.inria.fr/gudhi/software/>, 'Dionysus' <https://www.mrzv.org/software/dionysus/>, and 'PHAT' <https://bitbucket.org/phat-code/phat/>. This package also implements methods from Fasy et al. (2014) <doi:10.1214/14-AOS1252> and Chazal et al. (2015) <doi:10.20382/jocg.v6i2a8> for analyzing the statistical significance of persistent homology features.
Maintained by Jisu Kim. Last updated 1 months ago.
1.7 match 9 stars 7.18 score 204 scripts 5 dependentsmatrix-profile-foundation
tsmp:Time Series with Matrix Profile
A toolkit implementing the Matrix Profile concept that was created by CS-UCR <http://www.cs.ucr.edu/~eamonn/MatrixProfile.html>.
Maintained by Francisco Bischoff. Last updated 3 years ago.
algorithmmatrix-profilemotif-searchtime-seriescpp
1.7 match 72 stars 7.29 score 179 scripts 1 dependentsyuimaproject
yuima:The YUIMA Project Package for SDEs
Simulation and Inference for SDEs and Other Stochastic Processes.
Maintained by Stefano M. Iacus. Last updated 3 days ago.
1.7 match 9 stars 7.26 score 92 scripts 2 dependentstoduckhanh
bcROCsurface:Bias-Corrected Methods for Estimating the ROC Surface of Continuous Diagnostic Tests
The bias-corrected estimation methods for the receiver operating characteristics ROC surface and the volume under ROC surfaces (VUS) under missing at random (MAR) assumption.
Maintained by Duc-Khanh To. Last updated 1 years ago.
3.5 match 3.45 score 14 scriptsmsq-123
CovidMutations:Mutation Analysis and Assay Validation Toolkit for COVID-19 (Coronavirus Disease 2019)
A feasible framework for mutation analysis and reverse transcription polymerase chain reaction (RT-PCR) assay evaluation of COVID-19, including mutation profile visualization, statistics and mutation ratio of each assay. The mutation ratio is conducive to evaluating the coverage of RT-PCR assays in large-sized samples<doi:10.20944/preprints202004.0529.v1>.
Maintained by Shaoqian Ma. Last updated 5 years ago.
2.8 match 4 stars 4.30 score 6 scriptsjoycekang
symphony:Efficient and Precise Single-Cell Reference Atlas Mapping
Implements the Symphony single-cell reference building and query mapping algorithms and additional functions described in Kang et al <https://www.nature.com/articles/s41467-021-25957-x>.
Maintained by Joyce Kang. Last updated 2 years ago.
3.1 match 3.83 score 134 scriptsconnordonegan
geostan:Bayesian Spatial Analysis
For spatial data analysis; provides exploratory spatial analysis tools, spatial regression, spatial econometric, and disease mapping models, model diagnostics, and special methods for inference with small area survey data (e.g., the America Community Survey (ACS)) and censored population health monitoring data. Models are pre-specified using the Stan programming language, a platform for Bayesian inference using Markov chain Monte Carlo (MCMC). References: Carpenter et al. (2017) <doi:10.18637/jss.v076.i01>; Donegan (2021) <doi:10.31219/osf.io/3ey65>; Donegan (2022) <doi:10.21105/joss.04716>; Donegan, Chun and Hughes (2020) <doi:10.1016/j.spasta.2020.100450>; Donegan, Chun and Griffith (2021) <doi:10.3390/ijerph18136856>; Morris et al. (2019) <doi:10.1016/j.sste.2019.100301>.
Maintained by Connor Donegan. Last updated 3 months ago.
bayesianbayesian-inferencebayesian-statisticsepidemiologymodelingpublic-healthrspatialspatialstancpp
1.3 match 80 stars 8.80 score 46 scriptskdonnay
geomerge:Geospatial Data Integration
Geospatial data integration framework that merges raster, spatial polygon, and (dynamic) spatial points data into a spatial (panel) data frame at any geographical resolution.
Maintained by Karsten Donnay. Last updated 1 years ago.
3.9 match 2.93 score 17 scriptsbioc
VarCon:VarCon: an R package for retrieving neighboring nucleotides of an SNV
VarCon is an R package which converts the positional information from the annotation of an single nucleotide variation (SNV) (either referring to the coding sequence or the reference genomic sequence). It retrieves the genomic reference sequence around the position of the single nucleotide variation. To asses, whether the SNV could potentially influence binding of splicing regulatory proteins VarCon calcualtes the HEXplorer score as an estimation. Besides, VarCon additionally reports splice site strengths of splice sites within the retrieved genomic sequence and any changes due to the SNV.
Maintained by Johannes Ptok. Last updated 5 months ago.
functionalgenomicsalternativesplicing
2.9 match 4.00 score 5 scriptskwilliams83
ldbod:Local Density-Based Outlier Detection
Flexible procedures to compute local density-based outlier scores for ranking outliers. Both exact and approximate nearest neighbor search can be implemented, while also accommodating multiple neighborhood sizes and four different local density-based methods. It allows for referencing a random subsample of the input data or a user specified reference data set to compute outlier scores against, so both unsupervised and semi-supervised outlier detection can be implemented.
Maintained by Kristopher Williams. Last updated 8 years ago.
3.8 match 2 stars 3.00 score 3 scriptsbioc
cydar:Using Mass Cytometry for Differential Abundance Analyses
Identifies differentially abundant populations between samples and groups in mass cytometry data. Provides methods for counting cells into hyperspheres, controlling the spatial false discovery rate, and visualizing changes in abundance in the high-dimensional marker space.
Maintained by Aaron Lun. Last updated 5 months ago.
immunooncologyflowcytometrymultiplecomparisonproteomicssinglecellcpp
2.0 match 5.64 score 48 scriptsmatloff
qeML:Quick and Easy Machine Learning Tools
The letters 'qe' in the package title stand for "quick and easy," alluding to the convenience goal of the package. We bring together a variety of machine learning (ML) tools from standard R packages, providing wrappers with a simple, convenient, and uniform interface.
Maintained by Norm Matloff. Last updated 26 days ago.
1.3 match 41 stars 8.41 score 48 scripts 1 dependentspfh
langevitour:Langevin Tour
An HTML widget that randomly tours 2D projections of numerical data. A random walk through projections of the data is shown. The user can manipulate the plot to use specified axes, or turn on Guided Tour mode to find an informative projection of the data. Groups within the data can be hidden or shown, as can particular axes. Points can be brushed, and the selection can be linked to other widgets using crosstalk. The underlying method to produce the random walk and projection pursuit uses Langevin dynamics. The widget can be used from within R, or included in a self-contained R Markdown or Quarto document or presentation, or used in a Shiny app.
Maintained by Paul Harrison. Last updated 2 months ago.
javascript-applicationslangevin-dynamicstourvisualization
1.8 match 26 stars 6.41 score 22 scripts 1 dependentsmichaeldorman
starsExtra:Miscellaneous Functions for Working with 'stars' Rasters
Miscellaneous functions for working with 'stars' objects, mainly single-band rasters. Currently includes functions for: (1) focal filtering, (2) detrending of Digital Elevation Models, (3) calculating flow length, (4) calculating the Convergence Index, (5) calculating topographic aspect and topographic slope.
Maintained by Michael Dorman. Last updated 1 years ago.
1.7 match 25 stars 6.53 score 45 scripts 2 dependentscran
JGL:Performs the Joint Graphical Lasso for Sparse Inverse Covariance Estimation on Multiple Classes
The Joint Graphical Lasso is a generalized method for estimating Gaussian graphical models/ sparse inverse covariance matrices/ biological networks on multiple classes of data. We solve JGL under two penalty functions: The Fused Graphical Lasso (FGL), which employs a fused penalty to encourage inverse covariance matrices to be similar across classes, and the Group Graphical Lasso (GGL), which encourages similar network structure between classes. FGL is recommended over GGL for most applications. Reference: Danaher P, Wang P, Witten DM. (2013) <doi:10.1111/rssb.12033>.
Maintained by Patrick Danaher. Last updated 1 years ago.
4.1 match 1 stars 2.65 score 1 dependentsminoo-asty
CINNA:Deciphering Central Informative Nodes in Network Analysis
Computing, comparing, and demonstrating top informative centrality measures within a network. "CINNA: an R/CRAN package to decipher Central Informative Nodes in Network Analysis" provides a comprehensive overview of the package functionality Ashtiani et al. (2018) <doi:10.1093/bioinformatics/bty819>.
Maintained by Minoo Ashtiani. Last updated 2 years ago.
3.3 match 1 stars 3.29 score 98 scripts