Showing 13 of total 13 results (show query)
tiledb-inc
tiledb:Modern Database Engine for Complex Data Based on Multi-Dimensional Arrays
The modern database 'TileDB' introduces a powerful on-disk format for storing and accessing any complex data based on multi-dimensional arrays. It supports dense and sparse arrays, dataframes and key-values stores, cloud storage ('S3', 'GCS', 'Azure'), chunked arrays, multiple compression, encryption and checksum filters, uses a fully multi-threaded implementation, supports parallel I/O, data versioning ('time travel'), metadata and groups. It is implemented as an embeddable cross-platform C++ library with APIs from several languages, and integrations. This package provides the R support.
Maintained by Isaiah Norton. Last updated 12 hours ago.
arrayhdfss3storage-managertiledbcpp
108 stars 11.79 score 306 scripts 4 dependentskharchenkolab
pagoda2:Single Cell Analysis and Differential Expression
Analyzing and interactively exploring large-scale single-cell RNA-seq datasets. 'pagoda2' primarily performs normalization and differential gene expression analysis, with an interactive application for exploring single-cell RNA-seq datasets. It performs basic tasks such as cell size normalization, gene variance normalization, and can be used to identify subpopulations and run differential expression within individual samples. 'pagoda2' was written to rapidly process modern large-scale scRNAseq datasets of approximately 1e6 cells. The companion web application allows users to explore which gene expression patterns form the different subpopulations within your data. The package also serves as the primary method for preprocessing data for conos, <https://github.com/kharchenkolab/conos>. This package interacts with data available through the 'p2data' package, which is available in a 'drat' repository. To access this data package, see the instructions at <https://github.com/kharchenkolab/pagoda2>. The size of the 'p2data' package is approximately 6 MB.
Maintained by Evan Biederstedt. Last updated 1 years ago.
scrna-seqsingle-cellsingle-cell-rna-seqtranscriptomicsopenblascppopenmp
223 stars 8.00 score 282 scriptskharchenkolab
conos:Clustering on Network of Samples
Wires together large collections of single-cell RNA-seq datasets, which allows for both the identification of recurrent cell clusters and the propagation of information between datasets in multi-sample or atlas-scale collections. 'Conos' focuses on the uniform mapping of homologous cell types across heterogeneous sample collections. For instance, users could investigate a collection of dozens of peripheral blood samples from cancer patients combined with dozens of controls, which perhaps includes samples of a related tissue such as lymph nodes. This package interacts with data available through the 'conosPanel' package, which is available in a 'drat' repository. To access this data package, see the instructions at <https://github.com/kharchenkolab/conos>. The size of the 'conosPanel' package is approximately 12 MB.
Maintained by Evan Biederstedt. Last updated 1 years ago.
batch-correctionscrna-seqsingle-cell-rna-seqopenblascppopenmp
205 stars 7.33 score 258 scriptslcrawlab
mvMAPIT:Multivariate Genome Wide Marginal Epistasis Test
Epistasis, commonly defined as the interaction between genetic loci, is known to play an important role in the phenotypic variation of complex traits. As a result, many statistical methods have been developed to identify genetic variants that are involved in epistasis, and nearly all of these approaches carry out this task by focusing on analyzing one trait at a time. Previous studies have shown that jointly modeling multiple phenotypes can often dramatically increase statistical power for association mapping. In this package, we present the 'multivariate MArginal ePIstasis Test' ('mvMAPIT') – a multi-outcome generalization of a recently proposed epistatic detection method which seeks to detect marginal epistasis or the combined pairwise interaction effects between a given variant and all other variants. By searching for marginal epistatic effects, one can identify genetic variants that are involved in epistasis without the need to identify the exact partners with which the variants interact – thus, potentially alleviating much of the statistical and computational burden associated with conventional explicit search based methods. Our proposed 'mvMAPIT' builds upon this strategy by taking advantage of correlation structure between traits to improve the identification of variants involved in epistasis. We formulate 'mvMAPIT' as a multivariate linear mixed model and develop a multi-trait variance component estimation algorithm for efficient parameter inference and P-value computation. Together with reasonable model approximations, our proposed approach is scalable to moderately sized genome-wide association studies. Crawford et al. (2017) <doi:10.1371/journal.pgen.1006869>. Stamp et al. (2023) <doi:10.1093/g3journal/jkad118>.
Maintained by Julian Stamp. Last updated 5 months ago.
cppepistasisepistasis-analysisgwasgwas-toolslinear-mixed-modelsmapitmvmapitvariance-componentsopenblascppopenmp
11 stars 6.90 score 17 scripts 1 dependentsbioc
TileDBArray:Using TileDB as a DelayedArray Backend
Implements a DelayedArray backend for reading and writing dense or sparse arrays in the TileDB format. The resulting TileDBArrays are compatible with all Bioconductor pipelines that can accept DelayedArray instances.
Maintained by Aaron Lun. Last updated 5 months ago.
datarepresentationinfrastructuresoftware
10 stars 6.89 score 26 scripts 1 dependentsbioc
AlphaMissenseR:Accessing AlphaMissense Data Resources in R
The AlphaMissense publication <https://www.science.org/doi/epdf/10.1126/science.adg7492> outlines how a variant of AlphaFold / DeepMind was used to predict missense variant pathogenicity. Supporting data on Zenodo <https://zenodo.org/record/10813168> include, for instance, 71M variants across hg19 and hg38 genome builds. The 'AlphaMissenseR' package allows ready access to the data, downloading individual files to DuckDB databases for exploration and integration into *R* and *Bioconductor* workflows.
Maintained by Martin Morgan. Last updated 5 months ago.
snpannotationfunctionalgenomicsstructuralpredictiontranscriptomicsvariantannotationgenepredictionimmunooncology
8 stars 6.86 score 10 scriptschanzuckerberg
cellxgene.census:CZ CELLxGENE Discover Cell Census
API to facilitate the use of the CZ CELLxGENE Discover Census. For more information about the API and the project visit https://github.com/chanzuckerberg/cellxgene-census/
Maintained by Chan Zuckerberg Initiative Foundation. Last updated 6 months ago.
96 stars 6.60 score 15 scriptsygeunkim
bvhar:Bayesian Vector Heterogeneous Autoregressive Modeling
Tools to model and forecast multivariate time series including Bayesian Vector heterogeneous autoregressive (VHAR) model by Kim & Baek (2023) (<doi:10.1080/00949655.2023.2281644>). 'bvhar' can model Vector Autoregressive (VAR), VHAR, Bayesian VAR (BVAR), and Bayesian VHAR (BVHAR) models.
Maintained by Young Geun Kim. Last updated 28 days ago.
bayesianbayesian-econometricsbvareigenforecastingharpybind11pythonrcppeigentime-seriesvector-autoregressioncppopenmp
6 stars 6.42 score 25 scriptslcrawlab
smer:Sparse Marginal Epistasis Test
The Sparse Marginal Epistasis Test is a computationally efficient genetics method which detects statistical epistasis in complex traits; see Stamp et al. (2025, <doi:10.1101/2025.01.11.632557>) for details.
Maintained by Julian Stamp. Last updated 2 months ago.
genomewideassociationepistasisgeneticssnplinearmixedmodelcppepistasis-analysisepistatisgwasgwas-toolsmapitzlibcppopenmp
1 stars 4.95 score 8 scriptseddelbuettel
spdl:Easier Use of 'RcppSpdlog' Functions via Wrapper
Logging functions in 'RcppSpdlog' provide access to the logging functionality from the 'spdlog' 'C++' library. This package offers shorter convenience wrappers for the 'R' functions which match the 'C++' functions, namely via, say, 'spdl::debug()' at the debug level. The actual formatting is done by the 'fmt::format()' function from the 'fmtlib' library (that is also 'std::format()' in 'C++20' or later).
Maintained by Dirk Eddelbuettel. Last updated 10 months ago.
2 stars 4.78 score 1 scripts 6 dependentskharchenkolab
N2R:Fast and Scalable Approximate k-Nearest Neighbor Search Methods using 'N2' Library
Implements methods to perform fast approximate K-nearest neighbor search on input matrix. Algorithm based on the 'N2' implementation of an approximate nearest neighbor search using hierarchical Navigable Small World (NSW) graphs. The original algorithm is described in "Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs", Y. Malkov and D. Yashunin, <doi:10.1109/TPAMI.2018.2889473>, <arXiv:1603.09320>.
Maintained by Evan Biederstedt. Last updated 1 years ago.
10 stars 4.78 score 3 scripts 2 dependentsbioc
beachmat.tiledb:beachmat bindings for TileDB-backed matrices
Extends beachmat to initialize tatami matrices from TileDB-backed arrays. This allows C++ code in downstream packages to directly call the TileDB C/C++ library to access array data, without the need for block processing via DelayedArray. Developers only need to import this package to automatically extend the capabilities of beachmat::initializeCpp to TileDBArray instances.
Maintained by Aaron Lun. Last updated 3 months ago.
datarepresentationdataimportinfrastructurecpp
4.65 score 4 scriptsrogiersbart
ra:A minimal TileDB-backed lazy multi-dimensional array implementation with metadata
The {ra} package provides a wrapper around the low-level {tiledb} API and {jsonlite}, to implement a minimal lazy multi-dimensional array with arbitrary metadata support.
Maintained by Bart Rogiers. Last updated 9 months ago.
1.70 score