Showing 14 of total 14 results (show query)
jonasrieger
ldaPrototype:Prototype of Multiple Latent Dirichlet Allocation Runs
Determine a Prototype from a number of runs of Latent Dirichlet Allocation (LDA) measuring its similarities with S-CLOP: A procedure to select the LDA run with highest mean pairwise similarity, which is measured by S-CLOP (Similarity of multiple sets by Clustering with Local Pruning), to all other runs. LDA runs are specified by its assignments leading to estimators for distribution parameters. Repeated runs lead to different results, which we encounter by choosing the most representative LDA run as prototype.
Maintained by Jonas Rieger. Last updated 2 years ago.
latent-dirichlet-allocationldamodel-selectionmodelselectionreliabilitytext-miningtextdatatopic-modeltopic-modelstopic-similaritiestopicmodelingtopicmodelling
10.0 match 8 stars 4.44 score 23 scripts 1 dependentstomasfryda
h2o:R Interface for the 'H2O' Scalable Machine Learning Platform
R interface for 'H2O', the scalable open source machine learning platform that offers parallelized implementations of many supervised and unsupervised machine learning algorithms such as Generalized Linear Models (GLM), Gradient Boosting Machines (including XGBoost), Random Forests, Deep Neural Networks (Deep Learning), Stacked Ensembles, Naive Bayes, Generalized Additive Models (GAM), ANOVA GLM, Cox Proportional Hazards, K-Means, PCA, ModelSelection, Word2Vec, as well as a fully automatic machine learning algorithm (H2O AutoML).
Maintained by Tomas Fryda. Last updated 1 years ago.
4.3 match 3 stars 8.20 score 7.8k scripts 11 dependentsdavidrusi
mombf:Model Selection with Bayesian Methods and Information Criteria
Model selection and averaging for regression and mixtures, inclusing Bayesian model selection and information criteria (BIC, EBIC, AIC, GIC).
Maintained by David Rossell. Last updated 1 months ago.
3.0 match 7 stars 7.89 score 73 scripts 1 dependentsmpierrejean
jointseg:Joint Segmentation of Multivariate (Copy Number) Signals
Methods for fast segmentation of multivariate signals into piecewise constant profiles and for generating realistic copy-number profiles. A typical application is the joint segmentation of total DNA copy numbers and allelic ratios obtained from Single Nucleotide Polymorphism (SNP) microarrays in cancer studies. The methods are described in Pierre-Jean, Rigaill and Neuvial (2015) <doi:10.1093/bib/bbu026>.
Maintained by Morgane Pierre-Jean. Last updated 6 years ago.
3.0 match 6 stars 6.50 score 44 scripts 2 dependentstdhock
penaltyLearning:Penalty Learning
Implementations of algorithms from Learning Sparse Penalties for Change-point Detection using Max Margin Interval Regression, by Hocking, Rigaill, Vert, Bach <http://proceedings.mlr.press/v28/hocking13.html> published in proceedings of ICML2013.
Maintained by Toby Dylan Hocking. Last updated 6 months ago.
3.0 match 16 stars 6.13 score 129 scripts 2 dependentsaagillet
MorphoRegions:Analysis of Regionalization Patterns in Serially Homologous Structures
Computes the optimal number of regions (or subdivisions) and their position in serial structures without a priori assumptions and to visualize the results. After reducing data dimensionality with the built-in function for data ordination, regions are fitted as segmented linear regressions along the serial structure. Every region boundary position and increasing number of regions are iteratively fitted and the best model (number of regions and boundary positions) is selected with an information criterion. This package expands on the previous 'regions' package (Jones et al. (2018) <doi:10.1126/science.aar3126>) with improved computation and more fitting and plotting options.
Maintained by Amandine Gillet. Last updated 4 months ago.
3.3 match 4.30 score 6 scriptsbioc
INSPEcT:Modeling RNA synthesis, processing and degradation with RNA-seq data
INSPEcT (INference of Synthesis, Processing and dEgradation rates from Transcriptomic data) RNA-seq data in time-course experiments or steady-state conditions, with or without the support of nascent RNA data.
Maintained by Stefano de Pretis. Last updated 5 months ago.
sequencingrnaseqgeneregulationtimecoursesystemsbiology
3.0 match 4.38 score 9 scriptshaghish
autoEnsemble:Automated Stacked Ensemble Classifier for Severe Class Imbalance
An AutoML algorithm is developed to construct homogeneous or heterogeneous stacked ensemble models using specified base-learners. Various criteria are employed to identify optimal models, enhancing diversity among them and resulting in more robust stacked ensembles. The algorithm optimizes the model by incorporating an increasing number of top-performing models to create a diverse combination. Presently, only models from 'h2o.ai' are supported.
Maintained by E. F. Haghish. Last updated 12 hours ago.
aialgorithmautomated-machine-learningautomlautoml-algorithmsensembleensemble-learningh2oh2oaimachine-learningmachinelearningmetalearningstack-ensemblestacked-ensemblesstacking
3.0 match 5 stars 4.20 score 21 scriptsbioc
STATegRa:Classes and methods for multi-omics data integration
Classes and tools for multi-omics data integration.
Maintained by David Gomez-Cabrero. Last updated 5 months ago.
softwarestatisticalmethodclusteringdimensionreductionprincipalcomponent
3.0 match 4.15 score 3 scriptsvsousa
poolABC:Approximate Bayesian Computation with Pooled Sequencing Data
Provides functions to simulate Pool-seq data under models of demographic formation and to import Pool-seq data from real populations. Implements two ABC algorithms for performing parameter estimation and model selection using Pool-seq data. Cross-validation can also be performed to assess the accuracy of ABC estimates and model choice. Carvalho et al., (2022) <doi:10.1111/1755-0998.13834>.
Maintained by João Carvalho. Last updated 2 years ago.
3.3 match 1 stars 3.70 score 3 scriptsbioc
DaMiRseq:Data Mining for RNA-seq data: normalization, feature selection and classification
The DaMiRseq package offers a tidy pipeline of data mining procedures to identify transcriptional biomarkers and exploit them for both binary and multi-class classification purposes. The package accepts any kind of data presented as a table of raw counts and allows including both continous and factorial variables that occur with the experimental setting. A series of functions enable the user to clean up the data by filtering genomic features and samples, to adjust data by identifying and removing the unwanted source of variation (i.e. batches and confounding factors) and to select the best predictors for modeling. Finally, a "stacking" ensemble learning technique is applied to build a robust classification model. Every step includes a checkpoint that the user may exploit to assess the effects of data management by looking at diagnostic plots, such as clustering and heatmaps, RLE boxplots, MDS or correlation plot.
Maintained by Mattia Chiesa. Last updated 5 months ago.
sequencingrnaseqclassificationimmunooncologyopenjdk
2.3 match 5.32 score 7 scripts 1 dependentshhhelfer
HCmodelSets:Regression with a Large Number of Potential Explanatory Variables
Software for performing the reduction, exploratory and model selection phases of the procedure proposed by Cox, D.R. and Battey, H.S. (2017) <doi:10.1073/pnas.1703764114> for sparse regression when the number of potential explanatory variables far exceeds the sample size. The software supports linear regression, likelihood-based fitting of generalized linear regression models and the proportional hazards model fitted by partial likelihood.
Maintained by H. Battey. Last updated 2 years ago.
2.3 match 2 stars 4.00 score 5 scripts