Showing 9 of total 9 results (show query)
easystats
correlation:Methods for Correlation Analysis
Lightweight package for computing different kinds of correlations, such as partial correlations, Bayesian correlations, multilevel correlations, polychoric correlations, biweight correlations, distance correlations and more. Part of the 'easystats' ecosystem. References: Makowski et al. (2020) <doi:10.21105/joss.02306>.
Maintained by Brenton M. Wiernik. Last updated 28 days ago.
bayesianbayesian-correlationsbiserialcorcorrelationcorrelation-analysiscorrelationseasystatsgammagaussian-graphical-modelshacktoberfestmatrixmultilevel-correlationsoutlierspartialpartial-correlationsregressionrobustspearman
439 stars 14.23 score 672 scripts 10 dependentsconvexfi
fitHeavyTail:Mean and Covariance Matrix Estimation under Heavy Tails
Robust estimation methods for the mean vector, scatter matrix, and covariance matrix (if it exists) from data (possibly containing NAs) under multivariate heavy-tailed distributions such as angular Gaussian (via Tyler's method), Cauchy, and Student's t distributions. Additionally, a factor model structure can be specified for the covariance matrix. The latest revision also includes the multivariate skewed t distribution. The package is based on the papers: Sun, Babu, and Palomar (2014); Sun, Babu, and Palomar (2015); Liu and Rubin (1995); Zhou, Liu, Kumar, and Palomar (2019); Pascal, Ollila, and Palomar (2021).
Maintained by Daniel P. Palomar. Last updated 2 years ago.
cauchycovariance-estimationcovariance-matrixheavy-tailed-distributionsoutliersrobust-estimationstudent-ttyler
22 stars 6.27 score 28 scripts 1 dependentsdppalomar
imputeFin:Imputation of Financial Time Series with Missing Values and/or Outliers
Missing values often occur in financial data due to a variety of reasons (errors in the collection process or in the processing stage, lack of asset liquidity, lack of reporting of funds, etc.). However, most data analysis methods expect complete data and cannot be employed with missing values. One convenient way to deal with this issue without having to redesign the data analysis method is to impute the missing values. This package provides an efficient way to impute the missing values based on modeling the time series with a random walk or an autoregressive (AR) model, convenient to model log-prices and log-volumes in financial data. In the current version, the imputation is univariate-based (so no asset correlation is used). In addition, outliers can be detected and removed. The package is based on the paper: J. Liu, S. Kumar, and D. P. Palomar (2019). Parameter Estimation of Heavy-Tailed AR Model With Missing Data Via Stochastic EM. IEEE Trans. on Signal Processing, vol. 67, no. 8, pp. 2159-2172. <doi:10.1109/TSP.2019.2899816>.
Maintained by Daniel P. Palomar. Last updated 4 years ago.
financial-datamissing-valuesoutlierstime-series
25 stars 5.80 score 25 scriptschristiangoueguel
HotellingEllipse:Hotelling’s T-Squared Statistic and Ellipse
Functions to calculate the Hotelling’s T-squared statistic and corresponding confidence ellipses. Provides the semi-axes of the Hotelling’s T-squared ellipses at 95% and 99% confidence levels. Enables users to obtain the coordinates in two or three dimensions at user-defined confidence levels, allowing for the construction of 2D or 3D ellipses with customized confidence levels. Bro and Smilde (2014) <DOI:10.1039/c3ay41907j>. Brereton (2016) <DOI:10.1002/cem.2763>.
Maintained by Christian L. Goueguel. Last updated 3 months ago.
confidence-ellipsehotelling-ellipsehotelling-s-t-squarehotelling-t2hotellings-t2-distributionmultivariate-distributionoutlierspartial-least-squares-regressionpcaplsprincipal-component-analysis
7 stars 5.29 score 14 scriptsfbartos
RoBTT:Robust Bayesian T-Test
An implementation of Bayesian model-averaged t-tests that allows users to draw inferences about the presence versus absence of an effect, variance heterogeneity, and potential outliers. The 'RoBTT' package estimates ensembles of models created by combining competing hypotheses and applies Bayesian model averaging using posterior model probabilities. Users can obtain model-averaged posterior distributions and inclusion Bayes factors, accounting for uncertainty in the data-generating process (Maier et al., 2024, <doi:10.3758/s13423-024-02590-5>). The package also provides a truncated likelihood version of the model-averaged t-test, enabling users to exclude potential outliers without introducing bias (Godmann et al., 2024, <doi:10.31234/osf.io/j9f3s>). Users can specify a wide range of informative priors for all parameters of interest. The package offers convenient functions for summary, visualization, and fit diagnostics.
Maintained by František Bartoš. Last updated 5 months ago.
bayesianmodel-averagingoutlierst-testcpp
3 stars 5.26 score 9 scriptstalegari
solitude:An Implementation of Isolation Forest
Isolation forest is anomaly detection method introduced by the paper Isolation based Anomaly Detection (Liu, Ting and Zhou <doi:10.1145/2133360.2133363>).
Maintained by Komala Sheshachala Srikanth. Last updated 4 years ago.
isolation-forestoutliersrpackages
24 stars 5.23 score 70 scripts 1 dependentsgagolews
genie:Fast, Robust, and Outlier Resistant Hierarchical Clustering
Includes the reference implementation of Genie - a hierarchical clustering algorithm that links two point groups in such a way that an inequity measure (namely, the Gini index) of the cluster sizes does not significantly increase above a given threshold. This method most often outperforms many other data segmentation approaches in terms of clustering quality as tested on a wide range of benchmark datasets. At the same time, Genie retains the high speed of the single linkage approach, therefore it is also suitable for analysing larger data sets. For more details see (Gagolewski et al. 2016 <DOI:10.1016/j.ins.2016.05.003>). For an even faster and more feature-rich implementation, including, amongst others, noise point detection, see the 'genieclust' package (Gagolewski, 2021 <DOI:10.1016/j.softx.2021.100722>).
Maintained by Marek Gagolewski. Last updated 3 years ago.
clustercluster-analysisclusteringdata-analysisdata-miningdata-sciencedatasciencegeniehierarchical-clustering-algorithmmachine-learningmachine-learning-algorithmsoutlierscppopenmp
22 stars 4.55 score 16 scriptskvasilopoulos
transx:Transform Univariate Time Series
Univariate time series operations that follow an opinionated design. The main principle of 'transx' is to keep the number of observations the same. Operations that reduce this number have to fill the observations gap.
Maintained by Kostas Vasilopoulos. Last updated 4 years ago.
detrendfiltersoutlierstime-seriestransx
3 stars 4.29 score 13 scriptszcebeci
odetector:Outlier Detection Using Partitioning Clustering Algorithms
An object is called "outlier" if it remarkably deviates from the other objects in a data set. Outlier detection is the process to find outliers by using the methods that are based on distance measures, clustering and spatial methods (Ben-Gal, 2005 <ISBN 0-387-24435-2>). It is one of the intensively studied research topics for identification of novelties, frauds, anomalies, deviations or exceptions in addition to its use for outlier removing in data processing. This package provides the implementations of some novel approaches to detect the outliers based on typicality degrees that are obtained with the soft partitioning clustering algorithms such as Fuzzy C-means and its variants.
Maintained by Zeynel Cebeci. Last updated 2 years ago.
anomaly-detectioncluster-analysisclusteringclustering-methodsdatadatapreparationdatapreprocessingexception-handlingfcmfraud-detectionfuzzy-clusteringnovelty-detectionoutlier-detectionoutlier-removaloutlierspartitioningpcmsurprise-exploration
3.70 score 4 scripts