8 research outputs found
molic:An R package for multivariate outlier detection in contingency tables
Outlier detection is an important task in statistical analyses. An outlier is a case-specific unit since it may be interpreted as natural extreme noise in some applications, whereas in other applications it may be the most interesting observation. The molic package has been written to facilitate the novel outlier detection method (Lindskou, Eriksen & Tvedebrink, 2019) in high-dimensional contingency tables In other words, the method works for data sets in which all variables are categorical, implying that they can only take on a finite set of values (also called levels)
sparta: Sparse Tables and their Algebra with a View Towards High Dimensional Graphical Models
A graphical model is a multivariate (potentially very high dimensional)
probabilistic model, which is formed by combining lower dimensional components.
Inference (computation of conditional probabilities) is based on message
passing algorithms that utilize conditional independence structures. In
graphical models for discrete variables with finite state spaces, there is a
fundamental problem in high dimensions: A discrete distribution is represented
by a table of values, and in high dimensions such tables can become
prohibitively large. In inference, such tables must be multiplied which can
lead to even larger tables. The sparta package meets this challenge by
implementing methods that efficiently handles multiplication and
marginalization of sparse tables. The package was written in the R programming
language and is freely available from the Comprehensive R Archive Network
(CRAN). The companion package jti, also on CRAN, was developed to showcase the
potential of sparta in connection to the Junction Tree Algorithm. We show, that
jti is able to handle highly complex graphical models which are otherwise
infeasible due to lack of computer memory, using sparta as a backend for table
operations
Detecting Outliers in High-dimensional Data with Mixed Variable Types using Conditional Gaussian Regression Models
jti and sparta: Time and Space Efficient Packages For Model Based Prediction in Large Bayesian Networks
A Bayesian network is a multivariate (potentially very high dimensional) probabilistic model formed by combining lower-dimensional components. In Bayesian networks, the computation of conditional probabilities is fundamental for model-based predictions. This is usually done based on message passing algorithms that utilize conditional independence structures. In this paper, we deal with a specific message passing algorithm that exploits a second structure called a junction tree and hence is known as the junction tree algorithm (JTA). In Bayesian networks for discrete variables with finite state spaces, there is afundamental problem in high dimensions: A discrete distribution is represented by a table of values, and in high dimensions, such tables can become prohibitively large. In JTA, such tables must be multiplied which can lead to even larger tables. The jti package meets this challenge by using the package sparta by implementing methods that efficiently handle multiplication and marginalization of sparse tables through JTA. The two packages are written in the R programming language and are freely available from the ComprehensiveR Archive Network