3,587 research outputs found
Application of an efficient Bayesian discretization method to biomedical data
Background\ud
Several data mining methods require data that are discrete, and other methods often perform better with discrete data. We introduce an efficient Bayesian discretization (EBD) method for optimal discretization of variables that runs efficiently on high-dimensional biomedical datasets. The EBD method consists of two components, namely, a Bayesian score to evaluate discretizations and a dynamic programming search procedure to efficiently search the space of possible discretizations. We compared the performance of EBD to Fayyad and Irani's (FI) discretization method, which is commonly used for discretization.\ud
\ud
Results\ud
On 24 biomedical datasets obtained from high-throughput transcriptomic and proteomic studies, the classification performances of the C4.5 classifier and the naĂŻve Bayes classifier were statistically significantly better when the predictor variables were discretized using EBD over FI. EBD was statistically significantly more stable to the variability of the datasets than FI. However, EBD was less robust, though not statistically significantly so, than FI and produced slightly more complex discretizations than FI.\ud
\ud
Conclusions\ud
On a range of biomedical datasets, a Bayesian discretization method (EBD) yielded better classification performance and stability but was less robust than the widely used FI discretization method. The EBD discretization method is easy to implement, permits the incorporation of prior knowledge and belief, and is sufficiently fast for application to high-dimensional data
A traffic classification method using machine learning algorithm
Applying concepts of attack investigation in IT industry, this idea has been developed to design
a Traffic Classification Method using Data Mining techniques at the intersection of Machine
Learning Algorithm, Which will classify the normal and malicious traffic. This classification will
help to learn about the unknown attacks faced by IT industry. The notion of traffic classification
is not a new concept; plenty of work has been done to classify the network traffic for
heterogeneous application nowadays. Existing techniques such as (payload based, port based
and statistical based) have their own pros and cons which will be discussed in this
literature later, but classification using Machine Learning techniques is still an open field to explore and has provided very promising results up till now
A Partially Reflecting Random Walk on Spheres Algorithm for Electrical Impedance Tomography
In this work, we develop a probabilistic estimator for the voltage-to-current
map arising in electrical impedance tomography. This novel so-called partially
reflecting random walk on spheres estimator enables Monte Carlo methods to
compute the voltage-to-current map in an embarrassingly parallel manner, which
is an important issue with regard to the corresponding inverse problem. Our
method uses the well-known random walk on spheres algorithm inside subdomains
where the diffusion coefficient is constant and employs replacement techniques
motivated by finite difference discretization to deal with both mixed boundary
conditions and interface transmission conditions. We analyze the global bias
and the variance of the new estimator both theoretically and experimentally. In
a second step, the variance is considerably reduced via a novel control variate
conditional sampling technique
A Geometric Approach to Pairwise Bayesian Alignment of Functional Data Using Importance Sampling
We present a Bayesian model for pairwise nonlinear registration of functional
data. We use the Riemannian geometry of the space of warping functions to
define appropriate prior distributions and sample from the posterior using
importance sampling. A simple square-root transformation is used to simplify
the geometry of the space of warping functions, which allows for computation of
sample statistics, such as the mean and median, and a fast implementation of a
-means clustering algorithm. These tools allow for efficient posterior
inference, where multiple modes of the posterior distribution corresponding to
multiple plausible alignments of the given functions are found. We also show
pointwise credible intervals to assess the uncertainty of the alignment
in different clusters. We validate this model using simulations and present
multiple examples on real data from different application domains including
biometrics and medicine
A TV-Gaussian prior for infinite-dimensional Bayesian inverse problems and its numerical implementations
Many scientific and engineering problems require to perform Bayesian
inferences in function spaces, in which the unknowns are of infinite dimension.
In such problems, choosing an appropriate prior distribution is an important
task. In particular we consider problems where the function to infer is subject
to sharp jumps which render the commonly used Gaussian measures unsuitable. On
the other hand, the so-called total variation (TV) prior can only be defined in
a finite dimensional setting, and does not lead to a well-defined posterior
measure in function spaces. In this work we present a TV-Gaussian (TG) prior to
address such problems, where the TV term is used to detect sharp jumps of the
function, and the Gaussian distribution is used as a reference measure so that
it results in a well-defined posterior measure in the function space. We also
present an efficient Markov Chain Monte Carlo (MCMC) algorithm to draw samples
from the posterior distribution of the TG prior. With numerical examples we
demonstrate the performance of the TG prior and the efficiency of the proposed
MCMC algorithm
Tensor Computation: A New Framework for High-Dimensional Problems in EDA
Many critical EDA problems suffer from the curse of dimensionality, i.e. the
very fast-scaling computational burden produced by large number of parameters
and/or unknown variables. This phenomenon may be caused by multiple spatial or
temporal factors (e.g. 3-D field solvers discretizations and multi-rate circuit
simulation), nonlinearity of devices and circuits, large number of design or
optimization parameters (e.g. full-chip routing/placement and circuit sizing),
or extensive process variations (e.g. variability/reliability analysis and
design for manufacturability). The computational challenges generated by such
high dimensional problems are generally hard to handle efficiently with
traditional EDA core algorithms that are based on matrix and vector
computation. This paper presents "tensor computation" as an alternative general
framework for the development of efficient EDA algorithms and tools. A tensor
is a high-dimensional generalization of a matrix and a vector, and is a natural
choice for both storing and solving efficiently high-dimensional EDA problems.
This paper gives a basic tutorial on tensors, demonstrates some recent examples
of EDA applications (e.g., nonlinear circuit modeling and high-dimensional
uncertainty quantification), and suggests further open EDA problems where the
use of tensor computation could be of advantage.Comment: 14 figures. Accepted by IEEE Trans. CAD of Integrated Circuits and
System
- âŠ