1,913 research outputs found
A Survey of Bayesian Statistical Approaches for Big Data
The modern era is characterised as an era of information or Big Data. This
has motivated a huge literature on new methods for extracting information and
insights from these data. A natural question is how these approaches differ
from those that were available prior to the advent of Big Data. We present a
review of published studies that present Bayesian statistical approaches
specifically for Big Data and discuss the reported and perceived benefits of
these approaches. We conclude by addressing the question of whether focusing
only on improving computational algorithms and infrastructure will be enough to
face the challenges of Big Data
Bayesian Methods in Tensor Analysis
Tensors, also known as multidimensional arrays, are useful data structures in
machine learning and statistics. In recent years, Bayesian methods have emerged
as a popular direction for analyzing tensor-valued data since they provide a
convenient way to introduce sparsity into the model and conduct uncertainty
quantification. In this article, we provide an overview of frequentist and
Bayesian methods for solving tensor completion and regression problems, with a
focus on Bayesian methods. We review common Bayesian tensor approaches
including model formulation, prior assignment, posterior computation, and
theoretical properties. We also discuss potential future directions in this
field.Comment: 32 pages, 8 figures, 2 table
Representation Learning: A Review and New Perspectives
The success of machine learning algorithms generally depends on data
representation, and we hypothesize that this is because different
representations can entangle and hide more or less the different explanatory
factors of variation behind the data. Although specific domain knowledge can be
used to help design representations, learning with generic priors can also be
used, and the quest for AI is motivating the design of more powerful
representation-learning algorithms implementing such priors. This paper reviews
recent work in the area of unsupervised feature learning and deep learning,
covering advances in probabilistic models, auto-encoders, manifold learning,
and deep networks. This motivates longer-term unanswered questions about the
appropriate objectives for learning good representations, for computing
representations (i.e., inference), and the geometrical connections between
representation learning, density estimation and manifold learning
Representation learning for uncertainty-aware clinical decision support
Over the last decade, there has been an increasing trend towards digitalization in healthcare, where a growing amount of patient data is collected and stored electronically. These recorded data are known as electronic health records. They are the basis for state-of-the-art research on clinical decision support so that better patient care can be delivered with the help of advanced analytical techniques like machine learning. Among various technical fields in machine learning, representation learning is about learning good representations from raw data to extract useful information for downstream prediction tasks. Deep learning, a crucial class of methods in representation learning, has achieved great success in many fields such as computer vision and natural language processing. These technical breakthroughs would presumably further advance the research and development of data analytics in healthcare. This thesis addresses clinically relevant research questions by developing algorithms based on state-of-the-art representation learning techniques. When a patient visits the hospital, a physician will suggest a treatment in a deterministic manner. Meanwhile, uncertainty comes into play when the past statistics of treatment decisions from various physicians are analyzed, as they would possibly suggest different treatments, depending on their training and experiences. The uncertainty in clinical decision-making processes is the focus of this thesis. The models developed for supporting these processes will therefore have a probabilistic nature. More specifically, the predictions are predictive distributions in regression tasks and probability distributions over, e.g., different treatment decisions, in classification tasks. The first part of the thesis is concerned with prescriptive analytics to provide treatment recommendations. Apart from patient information and treatment decisions, the outcome after the respective treatment is included in learning treatment suggestions. The problem setting is known as learning individualized treatment rules and is formulated as a contextual bandit problem. A general framework for learning individualized treatment rules using data from observational studies is presented based on state-of-the-art representation learning techniques. From various offline evaluation methods, it is shown that the treatment policy in our proposed framework can demonstrate better performance than both physicians and competitive baselines. Subsequently, the uncertainty-aware regression models in diagnostic and predictive analytics are studied. Uncertainty-aware deep kernel learning models are proposed, which allow the estimation of the predictive uncertainty by a pipeline of neural networks and a sparse Gaussian process. By considering the input data structure, respective models are developed for diagnostic medical image data and sequential electronic health records. Various pre-training methods from representation learning are adapted to investigate their impacts on the proposed models. Through extensive experiments, it is shown that the proposed models delivered better performance than common architectures in most cases. More importantly, uncertainty-awareness of the proposed models is illustrated by systematically expressing higher confidence in more accurate predictions and less confidence in less accurate ones. The last part of the thesis is about missing data imputation in descriptive analytics, which provides essential evidence for subsequent decision-making processes. Rather than traditional mean and median imputation, a more advanced solution based on generative adversarial networks is proposed. The presented method takes the categorical nature of patient features into consideration, which enables the stabilization of the adversarial training. It is shown that the proposed method can better improve the predictive accuracy compared to traditional imputation baselines
- …