17,249 research outputs found
Probabilistic Inference from Arbitrary Uncertainty using Mixtures of Factorized Generalized Gaussians
This paper presents a general and efficient framework for probabilistic
inference and learning from arbitrary uncertain information. It exploits the
calculation properties of finite mixture models, conjugate families and
factorization. Both the joint probability density of the variables and the
likelihood function of the (objective or subjective) observation are
approximated by a special mixture model, in such a way that any desired
conditional distribution can be directly obtained without numerical
integration. We have developed an extended version of the expectation
maximization (EM) algorithm to estimate the parameters of mixture models from
uncertain training examples (indirect observations). As a consequence, any
piece of exact or uncertain information about both input and output values is
consistently handled in the inference and learning stages. This ability,
extremely useful in certain situations, is not found in most alternative
methods. The proposed framework is formally justified from standard
probabilistic principles and illustrative examples are provided in the fields
of nonparametric pattern classification, nonlinear regression and pattern
completion. Finally, experiments on a real application and comparative results
over standard databases provide empirical evidence of the utility of the method
in a wide range of applications
Methodological and empirical challenges in modelling residential location choices
The modelling of residential locations is a key element in land use and transport planning. There are significant empirical and methodological challenges inherent in such modelling, however, despite recent advances both in the availability of spatial datasets and in computational and choice modelling techniques.
One of the most important of these challenges concerns spatial aggregation. The housing market is characterised by the fact that it offers spatially and functionally heterogeneous products; as a result, if residential alternatives are represented as aggregated spatial units (as in conventional residential location models), the variability of dwelling attributes is lost, which may limit the predictive ability and policy sensitivity of the model. This thesis presents a modelling framework for residential location choice that addresses three key challenges: (i) the development of models at the dwelling-unit level, (ii) the treatment of spatial structure effects in such dwelling-unit level models, and (iii) problems associated with estimation in such modelling frameworks in the absence of disaggregated dwelling unit supply data. The proposed framework is applied to the residential location choice context in London.
Another important challenge in the modelling of residential locations is the choice set formation problem. Most models of residential location choices have been developed based on the assumption that households consider all available alternatives when they are making location choices. Due the high search costs associated with the housing market, however, and the limited capacity of households to process information, the validity of this assumption has been an on-going debate among researchers. There have been some attempts in the literature to incorporate the cognitive capacities of households within discrete choice models of residential location: for instance, by modelling households’ choice sets exogenously based on simplifying assumptions regarding their spatial search behaviour (e.g., an anchor-based search strategy) and their characteristics. By undertaking an empirical comparison of alternative models within the context of residential location choice in the Greater London area this thesis investigates the feasibility and practicality of applying deterministic choice set formation approaches to capture the underlying search process of households. The thesis also investigates the uncertainty of choice sets in residential location choice modelling and proposes a simplified probabilistic choice set formation approach to model choice sets and choices simultaneously.
The dwelling-level modelling framework proposed in this research is practice-ready and can be used to estimate residential location choice models at the level of dwelling units without requiring independent and disaggregated dwelling supply data. The empirical comparison of alternative exogenous choice set formation approaches provides a guideline for modellers and land use planners to avoid inappropriate choice set formation approaches in practice. Finally, the proposed simplified choice set formation model can be applied to model the behaviour of households in online real estate environments.Open Acces
A survey of statistical network models
Networks are ubiquitous in science and have become a focal point for
discussion in everyday life. Formal statistical models for the analysis of
network data have emerged as a major topic of interest in diverse areas of
study, and most of these involve a form of graphical representation.
Probability models on graphs date back to 1959. Along with empirical studies in
social psychology and sociology from the 1960s, these early works generated an
active network community and a substantial literature in the 1970s. This effort
moved into the statistical literature in the late 1970s and 1980s, and the past
decade has seen a burgeoning network literature in statistical physics and
computer science. The growth of the World Wide Web and the emergence of online
networking communities such as Facebook, MySpace, and LinkedIn, and a host of
more specialized professional network communities has intensified interest in
the study of networks and network data. Our goal in this review is to provide
the reader with an entry point to this burgeoning literature. We begin with an
overview of the historical development of statistical network modeling and then
we introduce a number of examples that have been studied in the network
literature. Our subsequent discussion focuses on a number of prominent static
and dynamic network models and their interconnections. We emphasize formal
model descriptions, and pay special attention to the interpretation of
parameters and their estimation. We end with a description of some open
problems and challenges for machine learning and statistics.Comment: 96 pages, 14 figures, 333 reference
Inferring Anomalies from Data using Bayesian Networks
Existing studies on data mining has largely focused on the design of measures and algorithms to identify outliers in large and high dimensional categorical and numeric databases. However, not much stress has been given on the interestingness of the reported outlier. One way to ascertain interestingness and usefulness of the reported outlier is by making use of domain knowledge. In this thesis, we present measures to discover outliers based on background knowledge, represented by a Bayesian network. Using causal relationships between attributes encoded in the Bayesian framework, we demonstrate that meaningful outliers, i.e., outliers which encode important or new information are those which violate causal relationships encoded in the model. Depending upon nature of data, several approaches are proposed to identify and explain anomalies using Bayesian knowledge. Outliers are often identified as data points which are ``rare'', ''isolated'', or ''far away from their nearest neighbors''. We show that these characteristics may not be an accurate way of describing interesting outliers. Through a critical analysis on several existing outlier detection techniques, we show why there is a mismatch between outliers as entities described by these characteristics and ``real'' outliers as identified using Bayesian approach. We show that the Bayesian approaches presented in this thesis has better accuracy in mining genuine outliers while, keeping a low false positive rate as compared to traditional outlier detection techniques
- …