17,249 research outputs found

    Probabilistic Inference from Arbitrary Uncertainty using Mixtures of Factorized Generalized Gaussians

    Full text link
    This paper presents a general and efficient framework for probabilistic inference and learning from arbitrary uncertain information. It exploits the calculation properties of finite mixture models, conjugate families and factorization. Both the joint probability density of the variables and the likelihood function of the (objective or subjective) observation are approximated by a special mixture model, in such a way that any desired conditional distribution can be directly obtained without numerical integration. We have developed an extended version of the expectation maximization (EM) algorithm to estimate the parameters of mixture models from uncertain training examples (indirect observations). As a consequence, any piece of exact or uncertain information about both input and output values is consistently handled in the inference and learning stages. This ability, extremely useful in certain situations, is not found in most alternative methods. The proposed framework is formally justified from standard probabilistic principles and illustrative examples are provided in the fields of nonparametric pattern classification, nonlinear regression and pattern completion. Finally, experiments on a real application and comparative results over standard databases provide empirical evidence of the utility of the method in a wide range of applications

    Methodological and empirical challenges in modelling residential location choices

    No full text
    The modelling of residential locations is a key element in land use and transport planning. There are significant empirical and methodological challenges inherent in such modelling, however, despite recent advances both in the availability of spatial datasets and in computational and choice modelling techniques. One of the most important of these challenges concerns spatial aggregation. The housing market is characterised by the fact that it offers spatially and functionally heterogeneous products; as a result, if residential alternatives are represented as aggregated spatial units (as in conventional residential location models), the variability of dwelling attributes is lost, which may limit the predictive ability and policy sensitivity of the model. This thesis presents a modelling framework for residential location choice that addresses three key challenges: (i) the development of models at the dwelling-unit level, (ii) the treatment of spatial structure effects in such dwelling-unit level models, and (iii) problems associated with estimation in such modelling frameworks in the absence of disaggregated dwelling unit supply data. The proposed framework is applied to the residential location choice context in London. Another important challenge in the modelling of residential locations is the choice set formation problem. Most models of residential location choices have been developed based on the assumption that households consider all available alternatives when they are making location choices. Due the high search costs associated with the housing market, however, and the limited capacity of households to process information, the validity of this assumption has been an on-going debate among researchers. There have been some attempts in the literature to incorporate the cognitive capacities of households within discrete choice models of residential location: for instance, by modelling households’ choice sets exogenously based on simplifying assumptions regarding their spatial search behaviour (e.g., an anchor-based search strategy) and their characteristics. By undertaking an empirical comparison of alternative models within the context of residential location choice in the Greater London area this thesis investigates the feasibility and practicality of applying deterministic choice set formation approaches to capture the underlying search process of households. The thesis also investigates the uncertainty of choice sets in residential location choice modelling and proposes a simplified probabilistic choice set formation approach to model choice sets and choices simultaneously. The dwelling-level modelling framework proposed in this research is practice-ready and can be used to estimate residential location choice models at the level of dwelling units without requiring independent and disaggregated dwelling supply data. The empirical comparison of alternative exogenous choice set formation approaches provides a guideline for modellers and land use planners to avoid inappropriate choice set formation approaches in practice. Finally, the proposed simplified choice set formation model can be applied to model the behaviour of households in online real estate environments.Open Acces

    A survey of statistical network models

    Full text link
    Networks are ubiquitous in science and have become a focal point for discussion in everyday life. Formal statistical models for the analysis of network data have emerged as a major topic of interest in diverse areas of study, and most of these involve a form of graphical representation. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active network community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning network literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online networking communities such as Facebook, MySpace, and LinkedIn, and a host of more specialized professional network communities has intensified interest in the study of networks and network data. Our goal in this review is to provide the reader with an entry point to this burgeoning literature. We begin with an overview of the historical development of statistical network modeling and then we introduce a number of examples that have been studied in the network literature. Our subsequent discussion focuses on a number of prominent static and dynamic network models and their interconnections. We emphasize formal model descriptions, and pay special attention to the interpretation of parameters and their estimation. We end with a description of some open problems and challenges for machine learning and statistics.Comment: 96 pages, 14 figures, 333 reference

    Inferring Anomalies from Data using Bayesian Networks

    Get PDF
    Existing studies on data mining has largely focused on the design of measures and algorithms to identify outliers in large and high dimensional categorical and numeric databases. However, not much stress has been given on the interestingness of the reported outlier. One way to ascertain interestingness and usefulness of the reported outlier is by making use of domain knowledge. In this thesis, we present measures to discover outliers based on background knowledge, represented by a Bayesian network. Using causal relationships between attributes encoded in the Bayesian framework, we demonstrate that meaningful outliers, i.e., outliers which encode important or new information are those which violate causal relationships encoded in the model. Depending upon nature of data, several approaches are proposed to identify and explain anomalies using Bayesian knowledge. Outliers are often identified as data points which are ``rare'', ''isolated'', or ''far away from their nearest neighbors''. We show that these characteristics may not be an accurate way of describing interesting outliers. Through a critical analysis on several existing outlier detection techniques, we show why there is a mismatch between outliers as entities described by these characteristics and ``real'' outliers as identified using Bayesian approach. We show that the Bayesian approaches presented in this thesis has better accuracy in mining genuine outliers while, keeping a low false positive rate as compared to traditional outlier detection techniques
    corecore