149,012 research outputs found

    Multivariate emulation of computer simulators: model selection and diagnostics with application to a humanitarian relief model

    Get PDF
    We present a common framework for Bayesian emulation methodologies for multivariate-output simulators, or computer models, that employ either parametric linear models or nonparametric Gaussian processes. Novel diagnostics suitable for multivariate covariance-separable emulators are developed and techniques to improve the adequacy of an emulator are discussed and implemented. A variety of emulators are compared for a humanitarian relief simulator, modelling aid missions to Sicily after a volcanic eruption and earthquake, and a sensitivity analysis is conducted to determine the sensitivity of the simulator output to changes in the input variables. The results from parametric and nonparametric emulators are compared in terms of prediction accuracy, uncertainty quantification and scientific interpretability

    Effect of Suburban Transit Oriented Developments on Residential Property Values, MTI Report 08-07

    Get PDF
    The development of successful TODs often encounters several barriers. These barriers include: a lack of inter-jurisdictional cooperation, auto-oriented design that favors park and ride lot over ridership generating uses, and community opposition. The community opposition may be more vocal in suburban areas where residents of predominately single-family neighborhoods may feel that the proposed high-density, mixed-use TOD will bring noise, air pollution, increased congestion and crime into their area. Community opposition has been instrumental in stopping many TOD projects in the San Francisco Bay Area. While community opposition to TODs has been pronounced, very little empirical research exists that indicates whether this opposition is well-founded. Economic theory suggests that if a TOD has a negative effect on the surrounding residential neighborhoods, then that effect should lower land prices and in turn, the housing prices in these neighborhoods. Similarly, an increase in the housing prices would mean a positive effect of TODs on the surrounding neighborhoods. This study empirically estimates the impact of four San Francisco Bay Area sub-urban TODs on single-family home sale prices. The study finds that the case study suburban TODs either had no impact or had a positive impact on the surrounding single-family home sale prices

    The very same thing: Extending the object token concept to incorporate causal constraints on individual identity

    Get PDF
    The contributions of feature recognition, object categorization, and recollection of episodic memories to the re-identification of a perceived object as the very same thing encountered in a previous perceptual episode are well understood in terms of both cognitive-behavioral phenomenology and neurofunctional implementation. Human beings do not, however, rely solely on features and context to re-identify individuals; in the presence of featural change and similarly-featured distractors, people routinely employ causal constraints to establish object identities. Based on available cognitive and neurofunctional data, the standard object-token based model of individual re-identification is extended to incorporate the construction of unobserved and hence fictive causal histories (FCHs) of observed objects by the pre-motor action planning system. Cognitive-behavioral and implementation-level predictions of this extended model and methods for testing them are outlined. It is suggested that functional deficits in the construction of FCHs are associated with clinical outcomes in both Autism Spectrum Disorders and later-stage stage Alzheimer's disease.\u

    PRESISTANT: Learning based assistant for data pre-processing

    Get PDF
    Data pre-processing is one of the most time consuming and relevant steps in a data analysis process (e.g., classification task). A given data pre-processing operator (e.g., transformation) can have positive, negative or zero impact on the final result of the analysis. Expert users have the required knowledge to find the right pre-processing operators. However, when it comes to non-experts, they are overwhelmed by the amount of pre-processing operators and it is challenging for them to find operators that would positively impact their analysis (e.g., increase the predictive accuracy of a classifier). Existing solutions either assume that users have expert knowledge, or they recommend pre-processing operators that are only "syntactically" applicable to a dataset, without taking into account their impact on the final analysis. In this work, we aim at providing assistance to non-expert users by recommending data pre-processing operators that are ranked according to their impact on the final analysis. We developed a tool PRESISTANT, that uses Random Forests to learn the impact of pre-processing operators on the performance (e.g., predictive accuracy) of 5 different classification algorithms, such as J48, Naive Bayes, PART, Logistic Regression, and Nearest Neighbor. Extensive evaluations on the recommendations provided by our tool, show that PRESISTANT can effectively help non-experts in order to achieve improved results in their analytical tasks

    Predicting and simulating future land use pattern : a case study of Seremban district

    Get PDF
    As long as rapid urbanization which is a result of natural population growth and rural urban migration due to push and pull factors of social and economic conditions as well as the moving of urban populations from major city centres to urban fringe areas due to changing lifestyle which emphasized on spacious and more comfortable and environmentally friendly living environment continue to happen; towns and cities will continue to grow and expand to accommodate the growing and complex demand of the people. Experiences have shown that rapid and uncontrolled expansion of towns and cities has led to amongst others the deterioration in the quality of urban environment and sprawling of urban development onto prime agricultural and forest areas as well as cities starting to lose their identity. In order to avoid such phenomena continuing to happen, particularly in the Kuala Lumpur Conurbation Area, towns and cities need to be properly planned and managed so that their growth or expansion can be controlled and managed in a sustainable manner. One of the strategies adopted to curb sprawling development is through the delineation of urban growth or development limits (UGL). This means that the limit of towns and cities need to be studied and identified, so that urban development can be directed to areas that are identified and specified suitable for such development. One of the main tasks in the process of delineating UGL has been included as an important task in the preparation of development plans. With such policy a research study is now being carried out to develop a spatial modelling framework towards delineating UGL through the application and integration of spatial technologies and this will be a basis or framework for land use planners, managers and policy makers to formulate urban land use policies and monitor urban land use development. One of the main analysis involve in the process of performing this task is to understand past urban land development trend and to predict and identify future urban growth areas of the selected study area. This paper highlights the integration of statistical modeling technique via binary logistic regression analysis with GIS technology in understanding and predicting urban growth pattern and area as applied to District of Seremban, Negeri Sembilan. The result shows that urban land use pattern in the study area within the study period are significantly related to more than half of the predictors used in the analysis

    Scalable Population Synthesis with Deep Generative Modeling

    Full text link
    Population synthesis is concerned with the generation of synthetic yet realistic representations of populations. It is a fundamental problem in the modeling of transport where the synthetic populations of micro-agents represent a key input to most agent-based models. In this paper, a new methodological framework for how to 'grow' pools of micro-agents is presented. The model framework adopts a deep generative modeling approach from machine learning based on a Variational Autoencoder (VAE). Compared to the previous population synthesis approaches, including Iterative Proportional Fitting (IPF), Gibbs sampling and traditional generative models such as Bayesian Networks or Hidden Markov Models, the proposed method allows fitting the full joint distribution for high dimensions. The proposed methodology is compared with a conventional Gibbs sampler and a Bayesian Network by using a large-scale Danish trip diary. It is shown that, while these two methods outperform the VAE in the low-dimensional case, they both suffer from scalability issues when the number of modeled attributes increases. It is also shown that the Gibbs sampler essentially replicates the agents from the original sample when the required conditional distributions are estimated as frequency tables. In contrast, the VAE allows addressing the problem of sampling zeros by generating agents that are virtually different from those in the original data but have similar statistical properties. The presented approach can support agent-based modeling at all levels by enabling richer synthetic populations with smaller zones and more detailed individual characteristics.Comment: 27 pages, 15 figures, 4 table

    Efficient Benchmarking of Algorithm Configuration Procedures via Model-Based Surrogates

    Get PDF
    The optimization of algorithm (hyper-)parameters is crucial for achieving peak performance across a wide range of domains, ranging from deep neural networks to solvers for hard combinatorial problems. The resulting algorithm configuration (AC) problem has attracted much attention from the machine learning community. However, the proper evaluation of new AC procedures is hindered by two key hurdles. First, AC benchmarks are hard to set up. Second and even more significantly, they are computationally expensive: a single run of an AC procedure involves many costly runs of the target algorithm whose performance is to be optimized in a given AC benchmark scenario. One common workaround is to optimize cheap-to-evaluate artificial benchmark functions (e.g., Branin) instead of actual algorithms; however, these have different properties than realistic AC problems. Here, we propose an alternative benchmarking approach that is similarly cheap to evaluate but much closer to the original AC problem: replacing expensive benchmarks by surrogate benchmarks constructed from AC benchmarks. These surrogate benchmarks approximate the response surface corresponding to true target algorithm performance using a regression model, and the original and surrogate benchmark share the same (hyper-)parameter space. In our experiments, we construct and evaluate surrogate benchmarks for hyperparameter optimization as well as for AC problems that involve performance optimization of solvers for hard combinatorial problems, drawing training data from the runs of existing AC procedures. We show that our surrogate benchmarks capture overall important characteristics of the AC scenarios, such as high- and low-performing regions, from which they were derived, while being much easier to use and orders of magnitude cheaper to evaluate
    corecore