284 research outputs found

    Detecting Events and Patterns in Large-Scale User Generated Textual Streams with Statistical Learning Methods

    Full text link
    A vast amount of textual web streams is influenced by events or phenomena emerging in the real world. The social web forms an excellent modern paradigm, where unstructured user generated content is published on a regular basis and in most occasions is freely distributed. The present Ph.D. Thesis deals with the problem of inferring information - or patterns in general - about events emerging in real life based on the contents of this textual stream. We show that it is possible to extract valuable information about social phenomena, such as an epidemic or even rainfall rates, by automatic analysis of the content published in Social Media, and in particular Twitter, using Statistical Machine Learning methods. An important intermediate task regards the formation and identification of features which characterise a target event; we select and use those textual features in several linear, non-linear and hybrid inference approaches achieving a significantly good performance in terms of the applied loss function. By examining further this rich data set, we also propose methods for extracting various types of mood signals revealing how affective norms - at least within the social web's population - evolve during the day and how significant events emerging in the real world are influencing them. Lastly, we present some preliminary findings showing several spatiotemporal characteristics of this textual information as well as the potential of using it to tackle tasks such as the prediction of voting intentions.Comment: PhD thesis, 238 pages, 9 chapters, 2 appendices, 58 figures, 49 table

    Systematic Biases in Weak Lensing Cosmology with the Dark Energy Survey

    Get PDF
    PhD thesis submitted to the University of Manchester, School of Physics and Astronomy, August 2017. Abstract: This thesis presents a practical guide to applying shear measurements as a cosmological tool. We first present one of two science-ready galaxy shape catalogues from Year 1 of the Dark Energy Survey (DES Y1), which covers 1500 square degrees in four bands griz, with a median redshift of 0.59. We describe the shape measurement process implemented by the DES Y1 im3shape catalogue, which contains 21.9M high-quality r-band bulge/disc fits. In Chapter 3 a new suite of image simulations, referred to as hoopoe, are presented. The hoopoe dataset is tailored to DES Y1 and includes realistic blending, spatial masks and variation in the point spread function. We derive shear corrections, which we show are robust to changes in calibration method, galaxy binning and variance within the simulated dataset. Sources of systematic uncertainty in the simulation-based shear calibration are discussed, leading to a final estimate of the 1 sigma uncertainties in the residual multiplicative bias after calibration of 0.025. Chapter 4 describes an extension of the analysis on the hoopoe simulations into a detailed investigation of the impact of galaxy neighbours on shape measurement and shear cosmology. Four mechanisms by which neighbours can have a non-negligible influence on shear measurement are identified. These effects, if ignored, would contribute a net multiplicative bias of m ~ 0.03 - 0.09 in DES Y1, though the precise impact will depend on both the measurement code and the selection cuts applied. We use the cosmological inference pipeline of DES Y1 to explore the cosmological implications of neighbour bias and show that omitting blending from the calibration simulation for DES Y1 would bias the inferred clustering amplitude S8 = sigma_8 (Omega_m /0.3)^0.5 by 1.5 sigma towards low values. Finally, we use the hoopoe simulations to test the effect of neighbour-induced spatial correlations in the multiplicative bias. We find the cosmological impact to be subdominant to statistical error at the current level of precision. Another major uncertainty in shear cosmology is the accuracy of our ensemble redshift distributions. Chapter 5 presents a numerical investigation into the combined constraining power of cosmic shear, galaxy clustering and their cross-correlation in DES Y1, and the potential for internal calibration of redshift errors. Introducing a moderate uniform bias into the redshift distributions used to model the weak lensing (WL) galaxies is shown to produce a > 2 sigma bias in S8. We demonstrate that this cosmological bias can be eliminated by marginalising over redshift error nuisance parameters. Strikingly, the cosmological constraint of the combined dataset is largely undiminished by the loss of prior information on the WL distributions. We demonstrate that this implicit self-calibration is the result of complementary degeneracy directions in the combined data. In Chapter 6 we present the preliminary results of an investigation into galaxy intrinsic alignments. Using the DES Y1 data, we show a clear dependence in alignment amplitude on galaxy type, in agreement with previous results. We subject these findings to a series of initial robustness tests. We conclude with a short overview of the work presented, and discuss prospects for the future

    Efficient Decision Support Systems

    Get PDF
    This series is directed to diverse managerial professionals who are leading the transformation of individual domains by using expert information and domain knowledge to drive decision support systems (DSSs). The series offers a broad range of subjects addressed in specific areas such as health care, business management, banking, agriculture, environmental improvement, natural resource and spatial management, aviation administration, and hybrid applications of information technology aimed to interdisciplinary issues. This book series is composed of three volumes: Volume 1 consists of general concepts and methodology of DSSs; Volume 2 consists of applications of DSSs in the biomedical domain; Volume 3 consists of hybrid applications of DSSs in multidisciplinary domains. The book is shaped decision support strategies in the new infrastructure that assists the readers in full use of the creative technology to manipulate input data and to transform information into useful decisions for decision makers

    Machine learning methods for discriminating natural targets in seabed imagery

    Get PDF
    The research in this thesis concerns feature-based machine learning processes and methods for discriminating qualitative natural targets in seabed imagery. The applications considered, typically involve time-consuming manual processing stages in an industrial setting. An aim of the research is to facilitate a means of assisting human analysts by expediting the tedious interpretative tasks, using machine methods. Some novel approaches are devised and investigated for solving the application problems. These investigations are compartmentalised in four coherent case studies linked by common underlying technical themes and methods. The first study addresses pockmark discrimination in a digital bathymetry model. Manual identification and mapping of even a relatively small number of these landform objects is an expensive process. A novel, supervised machine learning approach to automating the task is presented. The process maps the boundaries of ≈ 2000 pockmarks in seconds - a task that would take days for a human analyst to complete. The second case study investigates different feature creation methods for automatically discriminating sidescan sonar image textures characteristic of Sabellaria spinulosa colonisation. Results from a comparison of several textural feature creation methods on sonar waterfall imagery show that Gabor filter banks yield some of the best results. A further empirical investigation into the filter bank features created on sonar mosaic imagery leads to the identification of a useful configuration and filter parameter ranges for discriminating the target textures in the imagery. Feature saliency estimation is a vital stage in the machine process. Case study three concerns distance measures for the evaluation and ranking of features on sonar imagery. Two novel consensus methods for creating a more robust ranking are proposed. Experimental results show that the consensus methods can improve robustness over a range of feature parameterisations and various seabed texture classification tasks. The final case study is more qualitative in nature and brings together a number of ideas, applied to the classification of target regions in real-world sonar mosaic imagery. A number of technical challenges arose and these were surmounted by devising a novel, hybrid unsupervised method. This fully automated machine approach was compared with a supervised approach in an application to the problem of image-based sediment type discrimination. The hybrid unsupervised method produces a plausible class map in a few minutes of processing time. It is concluded that the versatile, novel process should be generalisable to the discrimination of other subjective natural targets in real-world seabed imagery, such as Sabellaria textures and pockmarks (with appropriate features and feature tuning.) Further, the full automation of pockmark and Sabellaria discrimination is feasible within this framework

    Analyzing Granger causality in climate data with time series classification methods

    Get PDF
    Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested

    Towards More Nuanced Patient Management: Decomposing Readmission Risk with Survival Models

    Get PDF
    Unplanned hospital readmissions are costly and associated with poorer patient outcomes. Overall readmission rates have also come to be used as performance metrics in reimbursement in healthcare policy, further motivating hospitals to identify and manage high-risk patients. Many models predicting readmission risk have been developed to facilitate the equitable measurement of readmission rates and to support hospital decision-makers in prioritising patients for interventions. However, these models consider the overall risk of readmission and are often restricted to a single time point. This work aims to develop the use of survival models to better support hospital decision-makers in managing readmission risk. First, semi-parametric statistical and nonparametric machine learning models are applied to adult patients admitted via the emergency department at Gold Coast University Hospital (n = 46,659) and Robina Hospital (n = 23,976) in Queensland, Australia. Overall model performance is assessed based on discrimination and calibration, as measured by time-dependent concordance and D-calibration. Second, a framework based on iterative hypothesis development and model fitting is proposed for decomposing readmission risk into persistent, patient-specific baselines and transient, care-related components using a sum of exponential hazards structure. Third, criteria for patient prioritisation based on the duration and magnitude of care-related risk components are developed. The extensibility of the framework and subsequent prioritisation criteria are considered for alternative populations, such as outpatient admissions and specific diagnosis groups, and different modelling techniques. Time-to-event models have rarely been applied for readmission modelling but can provide a rich description of the evolution of readmission risk post-discharge and support more nuanced patient management decisions than simple classification models
    • …
    corecore