14,949 research outputs found

    AMICO galaxy clusters in KiDS-DR3: sample properties and selection function

    Full text link
    We present the first catalogue of galaxy cluster candidates derived from the third data release of the Kilo Degree Survey (KiDS-DR3). The sample of clusters has been produced using the Adaptive Matched Identifier of Clustered Objects (AMICO) algorithm. In this analysis AMICO takes advantage of the luminosity and spatial distribution of galaxies only, not considering colours. In this way, we prevent any selection effect related to the presence or absence of the red-sequence in the clusters. The catalogue contains 7988 candidate galaxy clusters in the redshift range 0.13.5 with a purity approaching 95% over the entire redshift range. In addition to the catalogue of galaxy clusters we also provide a catalogue of galaxies with their probabilistic association to galaxy clusters. We quantify the sample purity, completeness and the uncertainties of the detection properties, such as richness, redshift, and position, by means of mock galaxy catalogues derived directly from the data. This preserves their statistical properties including photo-z uncertainties, unknown absorption across the survey, missing data, spatial correlation of galaxies and galaxy clusters. Being based on the real data, such mock catalogues do not have to rely on the assumptions on which numerical simulations and semi-analytic models are based on. This paper is the first of a series of papers in which we discuss the details and physical properties of the sample presented in this work.Comment: 16 pages, 14 figures, 3 tables, submitted to MNRA

    Autonomous Cleaning of Corrupted Scanned Documents - A Generative Modeling Approach

    Full text link
    We study the task of cleaning scanned text documents that are strongly corrupted by dirt such as manual line strokes, spilled ink etc. We aim at autonomously removing dirt from a single letter-size page based only on the information the page contains. Our approach, therefore, has to learn character representations without supervision and requires a mechanism to distinguish learned representations from irregular patterns. To learn character representations, we use a probabilistic generative model parameterizing pattern features, feature variances, the features' planar arrangements, and pattern frequencies. The latent variables of the model describe pattern class, pattern position, and the presence or absence of individual pattern features. The model parameters are optimized using a novel variational EM approximation. After learning, the parameters represent, independently of their absolute position, planar feature arrangements and their variances. A quality measure defined based on the learned representation then allows for an autonomous discrimination between regular character patterns and the irregular patterns making up the dirt. The irregular patterns can thus be removed to clean the document. For a full Latin alphabet we found that a single page does not contain sufficiently many character examples. However, even if heavily corrupted by dirt, we show that a page containing a lower number of character types can efficiently and autonomously be cleaned solely based on the structural regularity of the characters it contains. In different examples using characters from different alphabets, we demonstrate generality of the approach and discuss its implications for future developments.Comment: oral presentation and Google Student Travel Award; IEEE conference on Computer Vision and Pattern Recognition 201

    Outlier detection techniques for wireless sensor networks: A survey

    Get PDF
    In the field of wireless sensor networks, those measurements that significantly deviate from the normal pattern of sensed data are considered as outliers. The potential sources of outliers include noise and errors, events, and malicious attacks on the network. Traditional outlier detection techniques are not directly applicable to wireless sensor networks due to the nature of sensor data and specific requirements and limitations of the wireless sensor networks. This survey provides a comprehensive overview of existing outlier detection techniques specifically developed for the wireless sensor networks. Additionally, it presents a technique-based taxonomy and a comparative table to be used as a guideline to select a technique suitable for the application at hand based on characteristics such as data type, outlier type, outlier identity, and outlier degree

    Beyond Volume: The Impact of Complex Healthcare Data on the Machine Learning Pipeline

    Full text link
    From medical charts to national census, healthcare has traditionally operated under a paper-based paradigm. However, the past decade has marked a long and arduous transformation bringing healthcare into the digital age. Ranging from electronic health records, to digitized imaging and laboratory reports, to public health datasets, today, healthcare now generates an incredible amount of digital information. Such a wealth of data presents an exciting opportunity for integrated machine learning solutions to address problems across multiple facets of healthcare practice and administration. Unfortunately, the ability to derive accurate and informative insights requires more than the ability to execute machine learning models. Rather, a deeper understanding of the data on which the models are run is imperative for their success. While a significant effort has been undertaken to develop models able to process the volume of data obtained during the analysis of millions of digitalized patient records, it is important to remember that volume represents only one aspect of the data. In fact, drawing on data from an increasingly diverse set of sources, healthcare data presents an incredibly complex set of attributes that must be accounted for throughout the machine learning pipeline. This chapter focuses on highlighting such challenges, and is broken down into three distinct components, each representing a phase of the pipeline. We begin with attributes of the data accounted for during preprocessing, then move to considerations during model building, and end with challenges to the interpretation of model output. For each component, we present a discussion around data as it relates to the healthcare domain and offer insight into the challenges each may impose on the efficiency of machine learning techniques.Comment: Healthcare Informatics, Machine Learning, Knowledge Discovery: 20 Pages, 1 Figur

    Implementing Snow Load Monitoring to Control Reliability of a Stadium Roof

    Get PDF
    This contribution shows how monitoring can be used to control reliability of a structure not complying with the requirements of Eurocodes. A general methodology to obtain cost-optimal decisions using limit state design, probabilistic reliability analysis and cost estimates is utilised in a full-scale case study dealing with the roof of a stadium located in Northern Italy. The results demonstrate the potential of monitoring systems and probabilistic reliability analysis to support decisions regarding safety measures such as snow removal, or temporary closure of the stadium

    Outlier Detection Techniques For Wireless Sensor Networks: A Survey

    Get PDF
    In the field of wireless sensor networks, measurements that significantly deviate from the normal pattern of sensed data are considered as outliers. The potential sources of outliers include noise and errors, events, and malicious attacks on the network. Traditional outlier detection techniques are not directly applicable to wireless sensor networks due to the multivariate nature of sensor data and specific requirements and limitations of the wireless sensor networks. This survey provides a comprehensive overview of existing outlier detection techniques specifically developed for the wireless sensor networks. Additionally, it presents a technique-based taxonomy and a decision tree to be used as a guideline to select a technique suitable for the application at hand based on characteristics such as data type, outlier type, outlier degree
    corecore