14,949 research outputs found
AMICO galaxy clusters in KiDS-DR3: sample properties and selection function
We present the first catalogue of galaxy cluster candidates derived from the
third data release of the Kilo Degree Survey (KiDS-DR3). The sample of clusters
has been produced using the Adaptive Matched Identifier of Clustered Objects
(AMICO) algorithm. In this analysis AMICO takes advantage of the luminosity and
spatial distribution of galaxies only, not considering colours. In this way, we
prevent any selection effect related to the presence or absence of the
red-sequence in the clusters. The catalogue contains 7988 candidate galaxy
clusters in the redshift range 0.13.5 with a purity
approaching 95% over the entire redshift range. In addition to the catalogue of
galaxy clusters we also provide a catalogue of galaxies with their
probabilistic association to galaxy clusters. We quantify the sample purity,
completeness and the uncertainties of the detection properties, such as
richness, redshift, and position, by means of mock galaxy catalogues derived
directly from the data. This preserves their statistical properties including
photo-z uncertainties, unknown absorption across the survey, missing data,
spatial correlation of galaxies and galaxy clusters. Being based on the real
data, such mock catalogues do not have to rely on the assumptions on which
numerical simulations and semi-analytic models are based on. This paper is the
first of a series of papers in which we discuss the details and physical
properties of the sample presented in this work.Comment: 16 pages, 14 figures, 3 tables, submitted to MNRA
Autonomous Cleaning of Corrupted Scanned Documents - A Generative Modeling Approach
We study the task of cleaning scanned text documents that are strongly
corrupted by dirt such as manual line strokes, spilled ink etc. We aim at
autonomously removing dirt from a single letter-size page based only on the
information the page contains. Our approach, therefore, has to learn character
representations without supervision and requires a mechanism to distinguish
learned representations from irregular patterns. To learn character
representations, we use a probabilistic generative model parameterizing pattern
features, feature variances, the features' planar arrangements, and pattern
frequencies. The latent variables of the model describe pattern class, pattern
position, and the presence or absence of individual pattern features. The model
parameters are optimized using a novel variational EM approximation. After
learning, the parameters represent, independently of their absolute position,
planar feature arrangements and their variances. A quality measure defined
based on the learned representation then allows for an autonomous
discrimination between regular character patterns and the irregular patterns
making up the dirt. The irregular patterns can thus be removed to clean the
document. For a full Latin alphabet we found that a single page does not
contain sufficiently many character examples. However, even if heavily
corrupted by dirt, we show that a page containing a lower number of character
types can efficiently and autonomously be cleaned solely based on the
structural regularity of the characters it contains. In different examples
using characters from different alphabets, we demonstrate generality of the
approach and discuss its implications for future developments.Comment: oral presentation and Google Student Travel Award; IEEE conference on
Computer Vision and Pattern Recognition 201
Outlier detection techniques for wireless sensor networks: A survey
In the field of wireless sensor networks, those measurements that significantly deviate from the normal pattern of sensed data are considered as outliers. The potential sources of outliers include noise and errors, events, and malicious attacks on the network. Traditional outlier detection techniques are not directly applicable to wireless sensor networks due to the nature of sensor data and specific requirements and limitations of the wireless sensor networks. This survey provides a comprehensive overview of existing outlier detection techniques specifically developed for the wireless sensor networks. Additionally, it presents a technique-based taxonomy and a comparative table to be used as a guideline to select a technique suitable for the application at hand based on characteristics such as data type, outlier type, outlier identity, and outlier degree
Beyond Volume: The Impact of Complex Healthcare Data on the Machine Learning Pipeline
From medical charts to national census, healthcare has traditionally operated
under a paper-based paradigm. However, the past decade has marked a long and
arduous transformation bringing healthcare into the digital age. Ranging from
electronic health records, to digitized imaging and laboratory reports, to
public health datasets, today, healthcare now generates an incredible amount of
digital information. Such a wealth of data presents an exciting opportunity for
integrated machine learning solutions to address problems across multiple
facets of healthcare practice and administration. Unfortunately, the ability to
derive accurate and informative insights requires more than the ability to
execute machine learning models. Rather, a deeper understanding of the data on
which the models are run is imperative for their success. While a significant
effort has been undertaken to develop models able to process the volume of data
obtained during the analysis of millions of digitalized patient records, it is
important to remember that volume represents only one aspect of the data. In
fact, drawing on data from an increasingly diverse set of sources, healthcare
data presents an incredibly complex set of attributes that must be accounted
for throughout the machine learning pipeline. This chapter focuses on
highlighting such challenges, and is broken down into three distinct
components, each representing a phase of the pipeline. We begin with attributes
of the data accounted for during preprocessing, then move to considerations
during model building, and end with challenges to the interpretation of model
output. For each component, we present a discussion around data as it relates
to the healthcare domain and offer insight into the challenges each may impose
on the efficiency of machine learning techniques.Comment: Healthcare Informatics, Machine Learning, Knowledge Discovery: 20
Pages, 1 Figur
Implementing Snow Load Monitoring to Control Reliability of a Stadium Roof
This contribution shows how monitoring can be
used to control reliability of a structure not complying
with the requirements of Eurocodes. A general
methodology to obtain cost-optimal decisions using limit
state design, probabilistic reliability analysis and cost
estimates is utilised in a full-scale case study dealing with
the roof of a stadium located in Northern Italy. The
results demonstrate the potential of monitoring systems
and probabilistic reliability analysis to support decisions
regarding safety measures such as snow removal, or
temporary closure of the stadium
Outlier Detection Techniques For Wireless Sensor Networks: A Survey
In the field of wireless sensor networks, measurements that
significantly deviate from the normal pattern of sensed data are
considered as outliers. The potential sources of outliers include
noise and errors, events, and malicious attacks on the network.
Traditional outlier detection techniques are not directly
applicable to wireless sensor networks due to the multivariate
nature of sensor data and specific requirements and limitations of
the wireless sensor networks. This survey provides a comprehensive
overview of existing outlier detection techniques specifically
developed for the wireless sensor networks. Additionally, it
presents a technique-based taxonomy and a decision tree to be used
as a guideline to select a technique suitable for the application
at hand based on characteristics such as data type, outlier type,
outlier degree
- …