28 research outputs found
Automated analysis of radar imagery of Venus: handling lack of ground truth
Lack of verifiable ground truth is a common problem in remote sensing image analysis. For example, consider the synthetic aperture radar (SAR) image data of Venus obtained by the Magellan spacecraft. Planetary scientists are interested in automatically cataloging the locations of all the small volcanoes in this data set; however, the problem is very difficult and cannot be performed with perfect reliability even by human experts. Thus, training and evaluating the performance of an automatic algorithm on this data set must be handled carefully. We discuss the use of weighted free-response receiver-operating characteristics (wFROCs) for evaluating detection performance when the “ground truth” is subjective. In particular, we evaluate the relative detection performance of humans and automatic algorithms. Our experimental results indicate that proper assessment of the uncertainty in “ground truth” is essential in applications of this nature
Attention focussing and anomaly detection in real-time systems monitoring
In real-time monitoring situations, more information is not necessarily better. When faced with complex emergency situations, operators can experience information overload and a compromising of their ability to react quickly and correctly. We describe an approach to focusing operator attention in real-time systems monitoring based on a set of empirical and model-based measures for determining the relative importance of sensor data
A new generation of intelligent trainable tools for analyzing large scientific image databases
The focus of this paper is on the detection of natural, as opposed to human-made, objects. The distinction is important because, in the context of image analysis, natural objects tend to possess much greater variability in appearance than human-made objects. Hence, we shall focus primarily on the use of algorithms that 'learn by example' as the basis for image exploration. The 'learn by example' approach is potentially more generally applicable compared to model-based vision methods since domain scientists find it relatively easier to provide examples of what they are searching for versus describing a model
Using machine learning techniques to automate sky survey catalog generation
We describe the application of machine classification techniques to the development of an automated tool for the reduction of a large scientific data set. The 2nd Palomar Observatory Sky Survey provides comprehensive photographic coverage of the northern celestial hemisphere. The photographic plates are being digitized into images containing on the order of 10(exp 7) galaxies and 10(exp 8) stars. Since the size of this data set precludes manual analysis and classification of objects, our approach is to develop a software system which integrates independently developed techniques for image processing and data classification. Image processing routines are applied to identify and measure features of sky objects. Selected features are used to determine the classification of each object. GID3* and O-BTree, two inductive learning techniques, are used to automatically learn classification decision trees from examples. We describe the techniques used, the details of our specific application, and the initial encouraging results which indicate that our approach is well-suited to the problem. The benefits of the approach are increased data reduction throughput, consistency of classification, and the automated derivation of classification rules that will form an objective, examinable basis for classifying sky objects. Furthermore, astronomers will be freed from the tedium of an intensely visual task to pursue more challenging analysis and interpretation problems given automatically cataloged data
Problem Formulation and Fairness
Formulating data science problems is an uncertain and difficult process. It
requires various forms of discretionary work to translate high-level objectives
or strategic goals into tractable problems, necessitating, among other things,
the identification of appropriate target variables and proxies. While these
choices are rarely self-evident, normative assessments of data science projects
often take them for granted, even though different translations can raise
profoundly different ethical concerns. Whether we consider a data science
project fair often has as much to do with the formulation of the problem as any
property of the resulting model. Building on six months of ethnographic
fieldwork with a corporate data science team---and channeling ideas from
sociology and history of science, critical data studies, and early writing on
knowledge discovery in databases---we describe the complex set of actors and
activities involved in problem formulation. Our research demonstrates that the
specification and operationalization of the problem are always negotiated and
elastic, and rarely worked out with explicit normative considerations in mind.
In so doing, we show that careful accounts of everyday data science work can
help us better understand how and why data science problems are posed in
certain ways---and why specific formulations prevail in practice, even in the
face of what might seem like normatively preferable alternatives. We conclude
by discussing the implications of our findings, arguing that effective
normative interventions will require attending to the practical work of problem
formulation.Comment: Conference on Fairness, Accountability, and Transparency (FAT* '19),
January 29-31, 2019, Atlanta, GA, US
On the Handling of Continuous-Valued Attributes in Decision Tree Generation
We present a result applicable to classification learning algorithms that generate decision trees or rules using the information entropy minimization heuristic for discretizing continuous-valued attributes. The result serves to give a better understanding of the entropy measure, to point out that the behavior of the information entropy heuristic possesses desirable properties that justify its usage in a formal sense, and to improve the efficiency of evaluating continuous-valued attributes for cut value selection. Along with the formal proof, we present empirical results that demonstrate the theoretically expected reduction in evaluation effort for training data sets from real-world domains.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/46964/1/10994_2004_Article_422458.pd