Search CORE

28 research outputs found

Automated analysis of radar imagery of Venus: handling lack of ground truth

Author: Burl M. C.
Fayyad Usama M.
Perona Pietro
Smyth Padhraic
Publication venue: IEEE Computer Society Press
Publication date: 01/01/1994
Field of study

Lack of verifiable ground truth is a common problem in remote sensing image analysis. For example, consider the synthetic aperture radar (SAR) image data of Venus obtained by the Magellan spacecraft. Planetary scientists are interested in automatically cataloging the locations of all the small volcanoes in this data set; however, the problem is very difficult and cannot be performed with perfect reliability even by human experts. Thus, training and evaluating the performance of an automatic algorithm on this data set must be handled carefully. We discuss the use of weighted free-response receiver-operating characteristics (wFROCs) for evaluating detection performance when the “ground truth” is subjective. In particular, we evaluate the relative detection performance of humans and automatic algorithms. Our experimental results indicate that proper assessment of the uncertainty in “ground truth” is essential in applications of this nature

Crossref

Caltech Authors

Attention focussing and anomaly detection in real-time systems monitoring

Author: Chien Steve A.
Doyle Richard J.
Fayyad Usama M.
Porta Harry J.
Publication venue
Publication date
Field of study

In real-time monitoring situations, more information is not necessarily better. When faced with complex emergency situations, operators can experience information overload and a compromising of their ability to react quickly and correctly. We describe an approach to focusing operator attention in real-time systems monitoring based on a set of empirical and model-based measures for determining the relative importance of sensor data

NASA Technical Reports Server

A new generation of intelligent trainable tools for analyzing large scientific image databases

Author: Atkinson David J.
Fayyad Usama M.
Smyth Padhraic
Publication venue
Publication date
Field of study

The focus of this paper is on the detection of natural, as opposed to human-made, objects. The distinction is important because, in the context of image analysis, natural objects tend to possess much greater variability in appearance than human-made objects. Hence, we shall focus primarily on the use of algorithms that 'learn by example' as the basis for image exploration. The 'learn by example' approach is potentially more generally applicable compared to model-based vision methods since domain scientists find it relatively easier to provide examples of what they are searching for versus describing a model

NASA Technical Reports Server

Using machine learning techniques to automate sky survey catalog generation

Author: Djorgovski S. G.
Doyle R. J.
Fayyad Usama M.
Roden J. C.
Weir Nicholas
Publication venue
Publication date
Field of study

We describe the application of machine classification techniques to the development of an automated tool for the reduction of a large scientific data set. The 2nd Palomar Observatory Sky Survey provides comprehensive photographic coverage of the northern celestial hemisphere. The photographic plates are being digitized into images containing on the order of 10(exp 7) galaxies and 10(exp 8) stars. Since the size of this data set precludes manual analysis and classification of objects, our approach is to develop a software system which integrates independently developed techniques for image processing and data classification. Image processing routines are applied to identify and measure features of sky objects. Selected features are used to determine the classification of each object. GID3* and O-BTree, two inductive learning techniques, are used to automatically learn classification decision trees from examples. We describe the techniques used, the details of our specific application, and the initial encouraging results which indicate that our approach is well-suited to the problem. The benefits of the approach are increased data reduction throughput, consistency of classification, and the automated derivation of classification rules that will form an objective, examinable basis for classifying sky objects. Furthermore, astronomers will be freed from the tedium of an intensely visual task to pursue more challenging analysis and interpretation problems given automatically cataloged data

NASA Technical Reports Server

Problem Formulation and Fairness

Author: Andre Carillo Kevin Daniel
Busch Lawrence
Collins Harry M.
Cook Jack
Fayyad Usama
Latour Bruno
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/01/2019
Field of study

Formulating data science problems is an uncertain and difficult process. It requires various forms of discretionary work to translate high-level objectives or strategic goals into tractable problems, necessitating, among other things, the identification of appropriate target variables and proxies. While these choices are rarely self-evident, normative assessments of data science projects often take them for granted, even though different translations can raise profoundly different ethical concerns. Whether we consider a data science project fair often has as much to do with the formulation of the problem as any property of the resulting model. Building on six months of ethnographic fieldwork with a corporate data science team---and channeling ideas from sociology and history of science, critical data studies, and early writing on knowledge discovery in databases---we describe the complex set of actors and activities involved in problem formulation. Our research demonstrates that the specification and operationalization of the problem are always negotiated and elastic, and rarely worked out with explicit normative considerations in mind. In so doing, we show that careful accounts of everyday data science work can help us better understand how and why data science problems are posed in certain ways---and why specific formulations prevail in practice, even in the face of what might seem like normatively preferable alternatives. We conclude by discussing the implications of our findings, arguing that effective normative interventions will require attending to the practical work of problem formulation.Comment: Conference on Fairness, Accountability, and Transparency (FAT* '19), January 29-31, 2019, Atlanta, GA, US

arXiv.org e-Print Archive

Crossref

On the Handling of Continuous-Valued Attributes in Decision Tree Generation

Author: D.G. Luenberger
J. Cheng
J.R. Quinlan
J.R. Quinlan
K.B. Irani
Keki B. Irani
L. Breiman
P. Clark
P.M. Lewis
R.A. Fisher
S. Gelfand
U.M. Fayyad
U.M. Fayyad
Usama M. Fayyad
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1992
Field of study

We present a result applicable to classification learning algorithms that generate decision trees or rules using the information entropy minimization heuristic for discretizing continuous-valued attributes. The result serves to give a better understanding of the entropy measure, to point out that the behavior of the information entropy heuristic possesses desirable properties that justify its usage in a formal sense, and to improve the efficiency of evaluating continuous-valued attributes for cut value selection. Along with the formal proof, we present empirical results that demonstrate the theoretically expected reduction in evaluation effort for training data sets from real-world domains.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/46964/1/10994_2004_Article_422458.pd

Crossref

Deep Blue Documents at the University of Michigan

Advances in knowledge discovery and data mining /

Author: Fayyad Usama M.,
Publication venue: Cambridge (Mass.) : MIT press,
Publication date
Field of study

Archivsystem Ask23