Search CORE

66,655 research outputs found

Efficient Optimization of Performance Measures by Classifier Adaptation

Author: Li Nan
Tsang Ivor W.
Zhou Zhi-Hua
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/08/2012
Field of study

In practical applications, machine learning algorithms are often needed to learn classifiers that optimize domain specific performance measures. Previously, the research has focused on learning the needed classifier in isolation, yet learning nonlinear classifier for nonlinear and nonsmooth performance measures is still hard. In this paper, rather than learning the needed classifier by optimizing specific performance measure directly, we circumvent this problem by proposing a novel two-step approach called as CAPO, namely to first train nonlinear auxiliary classifiers with existing learning methods, and then to adapt auxiliary classifiers for specific performance measures. In the first step, auxiliary classifiers can be obtained efficiently by taking off-the-shelf learning algorithms. For the second step, we show that the classifier adaptation problem can be reduced to a quadratic program problem, which is similar to linear SVMperf and can be efficiently solved. By exploiting nonlinear auxiliary classifiers, CAPO can generate nonlinear classifier which optimizes a large variety of performance measures including all the performance measure based on the contingency table and AUC, whilst keeping high computational efficiency. Empirical studies show that CAPO is effective and of high computational efficiency, and even it is more efficient than linear SVMperf.Comment: 30 pages, 5 figures, to appear in IEEE Transactions on Pattern Analysis and Machine Intelligence, 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

OPUS - University of Technology Sydney

Sensemaking Practices in the Everyday Work of AI/ML Software Engineering

Author: Book Matthias
Charlotte
Jantunen Sami
Sharp H.
Washizaki H.
Publication venue: eScholarship, University of California
Publication date: 27/06/2020
Field of study

This paper considers sensemaking as it relates to everyday software engineering (SE) work practices and draws on a multi-year ethnographic study of SE projects at a large, global technology company building digital services infused with artificial intelligence (AI) and machine learning (ML) capabilities. Our findings highlight the breadth of sensemaking practices in AI/ML projects, noting developers' efforts to make sense of AI/ML environments (e.g., algorithms/methods and libraries), of AI/ML model ecosystems (e.g., pre-trained models and "upstream"models), and of business-AI relations (e.g., how the AI/ML service relates to the domain context and business problem at hand). This paper builds on recent scholarship drawing attention to the integral role of sensemaking in everyday SE practices by empirically investigating how and in what ways AI/ML projects present software teams with emergent sensemaking requirements and opportunities

Crossref

eScholarship - University of California

Remote sensing for Mapping TSM concentration in Mahakam Delta: an analytical approach

Author: Budhiman S.
Hobma T. W.
Vekerdy Zoltán
Publication venue
Publication date: 01/01/2004
Field of study

The Indonesian coastal zones have always been under heavy pressures, including those from fisheries, oil industries and sea transportation. The presence of these activities carry a large portion of risk in damaging the environment as well as in destroying the marine resources, leading to the need for an integrated management approach based on an environmental information system that is comprehensive and multi-disciplinary in nature. The Mahakam Delta has the same general problems as other coastal regions in Indonesia. The method is based on bio optical modeling. The forward water analysis comprised the laboratory measurements of water quality (TSM and Chl) and Inherent Optical Properties (IOPs) to derive Spesific Inherent Optical properties (SIOPs). SIOPs (of water, TSM, Chl and CDOM), coefficient f and B were used to developed R(0-) model. The inverse atmosphere analysis comprised the image preprocessing (i.e. geometric correction, atmospheric correction, air-water interface correction). The last step is inverse water analysis, which comprised the development of algorithm and image processing to develop TSM concentration maps. The spectrometer measurements collected in the field were used for obtaining the subsurface irradiance reflectance. The subsurface irradiance reflectance R(0-) is the ratio of upwelling (Ewu) and downwelling irradiance (Ewd) just beneath the water surface. There are some discrepancies from matching R(0-) model and R(0-) measured in the field, especially in the blue region and NIR region. The reason of the discrepancies could be due to the fact that the Q factor (the angular distribution factor of spectral radiance) is still not understood completely. This model is very susceptible to the decrease of the proportional factor f, and to the increase of the backscattering probability B. The results indicates that red band of satellite sensor is sensitive to detect higher TSM concentration. For Mahakam Delta, red band algorithm was used to derive TSM map, since higher TSM concentration occurred in the delta

Repository of the Academy's Library

University of Twente Research Information

Recommended from our members

Expert-augmented machine learning.

Author: Auerbach Andrew
Delgado Elier
Eaton Eric
Friedman Jerome H
Gennatas Efstathios D
Interian Yannet
Luna José Marcio
Pirracchio Romain
Reichmann Lara G
Simone Charles B
Solberg Timothy D
Ungar Lyle H
Valdes Gilmer
van der Laan Mark J
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Machine learning is proving invaluable across disciplines. However, its success is often limited by the quality and quantity of available data, while its adoption is limited by the level of trust afforded by given models. Human vs. machine performance is commonly compared empirically to decide whether a certain task should be performed by a computer or an expert. In reality, the optimal learning strategy may involve combining the complementary strengths of humans and machines. Here, we present expert-augmented machine learning (EAML), an automated method that guides the extraction of expert knowledge and its integration into machine-learned models. We used a large dataset of intensive-care patient data to derive 126 decision rules that predict hospital mortality. Using an online platform, we asked 15 clinicians to assess the relative risk of the subpopulation defined by each rule compared to the total sample. We compared the clinician-assessed risk to the empirical risk and found that, while clinicians agreed with the data in most cases, there were notable exceptions where they overestimated or underestimated the true risk. Studying the rules with greatest disagreement, we identified problems with the training data, including one miscoded variable and one hidden confounder. Filtering the rules based on the extent of disagreement between clinician-assessed risk and empirical risk, we improved performance on out-of-sample data and were able to train with less data. EAML provides a platform for automated creation of problem-specific priors, which help build robust and dependable machine-learning models in critical applications

eScholarship - University of California

Statistical Sources of Variable Selection Bias in Classification Tree Algorithms Based on the Gini Index

Author: Strobl Carolin
Publication venue
Publication date: 01/01/2005
Field of study

Evidence for variable selection bias in classification tree algorithms based on the Gini Index is reviewed from the literature and embedded into a broader explanatory scheme: Variable selection bias in classification tree algorithms based on the Gini Index can be caused not only by the statistical effect of multiple comparisons, but also by an increasing estimation bias and variance of the splitting criterion when plug-in estimates of entropy measures like the Gini Index are employed. The relevance of these sources of variable selection bias in the different simulation study designs is examined. Variable selection bias due to the explored sources applies to all classification tree algorithms based on empirical entropy measures like the Gini Index, Deviance and Information Gain, and to both binary and multiway splitting algorithms

Open Access LMU

Expert-Augmented Machine Learning

Author: Auerbach A.
Delgado E.
Eaton E.
Friedman J. H.
Gennatas E. D.
Interian Y.
Pirracchio R.
Reichman L.
Simone C. B.
Solberg T. D.
Ungar L. H.
Valdes G.
Van der Laan M. J.
Publication venue
Publication date: 05/01/2021
Field of study

Machine Learning is proving invaluable across disciplines. However, its success is often limited by the quality and quantity of available data, while its adoption by the level of trust that models afford users. Human vs. machine performance is commonly compared empirically to decide whether a certain task should be performed by a computer or an expert. In reality, the optimal learning strategy may involve combining the complementary strengths of man and machine. Here we present Expert-Augmented Machine Learning (EAML), an automated method that guides the extraction of expert knowledge and its integration into machine-learned models. We use a large dataset of intensive care patient data to predict mortality and show that we can extract expert knowledge using an online platform, help reveal hidden confounders, improve generalizability on a different population and learn using less data. EAML presents a novel framework for high performance and dependable machine learning in critical applications

arXiv.org e-Print Archive