66,655 research outputs found
Efficient Optimization of Performance Measures by Classifier Adaptation
In practical applications, machine learning algorithms are often needed to
learn classifiers that optimize domain specific performance measures.
Previously, the research has focused on learning the needed classifier in
isolation, yet learning nonlinear classifier for nonlinear and nonsmooth
performance measures is still hard. In this paper, rather than learning the
needed classifier by optimizing specific performance measure directly, we
circumvent this problem by proposing a novel two-step approach called as CAPO,
namely to first train nonlinear auxiliary classifiers with existing learning
methods, and then to adapt auxiliary classifiers for specific performance
measures. In the first step, auxiliary classifiers can be obtained efficiently
by taking off-the-shelf learning algorithms. For the second step, we show that
the classifier adaptation problem can be reduced to a quadratic program
problem, which is similar to linear SVMperf and can be efficiently solved. By
exploiting nonlinear auxiliary classifiers, CAPO can generate nonlinear
classifier which optimizes a large variety of performance measures including
all the performance measure based on the contingency table and AUC, whilst
keeping high computational efficiency. Empirical studies show that CAPO is
effective and of high computational efficiency, and even it is more efficient
than linear SVMperf.Comment: 30 pages, 5 figures, to appear in IEEE Transactions on Pattern
Analysis and Machine Intelligence, 201
Sensemaking Practices in the Everyday Work of AI/ML Software Engineering
This paper considers sensemaking as it relates to everyday software engineering (SE) work practices and draws on a multi-year ethnographic study of SE projects at a large, global technology company building digital services infused with artificial intelligence (AI) and machine learning (ML) capabilities. Our findings highlight the breadth of sensemaking practices in AI/ML projects, noting developers' efforts to make sense of AI/ML environments (e.g., algorithms/methods and libraries), of AI/ML model ecosystems (e.g., pre-trained models and "upstream"models), and of business-AI relations (e.g., how the AI/ML service relates to the domain context and business problem at hand). This paper builds on recent scholarship drawing attention to the integral role of sensemaking in everyday SE practices by empirically investigating how and in what ways AI/ML projects present software teams with emergent sensemaking requirements and opportunities
Remote sensing for Mapping TSM concentration in Mahakam Delta: an analytical approach
The Indonesian coastal zones have always been under heavy pressures, including those from
fisheries, oil industries and sea transportation. The presence of these activities carry a large
portion of risk in damaging the environment as well as in destroying the marine resources,
leading to the need for an integrated management approach based on an environmental
information system that is comprehensive and multi-disciplinary in nature. The Mahakam Delta
has the same general problems as other coastal regions in Indonesia. The method is based on bio
optical modeling. The forward water analysis comprised the laboratory measurements of water
quality (TSM and Chl) and Inherent Optical Properties (IOPs) to derive Spesific Inherent Optical
properties (SIOPs). SIOPs (of water, TSM, Chl and CDOM), coefficient f and B were used to
developed R(0-) model. The inverse atmosphere analysis comprised the image preprocessing (i.e.
geometric correction, atmospheric correction, air-water interface correction). The last step is
inverse water analysis, which comprised the development of algorithm and image processing to
develop TSM concentration maps. The spectrometer measurements collected in the field were
used for obtaining the subsurface irradiance reflectance. The subsurface irradiance reflectance
R(0-) is the ratio of upwelling (Ewu) and downwelling irradiance (Ewd) just beneath the water
surface. There are some discrepancies from matching R(0-) model and R(0-) measured in the
field, especially in the blue region and NIR region. The reason of the discrepancies could be due
to the fact that the Q factor (the angular distribution factor of spectral radiance) is still not
understood completely. This model is very susceptible to the decrease of the proportional factor
f, and to the increase of the backscattering probability B. The results indicates that red band of
satellite sensor is sensitive to detect higher TSM concentration. For Mahakam Delta, red band
algorithm was used to derive TSM map, since higher TSM concentration occurred in the delta
Recommended from our members
Expert-augmented machine learning.
Machine learning is proving invaluable across disciplines. However, its success is often limited by the quality and quantity of available data, while its adoption is limited by the level of trust afforded by given models. Human vs. machine performance is commonly compared empirically to decide whether a certain task should be performed by a computer or an expert. In reality, the optimal learning strategy may involve combining the complementary strengths of humans and machines. Here, we present expert-augmented machine learning (EAML), an automated method that guides the extraction of expert knowledge and its integration into machine-learned models. We used a large dataset of intensive-care patient data to derive 126 decision rules that predict hospital mortality. Using an online platform, we asked 15 clinicians to assess the relative risk of the subpopulation defined by each rule compared to the total sample. We compared the clinician-assessed risk to the empirical risk and found that, while clinicians agreed with the data in most cases, there were notable exceptions where they overestimated or underestimated the true risk. Studying the rules with greatest disagreement, we identified problems with the training data, including one miscoded variable and one hidden confounder. Filtering the rules based on the extent of disagreement between clinician-assessed risk and empirical risk, we improved performance on out-of-sample data and were able to train with less data. EAML provides a platform for automated creation of problem-specific priors, which help build robust and dependable machine-learning models in critical applications
Statistical Sources of Variable Selection Bias in Classification Tree Algorithms Based on the Gini Index
Evidence for variable selection bias in classification tree algorithms based on the Gini Index is reviewed from the literature and embedded into a broader explanatory scheme: Variable selection bias in classification tree algorithms based on the Gini Index can be caused not only by the statistical effect of multiple comparisons, but also by an increasing estimation bias and variance of the splitting criterion when plug-in estimates of entropy measures like the Gini Index are employed. The relevance of these sources of variable selection bias in the different simulation study designs is examined. Variable selection bias due to the explored sources applies to all classification tree algorithms based on empirical entropy measures like the Gini Index, Deviance and Information Gain, and to both binary and multiway splitting algorithms
Expert-Augmented Machine Learning
Machine Learning is proving invaluable across disciplines. However, its
success is often limited by the quality and quantity of available data, while
its adoption by the level of trust that models afford users. Human vs. machine
performance is commonly compared empirically to decide whether a certain task
should be performed by a computer or an expert. In reality, the optimal
learning strategy may involve combining the complementary strengths of man and
machine. Here we present Expert-Augmented Machine Learning (EAML), an automated
method that guides the extraction of expert knowledge and its integration into
machine-learned models. We use a large dataset of intensive care patient data
to predict mortality and show that we can extract expert knowledge using an
online platform, help reveal hidden confounders, improve generalizability on a
different population and learn using less data. EAML presents a novel framework
for high performance and dependable machine learning in critical applications
- …