66,655 research outputs found

    Efficient Optimization of Performance Measures by Classifier Adaptation

    Full text link
    In practical applications, machine learning algorithms are often needed to learn classifiers that optimize domain specific performance measures. Previously, the research has focused on learning the needed classifier in isolation, yet learning nonlinear classifier for nonlinear and nonsmooth performance measures is still hard. In this paper, rather than learning the needed classifier by optimizing specific performance measure directly, we circumvent this problem by proposing a novel two-step approach called as CAPO, namely to first train nonlinear auxiliary classifiers with existing learning methods, and then to adapt auxiliary classifiers for specific performance measures. In the first step, auxiliary classifiers can be obtained efficiently by taking off-the-shelf learning algorithms. For the second step, we show that the classifier adaptation problem can be reduced to a quadratic program problem, which is similar to linear SVMperf and can be efficiently solved. By exploiting nonlinear auxiliary classifiers, CAPO can generate nonlinear classifier which optimizes a large variety of performance measures including all the performance measure based on the contingency table and AUC, whilst keeping high computational efficiency. Empirical studies show that CAPO is effective and of high computational efficiency, and even it is more efficient than linear SVMperf.Comment: 30 pages, 5 figures, to appear in IEEE Transactions on Pattern Analysis and Machine Intelligence, 201

    Sensemaking Practices in the Everyday Work of AI/ML Software Engineering

    Get PDF
    This paper considers sensemaking as it relates to everyday software engineering (SE) work practices and draws on a multi-year ethnographic study of SE projects at a large, global technology company building digital services infused with artificial intelligence (AI) and machine learning (ML) capabilities. Our findings highlight the breadth of sensemaking practices in AI/ML projects, noting developers' efforts to make sense of AI/ML environments (e.g., algorithms/methods and libraries), of AI/ML model ecosystems (e.g., pre-trained models and "upstream"models), and of business-AI relations (e.g., how the AI/ML service relates to the domain context and business problem at hand). This paper builds on recent scholarship drawing attention to the integral role of sensemaking in everyday SE practices by empirically investigating how and in what ways AI/ML projects present software teams with emergent sensemaking requirements and opportunities

    Remote sensing for Mapping TSM concentration in Mahakam Delta: an analytical approach

    Get PDF
    The Indonesian coastal zones have always been under heavy pressures, including those from fisheries, oil industries and sea transportation. The presence of these activities carry a large portion of risk in damaging the environment as well as in destroying the marine resources, leading to the need for an integrated management approach based on an environmental information system that is comprehensive and multi-disciplinary in nature. The Mahakam Delta has the same general problems as other coastal regions in Indonesia. The method is based on bio optical modeling. The forward water analysis comprised the laboratory measurements of water quality (TSM and Chl) and Inherent Optical Properties (IOPs) to derive Spesific Inherent Optical properties (SIOPs). SIOPs (of water, TSM, Chl and CDOM), coefficient f and B were used to developed R(0-) model. The inverse atmosphere analysis comprised the image preprocessing (i.e. geometric correction, atmospheric correction, air-water interface correction). The last step is inverse water analysis, which comprised the development of algorithm and image processing to develop TSM concentration maps. The spectrometer measurements collected in the field were used for obtaining the subsurface irradiance reflectance. The subsurface irradiance reflectance R(0-) is the ratio of upwelling (Ewu) and downwelling irradiance (Ewd) just beneath the water surface. There are some discrepancies from matching R(0-) model and R(0-) measured in the field, especially in the blue region and NIR region. The reason of the discrepancies could be due to the fact that the Q factor (the angular distribution factor of spectral radiance) is still not understood completely. This model is very susceptible to the decrease of the proportional factor f, and to the increase of the backscattering probability B. The results indicates that red band of satellite sensor is sensitive to detect higher TSM concentration. For Mahakam Delta, red band algorithm was used to derive TSM map, since higher TSM concentration occurred in the delta

    Statistical Sources of Variable Selection Bias in Classification Tree Algorithms Based on the Gini Index

    Get PDF
    Evidence for variable selection bias in classification tree algorithms based on the Gini Index is reviewed from the literature and embedded into a broader explanatory scheme: Variable selection bias in classification tree algorithms based on the Gini Index can be caused not only by the statistical effect of multiple comparisons, but also by an increasing estimation bias and variance of the splitting criterion when plug-in estimates of entropy measures like the Gini Index are employed. The relevance of these sources of variable selection bias in the different simulation study designs is examined. Variable selection bias due to the explored sources applies to all classification tree algorithms based on empirical entropy measures like the Gini Index, Deviance and Information Gain, and to both binary and multiway splitting algorithms

    Expert-Augmented Machine Learning

    Full text link
    Machine Learning is proving invaluable across disciplines. However, its success is often limited by the quality and quantity of available data, while its adoption by the level of trust that models afford users. Human vs. machine performance is commonly compared empirically to decide whether a certain task should be performed by a computer or an expert. In reality, the optimal learning strategy may involve combining the complementary strengths of man and machine. Here we present Expert-Augmented Machine Learning (EAML), an automated method that guides the extraction of expert knowledge and its integration into machine-learned models. We use a large dataset of intensive care patient data to predict mortality and show that we can extract expert knowledge using an online platform, help reveal hidden confounders, improve generalizability on a different population and learn using less data. EAML presents a novel framework for high performance and dependable machine learning in critical applications