1,068 research outputs found

    Modelling Uncertainty in Black-box Classification Systems

    Get PDF
    [eng] Currently, thanks to the Big Data boom, the excellent results obtained by deep learning models and the strong digital transformation experienced over the last years, many companies have decided to incorporate machine learning models into their systems. Some companies have detected this opportunity and are making a portfolio of artificial intelligence services available to third parties in the form of application programming interfaces (APIs). Subsequently, developers include calls to these APIs to incorporate AI functionalities in their products. Although it is an option that saves time and resources, it is true that, in most cases, these APIs are displayed in the form of blackboxes, the details of which are unknown to their clients. The complexity of such products typically leads to a lack of control and knowledge of the internal components, which, in turn, can drive to potential uncontrolled risks. Therefore, it is necessary to develop methods capable of evaluating the performance of these black-boxes when applied to a specific application. In this work, we present a robust uncertainty-based method for evaluating the performance of both probabilistic and categorical classification black-box models, in particular APIs, that enriches the predictions obtained with an uncertainty score. This uncertainty score enables the detection of inputs with very confident but erroneous predictions while protecting against out of distribution data points when deploying the model in a productive setting. In the first part of the thesis, we develop a thorough revision of the concept of uncertainty, focusing on the uncertainty of classification systems. We review the existingrelated literature, describing the different approaches for modelling this uncertainty, its application to different use cases and some of its desirable properties. Next, we introduce the proposed method for modelling uncertainty in black-box settings. Moreover, in the last chapters of the thesis, we showcase the method applied to different domains, including NLP and computer vision problems. Finally, we include two reallife applications of the method: classification of overqualification in job descriptions and readability assessment of texts.[spa] La tesis propone un método para el cálculo de la incertidumbre asociada a las predicciones de APIs o librerías externas de sistemas de clasificación

    Selecting Negative Samples for PPI Prediction Using Hierarchical Clustering Methodology

    Get PDF
    Protein-protein interactions (PPIs) play a crucial role in cellular processes. In the present work, a new approach is proposed to construct a PPI predictor training a support vector machine model through a mutual information filter-wrapper parallel feature selection algorithm and an iterative and hierarchical clustering to select a relevance negative training set. By means of a selected suboptimum set of features, the constructed support vector machine model is able to classify PPIs with high accuracy in any positive and negative datasets

    Variable selection for Naive Bayes classification

    Get PDF
    The Naive Bayes has proven to be a tractable and efficient method for classification in multivariate analysis. However, features are usually correlated, a fact that violates the Naive Bayes' assumption of conditional independence, and may deteriorate the method's performance. Moreover, datasets are often characterized by a large number of features, which may complicate the interpretation of the results as well as slow down the method's execution. In this paper we propose a sparse version of the Naive Bayes classifier that is characterized by three properties. First, the sparsity is achieved taking into account the correlation structure of the covariates. Second, different performance measures can be used to guide the selection of features. Third, performance constraints on groups of higher interest can be included. Our proposal leads to a smart search, which yields competitive running times, whereas the flexibility in terms of performance measure for classification is integrated. Our findings show that, when compared against well-referenced feature selection approaches, the proposed sparse Naive Bayes obtains competitive results regarding accuracy, sparsity and running times for balanced datasets. In the case of datasets with unbalanced (or with different importance) classes, a better compromise between classification rates for the different classes is achieved.This research is partially supported by research grants and projects MTM2015-65915-R (Ministerio de Economia y Competitividad, Spain) and PID2019-110886RB-I00 (Ministerio de Ciencia, Innovacion y Universidades, Spain) , FQM-329 and P18-FR-2369 (Junta de Andalucia, Spain) , PR2019-029 (Universidad de Cadiz, Spain) , Fundacion BBVA and EC H2020 MSCA RISE NeEDS Project (Grant agreement ID: 822214) . This support is gratefully acknowledged. Documen
    • …
    corecore