63 research outputs found

    Identifying the Machine Learning Family from Black-Box Models

    Full text link
    [EN] We address the novel question of determining which kind of machine learning model is behind the predictions when we interact with a black-box model. This may allow us to identify families of techniques whose models exhibit similar vulnerabilities and strengths. In our method, we first consider how an adversary can systematically query a given black-box model (oracle) to label an artificially-generated dataset. This labelled dataset is then used for training different surrogate models (each one trying to imitate the oracle¿s behaviour). The method has two different approaches. First, we assume that the family of the surrogate model that achieves the maximum Kappa metric against the oracle labels corresponds to the family of the oracle model. The other approach, based on machine learning, consists in learning a meta-model that is able to predict the model family of a new black-box model. We compare these two approaches experimentally, giving us insight about how explanatory and predictable our concept of family is.This material is based upon work supported by the Air Force Office of Scientific Research under award number FA9550-17-1-0287, the EU (FEDER), and the Spanish MINECO under grant TIN 2015-69175-C4-1-R, the Generalitat Valenciana PROMETEOII/2015/013. F. Martinez-Plumed was also supported by INCIBE under grant INCIBEI-2015-27345 (Ayudas para la excelencia de los equipos de investigacion avanzada en ciberseguridad). J. H-Orallo also received a Salvador de Madariaga grant (PRX17/00467) from the Spanish MECD for a research stay at the CFI, Cambridge, and a BEST grant (BEST/2017/045) from the GVA for another research stay at the CFI.Fabra-Boluda, R.; Ferri Ramírez, C.; Hernández-Orallo, J.; Martínez-Plumed, F.; Ramírez Quintana, MJ. (2018). Identifying the Machine Learning Family from Black-Box Models. Lecture Notes in Computer Science. 11160:55-65. https://doi.org/10.1007/978-3-030-00374-6_6S556511160Angluin, D.: Queries and concept learning. Mach. Learn. 2(4), 319–342 (1988)Benedek, G.M., Itai, A.: Learnability with respect to fixed distributions. Theor. Comput. Sci. 86(2), 377–389 (1991)Biggio, B., et al.: Security Evaluation of support vector machines in adversarial environments. In: Ma, Y., Guo, G. (eds.) Support Vector Machines Applications, pp. 105–153. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-02300-7_4Blanco-Vega, R., Hernández-Orallo, J., Ramírez-Quintana, M.J.: Analysing the trade-off between comprehensibility and accuracy in mimetic models. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 338–346. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30214-8_29Dalvi, N., Domingos, P., Sanghai, S., Verma, D., et al.: Adversarial classification. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 99–108. ACM (2004)Dheeru, D., Karra Taniskidou, E.: UCI machine learning repository (2017). http://archive.ics.uci.edu/mlDomingos, P.: Knowledge discovery via multiple models. Intell. Data Anal. 2(3), 187–202 (1998)Duin, R.P.W., Loog, M., Pȩkalska, E., Tax, D.M.J.: Feature-based dissimilarity space classification. In: Ünay, D., Çataltepe, Z., Aksoy, S. (eds.) ICPR 2010. LNCS, vol. 6388, pp. 46–55. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17711-8_5Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems. J. Mach. Learn. Res. 15(1), 3133–3181 (2014)Ferri, C., Hernández-Orallo, J., Modroiu, R.: An experimental comparison of performance measures for classification. Pattern Recognit. Lett. 30(1), 27–38 (2009)Giacinto, G., Perdisci, R., Del Rio, M., Roli, F.: Intrusion detection in computer networks by a modular ensemble of one-class classifiers. Inf. Fusion 9(1), 69–82 (2008)Huang, L., Joseph, A.D., Nelson, B., Rubinstein, B.I., Tygar, J.: Adversarial machine learning. In: Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, pp. 43–58 (2011)Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)Landis, J.R., Koch, G.G.: An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics 33, 363–374 (1977)Lowd, D., Meek, C.: Adversarial learning. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery in Data mining, pp. 641–647. ACM (2005)Martınez-Plumed, F., Prudêncio, R.B., Martınez-Usó, A., Hernández-Orallo, J.: Making sense of item response theory in machine learning. In: Proceedings of 22nd European Conference on Artificial Intelligence (ECAI). Frontiers in Artificial Intelligence and Applications, vol. 285, pp. 1140–1148 (2016)Papernot, N., McDaniel, P., Goodfellow, I.: Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277 (2016)Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 372–387. IEEE (2016)Papernot, N., McDaniel, P., Wu, X., Jha, S., Swami, A.: Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597. IEEE (2016)Sesmero, M.P., Ledezma, A.I., Sanchis, A.: Generating ensembles of heterogeneous classifiers using stacked generalization. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 5(1), 21–34 (2015)Smith, M.R., Martinez, T., Giraud-Carrier, C.: An instance level analysis of data complexity. Mach. Learn. 95(2), 225–256 (2014)Tramèr, F., Zhang, F., Juels, A., Reiter, M.K., Ristenpart, T.: Stealing machine learning models via prediction APIs. In: USENIX Security Symposium, pp. 601–618 (2016)Valiant, L.G.: A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984)Wallace, C.S., Boulton, D.M.: An information measure for classification. Comput. J. 11(2), 185–194 (1968)Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992

    Factors associated with mortality in HIV-infected and uninfected patients with pulmonary tuberculosis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>HIV has fuelled the TB epidemic in sub-Saharan Africa. Mortality in patients co-infected with TB and HIV is high. Managing factors influencing mortality in TB patients might help reducing it. This study investigates factors associated with mortality including patients' HIV sero-status, CD4 cell count, laboratory, nutritional and demographic characteristics in AFB smear positive pulmonary TB patients.</p> <p>Methods</p> <p>We studied 887 sputum smear positive PTB patients, between 18 and 65 years of age receiving standard 8 months anti-TB treatment. Demographic, anthropometric and laboratory data including HIV, CD4 and other tests were collected at baseline and at regular intervals. Patients were followed for a median period of 2.5 years.</p> <p>Results</p> <p>Of the 887 participants, 155 (17.5%) died, of whom 90.3% (140/155) were HIV-infected, a fatality of 29.7% (140/471) compared to 3.6% (15/416) among HIV-uninfected. HIV infection, age, low Karnofsky score, CD4 cell counts and hemoglobin, high viral load, and oral thrush were significantly associated with high mortality in all patients.</p> <p>Conclusion</p> <p>Mortality among HIV-infected TB patients is high despite the use of effective anti-TB therapy. Most deaths occur after successful completion of therapy, an indication that patients die from causes other than TB. HIV infection is the strongest independent predictor of mortality in this cohort.</p

    Two Notch Ligands, Dll1 and Jag1, Are Differently Restricted in Their Range of Action to Control Neurogenesis in the Mammalian Spinal Cord

    Get PDF
    Notch signalling regulates neuronal differentiation in the vertebrate nervous system. In addition to a widespread function in maintaining neural progenitors, Notch signalling has also been involved in specific neuronal fate decisions. These functions are likely mediated by distinct Notch ligands, which show restricted expression patterns in the developing nervous system. Two ligands, in particular, are expressed in non-overlapping complementary domains of the embryonic spinal cord, with Jag1 being restricted to the V1 and dI6 progenitor domains, while Dll1 is expressed in the remaining domains. However, the specific contribution of different ligands to regulate neurogenesis in vertebrate embryos is still poorly understood.In this work, we investigated the role of Jag1 and Dll1 during spinal cord neurogenesis, using conditional knockout mice where the two genes are deleted in the neuroepithelium, singly or in combination. Our analysis showed that Jag1 deletion leads to a modest increase in V1 interneurons, while dI6 neurogenesis was unaltered. This mild Jag1 phenotype contrasts with the strong neurogenic phenotype detected in Dll1 mutants and led us to hypothesize that neighbouring Dll1-expressing cells signal to V1 and dI6 progenitors and restore neurogenesis in the absence of Jag1. Analysis of double Dll1;Jag1 mutant embryos revealed a stronger increase in V1-derived interneurons and overproduction of dI6 interneurons. In the presence of a functional Dll1 allele, V1 neurogenesis is restored to the levels detected in single Jag1 mutants, while dI6 neurogenesis returns to normal, thereby confirming that Dll1-mediated signalling compensates for Jag1 deletion in V1 and dI6 domains.Our results reveal that Dll1 and Jag1 are functionally equivalent in controlling the rate of neurogenesis within their expression domains. However, Jag1 can only activate Notch signalling within the V1 and dI6 domains, whereas Dll1 can signal to neural progenitors both inside and outside its domains of expression

    Lifted graphical models: a survey

    Get PDF
    Lifted graphical models provide a language for expressing dependencies between different types of entities, their attributes, and their diverse relations, as well as techniques for probabilistic reasoning in such multi-relational domains. In this survey, we review a general form for a lifted graphical model, a par-factor graph, and show how a number of existing statistical relational representations map to this formalism. We discuss inference algorithms, including lifted inference algorithms, that efficiently compute the answers to probabilistic queries over such models. We also review work in learning lifted graphical models from data. There is a growing need for statistical relational models (whether they go by that name or another), as we are inundated with data which is a mix of structured and unstructured, with entities and relations extracted in a noisy manner from text, and with the need to reason effectively with this data. We hope that this synthesis of ideas from many different research groups will provide an accessible starting point for new researchers in this expanding field

    Extensive Transcriptional Regulation of Chromatin Modifiers during Human Neurodevelopment

    Get PDF
    Epigenetic changes, including histone modifications or chromatin remodeling are regulated by a large number of human genes. We developed a strategy to study the coordinate regulation of such genes, and to compare different cell populations or tissues. A set of 150 genes, comprising different classes of epigenetic modifiers was compiled. This new tool was used initially to characterize changes during the differentiation of human embryonic stem cells (hESC) to central nervous system neuroectoderm progenitors (NEP). qPCR analysis showed that more than 60% of the examined transcripts were regulated, and >10% of them had a >5-fold increased expression. For comparison, we differentiated hESC to neural crest progenitors (NCP), a distinct peripheral nervous system progenitor population. Some epigenetic modifiers were regulated into the same direction in NEP and NCP, but also distinct differences were observed. For instance, the remodeling ATPase SMARCA2 was up-regulated >30-fold in NCP, while it remained unchanged in NEP; up-regulation of the ATP-dependent chromatin remodeler CHD7 was increased in NEP, while it was down-regulated in NCP. To compare the neural precursor profiles with those of mature neurons, we analyzed the epigenetic modifiers in human cortical tissue. This resulted in the identification of 30 regulations shared between all cell types, such as the histone methyltransferase SETD7. We also identified new markers for post-mitotic neurons, like the arginine methyl transferase PRMT8 and the methyl transferase EZH1. Our findings suggest a hitherto unexpected extent of regulation, and a cell type-dependent specificity of epigenetic modifiers in neurodifferentiation

    The FunGenES Database: A Genomics Resource for Mouse Embryonic Stem Cell Differentiation

    Get PDF
    Embryonic stem (ES) cells have high self-renewal capacity and the potential to differentiate into a large variety of cell types. To investigate gene networks operating in pluripotent ES cells and their derivatives, the “Functional Genomics in Embryonic Stem Cells” consortium (FunGenES) has analyzed the transcriptome of mouse ES cells in eleven diverse settings representing sixty-seven experimental conditions. To better illustrate gene expression profiles in mouse ES cells, we have organized the results in an interactive database with a number of features and tools. Specifically, we have generated clusters of transcripts that behave the same way under the entire spectrum of the sixty-seven experimental conditions; we have assembled genes in groups according to their time of expression during successive days of ES cell differentiation; we have included expression profiles of specific gene classes such as transcription regulatory factors and Expressed Sequence Tags; transcripts have been arranged in “Expression Waves” and juxtaposed to genes with opposite or complementary expression patterns; we have designed search engines to display the expression profile of any transcript during ES cell differentiation; gene expression data have been organized in animated graphs of KEGG signaling and metabolic pathways; and finally, we have incorporated advanced functional annotations for individual genes or gene clusters of interest and links to microarray and genomic resources. The FunGenES database provides a comprehensive resource for studies into the biology of ES cells
    corecore