3 research outputs found

    Facing online challenges using learning classifier systems

    Get PDF
    Els grans avenços en el camp de l’aprenentatge automàtic han resultat en el disseny de màquines competents que són capaces d’aprendre i d’extreure informació útil i original de l’experiència. Recentment, algunes d’aquestes tècniques d’aprenentatge s’han aplicat amb èxit per resoldre problemes del món real en àmbits tecnològics, mèdics, científics i industrials, els quals no es podien tractar amb tècniques convencionals d’anàlisi ja sigui per la seva complexitat o pel gran volum de dades a processar. Donat aquest èxit inicial, actualment els sistemes d’aprenentatge s’enfronten a problemes de complexitat més elevada, el que ha resultat en un augment de l’activitat investigadora entorn sistemes capaços d’afrontar nous problemes del món real eficientment i de manera escalable. Una de les famílies d’algorismes més prometedores en l’aprenentatge automàtic són els sistemes classificadors basats en algorismes genetics (LCSs), el funcionament dels quals s’inspira en la natura. Els LCSs intenten representar les polítiques d’actuació d’experts humans amb un conjunt de regles que s’empren per escollir les millors accions a realitzar en tot moment. Així doncs, aquests sistemes aprenen polítiques d’actuació de manera incremental a mida que van adquirint experiència a través de la informació nova que se’ls va presentant durant el temps. Els LCSs s’han aplicat, amb èxit, a camps tan diversos com la predicció de càncer de pròstata o el suport a la inversió en borsa, entre altres. A més en alguns casos s’ha demostrat que els LCSs realitzen tasques superant la precisió dels éssers humans. El propòsit d’aquesta tesi és explorar la naturalesa de l’aprenentatge online dels LCSs d’estil Michigan per a la mineria de grans quantitats de dades en forma de fluxos d’informació continus a alta velocitat i canviants en el temps. Molt sovint, l’extracció de coneixement a partir d’aquestes fonts de dades és clau per tal d’obtenir una millor comprensió dels processos que les dades estan descrivint. Així, aprendre d’aquestes dades planteja nous reptes a les tècniques tradicionals d’aprenentatge automàtic, les quals no estan dissenyades per tractar fluxos de dades continus i on els conceptes i els nivells de soroll poden variar amb el temps de forma arbitrària. La contribució de la present tesi pren l’eXtended Classifier System (XCS), el LCS d’estil Michigan més estudiat i un dels algoritmes d’aprenentatge automàtic més competents, com el punt de partida. D’aquesta manera els reptes abordats en aquesta tesi són dos: el primer desafiament és la construcció d’un sistema supervisat competent sobre el framework dels LCSs d’estil Michigan que aprèn dels fluxos de dades amb una capacitat de reacció ràpida als canvis de concepte i entrades amb soroll. Com moltes aplicacions científiques i industrials generen grans quantitats de dades sense etiquetar, el segon repte és aplicar les lliçons apreses per continuar amb el disseny de LCSs d’estil Michigan capaços de solucionar problemes online sense assumir una estructura a priori en els dades d’entrada.Los grandes avances en el campo del aprendizaje automático han resultado en el diseño de máquinas capaces de aprender y de extraer información útil y original de la experiencia. Recientemente alguna de estas técnicas de aprendizaje se han aplicado con éxito para resolver problemas del mundo real en ámbitos tecnológicos, médicos, científicos e industriales, los cuales no se podían tratar con técnicas convencionales de análisis ya sea por su complejidad o por el gran volumen de datos a procesar. Dado este éxito inicial, los sistemas de aprendizaje automático se enfrentan actualmente a problemas de complejidad cada vez m ́as elevada, lo que ha resultado en un aumento de la actividad investigadora en sistemas capaces de afrontar nuevos problemas del mundo real de manera eficiente y escalable. Una de las familias más prometedoras dentro del aprendizaje automático son los sistemas clasificadores basados en algoritmos genéticos (LCSs), el funcionamiento de los cuales se inspira en la naturaleza. Los LCSs intentan representar las políticas de actuación de expertos humanos usando conjuntos de reglas que se emplean para escoger las mejores acciones a realizar en todo momento. Así pues estos sistemas aprenden políticas de actuación de manera incremental mientras van adquiriendo experiencia a través de la nueva información que se les va presentando. Los LCSs se han aplicado con éxito en campos tan diversos como en la predicción de cáncer de próstata o en sistemas de soporte de bolsa, entre otros. Además en algunos casos se ha demostrado que los LCSs realizan tareas superando la precisión de expertos humanos. El propósito de la presente tesis es explorar la naturaleza online del aprendizaje empleado por los LCSs de estilo Michigan para la minería de grandes cantidades de datos en forma de flujos continuos de información a alta velocidad y cambiantes en el tiempo. La extracción del conocimiento a partir de estas fuentes de datos es clave para obtener una mejor comprensión de los procesos que se describen. Así, aprender de estos datos plantea nuevos retos a las técnicas tradicionales, las cuales no están diseñadas para tratar flujos de datos continuos y donde los conceptos y los niveles de ruido pueden variar en el tiempo de forma arbitraria. La contribución del la presente tesis toma el eXtended Classifier System (XCS), el LCS de tipo Michigan más estudiado y uno de los sistemas de aprendizaje automático más competentes, como punto de partida. De esta forma los retos abordados en esta tesis son dos: el primer desafío es la construcción de un sistema supervisado competente sobre el framework de los LCSs de estilo Michigan que aprende de flujos de datos con una capacidad de reacción rápida a los cambios de concepto y al ruido. Como muchas aplicaciones científicas e industriales generan grandes volúmenes de datos sin etiquetar, el segundo reto es aplicar las lecciones aprendidas para continuar con el diseño de nuevos LCSs de tipo Michigan capaces de solucionar problemas online sin asumir una estructura a priori en los datos de entrada.Last advances in machine learning have fostered the design of competent algorithms that are able to learn and extract novel and useful information from data. Recently, some of these techniques have been successfully applied to solve real-­‐world problems in distinct technological, scientific and industrial areas; problems that were not possible to handle by the traditional engineering methodology of analysis either for their inherent complexity or by the huge volumes of data involved. Due to the initial success of these pioneers, current machine learning systems are facing problems with higher difficulties that hamper the learning process of such algorithms, promoting the interest of practitioners for designing systems that are able to scalably and efficiently tackle real-­‐world problems. One of the most appealing machine learning paradigms are Learning Classifier Systems (LCSs), and more specifically Michigan-­‐style LCSs, an open framework that combines an apportionment of credit mechanism with a knowledge discovery technique inspired by biological processes to evolve their internal knowledge. In this regard, LCSs mimic human experts by making use of rule lists to choose the best action to a given problem situation, acquiring their knowledge through the experience. LCSs have been applied with relative success to a wide set of real-­‐ world problems such as cancer prediction or business support systems, among many others. Furthermore, on some of these areas LCSs have demonstrated learning capacities that exceed those of human experts for that particular task. The purpose of this thesis is to explore the online learning nature of Michigan-­‐style LCSs for mining large amounts of data in the form of continuous, high speed and time-­‐changing streams of information. Most often, extracting knowledge from these data is key, in order to gain a better understanding of the processes that the data are describing. Learning from these data poses new challenges to traditional machine learning techniques, which are not typically designed to deal with data in which concepts and noise levels may vary over time. The contribution of this thesis takes the extended classifier system (XCS), the most studied Michigan-­‐style LCS and one of the most competent machine learning algorithms, as the starting point. Thus, the challenges addressed in this thesis are twofold: the first challenge is building a competent supervised system based on the guidance of Michigan-­‐style LCSs that learns from data streams with a fast reaction capacity to changes in concept and noisy inputs. As many scientific and industrial applications generate vast amounts of unlabelled data, the second challenge is to apply the lessons learned in the previous issue to continue with the design of unsupervised Michigan-­‐style LCSs that handle online problems without assuming any a priori structure in input data

    Designing content-based adversarial perturbations and distributed one-class learning for images.

    Get PDF
    PhD Theses.This thesis covers two privacy-related problems for images: designing adversarial perturbations that can be added to the input images to protect the private content of images that a user shares with other users from the undesirable automatic inference of classifiers, and training privacy-preserving classifiers on images that are distributed among their owners (image holders) and contain their private information. Adversarial images can be easily detected using denoising algorithms when high-frequency spatial perturbations are used, or can be noticed by humans when perturbations are large and irrelevant to the content of images. In addition to this, adversarial images are not transferable to unseen classifiers as perturbations are small (in terms of the lp norm). In the first part of the thesis, we propose content-based adversarial perturbations that account for the content of the images (objects, colour, structure and details), human perception and the semantics of the class labels to address the above-mentioned limitations of perturbations. Our adversarial colour perturbations selectively modify the colours of objects within chosen ranges that are perceived as natural by humans. In addition to these natural-looking adversarial images, our structure-aware perturbations exploit traditional image processing filters, such as detail enhancement filter and Gamma correction filter, to generate enhanced adversarial images. We validate the proposed perturbations against three classifiers trained on ImageNet. Experiments show that the proposed perturbations are more robust and transferable and cause misclassification with a label that is semantically different from the label of the original image, when compared with seven state-ofthe- art perturbations. Classifiers are often trained by relying on centralised collection and aggregation of images that could lead to significant privacy concerns by disclosing the sensitive information of image holders. In the second part of the thesis, we propose a privacy-preserving technique, called distributed one-class learning, that enables training to take place on edge devices and therefore image holders do not need to centralise their images. Each image holder can independently use their images to locally train a reconstructive adversarial network as their one-class classifier. As sending the model parameters to the service provider would reveal sensitive information, we secret-share the parameters among two non-colluding service providers. Then, we provide cryptographically private prediction services through a mixture of multi-party computation protocols to achieve substantial gains in complexity and speed. A major advantage of the proposed technique is that none of the image holders and service providers can access the parameters and images of other image holders. We quantify the benefits of the proposed technique and compare its 3 4 performance with centralised training on three privacy-sensitive image-based tasks. Experiments show that the proposed technique achieves similar classification performance as non-private centralised training, while not violating the privacy of the image holders

    Molecular Characterization of Metastatic Endometrial Cancer by Mass Spectrometry

    Get PDF
    One of the most reliable prognostic factors in endometrial cancer is the presence of lymph node metastasis. Clinicians presently face the challenge that radiological imaging and conventional surgical-pathological variables such as tumour size, depth of invasion and grade of disease are unreliable in determining if the endometrial cancer has metastasized. Although only 10% of endometrial cancer patients suffer from lymph node metastasis, the majority of them undergo lymphadenectomy, which can be associated with significant complications including lower extremity lymphedema. Based on the assumption that metastasis is mainly determined by the properties of the primary tumour and its interaction with the surrounding tissues, a tissue based proteomic approach combining two complementary methods, peptide matrix assisted laser desorption/ionisation mass spectrometry imaging (MALDI MSI) and liquid chromatographytandem mass spectrometry (LC-MS/MS) was undertaken to identify molecular discriminators in primary endometrial cancers which correlate with lymph node metastasis. In a discovery approach, MALDI MSI was carried out on two tissue micro arrays (TMA), containing a total of 43 patients. Upon data acquisition, a canonical correlation analysis (CCA) based method was applied to rank the acquired m/z values based on their power to discriminate the primary carcinomas with and without metastatic potential. The highly ranked m/z values were able to classify 38 out of 43 patients (88.4%) correctly. The top discriminative m/z values were identified using a combination of in situ sequencing and LC-MS/MS from digested tumour samples. The differential abundance of the two identified proteins, plectin and α-Actin- 2 was further validated using data independent acquisition LC-MS/MS and immunohistochemistry. In a targeted approach, we aimed to improve the prediction model for endometrial cancer metastasis preoperatively. From publically available data and published research, we compiled a list of 60 target proteins with the potential to display differential abundance between primary endometrial cancers with lymph node metastasis versus those without. Using data dependent acquisition LC-MS/MS, we were able to detect 23 of these proteins in an independent cohort of endometrial cancer patients. Using data independent acquisition LC-MS/MS, the differential abundance of 5 of those proteins was observed (p < 0.05). Upon validation by immunohistochemistry, our data indicates that annexin A2 is upregulated while annexin A1 and alpha actinin 4 were downregulated in primary endometrial cancers with lymph node metastasis versus those without. The results of this immunohistochemistry analysis were used to generate a predictive model of endometrial cancer metastasis. Additionally the predictive model using highly ranked m/z values identified by MALDI MSI was generated and compared with other models containing the histopathological variables. However, when compared the MALDI MSI model showed significantly higher predictive accuracy than the model using immunohistochemistry data. Our results showed that the highly ranked m/z values identified from MALDI MSI data serve as new independent prognostic information beyond the established risk factors. The developed molecular classification tool has the potential to predict which tumours have metastasized and which patients would therefore benefit from radical surgery while avoiding those who will not benefit from it and consequently decreasing the risk of post-surgical morbidity. In conclusion, these findings demonstrate a successful application of MALDI MSI for the identification of protein biomarkers of endometrial cancer metastasis.Thesis (Ph.D.) -- University of Adelaide, School of Biological Sciences, 201
    corecore