6,514 research outputs found

    Benchmark of structured machine learning methods for microbial identification from mass-spectrometry data

    Full text link
    Microbial identification is a central issue in microbiology, in particular in the fields of infectious diseases diagnosis and industrial quality control. The concept of species is tightly linked to the concept of biological and clinical classification where the proximity between species is generally measured in terms of evolutionary distances and/or clinical phenotypes. Surprisingly, the information provided by this well-known hierarchical structure is rarely used by machine learning-based automatic microbial identification systems. Structured machine learning methods were recently proposed for taking into account the structure embedded in a hierarchy and using it as additional a priori information, and could therefore allow to improve microbial identification systems. We test and compare several state-of-the-art machine learning methods for microbial identification on a new Matrix-Assisted Laser Desorption/Ionization Time-of-Flight mass spectrometry (MALDI-TOF MS) dataset. We include in the benchmark standard and structured methods, that leverage the knowledge of the underlying hierarchical structure in the learning process. Our results show that although some methods perform better than others, structured methods do not consistently perform better than their "flat" counterparts. We postulate that this is partly due to the fact that standard methods already reach a high level of accuracy in this context, and that they mainly confuse species close to each other in the tree, a case where using the known hierarchy is not helpful

    Rigid surface operators and S-duality: some proposals

    Full text link
    We study surface operators in the N=4 supersymmetric Yang-Mills theories with gauge groups SO(n) and Sp(2n). As recently shown by Gukov and Witten these theories have a class of rigid surface operators which are expected to be related by S-duality. The rigid surface operators are of two types, unipotent and semisimple. We make explicit proposals for how the S-duality map should act on unipotent surface operators. We also discuss semisimple surface operators and make some proposals for certain subclasses of such operators.Comment: 27 pages. v2: minor changes, added referenc

    Roughness of molecular property landscapes and its impact on modellability

    Full text link
    In molecular discovery and drug design, structure-property relationships and activity landscapes are often qualitatively or quantitatively analyzed to guide the navigation of chemical space. The roughness (or smoothness) of these molecular property landscapes is one of their most studied geometric attributes, as it can characterize the presence of activity cliffs, with rougher landscapes generally expected to pose tougher optimization challenges. Here, we introduce a general, quantitative measure for describing the roughness of molecular property landscapes. The proposed roughness index (ROGI) is loosely inspired by the concept of fractal dimension and strongly correlates with the out-of-sample error achieved by machine learning models on numerous regression tasks.Comment: 17 pages, 6 figures, 2 tables (SI with 17 pages, 16 figures

    Interoperability of fingerprint sensors and matching algorithms

    Get PDF
    Biometric systems are widely deployed in governmental, military and commercial/civilian applications. There are a multitude of sensors and matching algorithms available from different vendors. This creates a competitive market for these products, which is good for the consumers but emphasizes the importance of interoperability. In fingerprint recognition, interoperability is the ability of a system to work with a diverse set of fingerprint devices. Variations induced by fingerprint sensors include image resolution, scanning area, gray levels, etc. Such variations can impact the quality of the extracted features, and cross-device matching performance. This is true even when dealing with fingerprint sensors of the same sensing technology. In this thesis, we perform a large-scale empirical study of the status of interoperability between fingerprint sensors and assess the performance consequence when interoperability is lacking. Additionally we develop a method to increase interoperability in fingerprint-based recognition systems deploying optical fingerprint sensors. A set of features to measure differences in fingerprint acquisition is designed and evaluated. Finally, different fusion schemes based on machine learning are tested end evaluated in order to exploit the designed set of features. Experimental results show that the proposed approach is able to reduce cross-device match error rates by a significant margin

    Application of Graph Neural Networks and graph descriptors for graph classification

    Full text link
    Graph classification is an important area in both modern research and industry. Multiple applications, especially in chemistry and novel drug discovery, encourage rapid development of machine learning models in this area. To keep up with the pace of new research, proper experimental design, fair evaluation, and independent benchmarks are essential. Design of strong baselines is an indispensable element of such works. In this thesis, we explore multiple approaches to graph classification. We focus on Graph Neural Networks (GNNs), which emerged as a de facto standard deep learning technique for graph representation learning. Classical approaches, such as graph descriptors and molecular fingerprints, are also addressed. We design fair evaluation experimental protocol and choose proper datasets collection. This allows us to perform numerous experiments and rigorously analyze modern approaches. We arrive to many conclusions, which shed new light on performance and quality of novel algorithms. We investigate application of Jumping Knowledge GNN architecture to graph classification, which proves to be an efficient tool for improving base graph neural network architectures. Multiple improvements to baseline models are also proposed and experimentally verified, which constitutes an important contribution to the field of fair model comparison.Comment: Master's thesis submitted at AGH University of Science and Technolog

    11th German Conference on Chemoinformatics (GCC 2015) : Fulda, Germany. 8-10 November 2015.

    Get PDF

    Application of LANDSAT to the surveillance of lake eutrophication in the Great Lakes basin

    Get PDF
    The author has identified the following significant results. A step-by-step procedure for establishing and monitoring the trophic status of inland lakes with the use of LANDSAT data, surface sampling, laboratory analysis, and aerial observations were demonstrated. The biomass was related to chlorophyll-a concentrations, water clarity, and trophic state. A procedure was developed for using surface sampling, LANDSAT data, and linear regression equations to produce a color-coded image of large lakes showing the distribution and concentrations of water quality parameters, causing eutrophication as well as parameters which indicate its effects. Cover categories readily derived from LANDSAT were those for which loading rates were available and were known to have major effects on the quality and quantity of runoff and lake eutrophication. Urban, barren land, cropland, grassland, forest, wetlands, and water were included

    Drug side-effect prediction using machine learning methods

    Get PDF
    Drug toxicity (or adverse side effects) is a pressing health problem which is also an impediment to the development of therapeutically effective drugs. Despite many on-going efforts to determine the toxicity beforehand, computational prediction of drug side-effects remains a challenging task. This thesis presents an approach to predict side-effects by utilizing side-information sources for the drugs, while simultaneously comparing state-of-the-art machine learning methods to improve accuracy. Specifically, the thesis implements a data-analysis pipeline for obtaining side-information that are useful for the prediction task. This thesis then formulates the drug side-effect prediction as a machine learning problem: Given disease indications and structural features (as side-information sources) of drugs, for which some measurements of side-effect exist, predict sideeffect for a new drug. As case studies, the prediction accuracies are compared for ten different side-effects using linear as well as non-linear machine learning methods. The thesis summarizes three key findings. First, the drug side-information sources are predictive of the side-effects. Second, non-linear methods show improved prediction accuracies as compared to their linear analogs. Third, the integration of disease indications and structural features with a principled machine learning approach further improves the drug side-effect predictions. However, the current study limits the analysis assuming side-effects are independent. In future, modeling the joint relationships of several side-effects could yield more strong predictions and better help to understand the underlying biological mechanism

    A machine learning based drug discovery pipeline: finding new therapies for Cystic Fibrosis

    Get PDF
    Tese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2019O avanço tecnológico e a crescente disponibilidade de dados públicos levaram ao desenvolvimento de metodologias robustas de predição de atividade de compostos com base em aprendizagem automática. Estas metodologias apresentam maior rapidez, eficiência e menores custos que os métodos tradicionais de descoberta de fármacos. Fibrose Quística (FQ) é uma doença autossómica progressiva para a qual existe urgente necessidade de surgimento de novas terapias. Mutações no gene CFTR nos pacientes de FQ levam à produção deficiente do canal de membrana de transporte de aniões CFTR, gerando desequilíbrios iónicos e transporte anormal de fluidos. FQ afeta vários órgãos, os pulmões com mais gravidade, sendo normalmente devido a problemas nestes a causa de morte prematura. A mutação mais prevalente e relevante em FQ é a deleção da fenilalanina 508 (F508del-CFTR). Por esta razão, os principais esforços de descoberta de novos fármacos são direcionados a corrigir ou amenizar os feitos desta mutação. Foi criada uma metodologia com recurso a modelos de aprendizagem automática de classificação e regressão baseada em máquinas de vetores de suporte e Random Forests para descoberta de compostos com potencial terapêutico em FQ a partir de bases de dados de compostos de acesso público. Os compostos mais promissores foram selecionados e testados em laboratório através de ensaios de imunofluorescência com microscopia automatizada de triagem e análise de alto rendimento sobre o efeito na F508del-CFTR, com base na eficiência de tráfego da F508del-CFTR para a membrana plasmática. Os 10 compostos com melhores resultados neste ensaio foram validados com Western Blot e comparados com dois conhecidos compostos corretores da F508del-CFTR. 4 compostos foram identificados como promissores compostos terapêuticos para FQ
    corecore