6,514 research outputs found
Benchmark of structured machine learning methods for microbial identification from mass-spectrometry data
Microbial identification is a central issue in microbiology, in particular in
the fields of infectious diseases diagnosis and industrial quality control. The
concept of species is tightly linked to the concept of biological and clinical
classification where the proximity between species is generally measured in
terms of evolutionary distances and/or clinical phenotypes. Surprisingly, the
information provided by this well-known hierarchical structure is rarely used
by machine learning-based automatic microbial identification systems.
Structured machine learning methods were recently proposed for taking into
account the structure embedded in a hierarchy and using it as additional a
priori information, and could therefore allow to improve microbial
identification systems. We test and compare several state-of-the-art machine
learning methods for microbial identification on a new Matrix-Assisted Laser
Desorption/Ionization Time-of-Flight mass spectrometry (MALDI-TOF MS) dataset.
We include in the benchmark standard and structured methods, that leverage the
knowledge of the underlying hierarchical structure in the learning process. Our
results show that although some methods perform better than others, structured
methods do not consistently perform better than their "flat" counterparts. We
postulate that this is partly due to the fact that standard methods already
reach a high level of accuracy in this context, and that they mainly confuse
species close to each other in the tree, a case where using the known hierarchy
is not helpful
Rigid surface operators and S-duality: some proposals
We study surface operators in the N=4 supersymmetric Yang-Mills theories with
gauge groups SO(n) and Sp(2n). As recently shown by Gukov and Witten these
theories have a class of rigid surface operators which are expected to be
related by S-duality. The rigid surface operators are of two types, unipotent
and semisimple. We make explicit proposals for how the S-duality map should act
on unipotent surface operators. We also discuss semisimple surface operators
and make some proposals for certain subclasses of such operators.Comment: 27 pages. v2: minor changes, added referenc
Roughness of molecular property landscapes and its impact on modellability
In molecular discovery and drug design, structure-property relationships and
activity landscapes are often qualitatively or quantitatively analyzed to guide
the navigation of chemical space. The roughness (or smoothness) of these
molecular property landscapes is one of their most studied geometric
attributes, as it can characterize the presence of activity cliffs, with
rougher landscapes generally expected to pose tougher optimization challenges.
Here, we introduce a general, quantitative measure for describing the roughness
of molecular property landscapes. The proposed roughness index (ROGI) is
loosely inspired by the concept of fractal dimension and strongly correlates
with the out-of-sample error achieved by machine learning models on numerous
regression tasks.Comment: 17 pages, 6 figures, 2 tables (SI with 17 pages, 16 figures
Interoperability of fingerprint sensors and matching algorithms
Biometric systems are widely deployed in governmental, military and commercial/civilian applications. There are a multitude of sensors and matching algorithms available from different vendors. This creates a competitive market for these products, which is good for the consumers but emphasizes the importance of interoperability. In fingerprint recognition, interoperability is the ability of a system to work with a diverse set of fingerprint devices. Variations induced by fingerprint sensors include image resolution, scanning area, gray levels, etc. Such variations can impact the quality of the extracted features, and cross-device matching performance. This is true even when dealing with fingerprint sensors of the same sensing technology. In this thesis, we perform a large-scale empirical study of the status of interoperability between fingerprint sensors and assess the performance consequence when interoperability is lacking. Additionally we develop a method to increase interoperability in fingerprint-based recognition systems deploying optical fingerprint sensors. A set of features to measure differences in fingerprint acquisition is designed and evaluated. Finally, different fusion schemes based on machine learning are tested end evaluated in order to exploit the designed set of features. Experimental results show that the proposed approach is able to reduce cross-device match error rates by a significant margin
Application of Graph Neural Networks and graph descriptors for graph classification
Graph classification is an important area in both modern research and
industry. Multiple applications, especially in chemistry and novel drug
discovery, encourage rapid development of machine learning models in this area.
To keep up with the pace of new research, proper experimental design, fair
evaluation, and independent benchmarks are essential. Design of strong
baselines is an indispensable element of such works.
In this thesis, we explore multiple approaches to graph classification. We
focus on Graph Neural Networks (GNNs), which emerged as a de facto standard
deep learning technique for graph representation learning. Classical
approaches, such as graph descriptors and molecular fingerprints, are also
addressed. We design fair evaluation experimental protocol and choose proper
datasets collection. This allows us to perform numerous experiments and
rigorously analyze modern approaches. We arrive to many conclusions, which shed
new light on performance and quality of novel algorithms.
We investigate application of Jumping Knowledge GNN architecture to graph
classification, which proves to be an efficient tool for improving base graph
neural network architectures. Multiple improvements to baseline models are also
proposed and experimentally verified, which constitutes an important
contribution to the field of fair model comparison.Comment: Master's thesis submitted at AGH University of Science and Technolog
Application of LANDSAT to the surveillance of lake eutrophication in the Great Lakes basin
The author has identified the following significant results. A step-by-step procedure for establishing and monitoring the trophic status of inland lakes with the use of LANDSAT data, surface sampling, laboratory analysis, and aerial observations were demonstrated. The biomass was related to chlorophyll-a concentrations, water clarity, and trophic state. A procedure was developed for using surface sampling, LANDSAT data, and linear regression equations to produce a color-coded image of large lakes showing the distribution and concentrations of water quality parameters, causing eutrophication as well as parameters which indicate its effects. Cover categories readily derived from LANDSAT were those for which loading rates were available and were known to have major effects on the quality and quantity of runoff and lake eutrophication. Urban, barren land, cropland, grassland, forest, wetlands, and water were included
Drug side-effect prediction using machine learning methods
Drug toxicity (or adverse side effects) is a pressing health problem which is also an impediment to the development of therapeutically effective drugs. Despite many on-going efforts to determine the toxicity beforehand, computational prediction of drug side-effects remains a challenging task.
This thesis presents an approach to predict side-effects by utilizing side-information sources for the drugs, while simultaneously comparing state-of-the-art machine learning methods to improve accuracy. Specifically, the thesis implements a data-analysis pipeline for obtaining side-information that are useful for the prediction task. This thesis then formulates the drug side-effect prediction as a machine learning problem: Given disease indications and structural features (as side-information sources) of drugs, for which some measurements of side-effect exist, predict sideeffect for a new drug.
As case studies, the prediction accuracies are compared for ten different side-effects using linear as well as non-linear machine learning methods. The thesis summarizes three key findings. First, the drug side-information sources are predictive of the side-effects. Second, non-linear methods show improved prediction accuracies as compared to their linear analogs. Third, the integration of disease indications and structural features with a principled machine learning approach further improves the drug side-effect predictions.
However, the current study limits the analysis assuming side-effects are independent. In future, modeling the joint relationships of several side-effects could yield more strong predictions and better help to understand the underlying biological mechanism
A machine learning based drug discovery pipeline: finding new therapies for Cystic Fibrosis
Tese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2019O avanço tecnológico e a crescente disponibilidade de dados públicos levaram ao desenvolvimento de metodologias robustas de predição de atividade de compostos com base em aprendizagem automática. Estas metodologias apresentam maior rapidez, eficiência e menores custos que os métodos tradicionais de descoberta de fármacos. Fibrose Quística (FQ) é uma doença autossómica progressiva para a qual existe urgente necessidade de surgimento de novas terapias. Mutações no gene CFTR nos pacientes de FQ levam à produção deficiente do canal de membrana de transporte de aniões CFTR, gerando desequilíbrios iónicos e transporte
anormal de fluidos. FQ afeta vários órgãos, os pulmões com mais gravidade, sendo normalmente devido a problemas nestes a causa de morte prematura. A mutação mais prevalente e relevante em FQ é a deleção da fenilalanina 508 (F508del-CFTR). Por esta razão, os principais esforços de descoberta de novos fármacos são direcionados a corrigir ou amenizar os feitos desta mutação.
Foi criada uma metodologia com recurso a modelos de aprendizagem automática de classificação e regressão baseada em máquinas de vetores de suporte e Random Forests para descoberta de compostos com potencial terapêutico em FQ a partir de bases de dados de compostos de acesso público. Os compostos mais promissores foram selecionados e testados em laboratório através de ensaios de imunofluorescência com microscopia automatizada de triagem e análise de alto rendimento sobre o efeito na F508del-CFTR, com base na eficiência de tráfego da F508del-CFTR para a membrana plasmática. Os 10 compostos com melhores resultados neste ensaio foram validados com Western Blot e comparados com dois conhecidos compostos corretores da F508del-CFTR. 4 compostos foram identificados como promissores compostos terapêuticos para FQ
- …