56 research outputs found

    Subspace-based dynamic selection for high-dimensional data

    Get PDF
    The number of features collected has increased greatly in the past decade, particularly in medicine and life sciences, which brings challenges and opportunities. Making reliable predictions, exploring associations and extracting meaningful information in high-dimensional data are some of the problems that are yet to be solved. Due to intrinsic properties of high-dimensional spaces such as distance concentration and hubness, traditional classification and clustering algorithms face difficult challenges. In general, a Multiple Classifier System (MCS) provides better classification accuracy than individual classifiers. One of the most promising approaches to MCS is Dynamic Selection (DS) methods, which work by selecting classifiers on the fly, according to each unknown test sample. The rationale behind this is that not every classifier is an expert in predicting all samples, rather each classifier or a combination of classifiers is an expert in a different region of the feature space; whose quality can significantly impact the overall performance. This thesis provides three major contributions. First, traditional DS methods fail to perform effectively in high-dimensional data sets due to the use of a k-Nearest Neighbour (k-NN) to define the region competence and, moreover, they do not indicate which are the most important features for classification. Second, two frameworks were proposed the Subspace-Based Dynamic Selection (SBDS) and the Classifier SBDS (cSBDS) which integrate characteristics of DS methods and subspace clustering. Subspace clustering methods localise their search for clusters and are able to uncover clusters that exist in multiple, possible overlapping subspaces of features and/or samples. The subspace clustering approach separates the high-dimensional feature space into small feature spaces with a reduced number of features and samples in each one. The results indicate that the cSBDS framework performs statistically better when compared to DS methods and majority voting on real-world and synthetic datasets. Third, we provide a comparison between the features selected by the cSBDS framework and feature importance methods. The results indicate that for high-dimensional datasets, the cSBDS framework is able to capture the most important features when the number of clusters per class is increased, while traditional feature importance methods lose this capability

    Subspace-based dynamic selection for high-dimensional data

    Get PDF
    The number of features collected has increased greatly in the past decade, particularly in medicine and life sciences, which brings challenges and opportunities. Making reliable predictions, exploring associations and extracting meaningful information in high-dimensional data are some of the problems that are yet to be solved. Due to intrinsic properties of high-dimensional spaces such as distance concentration and hubness, traditional classification and clustering algorithms face difficult challenges. In general, a Multiple Classifier System (MCS) provides better classification accuracy than individual classifiers. One of the most promising approaches to MCS is Dynamic Selection (DS) methods, which work by selecting classifiers on the fly, according to each unknown test sample. The rationale behind this is that not every classifier is an expert in predicting all samples, rather each classifier or a combination of classifiers is an expert in a different region of the feature space; whose quality can significantly impact the overall performance. This thesis provides three major contributions. First, traditional DS methods fail to perform effectively in high-dimensional data sets due to the use of a k-Nearest Neighbour (k-NN) to define the region competence and, moreover, they do not indicate which are the most important features for classification. Second, two frameworks were proposed the Subspace-Based Dynamic Selection (SBDS) and the Classifier SBDS (cSBDS) which integrate characteristics of DS methods and subspace clustering. Subspace clustering methods localise their search for clusters and are able to uncover clusters that exist in multiple, possible overlapping subspaces of features and/or samples. The subspace clustering approach separates the high-dimensional feature space into small feature spaces with a reduced number of features and samples in each one. The results indicate that the cSBDS framework performs statistically better when compared to DS methods and majority voting on real-world and synthetic datasets. Third, we provide a comparison between the features selected by the cSBDS framework and feature importance methods. The results indicate that for high-dimensional datasets, the cSBDS framework is able to capture the most important features when the number of clusters per class is increased, while traditional feature importance methods lose this capability

    Detecting danger in roads: an immune-inspired technique to identify heavy goods vehicles incident hot spots

    Get PDF
    We report on the adaptation of an immune-inspired instance selection technique to solve a real-world big data problem of determining vehicle incident hot spots. The technique, which is inspired by the Immune System self-regulation mechanism, was originally conceptualised to eliminate very similar instances in data classification tasks. We adapt the method to detect hot spots from a telematics data set containing hundreds of thousands of data points indicating incident locations involving heavy goods vehicles (HGVs) across the United Kingdom. The objective is to provide HGV drivers with information regarding areas of high likelihood of incidents in order to continuously improve road safety and vehicle economy. The problem presents several challenges and constraints. An accurate view of the hot spots produced in a timely manner is necessary. In addition, the solution is required to be adaptable and dynamic, as thousands of new incidents are included in the database daily. Furthermore, the impact on hot spots after informing drivers about their existence has to be considered. Our solution successfully addresses these constraints. It is fast, robust, and applicable to all different incidents investigated. The method is also self-adjustable, which means that if the hot spots’ configuration changes with time, the algorithm automatically evolves to present the most current topology. Our solution has been implemented by a telematics company to improve HGV safety in the United Kingdom

    An Immune-Inspired Technique to Identify Heavy Goods Vehicles Incident Hot Spots

    Get PDF
    We report on the adaptation of an immune-inspired instance selection technique to solve a real-world big data problem of determining vehicle incident hot spots. The technique, which is inspired by the Immune System self-regulation mechanism, was originally conceptualised to eliminate very similar instances in data classification tasks. We adapt the method to detect hot spots from a telematics data set containing hundreds of thousands of data points indicating incident locations involving heavy goods vehicles (HGVs) across the United Kingdom. The objective is to provide HGV drivers with information regarding areas of high likelihood of incidents in order to continuously improve road safety and vehicle economy. The problem presents several challenges and constraints. An accurate view of the hot spots produced in a timely manner is necessary. In addition, the solution is required to be adaptable and dynamic, as thousands of new incidents are included in the database daily. Furthermore, the impact on hot spots after informing drivers about their existence has to be considered. Our solution successfully addresses these constraints. It is fast, robust, and applicable to all different incidents investigated. The method is also self-adjustable, which means that if the hot spots’ configuration changes with time, the algorithm automatically evolves to present the most current topology. Our solution has been implemented by a telematics company to improve HGV safety in the United Kingdom

    Mass spectrometry and machine learning for the accurate diagnosis of benzylpenicillin and multidrug resistance of Staphylococcus aureus in bovine mastitis

    Get PDF
    Staphylococcus aureus is a serious human and animal pathogen threat exhibiting extraordinary capacity for acquiring new antibiotic resistance traits in the pathogen population worldwide.The development of fast, affordable and effective diagnostic solutions capable of discriminating between antibiotic-resistant and susceptible S. aureus strains would be of huge benefit for effective disease detection and treatment. Here we develop a diagnostics solution that uses Matrix-Assisted Laser Desorption/Ionisation–Time of Flight Mass Spectrometry (MALDI-TOF) and machine learning, to identify signature profiles of antibiotic resistance to either multidrug or benzylpenicillin in S. aureus isolates. Using ten different supervised learning techniques, we have analysed a set of 82 S. aureus isolates collected from 67 cows diagnosed with bovine mastitis across 24 farms. For the multidrug phenotyping analysis, LDA, linear SVM, RBF SVM, logistic regression, naïve Bayes, MLP neural network and QDA had Cohen’s kappa values over 85.00%. For the benzylpenicillin phenotyping analysis, RBF SVM, MLP neural network, naïve Bayes, logistic regression, linear SVM, QDA, LDA, and random forests had Cohen’s kappa values over 85.00%. For the benzylpenicillin the diagnostic systems achieved up to (mean result ± standard deviation over 30 runs on the test set): accuracy = 97.54% ± 1.91%, sensitivity = 99.93% ± 0.25%, specificity = 95.04% ± 3.83%, and Cohen’s kappa = 95.04% ± 3.83%. Moreover, the diagnostic platform complemented by a protein-protein network and 3D structural protein information framework allowed the identification of five molecular determinants underlying the susceptible and resistant profiles. Four proteins were able to classify multidrug-resistant and susceptible strains with 96.81% ± 0.43% accuracy. Five proteins, including the previous four, were able to classify benzylpenicillin resistant and susceptible strains with 97.54% ± 1.91% accuracy. Our approach may open up new avenues for the development of a fast, affordable and effective day-to-day diagnostic solution, which would offer new opportunities for targeting resistant bacteria

    Evaluation of insulin resistance and lipid profile in turner syndrome

    Get PDF
    OBJECTIVE: To evaluate the presence of insulin resistance (IR) and changes in lipid profile in Turner Syndrome (TS), and to check the influence of age, karyotype, systemic arterial hypertension (SAH), height, weight, body mass index (BMI), and pubertal development. PATIENTS AND METHODS: A transversal study of 35 TS patients, confirmed with karyotype (5 to 43 years), without previous use of anabolic steroid or hGH, with evaluation of blood pressure, pubertal development, anthropometric data, measurement of waist (W), hip (H), W to H ratio, total cholesterol, HDL, triglycerides (TGC), LDL, insulin and glucose. HOMA and QUICKI indexes were calculated, as well as glucose to insulin ratio (G/I). Data were examined by the Mann-Whitney and Spearman tests. RESULTS: Ten patients were >20 years. Seventeen had a 45,X karyotype and 6 structural aberrations; differences of the variables in relation to the karyotypes were not observed; 15 were nonpubertal and 20 pubertal; TGC and HOMA were significantly higher in puberty, while G/I was lower. Seven had normal height, 8 had BMI >25Kg/m2 (6 between 25 and 30, and 2 >30), and 19 W/H >0.85. Cholesterol levels were 180 ± 42mg% (4 >240); HDL 57 ± 16mg%; LDL 99 ± 34mg%; TGC 108 ± 96mg% (2 >200); HOMA 1.01 ± 0.71; QUICKI 0.4 ± 0.04 and G/I 23.5 ± 12.1 (2 20 anos. O cariótipo 45,X ocorreu em 17, e 6 com aberrações estruturais; não houve diferenças das variáveis em relação aos cariótipos. Quinze eram impúberes e 20 púberes; os TGC e o HOMA foram significativamente maiores na puberdade, e a G/I menor. Sete com estatura normal, 8 com IMC >25Kg/m2 (6 entre 25 e 30, e 2 >30), 19 com C/Q >0,85. O colesterol foi de 180 ± 42mg% (4 >240); o HDL de 57 ± 16mg%; o LDL de 99 ± 34mg%; os TGC de 108 ± 96mg% (2 >200); o HOMA de 1,01 ± 0,71; o QUICKI de 0,4 ± 0,04 e a G/I de 23,5 ± 12,1 (2 <7,0). CONCLUSÕES: Observaram-se alterações no perfil lipídico independentemente de faixa etária, cariótipo, PA e obesidade, porém agravadas pela RI, que foi menos freqüente do que descrita na literatura, parecendo relacionada à idade cronológica, obesidade e reposição estrogênica.278285Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq

    Eficiência de herbicidas na supressão de rebrote de touceiras de capim-amargoso

    Get PDF
    Sourgrass (Digitaria insularis) is a species native to the Americas, in tropical and subtropical regions. In areas where there is continuous use of glyphosate, plants develop clumps with rhizomes and become difficult to control, leading to the possibility of occurring regrowth after the herbicide treatments. This study evaluated the efficacy of different herbicide treatments to suppress the regrowth of D. insularis. The experiment was conducted in Nova Aurora – PR, from July to August 2013. The treatments were check (no herbicide), glyphosate + clethodim, glyphosate + imazethapyr + clethodim and glyphosate + s-metolachlor. We used a CO2 precision sprayer fitted with tips with flat fan nozzles ADIA 110.02. Evaluations were performed at 21, 28 and 35 days after application (DAA). The results showed that all treatments were suitable in suppressing regrowth of D. insularis, compared to no a played plants.O capim-amargoso (Digitaria insularis) é uma espécie originaria das Américas, nas regiões tropicais e subtropicais. Em áreas onde há uso contínuo de glyphosate, plantas entouceiradas e com rizomas tornam-se de difícil controle, podendo ocorrer rebrotes após os tratamentos. Este trabalho objetivou avaliar a eficiência de diferentes tratamentos herbicidas em suprimir a rebrota de D. insularis. O experimento foi conduzido no município de Nova Aurora - PR, nos meses de julho e agosto de 2013. Os tratamentos utilizados foram testemunha sem herbicida, glyphosate + clethodim, glyphosate + imazethapyr e glyphosate + clethodim + s-metolachlor. Utilizou-se um pulverizador de precisão a CO2, munido de pontas do tipo leque ADIA 110.02. As avaliações foram realizadas aos 21, 28 e 35 dias após a aplicação (DAA). Os resultados mostraram que todos os tratamentos apresentaram êxito na supressão da rebrota de D. insularis, em relação à testemunha que não recebeu herbicida

    Genome-Scale Metabolic Models and Machine Learning Reveal Genetic Determinants of Antibiotic Resistance in Escherichia coli and Unravel the Underlying Metabolic Adaptation Mechanisms

    Get PDF
    Antimicrobial resistance (AMR) is becoming one of the largest threats to public health worldwide, with the opportunistic pathogen Escherichia coli playing a major role in the AMR global health crisis. Unravelling the complex interplay between drug resistance and metabolic rewiring is key to understand the ability of bacteria to adapt to new treatments and to the development of new effective solutions to combat resistant infections. We developed a computational pipeline that combines machine learning with genome-scale metabolic models (GSMs) to elucidate the systemic relationships between genetic determinants of resistance and metabolism beyond annotated drug resistance genes. Our approach was used to identify genetic determinants of 12 AMR profiles for the opportunistic pathogenic bacterium E. coli. Then, to interpret the large number of identified genetic determinants, we applied a constraint-based approach using the GSM to predict the effects of genetic changes on growth, metabolite yields, and reaction fluxes. Our computational platform leads to multiple results. First, our approach corroborates 225 known AMR-conferring genes, 35 of which are known for the specific antibiotic. Second, integration with the GSM predicted 20 top-ranked genetic determinants (including accA, metK, fabD, fabG, murG, lptG, mraY, folP, and glmM) essential for growth, while a further 17 top-ranked genetic determinants linked AMR to auxotrophic behavior. Third, clusters of AMR-conferring genes affecting similar metabolic processes are revealed, which strongly suggested that metabolic adaptations in cell wall, energy, iron and nucleotide metabolism are associated with AMR. The computational solution can be used to study other human and animal pathogens.IMPORTANCE Escherichia coli is a major public health concern given its increasing level of antibiotic resistance worldwide and extraordinary capacity to acquire and spread resistance via horizontal gene transfer with surrounding species and via mutations in its existing genome. E. coli also exhibits a large amount of metabolic pathway redundancy, which promotes resistance via metabolic adaptability. In this study, we developed a computational approach that integrates machine learning with metabolic modeling to understand the correlation between AMR and metabolic adaptation mechanisms in this model bacterium. Using our approach, we identified AMR genetic determinants associated with cell wall modifications for increased permeability, virulence factor manipulation of host immunity, reduction of oxidative stress toxicity, and changes to energy metabolism. Unravelling the complex interplay between antibiotic resistance and metabolic rewiring may open new opportunities to understand the ability of E. coli, and potentially of other human and animal pathogens, to adapt to new treatments

    Whole-genome sequencing and gene sharing network analysis powered by machine learning identifies antibiotic resistance sharing between animals, humans and environment in livestock farming

    Get PDF
    Anthropogenic environments such as those created by intensive farming of livestock, have been proposed to provide ideal selection pressure for the emergence of antimicrobial-resistant Escherichia coli bacteria and antimicrobial resistance genes (ARGs) and spread to humans. Here, we performed a longitudinal study in a large-scale commercial poultry farm in China, collecting E. coli isolates from both farm and slaughterhouse; targeting animals, carcasses, workers and their households and environment. By using whole-genome phylogenetic analysis and network analysis based on single nucleotide polymorphisms (SNPs), we found highly interrelated non-pathogenic and pathogenic E. coli strains with phylogenetic intermixing, and a high prevalence of shared multidrug resistance profiles amongst livestock, human and environment. Through an original data processing pipeline which bcombines omics, machine learning, gene sharing network and mobile genetic elements analysis, we investigated the resistance to 26 different antimicrobials and identified 361 genes associated to antimicrobial resistance (AMR) phenotypes; 58 of these were known AMR-associated genes and 35 were associated to multidrug resistance. We uncovered an extensive network of genes, correlated to AMR phenotypes, shared among livestock, humans, farm and slaughterhouse environments. We also found several human, livestock and environmental isolates sharing closely related mobile genetic elements carrying ARGs across host species and environments. In a scenario where no consensus exists on how antibiotic use in the livestock may affect antibiotic resistance in the human population, our findings provide novel insights into the broader epidemiology of antimicrobial resistance in livestock farming. Moreover, our original data analysis method has the potential to uncover AMR transmission pathways when applied to the study of other pathogens active in other anthropogenic environments characterised by complex interconnections between host species

    Dissecting microbial communities and resistomes for interconnected humans, soil, and livestock

    Get PDF
    A debate is currently ongoing as to whether intensive livestock farms may constitute reservoirs of clinically relevant antimicrobial resistance (AMR), thus posing a threat to surrounding communities. Here, combining shotgun metagenome sequencing, machine learning (ML), and culture-based methods, we focused on a poultry farm and connected slaughterhouse in China, investigating the gut microbiome of livestock, workers and their households, and microbial communities in carcasses and soil. For both the microbiome and resistomes in this study, differences are observed across environments and hosts. However, at a finer scale, several similar clinically relevant antimicrobial resistance genes (ARGs) and similar associated mobile genetic elements were found in both human and broiler chicken samples. Next, we focused on Escherichia coli, an important indicator for the surveillance of AMR on the farm. Strains of E. coli were found intermixed between humans and chickens. We observed that several ARGs present in the chicken faecal resistome showed correlation to resistance/susceptibility profiles of E. coli isolates cultured from the same samples. Finally, by using environmental sensing these ARGs were found to be correlated to variations in environmental temperature and humidity. Our results show the importance of adopting a multi-domain and multi-scale approach when studying microbial communities and AMR in complex, interconnected environments
    • …
    corecore