3 research outputs found

    EndToEndML: An Open-Source End-to-End Pipeline for Machine Learning Applications

    Full text link
    Artificial intelligence (AI) techniques are widely applied in the life sciences. However, applying innovative AI techniques to understand and deconvolute biological complexity is hindered by the learning curve for life science scientists to understand and use computing languages. An open-source, user-friendly interface for AI models, that does not require programming skills to analyze complex biological data will be extremely valuable to the bioinformatics community. With easy access to different sequencing technologies and increased interest in different 'omics' studies, the number of biological datasets being generated has increased and analyzing these high-throughput datasets is computationally demanding. The majority of AI libraries today require advanced programming skills as well as machine learning, data preprocessing, and visualization skills. In this research, we propose a web-based end-to-end pipeline that is capable of preprocessing, training, evaluating, and visualizing machine learning (ML) models without manual intervention or coding expertise. By integrating traditional machine learning and deep neural network models with visualizations, our library assists in recognizing, classifying, clustering, and predicting a wide range of multi-modal, multi-sensor datasets, including images, languages, and one-dimensional numerical data, for drug discovery, pathogen classification, and medical diagnostics.Comment: 2024 7th International Conference on Information and Computer Technologies (ICICT

    Exploring Pathogen Presence Prediction in Pastured Poultry Farms through Transformer-Based Models and Attention Mechanism Explainability

    No full text
    In this study, we explore how transformer models, which are known for their attention mechanisms, can improve pathogen prediction in pastured poultry farming. By combining farm management practices with microbiome data, our model outperforms traditional prediction methods in terms of the F1 score—an evaluation metric for model performance—thus fulfilling an essential need in predictive microbiology. Additionally, the emphasis is on making our model’s predictions explainable. We introduce a novel approach for identifying feature importance using the model’s attention matrix and the PageRank algorithm, offering insights that enhance our comprehension of established techniques such as DeepLIFT. Our results showcase the efficacy of transformer models in pathogen prediction for food safety and mark a noteworthy contribution to the progress of explainable AI within the biomedical sciences. This study sheds light on the impact of effective farm management practices and highlights the importance of technological advancements in ensuring food safety

    Predicting Salmonella MIC and Deciphering Genomic Determinants of Antibiotic Resistance and Susceptibility

    No full text
    Salmonella spp., a leading cause of foodborne illness, is a formidable global menace due to escalating antimicrobial resistance (AMR). The evaluation of minimum inhibitory concentration (MIC) for antimicrobials is critical for characterizing AMR. The current whole genome sequencing (WGS)-based approaches for predicting MIC are hindered by both computational and feature identification constraints. We propose an innovative methodology called the “Genome Feature Extractor Pipeline” that integrates traditional machine learning (random forest, RF) with deep learning models (multilayer perceptron (MLP) and DeepLift) for WGS-based MIC prediction. We used a dataset from the National Antimicrobial Resistance Monitoring System (NARMS), comprising 4500 assembled genomes of nontyphoidal Salmonella, each annotated with MIC metadata for 15 antibiotics. Our pipeline involves the batch downloading of annotated genomes, the determination of feature importance using RF, Gini-index-based selection of crucial 10-mers, and their expansion to 20-mers. This is followed by an MLP network, with four hidden layers of 1024 neurons each, to predict MIC values. Using DeepLift, key 20-mers and associated genes influencing MIC are identified. The 10 most significant 20-mers for each antibiotic are listed, showcasing our ability to discern genomic features affecting Salmonella MIC prediction with enhanced precision. The methodology replaces binary indicators with k-mer counts, offering a more nuanced analysis. The combination of RF and MLP addresses the limitations of the existing WGS approach, providing a robust and efficient method for predicting MIC values in Salmonella that could potentially be applied to other pathogens
    corecore