2,493 research outputs found

    BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments

    Get PDF
    Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing (HPC) techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems (SWfMS) and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process

    Navigating Healthcare Insights: A Birds Eye View of Explainability with Knowledge Graphs

    Full text link
    Knowledge graphs (KGs) are gaining prominence in Healthcare AI, especially in drug discovery and pharmaceutical research as they provide a structured way to integrate diverse information sources, enhancing AI system interpretability. This interpretability is crucial in healthcare, where trust and transparency matter, and eXplainable AI (XAI) supports decision making for healthcare professionals. This overview summarizes recent literature on the impact of KGs in healthcare and their role in developing explainable AI models. We cover KG workflow, including construction, relationship extraction, reasoning, and their applications in areas like Drug-Drug Interactions (DDI), Drug Target Interactions (DTI), Drug Development (DD), Adverse Drug Reactions (ADR), and bioinformatics. We emphasize the importance of making KGs more interpretable through knowledge-infused learning in healthcare. Finally, we highlight research challenges and provide insights for future directions.Comment: IEEE AIKE 2023, 8 Page

    BowSaw: inferring higher-order trait interactions associated with complex biological phenotypes

    Get PDF
    Machine learning is helping the interpretation of biological complexity by enabling the inference and classification of cellular, organismal and ecological phenotypes based on large datasets, e.g. from genomic, transcriptomic and metagenomic analyses. A number of available algorithms can help search these datasets to uncover patterns associated with specific traits, including disease-related attributes. While, in many instances, treating an algorithm as a black box is sufficient, it is interesting to pursue an enhanced understanding of how system variables end up contributing to a specific output, as an avenue towards new mechanistic insight. Here we address this challenge through a suite of algorithms, named BowSaw, which takes advantage of the structure of a trained random forest algorithm to identify combinations of variables (“rules”) frequently used for classification. We first apply BowSaw to a simulated dataset, and show that the algorithm can accurately recover the sets of variables used to generate the phenotypes through complex Boolean rules, even under challenging noise levels. We next apply our method to data from the integrative Human Microbiome Project and find previously unreported high-order combinations of microbial taxa putatively associated with Crohn’s disease. By leveraging the structure of trees within a random forest, BowSaw provides a new way of using decision trees to generate testable biological hypotheses.Accepted manuscrip

    An Architecture for Simplified and Automated Machine Learning

    Get PDF
    learning has been adopted by businesses to analyze their vast data in order to make strategic decision. However, knowledge in machine learning and technical skill are usually required to prepare data and perform machine learning tasks. This obstacle prevents smaller businesses with no technical knowledge to utilize machine learning. In this paper, we propose an architecture for simplified and automated machine learning process currently supporting the data classification task. The architecture includes a method for characterizing datasets, which allows for simplifying and automating machine learning model and hyperparameter selection based on historical execution configurations. Users can simply upload their datasets via a web browser, and the system will determine the possible models and their hyperparameter configurations for the users to choose from. The prototype shows the feasibility of the proposed architecture. Although the accuracy is still limited by the small execution history and the cleanliness of the input datasets, the architecture can minimize user involvement in the machine learning process so that non-technical users can perform data classification through a web browser without installing any software

    A Computational Approach to Predict the Severity of Breast Cancer through Machine Learning Algorithms

    Get PDF
    Breast cancer is one type of cancer which causes from breast tissue. A lump in the breast, skin dimpling, breast shape changes, fluid from the nipple, or a red scaly patch of skin are some of signs of breast cancer. In the world, cancer is one of the most leading causes of deaths among the women. Among the cancer diseases, breast cancer is especially a concern in women. Mammography is one of the methods for finding tumor in the breast. This method is utilized to detect the cancer which is helpful for the doctor or radiologists. Due to the inexperience�s in the field of cancer detection, the abnormality is missed by doctor or Radiologists. Segmentation is very expensive for doctor and radiologists to examine the data in the mammogram. In mammogram the accuracy rate is based on the image segmentation. The recent clustering techniques are presented in this paper for detection of breast cancer. These Classification algorithms have been mostly studied which is applied in a various application areas. To maximize the efficiency of the searching process various clustering techniques are recommended. In this paper, we have presented a survey of Classification techniques

    MatGD: Materials Graph Digitizer

    Full text link
    We have developed MatGD (Material Graph Digitizer), which is a tool for digitizing a data line from scientific graphs. The algorithm behind the tool consists of four steps: (1) identifying graphs within subfigures, (2) separating axes and data sections, (3) discerning the data lines by eliminating irrelevant graph objects and matching with the legend, and (4) data extraction and saving. From the 62,534 papers in the areas of batteries, catalysis, and MOFs, 501,045 figures were mined. Remarkably, our tool showcased performance with over 99% accuracy in legend marker and text detection. Moreover, its capability for data line separation stood at 66%, which is much higher compared to other existing figure mining tools. We believe that this tool will be integral to collecting both past and future data from publications, and these data can be used to train various machine learning models that can enhance material predictions and new materials discovery.Comment: 23 pages, 4 figure

    Machine Learning Made Easy (MLme): A Comprehensive Toolkit for Machine Learning-Driven Data Analysis.

    Get PDF
    BACKGROUND Machine learning (ML) has emerged as a vital asset for researchers to analyze and extract valuable information from complex datasets. However, developing an effective and robust ML pipeline can present a real challenge, demanding considerable time and effort, thereby impeding research progress. Existing tools in this landscape require a profound understanding of ML principles and programming skills. Furthermore, users are required to engage in the comprehensive configuration of their ML pipeline to obtain optimal performance. RESULTS To address these challenges, we have developed a novel tool called Machine Learning Made Easy (MLme) that streamlines the use of ML in research, specifically focusing on classification problems at present. By integrating four essential functionalities, namely Data Exploration, AutoML, CustomML, and Visualization, MLme fulfills the diverse requirements of researchers while eliminating the need for extensive coding efforts. To demonstrate the applicability of MLme, we conducted rigorous testing on six distinct datasets, each presenting unique characteristics and challenges. Our results consistently showed promising performance across different datasets, reaffirming the versatility and effectiveness of the tool. Additionally, by utilizing MLme's feature selection functionality, we successfully identified significant markers for CD8+ naive (BACH2), CD16+ (CD16), and CD14+ (VCAN) cell populations. CONCLUSION MLme serves as a valuable resource for leveraging machine learning (ML) to facilitate insightful data analysis and enhance research outcomes, while alleviating concerns related to complex coding scripts. The source code and a detailed tutorial for MLme are available at https://github.com/FunctionalUrology/MLme
    corecore