399 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Healthy kidney segmentation in the dce-mr images using a convolutional neural network and temporal signal characteristics

    Get PDF
    Quantification of renal perfusion based on dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) requires determination of signal intensity time courses in the region of renal parenchyma. Thus, selection of voxels representing the kidney must be accomplished with special care and constitutes one of the major technical limitations which hampers wider usage of this technique as a standard clinical routine. Manual segmentation of renal compartments—even if performed by experts—is a common source of decreased repeatability and reproducibility. In this paper, we present a processing framework for the automatic kidney segmentation in DCE-MR images. The framework consists of two stages. Firstly, kidney masks are generated using a convolutional neural network. Then, mask voxels are classified to one of three regions—cortex, medulla, and pelvis–based on DCE-MRI signal intensity time courses. The proposed approach was evaluated on a cohort of 10 healthy volunteers who underwent the DCE-MRI examination. MRI scanning was repeated on two time events within a 10-day interval. For semantic segmentation task we employed a classic U-Net architecture, whereas experiments on voxel classification were performed using three alternative algorithms—support vector machines, logistic regression and extreme gradient boosting trees, among which SVM produced the most accurate results. Both segmentation and classification steps were accomplished by a series of models, each trained separately for a given subject using the data from other participants only. The mean achieved accuracy of the whole kidney segmentation was 94% in terms of IoU coefficient. Cortex, medulla and pelvis were segmented with IoU ranging from 90 to 93% depending on the tissue and body side. The results were also validated by comparing image-derived perfusion parameters with ground truth measurements of glomerular filtration rate (GFR). The repeatability of GFR calculation, as assessed by the coefficient of variation was determined at the level of 14.5 and 17.5% for the left and right kidney, respectively and it improved relative to manual segmentation. Reproduciblity, in turn, was evaluated by measuring agreement between image-derived and iohexol-based GFR values. The estimated absolute mean differences were equal to 9.4 and 12.9 mL/min/1.73 m2 for scanning sessions 1 and 2 and the proposed automated segmentation method. The result for session 2 was comparable with manual segmentation, whereas for session 1 reproducibility in the automatic pipeline was weaker.publishedVersio

    The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text

    Get PDF
    BACKGROUND: Determining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional metrics. The BioCreative III Protein-Protein Interaction (PPI) tasks were motivated by such considerations, trying to address aspects including how the end user would oversee the generated output, for instance by providing ranked results, textual evidence for human interpretation or measuring time savings by using automated systems. Detecting articles describing complex biological events like PPIs was addressed in the Article Classification Task (ACT), where participants were asked to implement tools for detecting PPI-describing abstracts. Therefore the BCIII-ACT corpus was provided, which includes a training, development and test set of over 12,000 PPI relevant and non-relevant PubMed abstracts labeled manually by domain experts and recording also the human classification times. The Interaction Method Task (IMT) went beyond abstracts and required mining for associations between more than 3,500 full text articles and interaction detection method ontology concepts that had been applied to detect the PPIs reported in them.RESULTS:A total of 11 teams participated in at least one of the two PPI tasks (10 in ACT and 8 in the IMT) and a total of 62 persons were involved either as participants or in preparing data sets/evaluating these tasks. Per task, each team was allowed to submit five runs offline and another five online via the BioCreative Meta-Server. From the 52 runs submitted for the ACT, the highest Matthew's Correlation Coefficient (MCC) score measured was 0.55 at an accuracy of 89 and the best AUC iP/R was 68. Most ACT teams explored machine learning methods, some of them also used lexical resources like MeSH terms, PSI-MI concepts or particular lists of verbs and nouns, some integrated NER approaches. For the IMT, a total of 42 runs were evaluated by comparing systems against manually generated annotations done by curators from the BioGRID and MINT databases. The highest AUC iP/R achieved by any run was 53, the best MCC score 0.55. In case of competitive systems with an acceptable recall (above 35) the macro-averaged precision ranged between 50 and 80, with a maximum F-Score of 55. CONCLUSIONS: The results of the ACT task of BioCreative III indicate that classification of large unbalanced article collections reflecting the real class imbalance is still challenging. Nevertheless, text-mining tools that report ranked lists of relevant articles for manual selection can potentially reduce the time needed to identify half of the relevant articles to less than 1/4 of the time when compared to unranked results. Detecting associations between full text articles and interaction detection method PSI-MI terms (IMT) is more difficult than might be anticipated. This is due to the variability of method term mentions, errors resulting from pre-processing of articles provided as PDF files, and the heterogeneity and different granularity of method term concepts encountered in the ontology. However, combining the sophisticated techniques developed by the participants with supporting evidence strings derived from the articles for human interpretation could result in practical modules for biological annotation workflows

    Deep Learning in Medical Image Analysis

    Get PDF
    The accelerating power of deep learning in diagnosing diseases will empower physicians and speed up decision making in clinical environments. Applications of modern medical instruments and digitalization of medical care have generated enormous amounts of medical images in recent years. In this big data arena, new deep learning methods and computational models for efficient data processing, analysis, and modeling of the generated data are crucially important for clinical applications and understanding the underlying biological process. This book presents and highlights novel algorithms, architectures, techniques, and applications of deep learning for medical image analysis

    Predicting Occurrence of the Term Sarcopenia with Semi-Supervised Machine Learning

    Get PDF
    Sarcopenia is a medical condition that involves loss of muscle mass. It has been difficult todefine and only recently assigned an official medical code, leading to many medical records lacking a coded diagnosis although the clinical note text may discuss it or symptoms of it. This thesis investigates the application of machine learning and natural language processing to analyze clinical note text to see how well the term ’sarcopenia’ can be predicted in clinical note text from records concerning the condition. A variety of machine learning models combined with different features and text processingare tested against training data that mentions the term and test data that is coded for the condition from small datasets from the Medical College of Wisconsin. This research showed that no tested configurations performed exceptionally well, nor combinations of features, based on the F1 score. Still, some models did show promise, especially those classifying with a support vector machine, as well as other classifiers such as decision trees, gradient boosting and logistic regression. Based on this initial research, while some of the ideas and approaches here did not perform great on the data studied, they provide many some insight and paths forward to extend them and apply them on larger and more precise datasets

    Statistical Data Modeling and Machine Learning with Applications

    Get PDF
    The modeling and processing of empirical data is one of the main subjects and goals of statistics. Nowadays, with the development of computer science, the extraction of useful and often hidden information and patterns from data sets of different volumes and complex data sets in warehouses has been added to these goals. New and powerful statistical techniques with machine learning (ML) and data mining paradigms have been developed. To one degree or another, all of these techniques and algorithms originate from a rigorous mathematical basis, including probability theory and mathematical statistics, operational research, mathematical analysis, numerical methods, etc. Popular ML methods, such as artificial neural networks (ANN), support vector machines (SVM), decision trees, random forest (RF), among others, have generated models that can be considered as straightforward applications of optimization theory and statistical estimation. The wide arsenal of classical statistical approaches combined with powerful ML techniques allows many challenging and practical problems to be solved. This Special Issue belongs to the section “Mathematics and Computer Science”. Its aim is to establish a brief collection of carefully selected papers presenting new and original methods, data analyses, case studies, comparative studies, and other research on the topic of statistical data modeling and ML as well as their applications. Particular attention is given, but is not limited, to theories and applications in diverse areas such as computer science, medicine, engineering, banking, education, sociology, economics, among others. The resulting palette of methods, algorithms, and applications for statistical modeling and ML presented in this Special Issue is expected to contribute to the further development of research in this area. We also believe that the new knowledge acquired here as well as the applied results are attractive and useful for young scientists, doctoral students, and researchers from various scientific specialties

    Machine Learning in Tribology

    Get PDF
    Tribology has been and continues to be one of the most relevant fields, being present in almost all aspects of our lives. The understanding of tribology provides us with solutions for future technical challenges. At the root of all advances made so far are multitudes of precise experiments and an increasing number of advanced computer simulations across different scales and multiple physical disciplines. Based upon this sound and data-rich foundation, advanced data handling, analysis and learning methods can be developed and employed to expand existing knowledge. Therefore, modern machine learning (ML) or artificial intelligence (AI) methods provide opportunities to explore the complex processes in tribological systems and to classify or quantify their behavior in an efficient or even real-time way. Thus, their potential also goes beyond purely academic aspects into actual industrial applications. To help pave the way, this article collection aimed to present the latest research on ML or AI approaches for solving tribology-related issues generating true added value beyond just buzzwords. In this sense, this Special Issue can support researchers in identifying initial selections and best practice solutions for ML in tribology

    Image Processing and Analysis for Preclinical and Clinical Applications

    Get PDF
    Radiomics is one of the most successful branches of research in the field of image processing and analysis, as it provides valuable quantitative information for the personalized medicine. It has the potential to discover features of the disease that cannot be appreciated with the naked eye in both preclinical and clinical studies. In general, all quantitative approaches based on biomedical images, such as positron emission tomography (PET), computed tomography (CT) and magnetic resonance imaging (MRI), have a positive clinical impact in the detection of biological processes and diseases as well as in predicting response to treatment. This Special Issue, “Image Processing and Analysis for Preclinical and Clinical Applications”, addresses some gaps in this field to improve the quality of research in the clinical and preclinical environment. It consists of fourteen peer-reviewed papers covering a range of topics and applications related to biomedical image processing and analysis

    Artificial Intelligence in Image-Based Screening, Diagnostics, and Clinical Care of Cardiopulmonary Diseases

    Get PDF
    Cardiothoracic and pulmonary diseases are a significant cause of mortality and morbidity worldwide. The COVID-19 pandemic has highlighted the lack of access to clinical care, the overburdened medical system, and the potential of artificial intelligence (AI) in improving medicine. There are a variety of diseases affecting the cardiopulmonary system including lung cancers, heart disease, tuberculosis (TB), etc., in addition to COVID-19-related diseases. Screening, diagnosis, and management of cardiopulmonary diseases has become difficult owing to the limited availability of diagnostic tools and experts, particularly in resource-limited regions. Early screening, accurate diagnosis and staging of these diseases could play a crucial role in treatment and care, and potentially aid in reducing mortality. Radiographic imaging methods such as computed tomography (CT), chest X-rays (CXRs), and echo ultrasound (US) are widely used in screening and diagnosis. Research on using image-based AI and machine learning (ML) methods can help in rapid assessment, serve as surrogates for expert assessment, and reduce variability in human performance. In this Special Issue, “Artificial Intelligence in Image-Based Screening, Diagnostics, and Clinical Care of Cardiopulmonary Diseases”, we have highlighted exemplary primary research studies and literature reviews focusing on novel AI/ML methods and their application in image-based screening, diagnosis, and clinical management of cardiopulmonary diseases. We hope that these articles will help establish the advancements in AI

    Scalable Text Mining with Sparse Generative Models

    Get PDF
    The information age has brought a deluge of data. Much of this is in text form, insurmountable in scope for humans and incomprehensible in structure for computers. Text mining is an expanding field of research that seeks to utilize the information contained in vast document collections. General data mining methods based on machine learning face challenges with the scale of text data, posing a need for scalable text mining methods. This thesis proposes a solution to scalable text mining: generative models combined with sparse computation. A unifying formalization for generative text models is defined, bringing together research traditions that have used formally equivalent models, but ignored parallel developments. This framework allows the use of methods developed in different processing tasks such as retrieval and classification, yielding effective solutions across different text mining tasks. Sparse computation using inverted indices is proposed for inference on probabilistic models. This reduces the computational complexity of the common text mining operations according to sparsity, yielding probabilistic models with the scalability of modern search engines. The proposed combination provides sparse generative models: a solution for text mining that is general, effective, and scalable. Extensive experimentation on text classification and ranked retrieval datasets are conducted, showing that the proposed solution matches or outperforms the leading task-specific methods in effectiveness, with a order of magnitude decrease in classification times for Wikipedia article categorization with a million classes. The developed methods were further applied in two 2014 Kaggle data mining prize competitions with over a hundred competing teams, earning first and second places
    corecore