333 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Transcriptomics in Toxicogenomics, Part III: Data Modelling for Risk Assessment

    Get PDF
    Transcriptomics data are relevant to address a number of challenges in Toxicogenomics (TGx). After careful planning of exposure conditions and data preprocessing, the TGx data can be used in predictive toxicology, where more advanced modelling techniques are applied. The large volume of molecular profiles produced by omics-based technologies allows the development and application of artificial intelligence (AI) methods in TGx. Indeed, the publicly available omics datasets are constantly increasing together with a plethora of different methods that are made available to facilitate their analysis, interpretation and the generation of accurate and stable predictive models. In this review, we present the state-of-the-art of data modelling applied to transcriptomics data in TGx. We show how the benchmark dose (BMD) analysis can be applied to TGx data. We review read across and adverse outcome pathways (AOP) modelling methodologies. We discuss how network-based approaches can be successfully employed to clarify the mechanism of action (MOA) or specific biomarkers of exposure. We also describe the main AI methodologies applied to TGx data to create predictive classification and regression models and we address current challenges. Finally, we present a short description of deep learning (DL) and data integration methodologies applied in these contexts. Modelling of TGx data represents a valuable tool for more accurate chemical safety assessment. This review is the third part of a three-article series on Transcriptomics in Toxicogenomics

    Exploring Patterns of Epigenetic Information With Data Mining Techniques

    Get PDF
    [Abstract] Data mining, a part of the Knowledge Discovery in Databases process (KDD), is the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with database management. Analyses of epigenetic data have evolved towards genome-wide and high-throughput approaches, thus generating great amounts of data for which data mining is essential. Part of these data may contain patterns of epigenetic information which are mitotically and/or meiotically heritable determining gene expression and cellular differentiation, as well as cellular fate. Epigenetic lesions and genetic mutations are acquired by individuals during their life and accumulate with ageing. Both defects, either together or individually, can result in losing control over cell growth and, thus, causing cancer development. Data mining techniques could be then used to extract the previous patterns. This work reviews some of the most important applications of data mining to epigenetics.Programa Iberoamericano de Ciencia y Tecnología para el Desarrollo; 209RT-0366Galicia. Consellería de Economía e Industria; 10SIN105004PRInstituto de Salud Carlos III; RD07/0067/000

    Epileptic Seizure Detection in EEGs by Using Random Tree Forest, Naïve Bayes and KNN Classification

    Get PDF
    Epilepsy is a disease that attacks the nerves. To detect epilepsy, it is necessary to analyze the results of an EEG test. In this study, we compared the naive bayes, random tree forest and K-nearest neighbour (KNN) classification algorithms to detect epilepsy. The raw EEG data were pre-processed before doing feature extraction. Then, we have done the training in three algorithms: KNN Classification, naïve bayes classification and random tree forest. The last step was validation of the trained machine learning. Comparing those three classifiers, we calculated accuracy, sensitivity, specificity, and precision. The best trained classifier is KNN classifier (accuracy: 92.7%), rather than random tree forest (accuracy: 86.6%) and naïve bayes classifier (accuracy: 55.6%). Seen from precision performance, KNN Classification also gives the best precision (82.5%) rather than Naïve Bayes classification (25.3%) and random tree forest (68.2%). But, for the sensitivity, Naïve Bayes classification is the best with 80.3% sensitivity, compare to KNN 73.2% and random tree forest (42.2%). For specificity, KNN classification gives 96.7% specificity, then random tree forest 95.9% and Naïve bayes 50.4%. The training time of naïve bayes was 0.166030 sec, while training time of random tree forest was 2.4094sec and KNN was the slower in training that was 4.789 sec. Therefore, KNN Classification gives better performance than naïve bayes and random tree forest classification

    Toward data science in biophotonics: biomedical investigations-based study

    Get PDF
    Biophotonics aims to grasp and investigate the characteristics of biological samples based on their interaction with incident light. Over the past decades, numerous biophotonic technologies have been developed delivering various sorts of biological and chemical information from the studied samples. Such information is usually contained in high dimensional data that need to be translated into high-level information like disease biomarkers. This data translation is not straightforward, but it can be achieved using the advances in computer and data science. The scientific contributions presented in this thesis were established to cover two main aspects of data science in biophotonics: the design of experiments and the data-driven modeling and validation. For the design of experiment, the scientific contributions focus on estimating the sample size required for group differentiation and on evaluating the influence of experimental factors on unbalanced multifactorial designs. Both methods were designed for multivariate data and were checked on Raman spectral datasets. Thereafter, the automatic detection and identification of three diagnostic tasks were checked based on combining several image processing techniques with machine learning (ML) algorithms. In the first task, an improved ML pipeline to predict the antibiotic susceptibilities of E. coli bacteria was presented and evaluated based on bright-field microscopic images. Then, transfer learning-based classification of bladder cancer was demonstrated using blue light cystoscopic images. Finally, different ML techniques and validation strategies were combined to perform the automatic detection of breast cancer based on a small-sized dataset of nonlinear multimodal images. The obtained results exhibited the benefits of data science tools in improving the experimantal planning and the translation of biophotonic-associated data into high-level information for various biophotonic technologies

    A Review: Effort Estimation Model for Scrum Projects using Supervised Learning

    Get PDF
    Effort estimation practice in Agile is a critical component of the methodology to help cross-functional teams to plan and prioritize their work. Agile approaches have emerged in recent years as a more adaptable means of creating software projects because they consistently produce a workable end product that is developed progressively, preventing projects from failing entirely. Agile software development enables teams to collaborate directly with clients and swiftly adjust to changing requirements. This produces a result that is distinct, gradual, and targeted. It has been noted that the present Scrum estimate approach heavily relies on historical data from previous projects and expert opinion, while existing agile estimation methods like analogy and planning poker become unpredictable in the absence of historical data and experts. User Stories are used to estimate effort in the Agile approach, which has been adopted by 60–70% of the software businesses. This study's goal is to review a variety of strategies and techniques that will be used to gauge and forecast effort. Additionally, the supervised machine learning method most suited for predictive analysis is reviewed in this paper
    corecore