2,586 research outputs found

    Toward a Standardized Strategy of Clinical Metabolomics for the Advancement of Precision Medicine

    Get PDF
    Despite the tremendous success, pitfalls have been observed in every step of a clinical metabolomics workflow, which impedes the internal validity of the study. Furthermore, the demand for logistics, instrumentations, and computational resources for metabolic phenotyping studies has far exceeded our expectations. In this conceptual review, we will cover inclusive barriers of a metabolomics-based clinical study and suggest potential solutions in the hope of enhancing study robustness, usability, and transferability. The importance of quality assurance and quality control procedures is discussed, followed by a practical rule containing five phases, including two additional "pre-pre-" and "post-post-" analytical steps. Besides, we will elucidate the potential involvement of machine learning and demonstrate that the need for automated data mining algorithms to improve the quality of future research is undeniable. Consequently, we propose a comprehensive metabolomics framework, along with an appropriate checklist refined from current guidelines and our previously published assessment, in the attempt to accurately translate achievements in metabolomics into clinical and epidemiological research. Furthermore, the integration of multifaceted multi-omics approaches with metabolomics as the pillar member is in urgent need. When combining with other social or nutritional factors, we can gather complete omics profiles for a particular disease. Our discussion reflects the current obstacles and potential solutions toward the progressing trend of utilizing metabolomics in clinical research to create the next-generation healthcare system.11Ysciescopu

    Optimized neural architecture for automatic landslide detection from high-resolution airborne laser scanning data

    Full text link
    © 2017 by the authors. Licensee MDPI, Basel, Switzerland. An accurate inventory map is a prerequisite for the analysis of landslide susceptibility, hazard, and risk. Field survey, optical remote sensing, and synthetic aperture radar techniques are traditional techniques for landslide detection in tropical regions. However, such techniques are time consuming and costly. In addition, the dense vegetation of tropical forests complicates the generation of an accurate landslide inventory map for these regions. Given its ability to penetrate vegetation cover, high-resolution airborne light detection and ranging (LiDAR) has been used to generate accurate landslide maps. This study proposes the use of recurrent neural networks (RNN) and multi-layer perceptron neural networks (MLP-NN) in landscape detection. These efficient neural architectures require little or no prior knowledge compared with traditional classification methods. The proposed methods were tested in the Cameron Highlands, Malaysia. Segmentation parameters and feature selection were respectively optimized using a supervised approach and correlation-based feature selection. The hyper-parameters of network architecture were defined based on a systematic grid search. The accuracies of the RNN and MLP-NN models in the analysis area were 83.33% and 78.38%, respectively. The accuracies of the RNN and MLP-NN models in the test area were 81.11%, and 74.56%, respectively. These results indicated that the proposed models with optimized hyper-parameters produced the most accurate classification results. LiDAR-derived data, orthophotos, and textural features significantly affected the classification results. Therefore, the results indicated that the proposed methods have the potential to produce accurate and appropriate landslide inventory in tropical regions such as Malaysia

    Novel methods based on regression techniques to analyze multistate models and high-dimensional omics data.

    Get PDF
    The dissertation is based on four distinct research projects that are loosely interconnected by the common link of a regression framework. Chapter 1 provides an introductory outline of the problems addressed in the projects along with a detailed review of the previous works that have been done on them and a brief discussion on our newly developed methodologies. Chapter 2 describes the first project that is concerned with the identification of hidden subject-specific sources of heterogeneity in gene expression profiling analyses and adjusting for them by a technique based on Partial Least Squares (PLS) regression, in order to ensure a more accurate inference on the expression pattern of the genes over two different varieties of samples. Chapter 3 focuses on the development of an R package based on Project 1 and its performance evaluation with respect to other popular software dealing with differential gene expression analyses. Chapter 4 covers the third project that proposes a non-parametric regression method for the estimation of stage occupation probabilities at different time points in a right-censored multistate model data, using an Inverse Probability of Censoring (IPCW) (Datta and Satten, 2001) based version of the backfitting principle (Hastie and Tibshirani, 1992). Chapter 5 describes the fourth project which deals with the testing for the equality of the residual distributions after adjusting for available covariate information from the right censored waiting times of two groups of subjects, by using an Inverse Probability of Censoring weighted (IPCW) version of the Mann-Whitney U test

    Statistical analysis of high-dimensional biomedical data: a gentle introduction to analytical goals, common approaches and challenges

    Get PDF
    International audienceBackground: In high-dimensional data (HDD) settings, the number of variables associated with each observation is very large. Prominent examples of HDD in biomedical research include omics data with a large number of variables such as many measurements across the genome, proteome, or metabolome, as well as electronic health records data that have large numbers of variables recorded for each patient. The statistical analysis of such data requires knowledge and experience, sometimes of complex methods adapted to the respective research questions. Methods: Advances in statistical methodology and machine learning methods offer new opportunities for innovative analyses of HDD, but at the same time require a deeper understanding of some fundamental statistical concepts. Topic group TG9 “High-dimensional data” of the STRATOS (STRengthening Analytical Thinking for Observational Studies) initiative provides guidance for the analysis of observational studies, addressing particular statistical challenges and opportunities for the analysis of studies involving HDD. In this overview, we discuss key aspects of HDD analysis to provide a gentle introduction for non-statisticians and for classically trained statisticians with little experience specific to HDD. Results: The paper is organized with respect to subtopics that are most relevant for the analysis of HDD, in particular initial data analysis, exploratory data analysis, multiple testing, and prediction. For each subtopic, main analytical goals in HDD settings are outlined. For each of these goals, basic explanations for some commonly used analysis methods are provided. Situations are identified where traditional statistical methods cannot, or should not, be used in the HDD setting, or where adequate analytic tools are still lacking. Many key references are provided. Conclusions: This review aims to provide a solid statistical foundation for researchers, including statisticians and non-statisticians, who are new to research with HDD or simply want to better evaluate and understand the results of HDD analyses

    Statistical Methods For Whole Transcriptome Sequencing: From Bulk Tissue To Single Cells

    Get PDF
    RNA-Sequencing (RNA-Seq) has enabled detailed unbiased profiling of whole transcriptomes with incredible throughput. Recent technological breakthroughs have pushed back the frontiers of RNA expression measurement to single-cell level (scRNA-Seq). With both bulk and single-cell RNA-Seq analyses, modeling of the noise structure embedded in the data is crucial for draw- ing correct inference. In this dissertation, I developed a series of statistical methods to account for the technical variations specific in RNA-Seq experiments in the context of isoform- or gene- level differential expression analyses. In the first part of my dissertation, I developed MetaDiff (https://github.com/jiach/MetaDiff), a random-effects meta-regression model, that allows the incorporation of uncertainty in isoform expression estimation in isoform differential expression anal- ysis. This framework was further extended to detect splicing quantitative trait loci with RNA-Seq data. In the second part of my dissertation, I developed TASC (Toolkit for Analysis of Single-Cell data; https://github.com/scrna-seq/TASC), a hierarchical mixture model, to explicitly adjust for cell-to-cell technical differences in scRNA-Seq analysis using an empirical Bayes approach. This framework can be adapted to perform differential gene expression analysis. In the third part of my dissertation, I developed, TASC-B, a method extended from TASC to model transcriptional bursting- induced zero-inflation. This model can identify and test for the difference in the level of transcrip- tional bursting. Compared to existing methods, these new tools that I developed have been shown to better control the false discovery rate in situations where technical noise cannot be ignored. They also display superior power in both our simulation studies and real world applications

    Modelling activated sludge wastewater treatment plants using artificial intelligence techniques (fuzzy logic and neural networks)

    Get PDF
    Activated sludge process (ASP) is the most commonly used biological wastewater treatment system. Mathematical modelling of this process is important for improving its treatment efficiency and thus the quality of the effluent released into the receiving water body. This is because the models can help the operator to predict the performance of the plant in order to take cost-effective and timely remedial actions that would ensure consistent treatment efficiency and meeting discharge consents. However, due to the highly complex and non-linear characteristics of this biological system, traditional mathematical modelling of this treatment process has remained a challenge. This thesis presents the applications of Artificial Intelligence (AI) techniques for modelling the ASP. These include the Kohonen Self Organising Map (KSOM), backpropagation artificial neural networks (BPANN), and adaptive network based fuzzy inference system (ANFIS). A comparison between these techniques has been made and the possibility of the hybrids between them was also investigated and tested. The study demonstrated that AI techniques offer viable, flexible and effective modelling methodology alternative for the activated sludge system. The KSOM was found to be an attractive tool for data preparation because it can easily accommodate missing data and outliers and because of its power in extracting salient features from raw data. As a consequence of the latter, the KSOM offers an excellent tool for the visualisation of high dimensional data. In addition, the KSOM was used to develop a software sensor to predict biological oxygen demand. This soft-sensor represents a significant advance in real-time BOD operational control by offering a very fast estimation of this important wastewater parameter when compared to the traditional 5-days bio-essay BOD test procedure. Furthermore, hybrids of KSOM-ANN and KSOM-ANFIS were shown to result much more improved model performance than using the respective modelling paradigms on their own.Damascus Universit
    • …
    corecore