782 research outputs found

    Quantitative imaging in radiation oncology

    Get PDF
    Artificially intelligent eyes, built on machine and deep learning technologies, can empower our capability of analysing patients’ images. By revealing information invisible at our eyes, we can build decision aids that help our clinicians to provide more effective treatment, while reducing side effects. The power of these decision aids is to be based on patient tumour biologically unique properties, referred to as biomarkers. To fully translate this technology into the clinic we need to overcome barriers related to the reliability of image-derived biomarkers, trustiness in AI algorithms and privacy-related issues that hamper the validation of the biomarkers. This thesis developed methodologies to solve the presented issues, defining a road map for the responsible usage of quantitative imaging into the clinic as decision support system for better patient care

    Small data: practical modeling issues in human-model -omic data

    Get PDF
    This thesis is based on the following articles: Chapter 2: Holsbø, E., Perduca, V., Bongo, L.A., Lund, E. & Birmelé, E. (Manuscript). Stratified time-course gene preselection shows a pre-diagnostic transcriptomic signal for metastasis in blood cells: a proof of concept from the NOWAC study. Available at https://doi.org/10.1101/141325. Chapter 3: Bøvelstad, H.M., Holsbø, E., Bongo, L.A. & Lund, E. (Manuscript). A Standard Operating Procedure For Outlier Removal In Large-Sample Epidemiological Transcriptomics Datasets. Available at https://doi.org/10.1101/144519. Chapter 4: Holsbø, E. & Perduca, V. (2018). Shrinkage estimation of rate statistics. Case Studies in Business, Industry and Government Statistics 7(1), 14-25. Also available at http://hdl.handle.net/10037/14678.Human-model data are very valuable and important in biomedical research. Ethical and economical constraints limit the access to such data, and consequently these datasets rarely comprise more than a few hundred observations. As measurements are comparatively cheap, the tendency is to measure as many things as possible for the few, valuable participants in a study. With -omics technologies it is cheap and simple to make hundreds of thousands of measurements simultaneously. This few observations–many measurements setting is a high-dimensional problem in the technical language. Most gene expression experiments measure the expression levels of 10 000–15 000 genes for fewer than 100 subjects. I refer to this as the small data setting. This dissertation is an exercise in practical data analysis as it happens in a large epidemiological cohort study. It comprises three main projects: (i) predictive modeling of breast cancer metastasis from whole-blood transcriptomics measurements; (ii) standardizing a microarray data quality assessment in the Norwegian Women and Cancer (NOWAC) postgenome cohort; and (iii) shrinkage estimation of rates. These three are all small data analyses for various reasons. Predictive modeling in the small data setting is very challenging. There are several modern methods built to tackle high-dimensional data, but there is a need to evaluate these methods against one another when analyzing data in practice. Through the metastasis prediction work we learned first-hand that common practices in machine learning can be inefficient or harmful, especially for small data. I will outline some of the more important issues. In a large project such as NOWAC there is a need to centralize and disseminate knowledge and procedures. The standardization of NOWAC quality assessment was a project born of necessity. The standard operating procedure for outlier removal was developed so that preprocessing of the NOWAC microarray material should happen in the same way every time. We take this procedure from an archaic R-script that resided in peoples email inboxes to a well-documented, open-source R-package and present the NOWAC guidelines for microarray quality control. The procedure is built around the inherent high value of a singleobservation. Small data are plagued by high variance. Working with small data it is usually profitable to bias models by shrinkage or borrowing of information from elsewhere. We present a pseudo-Bayesian estimator of rates in an informal crime rate study. We exhibit the value of such procedures in a small data setting and demonstrate some novel considerations about the coverage properties of such a procedure. In short I gather some common practices in predictive modeling as applied to small data and assess their practical implications. I argue that with more focus on human-based datasets in biomedicine there is a need for particular consideration of these data in a small data paradigm to allow for reliable analysis. I will present what I believe to be sensible guidelines

    Addressing the challenges of uncertainty in regression models for high dimensional and heterogeneous data from observational studies

    Get PDF
    The lack of replicability in research findings from different scientific disciplines has gained wide attention in the last few years and led to extensive discussions. In this `replication crisis', different types of uncertainty play an important role, which occur at different points of data collection and statistical analysis. Nevertheless, the consequences are often ignored in current research practices with the risk of low credibility and reliability of research findings. For the analysis and the development of solutions to this problem, we define measurement uncertainty, sampling uncertainty, data pre-processing uncertainty, method uncertainty, and model uncertainty, and investigate them in particular in the context of regression analyses. Therefore, we consider data from observational studies with the focus on high dimensionality and heterogeneous variables, which are characteristics of growing importance. High dimensional data, i.e., data with more variables than observations, play an important role in the area of medical research, where large amounts of molecular data (omics data) can be collected with ever decreasing expense and effort. Where several types of omics data are available, we are additionally faced with heterogeneity. Moreover, heterogeneous data can be found in many observational studies, where data originate from different sources, or where variables of different types are collected. This work comprises four contributions with different approaches to this topic and a different focus of investigation. Contribution 1 can be considered as a practical example to illustrate data pre-processing and method uncertainty in the context of prediction and variable selection from high dimensional and heterogeneous data. In the first part of this paper, we introduce the development of priority-Lasso, a hierarchical method for prediction using multi-omics data. Priority-Lasso is based on standard Lasso and assumes a pre-specified priority order of blocks of data. The idea is to successively fit Lasso models on these blocks of data and to take the linear predictor from every fit as an offset in the fit of the block with next lowest priority. In the second part, we apply this method in a current study of acute myeloid leukemia (AML) and compare its performance to standard Lasso. We illustrate data pre-processing and method uncertainty, caused by different choices of variable definitions and specifications of settings in the application of the method. These choices result in different effect estimates and thus different prediction performances and selected variables. In the second contribution, we compare method uncertainty with sampling uncertainty in the context of variable selection and ranking of omics biomarkers. For this purpose, we develop a user-friendly and versatile framework. We apply this framework on data from AML patients with high dimensional and heterogeneous characteristics and explore three different scenarios: First, variable selection in multivariable regression based on multi-omics data, second, variable ranking based on variable importance measures from random forests, and, third, identification of genes based on differential gene expression analysis. In contributions 3 and 4, we apply the vibration of effects framework, which was initially used to analyze model uncertainty in a large epidemiological study (NHANES), to assess and compare different types of uncertainty. The two contributions intensively address the methodological extension of this framework to different types of uncertainty. In contribution 3, we describe the extension of the vibration of effects framework to sampling and data pre-processing uncertainty. As a practical illustration, we take a large data set from psychological research with heterogeneous variable structure (SAPA-project), and examine sampling, model and data pre-processing uncertainty in the context of logistic regression for varying sample sizes. Beyond the comparison of single types of uncertainty, we introduce a strategy which allows quantifying cumulative model and data pre-processing uncertainty and analyzing their relative contributions to the total uncertainty with a variance decomposition. Finally, we extend the vibration of effects framework to measurement uncertainty in contribution 4. In a practical example, we conduct a comparison study between sampling, model and measurement uncertainty on the NHANES data set in the context of survival analysis. We focus on different scenarios of measurement uncertainty which differ in the choice of variables considered to be measured with error. Moreover, we analyze the behavior of different types of uncertainty with increasing sample sizes in a large simulation study

    Quantitative imaging analysis:challenges and potentials

    Get PDF

    Radiomics:Images are more than meets the eye

    Get PDF
    • …
    corecore