5 research outputs found

    Robust selection of cancer survival signatures from high-throughput genomic data using two-fold subsampling

    Get PDF
    Identifying relevant signatures for clinical patient outcome is a fundamental task in high-throughput studies. Signatures, composed of features such as mRNAs, miRNAs, SNPs or other molecular variables, are often non-overlapping, even though they have been identified from similar experiments considering samples with the same type of disease. The lack of a consensus is mostly due to the fact that sample sizes are far smaller than the numbers of candidate features to be considered, and therefore signature selection suffers from large variation. We propose a robust signature selection method that enhances the selection stability of penalized regression algorithms for predicting survival risk. Our method is based on an aggregation of multiple, possibly unstable, signatures obtained with the preconditioned lasso algorithm applied to random (internal) subsamples of a given cohort data, where the aggregated signature is shrunken by a simple thresholding strategy. The resulting method, RS-PL, is conceptually simple and easy to apply, relying on parameters automatically tuned by cross validation. Robust signature selection using RS-PL operates within an (external) subsampling framework to estimate the selection probabilities of features in multiple trials of RS-PL. These probabilities are used for identifying reliable features to be included in a signature. Our method was evaluated on microarray data sets from neuroblastoma, lung adenocarcinoma, and breast cancer patients, extracting robust and relevant signatures for predicting survival risk. Signatures obtained by our method achieved high prediction performance and robustness, consistently over the three data sets. Genes with high selection probability in our robust signatures have been reported as cancer-relevant. The ordering of predictor coefficients associated with signatures was well-preserved across multiple trials of RS-PL, demonstrating the capability of our method for identifying a transferable consensus signature. The software is available as an R package rsig at CRAN (http://cran.r-project.org)

    Statistical modelling of cardiovascular disease patients using Bayesian approaches

    Get PDF
    This study focuses on statistical modelling on cardiovascular disease (CVD) patients in Malaysia. A secondary dataset from the National Cardiovascular Disease Database-Acute Coronary Syndrome (NCVD-ACS) registry for the years 2006 to 2013 is utilised. Studies have shown that CVD affects males and females differently. Thus, a gender-specific analysis with regard to the risk factors and mortality among ST-Elevation Myocardial Infarction (STEMI) patients is needed. Initially, this study performed the standard multivariate logistic analysis where the aims are to identify risk factors associated with mortality for each gender and to compare differences, if any, among STEMI patients. The results showed that gender differences existed among STEMI patients. Even though females share the same risk factors as males, there are risk factors that relate only to females which may have increased their tendency to develop and increase the risk of mortality of CVD patients. An important contribution of this analysis is that it gives an understanding of possible gender-based differences in baseline characteristics, risk factors, treatments and outcomes which will help cardiac care specialists in improving current management of patients with CVD. Next, Bayesian analysis is proposed to develop a prognostic model of the STEMI patients. Bayesian Markov Chain Monte Carlo (MCMC) simulation approach is applied. Beside that, comparisons of the parameter estimates from the proposed Bayesian and frequentist models are made. The results showed that the proposed Bayesian modelling can deal correctly with the probabilities and provides parameter estimates of the posterior distribution which have natural clinical interpretations. In doing so, several programming codes for the Bayesian model development and convergence diagnostics in the Just Another Gibbs Sampler (JAGS) software in R interface are developed. In the final part of this study, a graphical probabilistic model framework defined using a Bayesian Network (BN) is proposed to identify and interpret the dependence structure between the predictors and health outcomes of STEMI patients. In doing so, the two learning processes are involved in obtaining the BN model from the data namely the structural learning and parameter learning. From the structural learning, 25 and 20 arcs were considered significant for males’ and females’ BN respectively. A few variables namely, Killip class, renal disease and age group were classified as key predictors as they were the most influential variables directly associated with the outcome of patients’ status. Moreover, conditional probabilities for each feature were obtained. The novelty of this study is that it provides an indication on the strength of each arc in the network by exploiting the bootstrap resampling method in the structural learning. A graphical model is developed where the relationships in a diagrammatical form is capable to be displayed and the cause-effect relationships can be illustrated. An important implication of this model is that it identifies dependencies based on the different features of variables. It can also include expert knowledge to improve predictability for data driven research when information or resources regarding the variables are limited

    Risks out of depth? : a study on the environmental impacts of seabed mining

    Get PDF
    The oceans are facing increasing pressures from human activities. Growing industrialisation of the ocean space is giving room to both the expansion of existing and emergence of new ocean-based activities, with seabed mining one of the rapidly emerging sectors heralded as a solution to resource sufficiency. As ocean mining activities are still in exploratory stages, the development of seafloor mining is underpinned by high uncertainties on both the implementation of the activities and their consequences for the environment. Realising the full potential of the seas and oceans requires sustainable approaches to their economic development, mainly due to the issues related to the negative environmental effects, yet we lack tools and knowledge to comprehensively evaluate the impacts and further societal implications of emerging maritime sectors. To fill this gap, this thesis aims to provide a more detailed understanding of the environmental risks of seabed mining and how those risks are perceived. This thesis consists of four papers and draws on an interdisciplinary approach that includes quantitative and qualitative analyses, modelling, literature reviews and knowledge syntheses. Paper I synthesises how the environmental impacts of seabed mining have been studied in the past and draws on parallel industries, such as aggregate extraction, to increase the knowledge of the impacts on marine ecosystems. It underlines that most studies have assessed the impacts narrowly, with little appreciation of the uncertainties or cumulative effects. In this paper, I further reflect on areas that need development for comprehensive environmental risk assessments for seabed mining. Paper II contributes to the baseline information on marine mineral precipitates, estimating the distribution of ferromanganese (FeMn) concretions using spatial modelling techniques. In paper III, I develop a probabilistic modelling framework for assessing the risks of seabed mining through a series of interviews with a multidisciplinary group of experts. The risk model is then used to illustrate the impacts of FeMn concretion extraction on benthic fauna in the Baltic Sea, offering a quantitative means to highlight the many uncertainties around the impacts of mining. Paper IV examines whether people care about the impacts of human activities in remote locations. In this paper, I evaluate the dimensions of environmental care for the deep sea and relate this to the perceived risks of seafloor mining by comparing the deep sea to three other remote environments: Antarctica, the Moon, and remote terrestrial environments. The results of this work show that despite people’s low knowledge of the deep sea, people do care about mining activities harming deep-sea ecosystems, and that a stronger emotional connection to remote environments is positively connected to environmental care and perception of the severity of the risks of mining. This thesis contributes to a more comprehensive understanding of the environmental risks of seabed mining and advocates a more transparent approach to emerging industries and their risks. The combined findings of this work suggest that it is fundamental to both increase knowledge of the environment that will be affected by the risks, and to account for the underlying values and emotions towards the marine environment to fathom how those risk will be perceived. An improved appreciation of the risks of emerging maritime industries will be essential to avoid uncontrolled developments and to ensure good stewardship of the marine environment.Merialueiden kasvava käyttöpaine lisää tarvetta parantaa ympäristövaikutusten arviointikäytäntöjä. Merenpohjan kaivostoiminta on yksi nopeasti kehittyvistä aloista, jonka odotetaan vastaavan mineraalivarojen kasvavaan kysyntään muun muassa akkuteollisuuden käyttöön. Koska merenpohjan kaivoshankkeet ovat vasta kehitysvaiheessa, merenpohjan mineraalivarojen hyödyntämiseen liittyy merkittäviä epävarmuuksia johtuen sekä toiminnan toteuttamisen kehityksestä, että sen vaikutuksista meriympäristöön. Tämän väitöskirjatyön tarkoituksena on antaa yksityiskohtaisempi käsitys merenpohjan kaivostoiminnan ympäristöriskeistä. Tarkastelen erityisesti Itämeren rautamangaanisaostumia ja tämän mahdollisesti taloudellisesti, että luontoarvoiltaan merkittävän merenpohjan mineraalivarannon hyödyntämisen ympäristövaikutuksia. Työ koostuu neljästä artikkelista ja perustuu tieteidenväliseen lähestymistapaan, joka sisältää sekä määrällisiä että laadullisia analyysejä. Artikkelissa I tarkastelen, miten merenpohjan kaivostoiminnan ympäristövaikutuksia on tutkittu aiemmin ja millaisia tietoaukkoja nykyisiin arviointikäytäntöihin liittyy. Kirjallisuuskatsauksen tulokset osoittavat, että useimmat tutkimukset ovat arvioineet vaikutuksia kapeasti, eikä epävarmuustekijöitä tai kumulatiivisia vaikutuksia ole juurikaan huomioitu. Artikkelissa II arvioimme merellisten mineraalivarojen levinneisyyttä Suomen merialueilla tarkastelemalla rautamangaanisaostumien levinneisyyttä spatiaalisen mallinnuksen keinoin. Saostumia esiintyy näiden arvioiden mukaan 11–20% Suomen merialueen pohjista kaikilla Suomen merialueella lukuun ottamatta Perämerta. Artikkelissa III kehitän mallinnuskehyksen merenpohjan kaivostoiminnan ympäristöriskien arvioimiseksi. Hyödyntämällä todennäköisyysmallinnusta ja asiantuntijahaastatteluita, työssä kehitettiin uudenlainen riskiarviomenetemä, jolla voidaan tarkastella kaivostoiminnan vaikutuksia meriekosysteemin eri osiin syy-seuraus-verkostojen avulla. Tulokset osoittavat, että rajoittamattomalla ottotoiminnalla voi olla mittavia vaikutuksia meriekosysteemin toimintaan, jotka on selvitettävä ennen kuin kaupallista ottotoimintaa voidaan harkita. Artikkelissa IV tarkastelen, välittävätkö ihmiset ihmistoiminnan vaikutuksista syrjäisissä ympäristöissä. Keskityn erityisesti siihen, miten ihmiset käsittävät syvän meren kaivostoiminnan riskit vertaamalla syvää merta kolmeen muuhun kaukaiseen ympäristöön: Etelämantereeseen, Kuuhun ja maanpäällisiin syrjäisiin ympäristöihin. Tutkimuksen tulokset osoittavat ympäristön herättämien tunneyhtymien ja arvojen vaikuttavan siihen, miten paljon ihmiset välittävät ympäristöriskeistä kaukaisissa ympäristöissä, joista heillä ei ole henkilökohtaista kokemusta. Työn yhdistetyt tulokset osoittavat, että on olennaisen tärkeää sekä lisätä tietoa ympäristöstä, johon ihmistoiminnasta johtuvat riskit kohdistuvat, että ottaa huomioon ihmisten arvot ja tunteet meriympäristöä kohtaan, jotta voidaan ymmärtää, miten nämä riskit käsitetään. Kehittyvien ihmistoimintojen riskien kattavampi ymmärtäminen on välttämätöntä merialueiden hallitsemattoman teollistumisen välttämiseksi ja meriympäristön hyvän hoidon varmistamiseksi

    A BAYESIAN APPROACH TO LEARNING DECISION TREES FOR PATIENT-SPECIFIC MODELS

    Get PDF
    A principal goal of precision medicine is to identify genomic factors that are predictive of outcomes in complex diseases, to provide better insight into their molecular mechanisms. Based on our current understanding, there are many genomic factors that are likely to be pathogenic in small subpopulations while being rare in the population as a whole. This research introduces a new machine learning method for discovering single nucleotide variants (SNVs), both common and rare, that in a given person are predictive of that person developing a disease or disease outcome. The new method described in this research constructs decision tree models, uses a Bayesian score to evaluate the models, and employs a person-specific search strategy to identify SNVs that are predictive in a subpopulation whose members are similar to the person of interest. This method, called the Personalized Decision Tree Algorithm (PDTA), works by constructing a decision tree model from the data and then identifying a path in the tree that has excellent prediction for the person of interest, or constructing a new path if none of the paths in the tree have excellent prediction. The PDTA was refined iteratively on synthetic data and was experimentally evaluated on five datasets. One of the datasets was synthetic, one was semi-synthetic, and three were biological datasets collected from patients with chronic pancreatitis that included one small genomic dataset, a whole exome dataset, and a whole exome dataset focused on patients with diabetes in chronic pancreatitis. The performance of the method was evaluated using area under the Receiver Operating Characteristic curve and F1 score, as well as the ability to retrieve known and unknown rare SNVs. The PDTA was found to be effective to varying degrees in the datasets that were evaluated, creating parsimonious genetic representations for patient-specific groups, with the potential to discover novel variants

    Facilitating and Enhancing Biomedical Knowledge Translation: An in Silico Approach to Patient-centered Pharmacogenomic Outcomes Research

    Get PDF
    Current research paradigms such as traditional randomized control trials mostly rely on relatively narrow efficacy data which results in high internal validity and low external validity. Given this fact and the need to address many complex real-world healthcare questions in short periods of time, alternative research designs and approaches should be considered in translational research. In silico modeling studies, along with longitudinal observational studies, are considered as appropriate feasible means to address the slow pace of translational research. Taking into consideration this fact, there is a need for an approach that tests newly discovered genetic tests, via an in silico enhanced translational research model (iS-TR) to conduct patient-centered outcomes research and comparative effectiveness research studies (PCOR CER). In this dissertation, it was hypothesized that retrospective EMR analysis and subsequent mathematical modeling and simulation prediction could facilitate and accelerate the process of generating and translating pharmacogenomic knowledge on comparative effectiveness of anticoagulation treatment plan(s) tailored to well defined target populations which eventually results in a decrease in overall adverse risk and improve individual and population outcomes. To test this hypothesis, a simulation modeling framework (iS-TR) was proposed which takes advantage of the value of longitudinal electronic medical records (EMRs) to provide an effective approach to translate pharmacogenomic anticoagulation knowledge and conduct PCOR CER studies. The accuracy of the model was demonstrated by reproducing the outcomes of two major randomized clinical trials for individualizing warfarin dosing. A substantial, hospital healthcare use case that demonstrates the value of iS-TR when addressing real world anticoagulation PCOR CER challenges was also presented
    corecore