29 research outputs found
Recommended from our members
Personalized Medicine: Studies of Pharmacogenomics in Yeast and Cancer
Advances in microarray and sequencing technology enable the era of personalized medicine. With increasing availability of genomic assays, clinicians have started to utilize genetics and gene expression of patients to guide clinical care. Signatures of gene expression and genetic variation in genes have been associated with disease risks and response to clinical treatment. It is therefore not difficult to envision a future where each patient will have clinical care that is optimized based on his or her genetic background and genomic profiles. However, many challenges exist towards the full realization of the potential personalized medicine. The human genome is complex and we have yet to gain a better understanding of how to associate genomic data with phenotype. First, the human genome is very complex: more than 50 million sequence variants and more than 20,000 genes have been reported. Many efforts have been devoted to genome-wide association studies (GWAS) in the last decade, associating common genetic variants with common complex traits and diseases. While many associations have been identified by genome-wide association studies, most of our phenotypic variation remains unexplained, both at the level of the variants involved and the underlying mechanism. Finally, interaction between genetics and environment presents additional layer of complexity governing phenotypic variation. Currently, there is much research developing computational methods to help associate genomic features with phenotypic variation. Modeling techniques such as machine learning have been very useful in uncovering the intricate relationships between genomics and phenotype. Despite some early successes, the performance of most models is disappointing. Many models lack robustness and predictions do not replicate. In addition, many successful models work as a black box, giving good predictions of phenotypic variation but unable to reveal the underlying mechanism. In this thesis I propose two methods addressing this challenge. First, I describe an algorithm that focuses on identifying causal genomic features of phenotype. My approach assumes genomic features predictive of phenotype are more likely to be causal. The algorithm builds models that not only accurately predict the traits, but also uncover molecular mechanisms that are responsible for these traits. . The algorithm gains its power by combining regularized linear regression, causality testing and Bayesian statistics. I demonstrate the application of the algorithm on a yeast dataset, where genotype and gene expression are used to predict drug sensitivity and elucidate the underlying mechanisms. The accuracy and robustness of the algorithm are both evaluated statistically and experimentally validated. The second part of the thesis takes on a much more complicated system: cancer. The availability of genomic and drug sensitivity data of cancer cell lines has recently been made available. The challenge here is not only the increasing complexity of the system (e.g. size of genome), but also the fundamental differences between cancers and tissues. Different cancers or tissues provide different contexts influencing regulatory networks and signaling pathways. In order to account for this, I propose a method to associate contextual genomic features with drug sensitivity. The algorithm is based on information theory, Bayesian statistics, and transfer learning. The algorithm demonstrates the importance of context specificity in predictive modeling of cancer pharmacogenomics. The two complementary algorithms highlight the challenges faced in personalized medicine and the potential solutions. This thesis detailed the results and analysis that demonstrate the importance of causality and context specificity in predictive modeling of drug response, which will be crucial for us towards bringing personalized medicine in practice
Harnessing gene expression to identify the genetic basis of drug resistance
The advent of cost-effective genotyping and sequencing methods have recently made it possible to ask questions that address the genetic basis of phenotypic diversity and how natural variants interact with the environment. We developed Camelot (CAusal Modelling with Expression Linkage for cOmplex Traits), a statistical method that integrates genotype, gene expression and phenotype data to automatically build models that both predict complex quantitative phenotypes and identify genes that actively influence these traits. Camelot integrates genotype and gene expression data, both generated under a reference condition, to predict the response to entirely different conditions. We systematically applied our algorithm to data generated from a collection of yeast segregants, using genotype and gene expression data generated under drug-free conditions to predict the response to 94 drugs and experimentally confirmed 14 novel gene–drug interactions. Our approach is robust, applicable to other phenotypes and species, and has potential for applications in personalized medicine, for example, in predicting how an individual will respond to a previously unseen drug
Large expert-curated database for benchmarking document similarity detection in biomedical literature search
Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe
Context Sensitive Modeling of Cancer Drug Sensitivity.
Recent screening of drug sensitivity in large panels of cancer cell lines provides a valuable resource towards developing algorithms that predict drug response. Since more samples provide increased statistical power, most approaches to prediction of drug sensitivity pool multiple cancer types together without distinction. However, pan-cancer results can be misleading due to the confounding effects of tissues or cancer subtypes. On the other hand, independent analysis for each cancer-type is hampered by small sample size. To balance this trade-off, we present CHER (Contextual Heterogeneity Enabled Regression), an algorithm that builds predictive models for drug sensitivity by selecting predictive genomic features and deciding which ones should-and should not-be shared across different cancers, tissues and drugs. CHER provides significantly more accurate models of drug sensitivity than comparable elastic-net-based models. Moreover, CHER provides better insight into the underlying biological processes by finding a sparse set of shared and type-specific genomic features
Load Forecasting Using Support Vector Machines: A Study on EUNITE Competition 2001
Load forecasting is usually made by constructing models on relative information, such as climate and previous load demand data. In 2001 EUNITE network organized a competition aiming at mid-term load forecasting (predicting daily maximum load of the next 31 days). During the competition we proposed a support vector machine (SVM) model, which was the winning entry, to solve the problem. In this paper, we discuss in detail how SVM, a new learning technique, is successfully applied to load forecasting. In addition, motivated by the competition results and the approaches by other participants, more experiments and deeper analyses are conducted and presented here. Some important conclusions from the results are that temperature (or other types of climate information) might not be useful in such a midterm load forecasting problem and that the introduction of timeseries concept may improve the forecasting
EUNITE Network Competition: Electricity Load Forecasting
EUNITE network recently organized a world-wide competition on electricity load forecasting. This paper details our approaches and results where the main machine learning technique used is support vector machine