15 research outputs found

    Analysing functional genomics data using novel ensemble, consensus and data fusion techniques

    Get PDF
    Motivation: A rapid technological development in the biosciences and in computer science in the last decade has enabled the analysis of high-dimensional biological datasets on standard desktop computers. However, in spite of these technical advances, common properties of the new high-throughput experimental data, like small sample sizes in relation to the number of features, high noise levels and outliers, also pose novel challenges. Ensemble and consensus machine learning techniques and data integration methods can alleviate these issues, but often provide overly complex models which lack generalization capability and interpretability. The goal of this thesis was therefore to develop new approaches to combine algorithms and large-scale biological datasets, including novel approaches to integrate analysis types from different domains (e.g. statistics, topological network analysis, machine learning and text mining), to exploit their synergies in a manner that provides compact and interpretable models for inferring new biological knowledge. Main results: The main contributions of the doctoral project are new ensemble, consensus and cross-domain bioinformatics algorithms, and new analysis pipelines combining these techniques within a general framework. This framework is designed to enable the integrative analysis of both large- scale gene and protein expression data (including the tools ArrayMining, Top-scoring pathway pairs and RNAnalyze) and general gene and protein sets (including the tools TopoGSA , EnrichNet and PathExpand), by combining algorithms for different statistical learning tasks (feature selection, classification and clustering) in a modular fashion. Ensemble and consensus analysis techniques employed within the modules are redesigned such that the compactness and interpretability of the resulting models is optimized in addition to the predictive accuracy and robustness. The framework was applied to real-word biomedical problems, with a focus on cancer biology, providing the following main results: (1) The identification of a novel tumour marker gene in collaboration with the Nottingham Queens Medical Centre, facilitating the distinction between two clinically important breast cancer subtypes (framework tool: ArrayMining) (2) The prediction of novel candidate disease genes for Alzheimer’s disease and pancreatic cancer using an integrative analysis of cellular pathway definitions and protein interaction data (framework tool: PathExpand, collaboration with the Spanish National Cancer Centre) (3) The prioritization of associations between disease-related processes and other cellular pathways using a new rule-based classification method integrating gene expression data and pathway definitions (framework tool: Top-scoring pathway pairs) (4) The discovery of topological similarities between differentially expressed genes in cancers and cellular pathway definitions mapped to a molecular interaction network (framework tool: TopoGSA, collaboration with the Spanish National Cancer Centre) In summary, the framework combines the synergies of multiple cross-domain analysis techniques within a single easy-to-use software and has provided new biological insights in a wide variety of practical settings

    Analysing functional genomics data using novel ensemble, consensus and data fusion techniques

    Get PDF
    Motivation: A rapid technological development in the biosciences and in computer science in the last decade has enabled the analysis of high-dimensional biological datasets on standard desktop computers. However, in spite of these technical advances, common properties of the new high-throughput experimental data, like small sample sizes in relation to the number of features, high noise levels and outliers, also pose novel challenges. Ensemble and consensus machine learning techniques and data integration methods can alleviate these issues, but often provide overly complex models which lack generalization capability and interpretability. The goal of this thesis was therefore to develop new approaches to combine algorithms and large-scale biological datasets, including novel approaches to integrate analysis types from different domains (e.g. statistics, topological network analysis, machine learning and text mining), to exploit their synergies in a manner that provides compact and interpretable models for inferring new biological knowledge. Main results: The main contributions of the doctoral project are new ensemble, consensus and cross-domain bioinformatics algorithms, and new analysis pipelines combining these techniques within a general framework. This framework is designed to enable the integrative analysis of both large- scale gene and protein expression data (including the tools ArrayMining, Top-scoring pathway pairs and RNAnalyze) and general gene and protein sets (including the tools TopoGSA , EnrichNet and PathExpand), by combining algorithms for different statistical learning tasks (feature selection, classification and clustering) in a modular fashion. Ensemble and consensus analysis techniques employed within the modules are redesigned such that the compactness and interpretability of the resulting models is optimized in addition to the predictive accuracy and robustness. The framework was applied to real-word biomedical problems, with a focus on cancer biology, providing the following main results: (1) The identification of a novel tumour marker gene in collaboration with the Nottingham Queens Medical Centre, facilitating the distinction between two clinically important breast cancer subtypes (framework tool: ArrayMining) (2) The prediction of novel candidate disease genes for Alzheimer’s disease and pancreatic cancer using an integrative analysis of cellular pathway definitions and protein interaction data (framework tool: PathExpand, collaboration with the Spanish National Cancer Centre) (3) The prioritization of associations between disease-related processes and other cellular pathways using a new rule-based classification method integrating gene expression data and pathway definitions (framework tool: Top-scoring pathway pairs) (4) The discovery of topological similarities between differentially expressed genes in cancers and cellular pathway definitions mapped to a molecular interaction network (framework tool: TopoGSA, collaboration with the Spanish National Cancer Centre) In summary, the framework combines the synergies of multiple cross-domain analysis techniques within a single easy-to-use software and has provided new biological insights in a wide variety of practical settings

    MĂ©thodes et algorithmes pour l’amĂ©lioration de l’infĂ©rence de l’histoire Ă©volutive des gĂ©nomes

    Full text link
    Les phylogĂ©nies de gĂšnes offrent un cadre idĂ©al pour l’étude comparative des gĂ©nomes. Non seulement elles incorporent l’évolution des espĂšces par spĂ©ciation, mais permettent aussi de capturer l’expansion et la contraction des familles de gĂšnes par gains et pertes de gĂšnes. La dĂ©termination de l’ordre et de la nature de ces Ă©vĂ©nements Ă©quivaut Ă  infĂ©rer l’histoire Ă©volutive des familles de gĂšnes, et constitue un prĂ©requis Ă  plusieurs analyses en gĂ©nomique comparative. En effet, elle est requise pour dĂ©terminer efficacement les relations d’orthologies entre gĂšnes, importantes pour la prĂ©diction des structures et fonctions de protĂ©ines et les analyses phylogĂ©nĂ©tiques, pour ne citer que ces applications. Les mĂ©thodes d’infĂ©rence d’histoires Ă©volutives de familles de gĂšnes supposent que les phylogĂ©nies considĂ©rĂ©es sont dĂ©nuĂ©es d’erreurs. Ces phylogĂ©nies de gĂšnes, souvent recons- truites Ă  partir des sĂ©quences d’acides aminĂ©s ou de nuclĂ©otides, ne reprĂ©sentent cependant qu’une estimation du vrai arbre de gĂšnes et sont sujettes Ă  des erreurs provenant de sources variĂ©es, mais bien documentĂ©es. Pour garantir l’exactitude des histoires infĂ©rĂ©es, il faut donc s’assurer de l’absence d’erreurs au sein des arbres de gĂšnes. Dans cette thĂšse, nous Ă©tudions cette problĂ©matique sous deux aspects. Le premier volet de cette thĂšse concerne l’identification des dĂ©viations du code gĂ©nĂ©tique, l’une des causes d’erreurs d’annotations se propageant ensuite dans les phylogĂ©nies. Nous dĂ©veloppons Ă  cet effet, une mĂ©thodologie pour l’infĂ©rence de dĂ©viations du code gĂ©nĂ©tique standard par l’analyse des sĂ©quences codantes et des ARNt. Cette mĂ©thodologie est cen- trĂ©e autour d’un algorithme de prĂ©diction de rĂ©affectations de codons, appelĂ© CoreTracker. Nous montrons tout d’abord l’efficacitĂ© de notre mĂ©thode, puis l’utilisons pour dĂ©montrer l’évolution du code gĂ©nĂ©tique dans les gĂ©nomes mitochondriaux des algues vertes. Le second volet de la thĂšse concerne le dĂ©veloppement de mĂ©thodes efficaces pour la correction et la construction d’arbres phylogĂ©nĂ©tiques de gĂšnes. Nous prĂ©sentons deux mĂ©thodes exploitant l’information sur l’évolution des espĂšces. La premiĂšre, ProfileNJ , est dĂ©terministe et trĂšs rapide. Elle corrige les arbres de gĂšnes en ciblant exclusivement les sous-arbres prĂ©sentant un support statistique faible. Son application sur les familles de gĂšnes d’Ensembl Compara montre une amĂ©lioration nette de la qualitĂ© des arbres, par comparaison Ă  ceux proposĂ©s par la base de donnĂ©es. La seconde, GATC, utilise un algorithme gĂ©nĂ©tique et traite le problĂšme comme celui de l’optimisation multi-objectif de la topologie des arbres de gĂšnes, Ă©tant donnĂ©es des contraintes relatives Ă  l’évolution des familles de gĂšnes par mutation de sĂ©quences et par gain/perte de gĂšnes. Nous montrons qu’une telle approche est non seulement efficace, mais appropriĂ©e pour la construction d’ensemble d’arbres de rĂ©fĂ©rence.Gene trees offer a proper framework for comparative genomics. Not only do they provide information about species evolution through speciation events, but they also capture gene family expansion and contraction by gene gains and losses. They are thus used to infer the evolutionary history of gene families and accurately predict the orthologous relationship between genes, on which several biological analyses rely. Methods for inferring gene family evolution explicitly assume that gene trees are known without errors. However, standard phylogenetic methods for tree construction based on se- quence data are well documented as error-prone. Gene trees constructed using these methods will usually introduce biases during the inference of gene family histories. In this thesis, we present new methods aiming to improve the quality of phylogenetic gene trees and thereby the accuracy of underlying evolutionary histories of their corresponding gene families. We start by providing a framework to study genetic code deviations, one possible reason of annotation errors that could then spread to the phylogeny reconstruction. Our framework is based on analysing coding sequences and tRNAs to predict codon reassignments. We first show its efficiency, then apply it to green plant mitochondrial genomes. The second part of this thesis focuses on the development of efficient species tree aware methods for gene tree construction. We present ProfileNJ , a fast and deterministic correction method that targets weakly supported branches of a gene tree. When applied to the gene families of the Ensembl Compara database, ProfileNJ produces an arguably better set of gene trees compared to the ones available in Ensembl Compara. We later use a different strategy, based on a genetic algorithm, allowing both construction and correction of gene trees. This second method called GATC, treats the problem as a multi-objective optimisation problem in which we are looking for the set of gene trees optimal for both sequence data and information of gene family evolution through gene gain and loss. We show that this approach yields accurate trees and is suitable for the construction of reference datasets to benchmark other methods

    A NOVEL COMPUTATIONAL FRAMEWORK FOR TRANSCRIPTOME ANALYSIS WITH RNA-SEQ DATA

    Get PDF
    The advance of high-throughput sequencing technologies and their application on mRNA transcriptome sequencing (RNA-seq) have enabled comprehensive and unbiased profiling of the landscape of transcription in a cell. In order to address the current limitation of analyzing accuracy and scalability in transcriptome analysis, a novel computational framework has been developed on large-scale RNA-seq datasets with no dependence on transcript annotations. Directly from raw reads, a probabilistic approach is first applied to infer the best transcript fragment alignments from paired-end reads. Empowered by the identification of alternative splicing modules, this framework then performs precise and efficient differential analysis at automatically detected alternative splicing variants, which circumvents the need of full transcript reconstruction and quantification. Beyond the scope of classical group-wise analysis, a clustering scheme is further described for mining prominent consistency among samples in transcription, breaking the restriction of presumed grouping. The performance of the framework has been demonstrated by a series of simulation studies and real datasets, including the Cancer Genome Atlas (TCGA) breast cancer analysis. The successful applications have suggested the unprecedented opportunity in using differential transcription analysis to reveal variations in the mRNA transcriptome in response to cellular differentiation or effects of diseases

    Knowledge Discovery with Bayesian Networks

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Genetic characterisation of a range of geographically distinct Helicoverpa armigera nucleopolyhedrovirus (HearNPV) isolates and evaluation of biological activity against South African populations of the African bollworm, Helicoverpa armigera (Hu bner) (Lepidoptera: Noctuidae)

    Get PDF
    The African bollworm, Helicoverpa armigera HĂŒbner (Lepidoptera: Noctuidae) is a pest of economic and agricultural importance globally. It is a polyphagous pest that feeds on a wide range of host plants including economically important crops. The impact it has on agricultural systems makes its control a priority. The most common method of control is using chemical pesticides; however, continuous application of the pesticides has resulted in the development of resistance. The use of biological control has been investigated and established as an effective method of control as a standalone or part of an integrated pest management (IPM) system. The use of the baculovirus Helicoverpa armigera nucleopolyhedrovirus (HearNPV), has shown promise in the control of H. armigera. Commercial formulations based on the virus are available in many global markets. However, the identification of novel HearNPV isolates will aid in the control of H. armigera as well as provide alternative isolates that may have better virulence. Three new HearNPV isolates were purified and identified from three distinct geographical South African locations H. armigera cadavers and named HearNPV-Albany, HearNPV-KZN and HearNPV-Haygrove. The genomes of two of the HearNPV isolates, namely HearNPV-Albany and HearNPV-KZN were genetically characterised and compared to other geographically distinct HearNPV isolates. Virulence studies were performed comparing the new HearNPV isolates against established commercial HearNPV formulations, Helicovirℱ and HelicovexÂź and other geographically distinct isolated HearNPV, HearNPV-G4 and HearNPV-SP1. Two laboratory colonies were established using H. armigera collected from South African fields in the Belmont Valley near Grahamstown labelled as Albany colony and a colony provided from Haygrove Eden farm near George labelled as Haygrove colony. Biological studies were carried out using the Albany H. armigera colony comparing the rate of development, survival and fertility on bell green peppers, cabbage leaves and on artificial diet. From the biological studies, it was recorded that development and survivorship was best on artificial diet. Regular quality control was required for the maintenance of the colony and continuous generations of healthy larvae were eventually established. Diseased cadavers with signs of baculovirus infection were collected after bioprospecting from the Kwa-Zulu Natal Province in South Africa and were labelled KZN isolate; Belmont Valley near Grahamstown and were labelled Albany isolate; and Haygrove Eden farm near George and were labelled Haygrove isolate for the study. A fourth isolate made up of a crude extract of occlusion bodies (OBs) first described by Whitlock was also analysed and labelled Whitlock isolate. Occlusion bodies were extracted, purified and morphologically identified from the KZN, Albany, Haygrove and Whitlock isolates using TEM. Genomic DNA, which was extracted from the purified OBs. Using PCR, the identity of the OBs as HearNPV was confirmed. Genomic analyses were performed on HearNPV-Albany and HearNPV-KZN through genetic characterisation and comparison with other geographically distinct HearNPV genomes to confirm novelty and establish potential genetic relationships between the isolates through evolutionary distances. Full genomic sequencing of the isolated HearNPV and comparison with other geographically distinct HearNPV isolates identified genomic differences that showed that the HearNPV isolates were novel. HearNPV-Albany and HearNPV-KZN were successfully sequenced and identified as novel isolates with unique fragment patterns and unique gene sequences through deletions or insertions when compared to other geographically distinct HearNPV. This raised the potential for differences in biological activity against H. armigera larvae when tested through biological assays. HearNPV-Whit genome assembly had low quality data which resulted in many gaps and failed assembly. The biological activity of HearNPV isolates from Spain, China, South Africa and two commercial formulations were studied against the laboratory established H. armigera South African colony. The LC50 values of the different South African HearNPV isolates were established to be between 7.7 × 101 OBs.ml-1 for the most effective and 3.2 × 102 OBs.ml-1 for the least effective. The Spanish and Chinese HearNPV isolates resulted in LC50 values of 2.0 × 102 OBs.ml-1 and 1.2 × 101 OBs.ml-1 respectively. The commercial formulations resulted in the least virulence observed with an LC50 of 5.84× 102 OBs.ml-1 and 9.0 × 102 OBs.ml-1 for HelicovexÂź and Helicovirℱ respectively. In this study, novel South African HearNPV isolates were isolated and identified. Through characterisation and bioassays against South African H. armigera populations the HearNPV isolates were shown to have different virulence in comparison to geographically distinct isolates. From this research, there is potential for development of new H. armigera biopesticides based on the novel isolates after field trial testing

    Eight Biennial Report : April 2005 – March 2007

    No full text

    Phenotype prediction based on microRNA expression profiles: a novel diagnostic tool for inflammatory bowel disease

    Get PDF
    Inflammatory bowel disease (IBD) is a chronic relapsing disorder of the alimentary tract, encompassing Crohn's disease (CD) and ulcerative colitis (UC) as its two major subtypes. The diagnosis of inflammatory bowel disease remains a clinical challenge and involves the assessment of numerous parameters. A multitude of biomarkers has been proposed to complement the diagnostic process. However, none of them can be recommended for routine clinical practice. MicroRNAs (miRNAs) represent a class of short, non-coding RNAs that act as post-transcriptional regulators of gene expression. In the scope of this work we evaluate whether systematic miRNA or miRNA variant expression profiling, in conjunction with state-of-the-art machine learning techniques, is suitable as a non-invasive tool for diagnostics of IBD. In a first study we employed microarray technology to determine expression levels of 863 miRNAs for whole blood samples drawn from a cohort comprising 314 individuals, to establish miRNA signature being informative for the highly accurate distinction of CD and UC among each other as well as from healthy and inflammatory controls. In another study we extended this approach by additionally incorporating expression profiles of miRNA variants. We employed next generation sequencing technology to examine isomiR expression profiles drawn from a cohort of 515 individuals, comprising IBD cases as well as healthy and symptomatic controls. Incorporating distinctive isomiR signatures, we generated classifiers performing with median balanced accuracies of 78.57%/78.83% (untreated/treated CD vs. UC) and 100.00%/98.28% (untreated/treated CD or UC vs. HC), respectively. Hence, we provide sampling-based evidence for our models' superiority over established biomarkers.Bei entzĂŒndlichen Darmerkrankungen (IBD) handelt es sich um eine Gruppe chronischer, rezidivierender Krankheiten des Verdauungstraktes, die die Hauptformen Morbus Crohn (CD) und Colitis ulcerosa (UC) umfasst. Nach wie vor stellt die Diagnostik von IBD eine klinische Herausforderung dar und schließt die Bewertung zahlreicher Parameter ein. Als ErgĂ€nzung des diagnostischen Prozesses wurden zahlreiche Biomarker vorgeschlagen. FĂŒr die routinemĂ€ĂŸige klinische Praxis jedoch kann keiner dieser Biomarker empfohlen werden. Bei microRNAs (miRNAs) handelt es sich um eine Klasse kurzer, nicht-kodierender RNAs, die als post-transkriptionelle Regulatoren der Genexpression fungieren. Unter Verwendung globaler Expressionprofile sowie Methoden des maschinellen Lernens, wird im Rahmen der vorliegenden Arbeit die Eignung dieser MolekĂŒle als nicht-invasives Werkzeug fĂŒr die Diagnose von IBD evaluiert. In einer ersten Studie verwendeten wir Microarray-Technologie zur Quantifizierung blutbasierter Expressionslevel von 863 miRNAs, generiert auf Grundlage einer 314 Individuen umfassenden Kohorte. Die resultierenden Expressionsprofile bildeten die Grundlage fĂŒr die Etablierung von miRNA-Signaturen, hilfreich fĂŒr die Unterscheidung zwischen CD and UC, sowie deren Abgrenzung von gesunden/inflammatorischen Kontrollen. In einer weiteren Studie wurde dieser Ansatz um die BerĂŒcksichtung von miRNA-Varianten erweitert. Wir verwendeten Sequenzier-Technologie zur Bestimmung von isomiR-Expressionsprofilen, generiert auf Grundlage einer Kohorte von 515 Individuen, einschließlich IBD-Patienten sowie gesunder/symptomatischer Kontrollen. Unter Verwendung von isomiR-Signaturen generierten wir Modelle, deren KlassifikationgĂŒte wir mit medianen balanced accuracies von 78.57%/78.83% (CD vs. UC, unbehandelt/behandelt) bzw. 100.00%/98.28% (CD oder UC vs. HC, unbehandelt/behandelt) bewerten konnten. Damit liefern wir Belege fĂŒr eine Überlegenheit unserer Modelle bezĂŒglich etablierter Biomarker

    Détection des transferts horizontaux de gÚnes : modÚles et algorithmes appliqués à l'évolution des espÚces et des langues

    Get PDF
    Le transfert horizontal de gĂšnes (THG, ou transfert latĂ©ral de gĂšnes) est un mĂ©canisme d'Ă©volution naturel qui consiste en le transfert direct du matĂ©riel gĂ©nĂ©tique d'une espĂšce Ă  une autre. La possibilitĂ© que le transfert horizontal de gĂšnes puisse jouer un rĂŽle clĂ© dans l'Ă©volution biologique est un changement fondamental dans notre perception des aspects gĂ©nĂ©raux de la biologie Ă©volutive survenu ces derniĂšres annĂ©es. Par exemple, les bactĂ©ries et les virus possĂšdent des mĂ©canismes sophistiquĂ©s d'acquisition de nouveaux gĂšnes par transfert horizontal leur permettant de s'adapter et d'Ă©voluer adĂ©quatement dans leur environnement. Jusqu'Ă  tout rĂ©cemment, les mĂ©thodes de dĂ©tection de ce mĂ©canisme reposaient essentiellement sur l'analyse de sĂ©quences et Ă©taient trĂšs rarement automatisĂ©es. Il est impossible de reprĂ©senter l'Ă©volution d'organismes ayant subi des THG Ă  l'aide d'arbres phylogĂ©nĂ©tiques acycliques. La prĂ©sentation adĂ©quate est celle d'un rĂ©seau. Dans cette thĂšse, nous dĂ©crivons un nouveau modĂšle de ce mĂ©canisme d'Ă©volution, en se basant sur l'Ă©tude de diffĂ©rences topologiques et mĂ©triques entre un arbre d'espĂšces et un arbre du gĂšne infĂ©rĂ©s pour le mĂȘme ensemble d'espĂšces. Les mĂ©thodes qui en dĂ©coulent ont Ă©tĂ© appliquĂ©es Ă  des jeux de donnĂ©es rĂ©elles oĂč des hypothĂšses de transferts latĂ©raux de gĂšnes Ă©taient plausibles. Des simulations MontĂ©-Carlo ont Ă©tĂ© menĂ©es afin d'Ă©valuer la qualitĂ© des rĂ©sultats par rapport Ă  des mĂ©thodes existantes. Nous prĂ©sentons Ă©galement une gĂ©nĂ©ralisation du modĂšle de transferts horizontaux complets qui est applicable pour dĂ©tecter des transferts partiels et identifier des gĂšnes mosaĂŻques. Dans ce dernier modĂšle, on suppose qu'une partie seulement du gĂšne a Ă©tĂ© transfĂ©rĂ©e. Enfin, nous prĂ©sentons une application de ces nouvelles mĂ©thodes servant Ă  modĂ©liser des emprunts de mots survenus durant l'Ă©volution des langues indo-europĂ©ennes. \ud ______________________________________________________________________________ \ud MOTS-CLÉS DE L’AUTEUR : arbre phylogĂ©nĂ©tique, rĂ©seau rĂ©ticulĂ©, transfert horizontal de gĂšnes, critĂšre des moindres carrĂ©s, distance de Robinson et Foulds, dissimilaritĂ© de bipartitions, biolinguistique

    Reverse engineering of biological signaling networks via integration of data and knowledge using probabilistic graphical models

    Get PDF
    Motivation The postulate that biological molecules rather act together in intricate networks, pioneered systems biology and popularized the study on approaches to reconstruct and understand these networks. These networks give an insight of the underlying biological process and diseases involving aberration in these pathways like, cancer and neuro degenerative diseases. These networks can be reconstructed by two different approaches namely, data driven and knowledge driven methods. This leaves a critical question of relying on either of them. Relying completely on data driven approaches brings in the issue of overfitting, whereas, an entirely knowledge driven approach leaves us without acquisition of any new information/knowledge. This thesis presents hybrid approach in terms of integration of high throughput data and biological knowledge to reverse-engineer the structure of biological networks in a probabilistic way and showcases the improvement brought about as a result. Accomplishments The current work aims to learn networks from perturbation data. It extends the existing Nested Effects Model (NEMs) for pathway reconstruction in order to use the time course data, allowing the differentiation between direct and indirect effects and resolve feedback loops. The thesis also introduces an approach to learn the signaling network from phenotype data in form of images/movie, widening the scope of NEMs, which was so far limited to gene expression data. Furthermore, the thesis introduces methodologies to integrate knoowledge from different existing sources as probabilistic prior that improved the reconstruction accuracy of the network and could make it biologically more rational. These methods were finally integrated and for reverse engineering of more accurate and realistic networks. Conclusion The thesis added three dimensions to existing scope of network reverse engineering specially Nested Effects Models in terms of use of time course data, phenotype data and finally the incorporation of prior biological knowledge from multiple sources. The approaches developed demonstrate their application to understand signaling in stem cells and cell division and breast cancer. Furthermore the integrative approach shows the reconstruction of AMPK/EGFR pathway that is used to identify potential drug targets in lung cancer which were also validated experimentally, meeting one of the desired goals in systems biology
    corecore