408 research outputs found

    Very Important Pool (VIP) genes – an application for microarray-based molecular signatures

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Advances in DNA microarray technology portend that molecular signatures from which microarray will eventually be used in clinical environments and personalized medicine. Derivation of biomarkers is a large step beyond hypothesis generation and imposes considerably more stringency for accuracy in identifying informative gene subsets to differentiate phenotypes. The inherent nature of microarray data, with fewer samples and replicates compared to the large number of genes, requires identifying informative genes prior to classifier construction. However, improving the ability to identify differentiating genes remains a challenge in bioinformatics.</p> <p>Results</p> <p>A new hybrid gene selection approach was investigated and tested with nine publicly available microarray datasets. The new method identifies a Very Important Pool (VIP) of genes from the broad patterns of gene expression data. The method uses a bagging sampling principle, where the re-sampled arrays are used to identify the most informative genes. Frequency of selection is used in a repetitive process to identify the VIP genes. The putative informative genes are selected using two methods, t-statistic and discriminatory analysis. In the t-statistic, the informative genes are identified based on p-values. In the discriminatory analysis, disjoint Principal Component Analyses (PCAs) are conducted for each class of samples, and genes with high discrimination power (DP) are identified. The VIP gene selection approach was compared with the p-value ranking approach. The genes identified by the VIP method but not by the p-value ranking approach are also related to the disease investigated. More importantly, these genes are part of the pathways derived from the common genes shared by both the VIP and p-ranking methods. Moreover, the binary classifiers built from these genes are statistically equivalent to those built from the top 50 p-value ranked genes in distinguishing different types of samples.</p> <p>Conclusion</p> <p>The VIP gene selection approach could identify additional subsets of informative genes that would not always be selected by the p-value ranking method. These genes are likely to be additional true positives since they are a part of pathways identified by the p-value ranking method and expected to be related to the relevant biology. Therefore, these additional genes derived from the VIP method potentially provide valuable biological insights.</p

    The Diversity of Cortical Interneurons

    Get PDF
    The cortex is involved in diverse higher cognitive processes including decision making,motor planning, sensory discrimination, and memory consolidation. The cortical interneurons are key elements of the cortical system. These interneurons stabilize networks, but at the same time they also add non-linear effects to the excitatory system to make the cortical network more dynamic. To achieve this, cortical interneurons form a very heterogeneous group, making it hard to classify them without markers. We took a BACtrap approach performing translating ribosome affinity purifications on transgenic mice with Bacterial artificial chromosome, for systematic discovery of markers for different cell types. First, we generated BACtrap lines for known markers of mixed interneuron populations. After IHC (immnohistochemistry) characterization of each line, we picked 4 lines for Dlx1, Nek7, Htr3a and Cort genes for further studies. We collected mRNAs from targeted neurons in each line and performed gene profiling. Based on IHC and gene profiling studies, we found that each of the 4 lines labeled different but overlapping interneuron populations in the cortex. Second, we performed a comparative microarray analysis to find genes that showed differential enrichment in each of the 4 populations and we found ~20 genes as candidate marker genes. To examine their potential role as marker genes, we generated BAC transgenic mice for these candidate genes and also examined their DRP (Density Recovery Profile) on ISH images from the ABA (Allen Brain Atlas). We found that a number of candidate genes showed regular spacing of cell bodies, suggesting that those genes might label a functionally homogenous group. Third, we characterized new Cre lines for candidate marker genes, Rbp4 and Oxtr, to investigate their cell types and functional roles by using Cre/loxP system. Both Rbp4 and Oxtr Cre are heterogeneous in terms of their neurochemical profiles, but DRP analysis on Oxtr Cre neurons suggested their potential to be a functionally homogenous group. Cre dependent AAV injection also revealed a tiling property of Oxtr Cre neurons in the somatosensory cortex. Connectivity of three different Cre lines (Rbp4, Oxtr, Htr3a) was also examined using retrograde monosynaptic rabies virus tracers. Although three lines expressed Cre proteins in different interneuron populations, the presynaptic inputs were almost identical except for a few differences. However, each line had a different preference in inputs and we found line specific inputs from the hippocampus and the dopaminergic nuclei. In short, we carried out systematic marker searches and the generation of transgenic mice. Our findings suggest the existence of better markers for interneuron cell types, and we also showed that a group that is heterogeneous at the cellular level could work as a functionally homologous group. New interneuronal Cre lines showed a few differences in presynaptic inputs and created new opportunities for us to understand the functional differences of distinct cell types

    Pathway-Based Multi-Omics Data Integration for Breast Cancer Diagnosis and Prognosis.

    Get PDF
    Ph.D. Thesis. University of Hawaiʻi at Mānoa 2017

    Studies In Patients With Surgically Resected Pancreatic Neuroendocrine Tumours - MicroRNA Expression And Clinical Correlation

    Get PDF
    INTRODUCTION Pancreatic Neuroendocrine Tumours (PNETS) have increased in incidence over the past three decades. Treatment options currently include surgery, locoregional and systemic therapies, however the prognosis remains poor and biomarkers that accurately predict the clinical behavior of these tumours are lacking. Dysregulation of microRNAs (miRNAs) has recently been shown to play a role in the development of many cancers through post-transcriptional gene regulation, however, few studies have investigated the role of miRNAs as diagnostic or prognostic markers in PNETs. METHODS Patients undergoing resection of PNETs at our institutions between 1992 and 2014 were retrospectively included in this study. RNA was extracted from formalin-fixed, paraffin embedded PNET samples and microarray analysis performed and correlated with clinicopathological data. MicroRNAs with statistically significant differential expression between patients with locoregional disease only, versus those who developed metastases were subject to in-silico analysis using three target gene prediction databases. RESULTS 37 patients were included in the study. Patient subgroup DM had poorer overall survival (OS) as compared to subgroup L (p = 0.046). 506 miRNAs with differential expression between the ‘Distant metastasis’ group and ‘Locoregional’ group were identified. Of these, 265 miRNAs were downregulated, whilst 241 were upregulated. Four of these miRNAs were differentially expressed to statistical significance. These included miR-3653 which was upregulated and miR-4417, miR-574-3p and miR-664b-3p which were all downregulated. Only miRNA-3653 was identified by all three databases as having a potential PNET-related target; ATRX. CONCLUSION Higher expression of tumour miR-3653 was seen in the group of patients with metastatic disease compared to those with only locoregional disease. Several bioinformatic tools predicted transcriptional regulator ATRX as a possible target for miR-3653. Thus, it is possible that miR-3653 could be a biomarker for metastatic potential and consequently poorer prognosis in patients with PNETs. However, these conclusions cannot be made with certainty given the limitations of this study and further work beginning with validation is required

    Statistical Methods for Analyzing Population-scale Genomic and Transcriptomic Data

    Full text link
    The study of genetics is an integral part to understanding the biology behind our complex traits and can be approached in a variety of ways. Technological advancements in the field of genomics have enabled unprecedented large-scale studies which have identified numerous statistical associations between many diseases and our genes. Recently, studies involving gene expression have become an increasingly popular approach to understanding the biological pathways underlying statistical associations. In this dissertation, I address specific challenges related to the study of gene expression, including meta-imputation of expression across multiple datasets with only summary-level imputation models available, correcting for technical biases towards reference alleles in array-based expression assays, and identifying tissue-specific and population-specific regulatory variants and trait-associated loci in the context of systems genetics with whole genome sequencing, transcriptomics profiles, morphometric traits, and clinical endpoints. In Chapter 2, I develop a method which leverages multiple datasets to accurately impute tissue-specific gene expression levels. Our method, Smartly Weighted Averaging across Multiple Tissues (SWAM) does not train directly from data, but rather performs a meta-imputation by combines extant imputation models by assigning weights based on their predictive performance and similarity to the tissue of interest. I demonstrate that when using the same set of resources, SWAM improves imputation accuracy compared to existing approaches that impute tissue-specific expression by training directly from raw data. The major benefit of using the SWAM meta-imputation framework is the flexibility to combine multiple pre-trained imputation models trained from privacy-protected raw datasets. Indeed, prediction accuracy is substantially improved when integrating multiple datasets, highlighting the importance of using multiple datasets. In Chapter 3, I examine the benefits of using deep whole genome sequencing to empower and refine existing microarray-based eQTL studies. I revisited a well-known hybridization bias that arises in microarray studies caused by genetic polymorphisms within target probe sequences. In this chapter, I interrogated the impact of genetic variants from whole genome sequencing to accurately identify and characterize this bias at both the probe and probeset level. I evaluated several approaches to account for hybridization bias, including methods to remove variant-overlapping probes, and a novel method to adjust hybridization bias for each probe. I demonstrate that accounting for variant-overlapping probes when quantifying expression levels reduces reference bias and false positives in cis-eQTL analyses. I also demonstrate that adjusting for hybridization bias with deeply sequenced genomes is ideal to avoid reference bias, although leveraging publicly available variant catalogues such as the 1000 Genomes data provides comparable benefits. In Chapter 4, I performed a systems genetic study of Pima Native Americans enrolled in a diabetic nephropathy study. I integrate whole genome sequences, transcriptomic profiles, and morphometric traits derived from two micro-dissected renal compartments – glomerular and tubulointerstitial – and clinical phenotypes to identify significant associations between these molecular and complex traits. I identified thousands of eQTLs, including kidney-specific and population-specific eQTLs. I also identified many transcriptional associations with morphometric and clinical phenotypes enriched for kidney-specific biological pathways. Moreover, through dimension reduction techniques, I identified genome-wide significant genetic associations with a morphometric trait (podocyte volume), and with a composite trait representing albumin-creatin ration and glomerular surface volume, which was obtained from dimensionality reduction techniques. Studying this unique and richly-phenotyped cohort resulted many population- and tissue-specific regulatory variants, genes, and pathways implicated for renal disease progression.PHDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/170016/1/aeyliu_1.pd

    Identification of omic profiles for diagnosis and monitoring of bladder cancer

    Get PDF
    La presente Tesis titulada Identificación de perfiles ómicos para el diagnóstico y la monitorización del cáncer de vejiga se centra en la identificación de biomarcadores metabolómicos urinarios no invasivos para el diagnóstico y la monitorización del cáncer de vejiga (CaV). Con este fin, se han utilizado dos plataformas analíticas: la Resonancia Magnética Nuclear (Nuclear Magnetic Resonance, 1H NMR) y la Cromatografía Líquida de alta resolución acoplada a la Espectrometría de Masas (Ultraperformance Liquid Chromatography–Mass Spectrometry, UPLC-MS). Además, se han analizado tejidos vesicales mediante la técnica de Resonancia Magnética Nuclear de Alta Resolución con Giro de Ángulo Mágico (High-Resolution Magic Angle Spinning NMR, HRMAS NMR) para obtener más información sobre las vías metabólicas alteradas en el CaV y evaluar su relación con los perfiles metabólicos urinarios alterados. Por otro lado, se han realizado análisis transcriptómicos en tejidos vesicales para identificar genes metabólicos clave en el CaV. Finalmente, se han llevado a cabo estudios integradores con los datos metabolómicos y transcriptómicos para estudiar las conexiones entre genes y metabolitos y establecer su asociación con el metaboloma urinario. Inicialmente, en el capítulo cuarto se presenta un perfil metabólico capaz de distinguir los tejidos tumorales de los tejidos no tumorales con una sensibilidad y especificidad del 100%, independientemente del estadio y el grado del tumor. Además, se muestran los metabolitos que forman parte de este perfil, así como las vías metabólicas alteradas asociadas a la carcinogénesis vesical. Por otro lado, tras la realización de análisis transcriptómicos en esos mismos tejidos, se detalla como los genes metabólicos están siendo regulados a la baja en los tumores de vejiga mediante la acción de represores transcripcionales, marcas de histonas y procesos de splicing alternativo. Además, mediante un análisis integrativo entre los datos metabolómicos y transcriptómicos, se muestra la concordancia entre los resultados obtenidos a través de estas dos técnicas que representan diferentes niveles de regulación molecular. Finalmente, se muestra un perfil metabólico urinario identificado mediante 1H NMR capaz de distinguir orinas con CaV de orinas control (recogidas después de la cirugía) con una significativa sensibilidad (90,9%) y especificidad (76,9%). Como las muestras de orina y de tejido se recogieron de los mismos pacientes, al final del capítulo se describen las conexiones encontradas entre las rutas metabólicas alteradas en tejidos y orinas Los siguientes dos capítulos de la tesis se centran principalmente en la búsqueda de biomarcadores no invasivos de CaV en muestras de orina para el seguimiento de esta enfermedad a través de dos técnicas analíticas. Con el objetivo de validar el perfil urinario como biomarcador de monitorización de CaV, se llevó a cabo un estudio con muestras urinarias adicionales de pacientes con CaV no músculo-invasivo (CVNMI). Las muestras urinarias se recogieron mensualmente durante un período de seguimiento activo. El perfil metabolómico urinario detectado mediante 1H NMR presentó sensibilidades y especificidades alrededor del 85% en la clasificación de las orinas tumorales, e incluso detectó las recidivas tumorales en un estado temprano de su desarrollo, anticipándose en algunos casos a la visualización de éstas mediante cistoscopia. Este quinto capítulo también detalla los metabolitos discriminantes que forman parte de este perfil metabolómico y su relación con las rutas bioquímicas alteradas en el CaV. Finalmente, el sexto capítulo de la tesis muestra los resultados de un estudio clínico llevado a cabo con un gran número de muestras de orina recolectadas de pacientes con CVNMI antes y después de la cirugía, así como durante un periodo de seguimiento posterior. En este caso, las muestras urinarias se analizaron mediante UPLC-MS y se estudiaron las vías metabólicas perturbadas vinculadas al CaV. Algunas muestras de orina fueron comunes a las analizadas por 1H NMR, y en general, los datos de los dos estudios fueron concordantes. El análisis de las trayectorias longitudinales del biomarcador metabólico urinario capaz de discriminar las muestras tumorales de las controles permitió una evaluación preliminar de su utilidad como biomarcador de seguimiento para la detección de las recurrencias en pacientes con CVNMI. En general, los resultados presentados en esta tesis respaldan la hipótesis de la existencia de una huella metabólica urinaria vinculada a las alteraciones tumorales presentes en los tejidos vesicales, capaz de detectar y predecir las recurrencias durante el período de vigilancia en pacientes con CVNMI. Además, los buenos resultados obtenidos y la concordancia entre ambos estudios urinarios (1H NMR y UPLC-MS) posicionan la metabolómica al frente de las técnicas ómicas para la búsqueda de biomarcadores robustos y dinámicos que reflejen la biología del tumor.The present Thesis entitled Identification of omic profiles for diagnosis and monitoring of bladder cancer is focused on identifying non-invasive urinary metabolomic biomarkers of diagnosis and monitoring of bladder cancer (BC). In order to achieve this objective, two analytical strategies based on Nuclear Magnetic Resonance (1H NMR) and Ultraperformance Liquid Chromatography–Mass Spectrometry (UPLC-MS) have been used for the analysis of urine samples. Besides, bladder tissue samples have been analyzed by High-Resolution Magic Angle Spinning NMR (HRMAS NMR) technique to get further insight into altered metabolic pathways in BC and assess their link with altered urinary metabolomic profiles. On the other hand, transcriptomic analysis has been carried out in bladder tissues to identify key metabolic genes in BC. Additionally, integrative studies using metabolomic and transcriptomic data have been performed to study the gene-metabolite networks in BC and its association with the altered urinary metabolome. Initially, a metabolic profile able to distinguish BC tissues from non-tumor tissues with a sensitivity and specificity of 100%, independently of stage and grade of the tumor, is presented in chapter four. Moreover, the metabolites that take part of this profile are showed, as well as, the disturbed metabolic pathways linked to BC carcinogenesis. On the other hand, the transcriptomic analysis performed in these same tissues is described, indicating principally that metabolic genes are downregulated in bladder tumors and that transcriptional repressors, histone marks, and alternatively splicing processes may be regulating those genes. Additionally, an integrative analysis between metabolomic and transcriptomic data is detailed, showing concordance between the results obtained through these two techniques that represent different levels of molecular regulation. Finally, a 1H NMR-based urinary metabolic profile capable of distinguishing BC urines from control urines (collected after surgery) with significant sensitivity (90.9%) and specificity (76.9%) is shown. Urine and tissue samples were collected from the same patients, so at the end of chapter, the connections between the perturbed metabolic pathways in tissues and urines are described. The following two chapters of the thesis are focused principally on searching non-invasive biomarkers of BC in urine samples for monitoring this disease by means of two analytical techniques. In order to validate the urinary profile as a biomarker for monitoring, a study with additional urinary samples collected from patients with NMIBC was carried out. Urinary samples were collected monthly during a follow-up period. The urinary 1H NMR metabolic profile showed a sensitivity and specificity around 85% classifying BC urines. Moreover, tumor recurrences were detected by the metabolic profile in an early stage of disease, anticipating in some cases to the BC visualization by cystoscopy. The altered metabolic pathways in the urinary metabolome were also identified. Finally, the sixth chapter of the thesis exhibits the results of an investigative clinical study carried out in a large number or urinary samples collected from NMIBC patients before and after surgery, as well as during the subsequent surveillance. In this case, the urine samples were analyzed through UPLC-MS and perturbed metabolic pathways were assessed. Some urinary samples were common to those analyzed by 1H NMR, and data were in agreement. The analysis of the longitudinal trajectories of the metabolic biomarker discriminating between BC and control samples allowed a preliminary evaluation of its potential utility to monitor NMIBC relapse in patients undergoing surveillance for tumor recurrence. On the whole, the results presented in this thesis give support to the hypothesis of the existence of a urinary metabolic signature linked with tumor alterations in BC tissues able to detect and predict recurrences during the surveillance period of patients with NMIBC. Moreover, the good results obtained and the concordance between the urinary analyses by 1H NMR and UPLC–MS highlight the metabolomics as a competitive omic for searching biomarkers, since offers robust and dynamic information about the biology of the tumor

    Data based system design and network analysis tools for chemical and biological processes

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    A module based approach for identifying driver genes and expanding pathways from integrated biological networks

    Full text link
    Each gene or protein has its own function which, when combined with others, allows the group to perform more complex behaviors, e.g. carry out a particular cellular task (functional module) or affect a particular disease phenotype (disease module). One of the major challenges in systems biology is to reveal the roles of genes or proteins in functional modules or disease modules. In the first part of the dissertation, I present a data-driven method, Correlation Set Analysis (CSA), for comprehensively detecting active regulators in disease populations by integrating co-expression analysis and specific types of literature-derived causal relationships. Instead of investigating the co-expression level between regulators and their targets, I focus on coherence of regulatees of a regulator, e.g. downstream targets of a transcription factor. Using simulated datasets I show that my method can reach high true positive rate and true negative rate (>80%) even the regulatory relationships is weak (only 20% of regulatees are co-expressed). Using three separate real biological datasets I was able to recover well-known and as- yet undescribed, active regulators for each disease population. In the second part of the dissertation, I develop and apply a new computational algorithm for detecting modules of functionally related genes that are likely to drive malignant transformation. The algorithm takes as input the identity and locations of a small number of known oncogenes (a seed set) on a human genome functional linkage network (FLN). It then searches for a boundary surrounding a gene set encompassing the seed, such that the magnitude of the difference in linkage weights between interior-interior gene pairs, and interior-exterior gene pairs is maximized. Starting with small seed sets for breast and ovarian cancer, I successfully identify known and novel drivers in both cancer types. In the third part of the dissertation, I propose a module based approach for expanding manually curated functional modules. I use the KEGG pathway database as an example and the results show that my approach can successfully suggest both validated pathway members (genes that are assigned to a particular pathway by other manually curated pathway databases) and novel candidate pathway genes
    corecore