10 research outputs found

    OligoRAP – an Oligo Re-Annotation Pipeline to improve annotation and estimate target specificity

    Get PDF
    Background - High throughput gene expression studies using oligonucleotide microarrays depend on the specificity of each oligonucleotide (oligo or probe) for its target gene. However, target specific probes can only be designed when a reference genome of the species at hand were completely sequenced, when this genome were completely annotated and when the genetic variation of the sampled individuals were completely known. Unfortunately there is not a single species for which such a complete data set is available. Therefore, it is important that probe annotation can be updated frequently for optimal interpretation of microarray experiments. Results - In this paper we present OligoRAP, a pipeline to automatically update the annotation of oligo libraries and estimate oligo target specificity. OligoRAP uses a reference genome assembly with Ensembl and Entrez Gene annotation supplemented with a set of unmapped transcripts derived from RefSeq and UniGene to handle assembly gaps. OligoRAP produces alignments of each oligo with the reference assembly as well as with unmapped transcripts. These alignments are re-mapped to the annotation sources, which results in a concise, as complete as possible and up-to-date annotation of the oligo library. The building blocks of this pipeline are BioMoby web services creating a highly modular and distributed system with a robust, remote programmatic interface. OligoRAP was used to update the annotation for a subset of 791 oligos from the ARK-Genomics 20 K chicken array, which were selected as starting material for the oligo annotation session of the EADGENE/SABRE Post-analysis workshop. Based on the updated annotation about one third of these oligos is problematic with regard to target specificity. In addition, the accession numbers or ids the oligos were originally designed for no longer exist in the updated annotation for almost half of the oligos. Conclusion - As microarrays are designed on incomplete data, it is important to update probe annotation and check target specificity regularly. OligoRAP provides both and due to its design based on BioMoby web services it can easily be embedded as an oligo annotation engine in customised applications for microarray data analysis. The dramatic difference in updated annotation and target specificity for the ARK-Genomics 20 K chicken array as compared to the original data emphasises the need for regular updates

    Microarray Expression Profiles of 20.000 Genes across 23 Healthy Porcine Tissues

    Get PDF
    BACKGROUND: Gene expression microarrays have been intensively applied to screen for genes involved in specific biological processes of interest such as diseases or responses to environmental stimuli. For mammalian species, cataloging of the global gene expression profiles in large tissue collections under normal conditions have been focusing on human and mouse genomes but is lacking for the pig genome. METHODOLOGY/PRINCIPAL FINDINGS: Here we present the results from a large-scale porcine study establishing microarray cDNA expression profiles of approximately 20.000 genes across 23 healthy tissues. As expected, a large portion of the genes show tissue specific expression in agreement with mappings to gene descriptions, Gene Ontology terms and KEGG pathways. Two-way hierarchical clustering identified expected tissue clusters in accordance with tissue type and a number of cDNA clusters having similar gene expression patterns across tissues. For one of these cDNA clusters, we demonstrate that possible tissue associated gene function can be inferred for previously uncharacterized genes based on their shared expression patterns with functionally annotated genes. We show that gene expression in common porcine tissues is similar to the expression in homologous tissues of human. CONCLUSIONS/SIGNIFICANCE: The results from this study constitute a valuable and publicly available resource of basic gene expression profiles in normal porcine tissues and will contribute to the identification and functional annotation of porcine genes

    Consolidating metabolite identifiers to enable contextual and multi-platform metabolomics data analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Analysis of data from high-throughput experiments depends on the availability of well-structured data that describe the assayed biomolecules. Procedures for obtaining and organizing such meta-data on genes, transcripts and proteins have been streamlined in many data analysis packages, but are still lacking for metabolites. Chemical identifiers are notoriously incoherent, encompassing a wide range of different referencing schemes with varying scope and coverage. Online chemical databases use multiple types of identifiers in parallel but lack a common primary key for reliable database consolidation. Connecting identifiers of analytes found in experimental data with the identifiers of their parent metabolites in public databases can therefore be very laborious.</p> <p>Results</p> <p>Here we present a strategy and a software tool for integrating metabolite identifiers from local reference libraries and public databases that do not depend on a single common primary identifier. The program constructs groups of interconnected identifiers of analytes and metabolites to obtain a local metabolite-centric SQLite database. The created database can be used to map in-house identifiers and synonyms to external resources such as the KEGG database. New identifiers can be imported and directly integrated with existing data. Queries can be performed in a flexible way, both from the command line and from the statistical programming environment R, to obtain data set tailored identifier mappings.</p> <p>Conclusions</p> <p>Efficient cross-referencing of metabolite identifiers is a key technology for metabolomics data analysis. We provide a practical and flexible solution to this task and an open-source program, the metabolite masking tool (MetMask), available at <url>http://metmask.sourceforge.net</url>, that implements our ideas.</p

    Bioconductor: open software development for computational biology and bioinformatics.

    Get PDF
    The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methods, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples

    Induction of a chemoattractant transcriptional response by a Campylobacter jejuni boiled cell extract in colonocytes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Campylobacter jejuni</it>, the commonest cause of bacterial diarrhoea worldwide, can also induce colonic inflammation. To understand how a previously identified heat stable component contributes to pro-inflammatory responses we used microarray and real-time quantitative PCR to investigate the transcriptional response to a boiled cell extract of <it>Campylobacter jejuni </it>NCTC 11168.</p> <p>Results</p> <p>RNA was extracted from the human colonocyte line HCA-7 (clone 29) after incubation for 6 hours with <it>Campylobacter jejuni </it>boiled cell extract and was used to probe the Affymetrix Human Genome U133A array. Genes differentially affected by <it>Campylobacter jejuni </it>boiled cell extract were identified using the Significance Score algorithm of the Bioconductor software suite and further analyzed using the Ingenuity Pathway Analysis program. The chemokines CCL20, CXCL3, CXCL2, Interleukin 8, CXCL1 and CXCL6 comprised 6 of the 10 most highly up-regulated genes, all with Significance Scores ≄ 10. Members of the Tumor Necrosis Factor α/Nuclear Factor-ÎșB super-family were also significantly up-regulated and involved in the most significantly regulated signalling pathways (Death receptor, Interleukin 6, Interleukin 10, Toll like receptor, Peroxisome Proliferator Activated Receptor-Îł and apoptosis). Ingenuity Pathway Analysis also identified the most affected functional gene networks such as cell movement, gene expression and cell death. In contrast, down-regulated genes were predominantly concerned with structural and metabolic functions.</p> <p>Conclusion</p> <p>A boiled cell extract of <it>Campylobacter jejuni </it>has components that can directly switch the phenotype of colonic epithelial cells from one of resting metabolism to a pro-inflammatory one, particularly characterized by increased expression of genes for leukocyte chemoattractant molecules.</p

    Silencing the host : the role of intronic microRNAs

    Get PDF
    Thesis (S.M.)--Harvard-MIT Division of Health Sciences and Technology, 2009.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 62-68).Fifteen years ago lin-4 was reported to be the first endogenous small non-coding, but interfering RNA structure involved in developmental timing in C. elegans. First thought not, or only rarely, to occur in mammals, microRNAs are now among the major players in up-to-date genomic research. The mature molecules are ~22 nucleotides in length and, by targeting predominantly the 3' UTR of mRNAs, lead to translational repression or degradation of the target message, hence controlling important cellular mechanisms, including division, differentiation and death. This key role makes them excellent targets for cancer research. In fact they have been shown to have a major impact on cancer development in many cases. However, miRNAs are not a homogeneous class and can be sub classified into intragenic and intergenic, depending on their genomic position. Whereas intergenic miRNAs are expected to be independent transcriptional units, intragenic miRNAs are commonly believed to be regulated through their host gene. Despite of the growing knowledge on how miRNAs integrate into cellular regulatory networks, our current knowledge about the specific role of intragenic miRNAs is rather limited. In this work we integrated current miRNA knowledge bases, ranging from miRNA sequence and genomic localization information to target prediction, with biochemical pathway information and publicly available expression data to investigate functional properties of intragenic miRNAs and their relationship to their host genes. To the best of our knowledge, we are the first to show in a large-scale analysis that intragenic miRNAs seem to act as negative feedback regulators on multiple levels. We furthermore investigated the impact of this model on the potential role of intronic miRNAs in cancer pathogenesis.by Ludwig Christian Giuseppe Hinske.S.M

    Web services for transcriptomics

    Get PDF
    Transcriptomics is part of a family of disciplines focussing on high throughput molecular biology experiments. In the case of transcriptomics, scientists study the expression of genes resulting in transcripts. These transcripts can either perform a biological function themselves or function as messenger molecules containing a copy of the genetic code, which can be used by the ribosomes as templates to synthesise proteins. Over the past decade microarray technology has become the dominant technology for performing high throughput gene expression experiments. A microarray contains short sequences (oligos or probes), which are the reverse complement of fragments of the targets (transcripts or sequences derived thereof). When genes are expressed, their transcripts (or sequences derived thereof) can hybridise to these probes. Many thousand copies of a probe are immobilised in a small region on a support. These regions are called spots and a typical microarray contains thousands or sometimes even more than a million spots. When the transcripts (or sequences derived thereof) are fluorescently labelled and it is known which spots are located where on the support, a fluorescent signal in a certain region represents expression of a certain gene. For interpretation of microarray data it is essential to make sure the oligos are specific for their targets. Hence for proper probe design one needs to know all transcripts that may be expressed and how well they can hybridise with candidate oligos. Therefore oligo design requires: 1. A complete reference genome assembly. 2. Complete annotation of the genome to know which parts may be transcribed. 3. Insight in the amount of natural variation in the genomes of different individuals. 4. Knowledge on how experimental conditions influence the ability of probes to hybridise with certain transcripts. Unfortunately such complete information does not exist, but many microarrays were designed based on incomplete data nevertheless. This can lead to a variety of problems including cross-hybridisation (non-specific binding), erroneously annotated and therefore misleading probes, missing probes and orphan probes. Fortunately the amount of information on genes and their transcripts increases rapidly. Therefore, it is possible to improve the reliability of microarray data analysis by regular updates of the probe annotation using updated databases for genomes and their annotation. Several tools have been developed for this purpose, but these either used simplistic annotation strategies or did not support our species and/ or microarray platforms of interest. Therefore, we developed OligoRAP (Oligo Re- Annotation Pipeline), which is described in chapter 2. OligoRAP was designed to take advantage of amongst others annotation provided by Ensembl, which is the largest genome annotation effort in the world. Thereby OligoRAP supports most of the major animal model organisms including farm animals like chicken and cow. In addition to support for our species and array platforms of interest OligoRAP employs a new annotation strategy combining information from genome and transcript databases in a non-redundant way to get the most complete annotation possible. In chapter 3 we compared annotation generated with 3 oligo annotation pipelines including OligoRAP and investigated the effect on functional analysis of a microarray experiment involving chickens infected with Eimeria bacteria. As an example of functional analysis we investigated if up- or downregulated genes were enriched for Terms from the Gene Ontology (GO). We discovered that small differences in annotation strategy could lead to alarmingly large differences in enriched GO terms. Therefore it is important to know, which annotation strategy works best, but it was not possible to assess this due to the lack of a good reference or benchmark dataset. There are a few limited studies investigating the hybridisation potential of imperfect alignments of oligos with potential targets, but in general such data is scarce. In addition it is difficult to compare these studies due to differences in experimental setup including different hybridisation temperatures and different probe lengths. As result we cannot determine exact thresholds for the alignments of oligos with non-targets to prevent cross-hybridisation, but from these different studies we can get an idea of the range for the thresholds that would be required for optimal target specificity. Note that in these studies experimental conditions were first optimised for an optimal signal to noise ratio for hybridisation of oligos with targets. Then these conditions were used to determine the thresholds for alignments of oligos with non-targets to prevent cross-hybridisation. Chapter 4 describes a parameter sweep using OligoRAP to explore hybridisation potential thresholds from a different perspective. Given the mouse genome thresholds were determined for the largest amount of gene specific probes. Using those thresholds we then determined thresholds for optimal signal to noise ratios. Unfortunately the annotation-based thresholds we found did not fall within the range of experimentally determined thresholds; in fact they were not even close. Hence what was experimentally determined to be optimal for the technology was not in sync with what was determined to be optimal for the mouse genome. Further research will be required to determine whether microarray technology can be modified in such a way that it is better suited for gene expression experiments. The requirement of a priori information on possible targets and the lack of sufficient knowledge on how experimental conditions influence hybridisation potential can be considered the Achiles’ heels of microarray technology. Chapter 5 is a collection of 3 application notes describing other tools that can aid in analysis of transcriptomics data. Firstly, RShell, which is a plugin for the Taverna workbench allowing users to execute statistical computations remotely on R-servers. Secondly, MADMAX services, which provide quality control and normalisation of microarray data for AffyMetrix arrays. Finally, GeneIlluminator, which is a tool to disambiguate gene symbols allowing researchers to specifically retrieve literature for their genes of interest even if the gene symbols for those genes had many synonyms and homonyms. Web services High throughput experiments like those performed in transcriptomics usually require subsequent analysis with many different tools to make biological sense of the data. Installing all these tools on a single, local computer and making them compatible so users can build analysis pipelines can be very cumbersome. Therefore distributed analysis strategies have been explored extensively over the past decades. In a distributed system providers offer remote access to tools and data via the Internet allowing users to create pipelines from modules from all over the globe. Chapter 1 provides an overview of the evolution of web services, which represent the latest breed in technology for creating distributed systems. The major advantage of web services over older technology is that web services are programming language independent, Internet communication protocol independent and operating system independent. Therefore web services are very flexible and most of them are firewall-proof. Web services play a major role in the remaining chapters of this thesis: OligoRAP is a workflow entirely made from web services and the tools described in chapter 5 all provide remote programmatic access via web service interfaces. Although web services can be used to build relatively complex workflows like OligoRAP, a lack of mainly de facto standards and of user-friendly clients has limited the use of web services to bioinformaticians. A semantic web where biologists can easily link web services into complex workflows does n <br/

    Expression Profiling by DNA Microarrays : Development of Amplification Methods for the Analysis of Minimal Tumor Samples

    Get PDF
    Recently, microarrays of synthetic long sense-oriented oligonucleotides were introduced as an alternative expression profiling platform with distinct advantages to both cDNA arrays and commercial arrays produced by in situ synthesis of multiple short oligonucleotides per gene. However, gene expression analysis using microarrays of long oligonucleotides is limited in that it requires substantial amounts of RNA. The objective of this thesis was to develop protocols that allow for the analysis of gene expression even in minimal samples. Two different approaches were taken, one that amplifies the RNA target material before hybridization and another that amplifies the signal generated on the array. Most existing target amplification protocols linearly amplify mRNA by cDNA synthesis and in vitro transcription. Since orientation of the product is antisense (aRNA), it is inapplicable for dye-labeling by reverse transcription and hybridization to sense-oriented oligonucleotide arrays. Here, a novel protocol (TAcKLE) is introduced in which a combination of two reverse and one forward transcription reactions followed by dye-incorporation using the Klenow fragment of E. coli DNA polymerase I generates fluorescent antisense cDNA. This protocol provides high fidelity and up to 105-fold amplification, starting from 2 ng total RNA. The generated data are highly reproducible and maintain relative gene expression levels between samples. Signal amplification is another option if only minimal amounts of sample material are available. Therefore, a method was evaluated that uses on-chip rolling circle replication of circularized oligonucleotides for the amplified detection of gene expression profiles. This principle should allow for a faster and cheaper experimental procedure, circumventing sequence-dependent amplification bias. The preliminary results provide evidence for the method’s applicability, but further experiments are required to reduce the required amount of starting material and to define a stable protocol. As the TAcKLE protocol performed particularly well, it was subsequently applied to evaluate the utility of spotted oligonucleotide microarrays compared to a widely-used and accepted commercial reference platform. There are numerous ways to perform global transcriptional profiling, among which microarray technology has certainly gained a premier position. The comparison of gene expression measurements obtained with different array-based approaches is therefore of substantial interest in order to clarify whether inter-platform differences may conceal biologically significant information. To address this concern, global gene expression was analyzed in a set of clinical head and neck squamous cell carcinoma samples, using both spotted oligonucleotide microarrays made from a large collection of 70-mer probes and commercial arrays produced by in situ synthesis of sets of multiple 25-mer oligonucleotides per gene. Expression measurements were compared for 4,425 genes represented on both platforms, which revealed strong correlations between the corresponding data sets and similar profiles of relative gene expression. In conclusion, combining the TAcKLE protocol with spotted oligonucleotide arrays is an attractive alternative for transcriptional profiling of limited source material, offering a high potential for gene expression analysis in a multitude of disease situations

    Etablierung der DNA-Mikroarray-Transkriptom-Analyse fĂŒr Halobacterium salinarum R1

    Get PDF
    Basierend auf der Genomsequenz von H. sal. R1 wurde ein genspezifischer Gesamt-Genom-DNA-Mikroarray konstruiert. Hierzu wurden fĂŒr jeden ORF des Genoms ORF-spezifische Oligonukleotide abgeleitet und zur spezifischen Amplifizierung der Genabschnitte mittels PCR eingesetzt. Nach Amplifizierung und Reinigung der Genabschnitte deckten die Produkte ĂŒber 97% des halobakteriellen Genoms ab. Zur Konstruktion des Gesamt-Genom-DNA-Mikroarrays wurde jede spezifische Gensonde in fĂŒnffacher Wiederholung auf den DNA-Array aufgebracht. Auf diese Weise wurde ein DNA-Mikroarray erstellt, der mit einer Gesamtzahl von 13545 genspezifischen Sonden, das bisher dichteste Raster eines archaealen DNA-Mikroarrays aufweist. Durch parallele genomweite Genexpressionsanalyse in H. sal. R1, wurde der Vergleich zwischen aerobem und phototrophem Wachstum in drei umfassenden DNA-Mikroarrayexperimenten gezogen. Die Mikroarrayexperimente wurden mit dem so genannten „common reference“ Experimentdesign durchgefĂŒhrt, bei dem eine Mischung aller RNA-Proben eines Experiments als Referenz bei den Hybridisierungen dient. Als weitere Vorraussetzung zur spĂ€teren statistischen Datenanalyse wurden die Transkriptomexperimente alle vier- bis fĂŒnfmal in unabhĂ€ngigen Experimenten wiederholt. Die Wahl der Referenz und die Anzahl der unabhĂ€ngigen biologischen Replikate haben die Basis geschaffen, die erhobenen Expressionsdaten mit Hilfe des R/MAANOVA Paktes der flexiblen und leistungsstarken statistischen ProgrammoberflĂ€che R auszuwerten. Eine leistungsstarke und flexible ProgrammoberflĂ€che zur Datenanalyse war unerlĂ€sslich, denn mit steigender KomplexitĂ€t eines Transkriptomexperiments, steigt auch die Anzahl der notwendigen Wiederholungen und damit einhergehend die Gesamtzahl der auszuwertenden Datenpunkte. FĂŒr ein durchgefĂŒhrtes Zeitreihenexperiment vom Wechsel aerobes zu phototrophem Wachstum mit sechs Zeitpunkten, fallen ca. 384.000 Datenpunkte an, fĂŒr deren Vorder- und Hintergrundwerte die statistischen Kennwerte berechnet werden mussten. Ein solcher statistischer Kennwert ist der so genannte p-Wert, der die Signifikanz eines Ergebnisses widerspiegelt. Auf der Basis dieser signifikanten p-Werte ist eine Liste von 242 Kandidatengenen erstellt worden, die als differentiell exprimiert angesehen werden. Ein Anteil von 54.5% dieser differentiell exprimierten Gene weist kein homologes Protein oder Funktion auf. Diese Tatsache, birgt die Chance sowohl die Existenz dieser ORFs, als auch ihre Funktion aufzuklĂ€ren. In diesem Zusammenhang wurden fĂŒr die hypothetischen ORFs OE3107F und OE3136F Deletionsmutanten hergestellt und nĂ€her charakterisiert. Dabei wurde festgestellt, dass die Deletionsmutanten R1D3107 und R1D3136 im Vergleich zum WT-Stamm H. sal. R1 deutliche Unterschiede in ihrer Pigmentzusammensetzung aufweisen. Beide DeletionsstĂ€mme weisen z.B. einen geringeren Gehalt an Bakteriorhodopsin auf. Somit hat die neu etablierte Methode der DNA-Mikroarray basierten Genexpressionsanalyse dazu beigetragen, zwei bisher unbekannte Kandidaten der Regulation der Expression des Bakteriorhodopsins in H. sal. R1 zu identifizieren. Durch zellfreie in vitro Expression des Gens OE3136F wurde ein möglicher Ansatz zur nĂ€heren Charakterisierung und FunktionsaufklĂ€rung aufgezeigt. Neben der Herstellung von Deletionsmutanten und deren Charakterisierung, wurde durch die Anwendung einer weiteren Datenanalyse mittels PCA (Hauptkomponentenanalyse) und dem Ansatz die erhobenen Transkriptomdaten auf Stoffwechselwegen abzubilden, zwei weitere denkbare Wege aufgezeigt, aus den ermittelten Expressionsdaten mehr Informationen zu erhalten. Die Ergebnisse aller Transkriptomexperimente fĂŒr H. sal. R1 stimmen mit den Ergebnissen frĂŒherer Arbeiten ĂŒberein und durch unabhĂ€ngige Methoden wie RT-PCR, Nothern-Blot-Analysen und Proteomvergleich konnten die Resultate der Expressionsanalysen eindeutig verifiziert werden. Die Konstruktion und Herstellung der H. sal. R1 Gesamtgenom-Mikroarrays und Ausarbeitung eines Standardprotokolls zur VersuchsdurchfĂŒhrung, bilden die Grundlage aller Transkriptomexperimente der Arbeitsgruppe. Daneben ermöglicht die Schaffung einer bioinformatischen Infrastruktur zur statistisch signifikanten Auswertung der DNA-Mikroarray-Hybridisierungsergebnisse die Erstellung einer Transkriptomdatenbank, die durch AnknĂŒpfung an die bereits vorhandene HaloLex-Datenbank jedem Nutzer fĂŒr weitere Mikroarray-Experimente mit anderer Fragestellung in leichter Form zur VerfĂŒgung steht. Abschließend kann gesagt werden, dass die DNA-Mikroarray basierende Transkriptomanalyse von H. sal. R1 dazu beigetragen hat das Wissen ĂŒber den Prozess der Anpassung an das phototrophe Wachstum zu erweitern. Die in der Arbeit erhobenen Daten bilden die Grundlage einer Datensammlung, die es zu einem spĂ€teren Zeitpunkt ermöglichen wird, ĂŒber viele verschiedene Experimente hinweg neue Co-Regulationen von Genen zu erfassen und damit neue Gene und VerknĂŒpfungen zwischen Stoffwechselwegen schnell und einfach zu detektieren. Die vorliegende Arbeit kann als Ausgangspunkt fĂŒr genomweite funktionelle Charakterisierung haloarchaealer Genexpression und ihrer Regulation angesehen werden. Dieser Punkt ist im Hinblick auf die wachsende Bedeutung der Systembiologie von entscheidender Wichtigkeit, denn nur auf der Basis von soliden experimentellen Ergebnissen können Modelle aufgestellt und verbessert werden
    corecore