1,241 research outputs found

    Linear model for fast background subtraction in oligonucleotide microarrays

    Get PDF
    One important preprocessing step in the analysis of microarray data is background subtraction. In high-density oligonucleotide arrays this is recognized as a crucial step for the global performance of the data analysis from raw intensities to expression values. We propose here an algorithm for background estimation based on a model in which the cost function is quadratic in a set of fitting parameters such that minimization can be performed through linear algebra. The model incorporates two effects: 1) Correlated intensities between neighboring features in the chip and 2) sequence-dependent affinities for non-specific hybridization fitted by an extended nearest-neighbor model. The algorithm has been tested on 360 GeneChips from publicly available data of recent expression experiments. The algorithm is fast and accurate. Strong correlations between the fitted values for different experiments as well as between the free-energy parameters and their counterparts in aqueous solution indicate that the model captures a significant part of the underlying physical chemistry.Comment: 21 pages, 5 figure

    Generation of subspecies level-specific microbial diagnostic microarrays using genes amplified from subtractive suppression hybridization as microarray probes

    Get PDF
    The generation of microarray probes with specificity below the species level is an ongoing challenge, not least because the high-throughput detection of microorganisms would be an efficient means of identifying environmentally relevant microbes. Here, we describe how suppression subtractive hybridization (SSH) can be applied to the production of microarray probes that are useful for microbial differentiation at the subspecies level. SSH was used to initially isolate unique genomic sequences of nine Salmonella strains, and these were validated in quadruplicate by microarray analysis. The results obtained indicate that a large group of genes subtracted by SSH could serve together, as one probe, for detecting a microbial subspecies. Similarly, the whole microbial genome (not subjected to SSH) can be used as a species-specific probe. The detailed methods described herein could be used and adapted for the estimation of any cultivable bacteria from different environments

    Positive selection of hearing loss candidate genes,based on multiple microarray platforms experiments and data mining

    Get PDF
    2006/2007Secondo le stime del World Health Organization, le perdite uditive colpiscono circa 278 milioni di persone in tutto il mondo. Approssimativamente 1 bambino ogni 100, nasce con problemi d’udito. Nonostante l’identificazione negli ultimi 10 anni di più di 100 loci genetici associati a fenotipi di perdita uditiva, non tutti i corrispettivi geni causativi sono stati identificati. Normalmente utilizzando un approccio sperimentale di linkage tradizionale non è sempre possibile identificare un intervallo genomico sufficientemente corto da essere analizzato per la ricerca di mutazioni. Il lavoro presentato in questa tesi ha lo scopo di selezionare un set limitato di geni potenzialmente coinvolti nelle perdite uditive non sindromiche, utilizzando la combinazione di un approccio biologico e bioinformatico. Il punto di partenza dell’analisi è stato il gene GJB2. Il gene GJB2 codifica la Connessina 26, proteina coinvolta nella formazione delle gap junction tra le cellule, ma anche implicata in più del 50% dei casi di perdite uditive non sindromiche. Per questa ragione è stato suggerito un ruolo chiave nella biologia dell’orecchio, che va oltre la sua funzione di proteina canale. In questa tesi è stato esaminato il profilo d’espressione genica di cellule HeLa transfettate con la forma naturale e con delle forme mutate della Connessina26. Le analisi dei dati hanno identificato numerosi geni differenzialmente espressi e si è quindi deciso di passare ad un approccio informatico per ridurne il numero. Questa analisi ha permesso di identificare 19 geni in 11 loci privi di geni causativi selezionandoli in base alla loro espressione rispetto librerie di cDNA prodotte da orecchio. Sono stati quindi identificati i geni omologhi in topo per 5 dei 19 geni, con lo scopo di verificare la loro rilevanza con la perdita uditiva. Per tutti questi 5 geni è stata confermata l’espressione nell’organo di corti in topo e con Real-time RT-PCR nelle linee cellulari transfettate impiegate negli esperimenti di microarray. Il progetto proseguirà ora con lo screening di mutazioni nei geni candidati in famiglie di pazienti selezionate.According to WHO estimates hearing impairment affects 278 million people worldwide. Approximately 1/1000 children are born with a significant hearing impairment. To date approximately 100 genetic loci involved in deafness have been described. Despite the fact that such a large number of genetic locations associated with deafness phenotypes are known, not all the genes involved have been identified yet. Using a traditional linkage approach, however, it is not always possible to map a locus to intervals short enough to be amenable for costly mutation analysis. So far no more than 40 deafness genes have been identified and these encode very heterogeneous proteins. The work presented in this thesis aims to identify a limited set of candidate genes with high potential to be involved in Non-Syndromic Hearing Loss using a combination of biological and bioinformatics approaches. The starting point of the analysis was the GJB2 gene. The GJB2 gene encodes for the gap junction protein Connexin26 and is responsible for more than half of the non-syndromic hearing loss cases. For this reason it has been proposed that this protein might play a wider role in the biology of the ear, beyond its mere channel function. I therefore performed whole genome expression profiles of HeLa cells transfected with the wild type form of the GJB2 gene and compared them to that of cells transfected with mutant forms of this gene to shed light on its function. Initially this experiment yielded a bewildering number of differentially expressed genes (4,984). Thus I devised an in silico strategy to narrow down this number, focusing on genes which were positionally linked to specific non-syndromic hereditary hearing loss conditions, as well as found within human ear cDNA libraries, thus potentially causative of the disease. This further analysis yielded 19 genes within 11 loci. In order to assess their relevance to hearing loss, the mouse homologs of these genes were identified for 5 of them and indeed they were all found to be expressed in the mouse organ of corti. These five genes were also validated by Real-time RT-PCR in the human cell line used for the microarray experiments.197

    Gene Expression : From Microarrays to Functional Genomics

    Get PDF
    The time of the large sequencing projects has enabled unprecedented possibilities of investigating more complex aspects of living organisms. Among the high-throughput technologies based on the genomic sequences, the DNA microarrays are widely used for many purposes, including the measurement of the relative quantity of the messenger RNAs. However, the reliability of microarrays has been strongly doubted as robust analysis of the complex microarray output data has been developed only after the technology had already been spread in the community. An objective of this study consisted of increasing the performance of microarrays, and was measured by the successful validation of the results by independent techniques. To this end, emphasis has been given to the possibility of selecting candidate genes with remarkable biological significance within specific experimental design. Along with literature evidence, the re-annotation of the probes and model-based normalization algorithms were found to be beneficial when analyzing Affymetrix GeneChip data. Typically, the analysis of microarrays aims at selecting genes whose expression is significantly different in different conditions followed by grouping them in functional categories, enabling a biological interpretation of the results. Another approach investigates the global differences in the expression of functionally related groups of genes. Here, this technique has been effective in discovering patterns related to temporal changes during infection of human cells. Another aspect explored in this thesis is related to the possibility of combining independent gene expression data for creating a catalog of genes that are selectively expressed in healthy human tissues. Not all the genes present in human cells are active; some involved in basic activities (named housekeeping genes) are expressed ubiquitously. Other genes (named tissue-selective genes) provide more specific functions and they are expressed preferably in certain cell types or tissues. Defining the tissue-selective genes is also important as these genes can cause disease with phenotype in the tissues where they are expressed. The hypothesis that gene expression could be used as a measure of the relatedness of the tissues has been also proved. Microarray experiments provide long lists of candidate genes that are often difficult to interpret and prioritize. Extending the power of microarray results is possible by inferring the relationships of genes under certain conditions. Gene transcription is constantly regulated by the coordinated binding of proteins, named transcription factors, to specific portions of the its promoter sequence. In this study, the analysis of promoters from groups of candidate genes has been utilized for predicting gene networks and highlighting modules of transcription factors playing a central role in the regulation of their transcription. Specific modules have been found regulating the expression of genes selectively expressed in the hippocampus, an area of the brain having a central role in the Major Depression Disorder. Similarly, gene networks derived from microarray results have elucidated aspects of the development of the mesencephalon, another region of the brain involved in Parkinson Disease.The time of the large sequencing projects has enabled unprecedented possibilities of investigating more complex aspects of living organisms. Among the high-throughput technologies based on the genomic sequences, the DNA microarrays are widely used for many purposes, including the measurement of the relative quantity of the messenger RNAs. However, the reliability of microarrays has been strongly doubted as robust analysis of the complex microarray output data has been developed only after the technology had already been spread in the community. An objective of this study consisted of increasing the performance of microarrays, and was measured by the successful validation of the results by independent techniques. To this end, emphasis has been given to the possibility of selecting candidate genes with remarkable biological significance within specific experimental design. Along with literature evidence, the re-annotation of the probes and model-based normalization algorithms were found to be beneficial when analyzing Affymetrix GeneChip data. Typically, the analysis of microarrays aims at selecting genes whose expression is significantly different in different conditions followed by grouping them in functional categories, enabling a biological interpretation of the results. Another approach investigates the global differences in the expression of functionally related groups of genes. Here, this technique has been effective in discovering patterns related to temporal changes during infection of human cells. Another aspect explored in this thesis is related to the possibility of combining independent gene expression data for creating a catalog of genes that are selectively expressed in healthy human tissues. Not all the genes present in human cells are active; some involved in basic activities (named housekeeping genes) are expressed ubiquitously. Other genes (named tissue-selective genes) provide more specific functions and they are expressed preferably in certain cell types or tissues. Defining the tissue-selective genes is also important as these genes can cause disease with phenotype in the tissues where they are expressed. The hypothesis that gene expression could be used as a measure of the relatedness of the tissues has been also proved. Microarray experiments provide long lists of candidate genes that are often difficult to interpret and prioritize. Extending the power of microarray results is possible by inferring the relationships of genes under certain conditions. Gene transcription is constantly regulated by the coordinated binding of proteins, named transcription factors, to specific portions of the its promoter sequence. In this study, the analysis of promoters from groups of candidate genes has been utilized for predicting gene networks and highlighting modules of transcription factors playing a central role in the regulation of their transcription. Specific modules have been found regulating the expression of genes selectively expressed in the hippocampus, an area of the brain having a central role in the Major Depression Disorder. Similarly, gene networks derived from microarray results have elucidated aspects of the development of the mesencephalon, another region of the brain involved in Parkinson Disease

    Pigeons: a novel GUI software for analysing and parsing high density heterologous oligonucleotide microarray probe level data

    Get PDF
    Genomic DNA-based probe selection by using high density oligonucleotide arrays has recently been applied to heterologous species (Xspecies). With the advent of this new approach, researchers are able to study the genome and transcriptome of a non-model or an underutilised crop species through current state-of-the-art microarray platforms. However, a software package with a graphical user interface (GUI) to analyse and parse the oligonucleotide probe pair level data is still lacking when an experiment is designed on the basis of this cross species approach. A novel computer program called Pigeons has been developed for customised array data analysis to allow the user to import and analyse Affymetrix GeneChip® probe level data through XSpecies. One can determine empirical boundaries for removing poor probes based on genomic hybridisation of the test species to the Xspecies array, followed by making a species-specific Chip Description File (CDF) file for transcriptomics in the heterologous species, or Pigeons can be used to examine an experimental design to identify potential Single-Feature Polymorphisms (SFPs) at the DNA or RNA level. Pigeons is also focused around visualization and interactive analysis of the datasets. The software with its manual (the current release number version 1.2.1) is freely available at the website of the Nottingham Arabidopsis Stock Centre (NASC)

    Bioinformatics framework for genotyping microarray data analysis

    Get PDF
    Functional genomics is a flourishing science enabled by recent technological breakthroughs in high-throughput instrumentation and microarray data analysis. Genotyping microarrays establish the genotypes of DNA sequences containing single nucleotide polymorphisms (SNPs), and can help biologists probe the functions of different genes and/or construct complex gene interaction networks. The enormous amount of data from these experiments makes it infeasible to perform manual processing to obtain accurate and reliable results in daily routines. Advanced algorithms as well as an integrated software toolkit are needed to help perform reliable and fast data analysis. The author developed a MatlabTM based software package, called TIMDA (a Toolkit for Integrated Genotyping Microarray Data Analysis), for fully automatic, accurate and reliable genotyping microarray data analysis. The author also developed new algorithms for image processing and genotype-calling. The modular design of TIMDA allows satisfactory extensibility and maintainability. TIMDA is open source (URL: http://timda.SF.net and can be easily customized by users to meet their particular needs. The quality and reproducibility of results in image processing and genotype-calling and the ease of customization indicate that TIMDA is a useful package for genomics research
    • …
    corecore