771 research outputs found

    Methods to improve gene signal : Application to cDNA microarrays

    Get PDF
    Microarrays are high throughput biological assays that allow the screening of thousands of genes for their expression. The main idea behind microarrays is to compute for each gene a unique signal that is directly proportional to the quantity of mRNA that was hybridized on the chip. A large number of steps and errors associated with each step make the generated expression signal noisy. As a result, microarray data need to be carefully pre-processed before their analysis can be assumed to lead to reliable and biologically relevant conclusions. This thesis focuses on developing methods for improving gene signal and further utilizing this improved signal for higher level analysis. To achieve this, first, approaches for designing microarray experiments using various optimality criteria, considering both biological and technical replicates, are described. A carefully designed experiment leads to signal with low noise, as the effect of unwanted variations is minimized and the precision of the estimates of the parameters of interest are maximized. Second, a system for improving the gene signal by using three scans at varying scanner sensitivities is developed. A novel Bayesian latent intensity model is then applied on these three sets of expression values, corresponding to the three scans, to estimate the suitably calibrated true signal of genes. Third, a novel image segmentation approach that segregates the fluorescent signal from the undesired noise is developed using an additional dye, SYBR green RNA II. This technique helped in identifying signal only with respect to the hybridized DNA, and signal corresponding to dust, scratch, spilling of dye, and other noises, are avoided. Fourth, an integrated statistical model is developed, where signal correction, systematic array effects, dye effects, and differential expression, are modelled jointly as opposed to a sequential application of several methods of analysis. The methods described in here have been tested only for cDNA microarrays, but can also, with some modifications, be applied to other high-throughput technologies. Keywords: High-throughput technology, microarray, cDNA, multiple scans, Bayesian hierarchical models, image analysis, experimental design, MCMC, WinBUGS.Tarkastellaan menetelmiÀ, joilla voidaan parantaa geneetisiÀ signaaleja ja hyödyntÀÀ vahvistetun signaalin kÀyttöÀ myöhemmissÀ analyyseissÀ

    A multi-view approach to cDNA micro-array analysis

    Get PDF
    The official published version can be obtained from the link below.Microarray has emerged as a powerful technology that enables biologists to study thousands of genes simultaneously, therefore, to obtain a better understanding of the gene interaction and regulation mechanisms. This paper is concerned with improving the processes involved in the analysis of microarray image data. The main focus is to clarify an image's feature space in an unsupervised manner. In this paper, the Image Transformation Engine (ITE), combined with different filters, is investigated. The proposed methods are applied to a set of real-world cDNA images. The MatCNN toolbox is used during the segmentation process. Quantitative comparisons between different filters are carried out. It is shown that the CLD filter is the best one to be applied with the ITE.This work was supported in part by the Engineering and Physical Sciences Research Council (EPSRC) of the UK under Grant GR/S27658/01, the National Science Foundation of China under Innovative Grant 70621001, Chinese Academy of Sciences under Innovative Group Overseas Partnership Grant, the BHP Billiton Cooperation of Australia Grant, the International Science and Technology Cooperation Project of China under Grant 2009DFA32050 and the Alexander von Humboldt Foundation of Germany

    BayGO: Bayesian analysis of ontology term enrichment in microarray data

    Get PDF
    BACKGROUND: The search for enriched (aka over-represented or enhanced) ontology terms in a list of genes obtained from microarray experiments is becoming a standard procedure for a system-level analysis. This procedure tries to summarize the information focussing on classification designs such as Gene Ontology, KEGG pathways, and so on, instead of focussing on individual genes. Although it is well known in statistics that association and significance are distinct concepts, only the former approach has been used to deal with the ontology term enrichment problem. RESULTS: BayGO implements a Bayesian approach to search for enriched terms from microarray data. The R source-code is freely available at in three versions: Linux, which can be easily incorporated into pre-existent pipelines; Windows, to be controlled interactively; and as a web-tool. The software was validated using a bacterial heat shock response dataset, since this stress triggers known system-level responses. CONCLUSION: The Bayesian model accounts for the fact that, eventually, not all the genes from a given category are observable in microarray data due to low intensity signal, quality filters, genes that were not spotted and so on. Moreover, BayGO allows one to measure the statistical association between generic ontology terms and differential expression, instead of working only with the common significance analysis

    Use of genomic DNA control features and predicted operon structure in microarray data analysis: ArrayLeaRNA – a Bayesian approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microarrays are widely used for the study of gene expression; however deciding on whether observed differences in expression are significant remains a challenge.</p> <p>Results</p> <p>A computing tool (ArrayLeaRNA) has been developed for gene expression analysis. It implements a Bayesian approach which is based on the Gumbel distribution and uses printed genomic DNA control features for normalization and for estimation of the parameters of the Bayesian model and prior knowledge from predicted operon structure. The method is compared with two other approaches: the classical LOWESS normalization followed by a two fold cut-off criterion and the OpWise method (Price, et al. 2006. BMC Bioinformatics. 7, 19), a published Bayesian approach also using predicted operon structure. The three methods were compared on experimental datasets with prior knowledge of gene expression. With ArrayLeaRNA, data normalization is carried out according to the genomic features which reflect the results of equally transcribed genes; also the statistical significance of the difference in expression is based on the variability of the equally transcribed genes. The operon information helps the classification of genes with low confidence measurements.</p> <p>ArrayLeaRNA is implemented in Visual Basic and freely available as an Excel add-in at <url>http://www.ifr.ac.uk/safety/ArrayLeaRNA/</url></p> <p>Conclusion</p> <p>We have introduced a novel Bayesian model and demonstrated that it is a robust method for analysing microarray expression profiles. ArrayLeaRNA showed a considerable improvement in data normalization, in the estimation of the experimental variability intrinsic to each hybridization and in the establishment of a clear boundary between non-changing and differentially expressed genes. The method is applicable to data derived from hybridizations of labelled cDNA samples as well as from hybridizations of labelled cDNA with genomic DNA and can be used for the analysis of datasets where differentially regulated genes predominate.</p

    Using Robust Rank Aggregation for Prioritising Autoimmune Targets on Protein Microarrays

    Get PDF
    Autoimmuunhaigused on tĂ€napĂ€eva maailmas vĂ€ga sagedased. Üha enam ja enam haigusi on seotud autoimmuunsete protsessidega. Autoimmuunreaktsioon on protsess, mille kĂ€igus immuunsĂŒsteem toodab antikehasid (autoantikehad) organismi enda rakkude vastu. Autoimmuunhaiguste pĂ”hjused ja mehhanismid on aga veel selgeks tegemata. Üheks vĂ”imaluseks, kuidas autoimmuunhaigusi Ă”ppida on vĂ€lja selgitada, miks kindlad rakud ja iseĂ€ranis just valgud on autoantikehade mĂ€rklauaks. Selle eesmĂ€rgi saavutamiseks on vĂ€lja töötatud mitmesuguseid tehnoloogiaid, kuhu kuuluvad ka valgukiibid. See tehnoloogia vĂ”imaldab hinnata autoantikehade kogust patsiendi seerumis 9000 unikaalse inimese valgu vastu. Seega, rakendades andmeanalĂŒĂŒsi meetodeid on bioinformaatikud vĂ”imelised tuvastama autoantikehade mĂ€rklaudvalke. Teades neid valke, saavad bioloogid lĂ€bi viia edasisi katseid ning formuleerida uusi hĂŒpoteese autoimmuunhaiguste mehhanismide ja esinemise kohta. Traditsioonilised andmeanalĂŒĂŒsi meetodid keskenduvad ainult selliste valkude leidmisele, mis erinevad kĂ”ige kindlamalt tervete ja patsientide grupi vahel. Need meetodid aga jĂ€tavad kĂ”rvale fakti, et mĂ€rklaudvalkude repertuaar vĂ”ib patsientide vahel oluliselt erineda. Seega vĂ”ib isegi ĂŒksikjuhtum sisaldada olulist informatsiooni haiguse mehhanismide mĂ”istmisel. KĂ€esolevas lĂ”putöös pakume vĂ€lja, et Robust Rank Aggregation (RRA) algoritmi saab kasutada adaptiivse meetodina leidmaks reaktiivsete valkude (mĂ€rklaudvalkude) laia repertuaari. Me vĂ”rdlesime klassikaliste analĂŒĂŒsimeetodite otstarbekust ja efektiivsust RRA-ga nii sĂŒnteetilistel kui ka pĂ€risandmetel. Katsed sĂŒnteetilise andmehulgaga ehk andmehulgaga, mille puhul on reaktiivsed valgud teada nĂ€itavad, et RRA ĂŒletab teisi meetodeid olles samal ajal vĂ€hem mĂ”jutatud “mĂŒrast”. Rakendades RRA-d pĂ€risandmetel ning viies lĂ€bi rikastusanalĂŒĂŒsi iga meetodi kohta saadud reaktiivsete valkude listidega, saime me sarnase arvu valke, mis olid bioloogilise ja immuunvastusega seotud klassides ĂŒleesindatud.Autoimmune diseases are very common in the modern world. More and more diseases associated with an autoimmune process. Autoimmune reaction is a process in which the immune system produces antibodies (autoantibodies) that attack organism’s own cells. Causes and mechanisms of autoimmune diseases are yet to be understood. One of the ways to study autoimmunity is to explore reasons why certain cells and particularly proteins were attacked by autoantibodies. To achieve this, many technologies have been developed and one of which is Protein microarray. This technology allows estimating the amount of autoantibodies in patient serum against 9000 unique human proteins. Consequently, applying methods of data analysis on this data, bioinformaticians might be able to identify proteins that attract prevalent amount of autoantibodies. Knowing these proteins, biologists could conduct experiments and formulate new hypotheses about mechanisms of work and appearance of autoimmune diseases. Common data analysis methods focused on how to select only the most reliably differing proteins between healthy and diseased groups. Moreover, ignoring the fact that in the case of an autoimmune disease - the repertoire of the affected proteins can differ greatly between patients. So even single cases of high protein reactivity may carry important information for understanding the mechanisms of disease. In this thesis, we propose to apply Robust Rank Aggregation algorithm as an adaptive method to identify a wide repertoire of reactive proteins. We compared expediency and effectiveness of the classical methods of analysis, method recently applied by biologists and RRA on synthetic and real data. Experiments on synthetic data sets with known reactive proteins show that RRA outperforms these methods while also being more robust to incorporated noise. Applying RRA on real data and conducting an enrichment analysis on lists of reactive proteins for each method, we got comparable numbers of proteins overrepresented in the classes associated with biological and immune responses

    SW-ARRAY: a dynamic programming solution for the identification of copy-number changes in genomic DNA using array comparative genome hybridization data

    Get PDF
    Comparative genome hybridization (CGH) to DNA microarrays (array CGH) is a technique capable of detecting deletions and duplications in genomes at high resolution. However, array CGH studies of the human genome noting false negative and false positive results using large insert clones as probes have raised important concerns regarding the suitability of this approach for clinical diagnostic applications. Here, we adapt the Smith–Waterman dynamic-programming algorithm to provide a sensitive and robust analytic approach (SW-ARRAY) for detecting copy-number changes in array CGH data. In a blind series of hybridizations to arrays consisting of the entire tiling path for the terminal 2 Mb of human chromosome 16p, the method identified all monosomies between 267 and 1567 kb with a high degree of statistical significance and accurately located the boundaries of deletions in the range 267–1052 kb. The approach is unique in offering both a nonparametric segmentation procedure and a nonparametric test of significance. It is scalable and well-suited to high resolution whole genome array CGH studies that use array probes derived from large insert clones as well as PCR products and oligonucleotides

    New methods to analyse microarray data that partially lack a reference signal

    Get PDF
    BACKGROUND: Microarray-based Comparative Genomic Hybridisation (CGH) has been used to assess genetic variability between bacterial strains. Crucial for interpretation of microarray data is the availability of a reference to compare signal intensities to reliably determine presence or divergence each DNA fragment. However, the production of a good reference becomes unfeasible when microarrays are based on pan-genomes.When only a single strain is used as a reference for a multistrain array, the accessory gene pool will be partially represented by reference DNA, although these genes represent the genomic repertoire that can explain differences in virulence, pathogenicity or transmissibility between strains. The lack of a reference makes interpretation of the data for these genes difficult and, if the test signal is low, they are often deleted from the analysis. We aimed to develop novel methods to determine the presence or divergence of genes in a Staphylococcus aureus multistrain PCR product microarray-based CGH approach for which reference DNA was not available for some probes. RESULTS: In this study we have developed 6 new methods to predict divergence and presence of all genes spotted on a multistrain Staphylococcus aureus DNA microarray, published previously, including those gene spots that lack reference signals. When considering specificity and PPV (i.e. the false-positive rate) as the most important criteria for evaluating these methods, the method that defined gene presence based on a signal at least twice as high as the background and higher than the reference signal (method 4) had the best test characteristics. For this method specificity was 100% and 82% for MRSA252 (compared to the GACK method) and all spots (compared to sequence data), respectively, and PPV were 100% and 76% for MRSA252 (compared to the GACK method) and all spots (compared to sequence data), respectively. CONCLUSION: A definition of gene presence based on signal at least twice as high as the background and higher than the reference signal (method 4) had the best test characteristics, allowing the analysis of 6-17% more of the genes not present in the reference strain. This method is recommended to analyse microarray data that partially lack a reference signal

    p53FamTaG: a database resource of human p53, p63 and p73 direct target genes combining in silico prediction and microarray data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The p53 gene family consists of the three genes p53, p63 and p73, which have polyhedral non-overlapping functions in pivotal cellular processes such as DNA synthesis and repair, growth arrest, apoptosis, genome stability, angiogenesis, development and differentiation. These genes encode sequence-specific nuclear transcription factors that recognise the same responsive element (RE) in their target genes. Their inactivation or aberrant expression may determine tumour progression or developmental disease. The discovery of several protein isoforms with antagonistic roles, which are produced by the expression of different promoters and alternative splicing, widened the complexity of the scenario of the transcriptional network of the p53 family members. Therefore, the identification of the genes transactivated by p53 family members is crucial to understand the specific role for each gene in cell cycle regulation. We have combined a genome-wide computational search of p53 family REs and microarray analysis to identify new direct target genes. The huge amount of biological data produced has generated a critical need for bioinformatic tools able to manage and integrate such data and facilitate their retrieval and analysis.</p> <p>Description</p> <p>We have developed the p53FamTaG database (p53 FAMily TArget Genes), a modular relational database, which contains p53 family direct target genes selected in the human genome searching for the presence of the REs and the expression profile of these target genes obtained by microarray experiments. p53FamTaG database also contains annotations of publicly available databases and links to other experimental data.</p> <p>The genome-wide computational search of the REs was performed using PatSearch, a pattern-matching program implemented in the DNAfan tool. These data were integrated with the microarray results we produced from the overexpression of different isoforms of p53, p63 and p73 stably transfected in isogenic cell lines, allowing the comparative study of the transcriptional activity of all the proteins in the same cellular background.</p> <p>p53FamTaG database is available free at <url>http://www2.ba.itb.cnr.it/p53FamTaG/</url></p> <p>Conclusion</p> <p>p53FamTaG represents a unique integrated resource of human direct p53 family target genes that is extensively annotated and provides the users with an efficient query/retrieval system which displays the results of our microarray experiments and allows the export of RE sequences. The database was developed for supporting and integrating high-throughput <it>in silico</it> and experimental analyses and represents an important reference source of knowledge for research groups involved in the field of oncogenesis, apoptosis and cell cycle regulation.</p

    CCP11 Group Meeting—Towards the Functional Analysis of Microarrays

    Get PDF
    The CCP11 project [2] aims to foster bioinformatics in the UK through conferences, workshops and the provision of Web resources. In March 2002, CCP11 held a meeting in Manchester, UK, on the functional analysis of microarrays. This was part of Manchester BioinformaticsWeek—three consecutive short bioinformatics meetings held in the attractive setting of the Chancellor's Conference Centre at the University of Manchester. The other meetings in the series were a workshop on ontologies and the 12th Annual MASAMB (Mathematical and Statistical Aspects of Molecular Biology) Conference. Many delegates were able to attend more than one meeting, which led to a useful cross-fertilization of ideas across the bioinformatics community. The CCP11 meeting shared with MASAMB a strong emphasis on the statistical analysis and interpretation of data—most often image intensity data

    Understanding pathways

    No full text
    The challenge with todays microarray experiments is to infer biological conclusions from them. There are two crucial difficulties to be surmounted in this challenge:(1) A lack of suitable biological repository that can be easily integrated into computational algorithms. (2) Contemporary algorithms used to analyze microarray data are unable to draw consistent biological results from diverse datasets of the same disease. To deal with the first difficulty, we believe a core database that unifies available biological repositories is important. Towards this end, we create a unified biological database from three popular biological repositories (KEGG, Ingenuity and Wikipathways). This database provides computer scientists the flexibility of easily integrating biological information using simple API calls or SQL queries. To deal with the second difficulty of deriving consistent biological results from the experiments, we first conceptualize the notion of “subnetworks”, which refers to a connected portion in a biological pathway. Then we propose a method that identifies subnetworks that are consistently expressed by patients of he same disease phenotype. We test our technique on independent datasets of several diseases, including ALL, DMD and lung cancer. For each of these diseases, we obtain two independent microarray datasets produced by distinct labs on distinct platforms. In each case, our technique consistently produces overlapping lists of significant nontrivial subnetworks from two independent sets of microarray data. The gene-level agreement of these significant subnetworks is between 66.67% to 91.87%. In contrast, when the same pairs of microarray datasets were analysed using GSEA and t-test, this percentage fell between 37% to 55.75% (GSEA) and between 2.55% to 19.23% (t-test). Furthermore, the genes selected using GSEA and t-test do not form subnetworks of substantial size. Thus it is more probable that the subnetworks selected by our technique can provide the researcher with more descriptive information on the portions of the pathway which actually associates with the disease. Keywords: pathway analysis, microarra
    • 

    corecore