3 research outputs found

    Hardware-accelerated analysis of non-protein-coding RNAs

    No full text
    A tremendous amount of genomic sequence data of relatively high quality has become publicly available due to the human genome sequencing projects that were completed a few years ago. Despite considerable efforts, we do not yet know everything that is to know about the various parts of the genome, what all the regions code for, and how their gene products contribute in the myriad of biological processes that are performed within the cells. New high-performance methods are needed to extract knowledge from this vast amount of information. Furthermore, the traditional view that DNA codes for RNA that codes for protein, which is known as the central dogma of molecular biology, seems to be only part of the story. The discovery of many non-proteincoding gene families with housekeeping and regulatory functions brings an entirely new perspective to molecular biology. Also, sequence analysis of the new gene families require new methods, as there are significant differences between protein-coding and non-protein-coding genes. This work describes a new search processor that can search for complex patterns in sequence data for which no efficient lookup-index is known. When several chips are mounted on search cards that are fitted into PCs in a small cluster configuration, the system’s performance is orders of magnitude higher than that of comparable solutions for selected applications. The applications treated in this work fall into two main categories, namely pattern screening and data mining, and both take advantage of the search capacity of the cluster to achieve adequate performance. Specifically, the thesis describes an interactive system for exploration of all types of genomic sequence data. Moreover, a genetic programming-based data mining system finds classifiers that consist of potentially complex patterns that are characteristic for groups of sequences. The screening and mining capacity has been used to develop an algorithm for identification of new non-protein-coding genes in bacteria; a system for rational design of effective and specific short interfering RNA for sequence-specific silencing of protein-coding genes; and an improved algorithmic step for identification of new regulatory targets for the microRNA family of non-protein-coding genes.Paper V, VI, and VII are reprinted with kind permision of Elsevier, sciencedirect.co

    Rational Design of Micro-RNA-like Bifunctional siRNAs Targeting HIV and the HIV Coreceptor CCR5

    No full text
    Small-interfering RNAs (siRNAs) and micro-RNAs (miRNAs) are distinguished by their modes of action. SiRNAs serve as guides for sequence-specific cleavage of complementary mRNAs and the targets can be in coding or noncoding regions of the target transcripts. MiRNAs inhibit translation via partially complementary base-pairing to 3′ untranslated regions (UTRs) and are generally ineffective when targeting coding regions of a transcript. In this study, we deliberately designed siRNAs that simultaneously direct cleavage and translational suppression of HIV RNAs, or cleavage of the mRNA encoding the HIV coreceptor CCR5 and suppression of translation of HIV. These bifunctional siRNAs trigger inhibition of HIV infection and replication in cell culture. The design principles have wide applications throughout the genome, as about 90% of genes harbor sites that make the design of bifunctional siRNAs possible

    Meta-analysis of breast cancer microarray studies in conjunction with conserved <it>cis</it>-elements suggest patterns for coordinate regulation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene expression measurements from breast cancer (BrCa) tumors are established clinical predictive tools to identify tumor subtypes, identify patients showing poor/good prognosis, and identify patients likely to have disease recurrence. However, diverse breast cancer datasets in conjunction with diagnostic clinical arrays show little overlap in the sets of genes identified. One approach to identify a set of consistently dysregulated candidate genes in these tumors is to employ meta-analysis of multiple independent microarray datasets. This allows one to compare expression data from a diverse collection of breast tumor array datasets generated on either cDNA or oligonucleotide arrays.</p> <p>Results</p> <p>We gathered expression data from 9 published microarray studies examining estrogen receptor positive (ER+) and estrogen receptor negative (ER-) BrCa tumor cases from the Oncomine database. We performed a meta-analysis and identified genes that were universally up or down regulated with respect to ER+ versus ER- tumor status. We surveyed both the proximal promoter and 3' untranslated regions (3'UTR) of our top-ranking genes in each expression group to test whether common sequence elements may contribute to the observed expression patterns. Utilizing a combination of known transcription factor binding sites (TFBS), evolutionarily conserved mammalian promoter and 3'UTR motifs, and microRNA (miRNA) seed sequences, we identified numerous motifs that were disproportionately represented between the two gene classes suggesting a common regulatory network for the observed gene expression patterns.</p> <p>Conclusion</p> <p>Some of the genes we identified distinguish key transcripts previously seen in array studies, while others are newly defined. Many of the genes identified as overexpressed in ER- tumors were previously identified as expression markers for neoplastic transformation in multiple human cancers. Moreover, our motif analysis identified a collection of specific <it>cis</it>-acting target sites which may collectively play a role in the differential gene expression patterns observed in ER+ versus ER- breast cancer tumors. Importantly, the gene sets and associated DNA motifs provide a starting point with which to explore the mechanistic basis for the observed expression patterns in breast tumors.</p