117 research outputs found

    Computational Prediction of MicroRNAs Encoded in Viral and Other Genomes

    Get PDF
    We present an overview of selected computational methods for microRNA prediction. It is especially aimed at viral miRNA detection. As the number of microRNAs increases and the range of genomes encoding miRNAs expands, it seems that these small regulators have a more important role than has been previously thought. Most microRNAs have been detected by cloning and Northern blotting, but experimental methods are biased towards abundant microRNAs as well as being time-consuming. Computational detection methods must therefore be refined to serve as a faster, better, and more affordable method for microRNA detection. We also present data from a small study investigating the problems of computational miRNA prediction. Our findings suggest that the prediction of microRNA precursor candidates is fairly easy, while excluding false positives as well as exact prediction of the mature microRNA is hard. Finally, we discuss possible improvements to computational microRNA detection

    Large-scale inference of the point mutational spectrum in human segmental duplications

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent segmental duplications are relatively large (≥ 1 kb) genomic regions of high sequence identity (≥ 90%). They cover approximately 4–5% of the human genome and play important roles in gene evolution and genomic disease. The DNA sequence differences between copies of a segmental duplication represent the result of various mutational events over time, since any two duplication copies originated from the same ancestral DNA sequence. Based on this fact, we have developed a computational scheme for inference of point mutational events in human segmental duplications, which we collectively term duplication-inferred mutations (DIMs). We have characterized these nucleotide substitutions by comparing them with high-quality SNPs from dbSNP, both in terms of sequence context and frequency of substitution types.</p> <p>Results</p> <p>Overall, DIMs show a lower ratio of transitions relative to transversions than SNPs, although this ratio approaches that of SNPs when considering DIMs within most recent duplications. Our findings indicate that DIMs and SNPs in general are caused by similar mutational mechanisms, with some deviances at the CpG dinucleotide. Furthermore, we discover a large number of reference SNPs that coincide with computationally inferred DIMs. The latter reflects how sequence variation in duplicated sequences can be misinterpreted as ordinary allelic variation.</p> <p>Conclusion</p> <p>In summary, we show how DNA sequence analysis of segmental duplications can provide a genome-wide mutational spectrum that mirrors recent genome evolution. The inferred set of nucleotide substitutions represents a valuable complement to SNPs for the analysis of genetic variation and point mutagenesis.</p

    PARALIGN: rapid and sensitive sequence similarity searches powered by parallel computing technology

    Get PDF
    PARALIGN is a rapid and sensitive similarity search tool for the identification of distantly related sequences in both nucleotide and amino acid sequence databases. Two algorithms are implemented, accelerated Smith–Waterman and ParAlign. The ParAlign algorithm is similar to Smith–Waterman in sensitivity, while as quick as BLAST for protein searches. A form of parallel computing technology known as multimedia technology that is available in modern processors, but rarely used by other bioinformatics software, has been exploited to achieve the high speed. The software is also designed to run efficiently on computer clusters using the message-passing interface standard. A public search service powered by a large computer cluster has been set-up and is freely available at , where the major public databases can be searched. The software can also be downloaded free of charge for academic use

    A Two-tiered compensatory response to loss of DNA repair modulates aging and stress response pathways

    Get PDF
    Activation of oxidative stress-responses and downregulation of insulin-like signaling (ILS) is seen in Nucleotide Excision Repair (NER) deficient segmental progeroid mice. Evidence suggests that this is a survival response to persistent transcription-blocking DNA damage, although the relevant lesions have not been identified. Here we show that loss of NTH-1, the only Base Excision Repair (BER) enzyme known to initiate repair of oxidative DNA damage inC. elegans, restores normal lifespan of the short-lived NER deficient xpa-1 mutant. Loss of NTH-1 leads to oxidative stress and global expression profile changes that involve upregulation of genes responding to endogenous stress and downregulation of ILS. A similar, but more extensive, transcriptomic shift is observed in the xpa-1 mutant whereas loss of both NTH-1 and XPA-1 elicits a different profile with downregulation of Aurora-B and Polo-like kinase 1 signaling networks as well as DNA repair and DNA damage response genes. The restoration of normal lifespan and absence oxidative stress responses in nth-1;xpa-1 indicate that BER contributes to generate transcription blocking lesions from oxidative DNA damage. Hence, our data strongly suggests that the DNA lesions relevant for aging are repair intermediates resulting from aberrant or attempted processing by BER of lesions normally repaired by NER

    Custom Design and Analysis of High-Density Oligonucleotide Bacterial Tiling Microarrays

    Get PDF
    Not until recently have custom made high-density oligonucleotide microarrays been available at an affordable price. The aim of this thesis was to design microarrays and analysis algorithms for DNA repair and DNA damage detection, and to apply the methods in real experiments. Thomassen et al. have used their custom designed whole genome-tiling microarrays for detection of transcriptional changes in Escherichia coli after exposure to DNA damageing reagents. The transcriptional changes in E. coli treated with UV light or the methylating reagent MNNG were shown to be larger and to include far more genes than previously reported. To optimize the data analysis for the custom made arrays, Thomassen and coworkers designed their own normalization and analysis algorithms, and showed these more suitable than established methods that are currently applied on custom tiling arrays. Among other findings several novel stress-induced transcripts were detected, of which one is predicted to be a UV-induced short transmembrane protein. Additionally, no upregulation of the previously described UV-inducible aidB is shown. In the MNNG study several genes are shown as downregulated in response to DNA damage although having upstream regulatory sequences similar to the established LexA box A and B. This indicates that the LexA regulon also might control gene repression and that the box A and B sequence can not alone answer for the LexA controlled gene regulation. Thomassen et al. have also custom designed a microarray for oncogenic fusion gene detection. Cancer specific fusion genes are often used to subgroup cancers and to define the optimal treatment, but currently the laboratory detection procedure is both laborious and tedious. In a blinded study on six cancer cell lines proof of principle was shown by detection of six out of six positive controls. The design and analysis methods for this microarray are now being refined to make a diagnostic fusion gene detection tool

    Predicting non-coding RNA genes in Escherichia coli with boosted genetic programming

    Get PDF
    Several methods exist for predicting non-coding RNA (ncRNA) genes in Escherichia coli (E.coli). In addition to about sixty known ncRNA genes excluding tRNAs and rRNAs, various methods have predicted more than thousand ncRNA genes, but only 95 of these candidates were confirmed by more than one study. Here, we introduce a new method that uses automatic discovery of sequence patterns to predict ncRNA genes. The method predicts 135 novel candidates. In addition, the method predicts 152 genes that overlap with predictions in the literature. We test sixteen predictions experimentally, and show that twelve of these are actual ncRNA transcripts. Six of the twelve verified candidates were novel predictions. The relatively high confirmation rate indicates that many of the untested novel predictions are also ncRNAs, and we therefore speculate that E.coli contains more ncRNA genes than previously estimated

    Swarm v3: towards tera-scale amplicon clustering

    Get PDF
    Motivation: Previously we presented swarm, an open-source amplicon clustering programme that produces fine-scale molecular operational taxonomic units (OTUs) that are free of arbitrary global clustering thresholds. Here, we present swarm v3 to address issues of contemporary datasets that are growing towards tera-byte sizes. Results: When compared with previous swarm versions, swarm v3 has modernized C++ source code, reduced memory footprint by up to 50%, optimized CPU-usage and multithreading (more than 7 times faster with default parameters), and it has been extensively tested for its robustness and logic

    A new protein superfamily includes two novel 3-methyladenine DNA glycosylases from Bacillus cereus, AlkC and AlkD

    Get PDF
    Soil bacteria are heavily exposed to environmental methylating agents such as methylchloride and may have special requirements for repair of alkylation damage on DNA. We have used functional complementation of an Escherichia coli tag alkA mutant to screen for 3-methyladenine DNA glycosylase genes in genomic libraries of the soil bacterium Bacillus cereus. Three genes were recovered: alkC, alkD and alkE. The amino acid sequence of AlkE is homologous to the E. coli AlkA sequence. AlkC and AlkD represent novel proteins without sequence similarity to any protein of known function. However, iterative and indirect sequence similarity searches revealed that AlkC and AlkD are distant homologues of each other within a new protein superfamily that is ubiquitous in the prokaryotic kingdom. Homologues of AlkC and AlkD were also identified in the amoebas Entamoeba histolytica and Dictyostelium discoideum, but no other eukaryotic counterparts of the superfamily were found. The alkC and alkD genes were expressed in E. coli and the proteins were purified to homogeneity. Both proteins were found to be specific for removal of N-alkylated bases, and showed no activity on oxidized or deaminated base lesions in DNA. B. cereus AlkC and AlkD thus define novel families of alkylbase DNA glycosylases within a new protein superfamily

    Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Smith-Waterman algorithm for local sequence alignment is more sensitive than heuristic methods for database searching, but also more time-consuming. The fastest approach to parallelisation with SIMD technology has previously been described by Farrar in 2007. The aim of this study was to explore whether further speed could be gained by other approaches to parallelisation.</p> <p>Results</p> <p>A faster approach and implementation is described and benchmarked. In the new tool SWIPE, residues from sixteen different database sequences are compared in parallel to one query residue. Using a 375 residue query sequence a speed of 106 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon X5650 six-core processor system, which is over six times more rapid than software based on Farrar's 'striped' approach. SWIPE was about 2.5 times faster when the programs used only a single thread. For shorter queries, the increase in speed was larger. SWIPE was about twice as fast as BLAST when using the BLOSUM50 score matrix, while BLAST was about twice as fast as SWIPE for the BLOSUM62 matrix. The software is designed for 64 bit Linux on processors with SSSE3. Source code is available from <url>http://dna.uio.no/swipe/</url> under the GNU Affero General Public License.</p> <p>Conclusions</p> <p>Efficient parallelisation using SIMD on standard hardware makes it possible to run Smith-Waterman database searches more than six times faster than before. The approach described here could significantly widen the potential application of Smith-Waterman searches. Other applications that require optimal local alignment scores could also benefit from improved performance.</p
    corecore