1,593 research outputs found

    On the Computational Power of DNA Annealing and Ligation

    Get PDF
    In [20] it was shown that the DNA primitives of Separate, Merge, and Amplify were not sufficiently powerful to invert functions defined by circuits in linear time. Dan Boneh et al [4] show that the addition of a ligation primitive, Append, provides the missing power. The question becomes, "How powerful is ligation? Are Separate, Merge, and Amplify necessary at all?" This paper proposes to informally explore the power of annealing and ligation for DNA computation. We conclude, in fact, that annealing and ligation alone are theoretically capable of universal computation

    Highly Scalable Algorithms for Robust String Barcoding

    Full text link
    String barcoding is a recently introduced technique for genomic-based identification of microorganisms. In this paper we describe the engineering of highly scalable algorithms for robust string barcoding. Our methods enable distinguisher selection based on whole genomic sequences of hundreds of microorganisms of up to bacterial size on a well-equipped workstation, and can be easily parallelized to further extend the applicability range to thousands of bacterial size genomes. Experimental results on both randomly generated and NCBI genomic data show that whole-genome based selection results in a number of distinguishers nearly matching the information theoretic lower bounds for the problem

    Mass spectrometric methods and bioinformatics tools for accurate identification of MicroRNA biomarkers

    Get PDF
    MicroRNA (miRNA) are a class of endogenous non-protein-coding RNA of ~19-25 nucleotides long that post-transcriptionally regulate protein expression by targeting messenger RNAs for cleavage or translational repression. MiRNAs have been implicated in the initiation and progression of 160+ human diseases. Unique miRNA differential expression signatures can be used as a basis of discriminating against the presence or absence of human diseases. MiRNAs are therefore a promising and emerging class of disease biomarkers and therapeutic targets; however, the accurate detection of a specific miRNA has continued to be a challenging issue. Recently, mass spectrometry (MS) has seen remarkable technological advancements making it an attractive alternative to the conventional molecular biology miRNA characterization techniques. This study consistently documents the development of various analytical techniques aimed at characterization of miRNAs. The current literature in the field of miRNA is covered in chapter one. In chapter two, two new MS based concepts for detection of miRNA are introduced; a) the miRNA is captured using a specific complementary DNA probe, eluted and digested with specific endonuclease. The digested miRNA fragments are measured by MS resulting in a peak pattern that is dependent on the miRNA sequence i.e. an intrinsic mass signature and b) a unique mass signature is created by incorporating extra nucleotide(s) to the 3' end of miRNA and the extended miRNA is measured by using MS. The molecular mass of the extended miRNA, which is defined as extended mass signature, is expected to be different from the other miRNA within the same sample. These two approaches can improve the accuracy on qualitative MS identification of specific miRNA. To better understand miRNA function however, it is important to elucidate the nucleotide sequence of the miRNA. Chapter three of this study introduces a novel MS based assay for the sequencing of miRNA through chemical hydrolysis. In this study, by taking advantage of the mixing between a miRNA sample and an acidic MALDI matrix prior to the MALDI-TOF MS measurements, a unique yet simple and relatively cost-effective approach to generate miRNA sequencing ladders was developed. By using this method, 100% sequence coverage and accuracy in the sequencing of selected miRNAs were achieved. When many samples are involved, the data generated from miRNA measurements can be complex and manual data processing is tedious and challenging, as such, the spectral interpretation of mass spectrometric data can quickly turn out to be the bottleneck in miRNA analysis. The success of MS as a tool for analysis of miRNA will therefore strongly depend on the development of relevant computational software with the ability to properly interpret and analyze the large data. To meet this need, chapter four of this work explains the development of MicroRNA MultiTool, a computational software for the rapid interpretation of MS data containing human miRNA. Users can directly enter data obtained from mass spectrometric measurement in order to obtain the identify of miRNA, highly reducing the time needed to process data. The development of such analytical and bioinformatics tools will provide scientists with the opportunity to better understand miRNA functions and will be influential in propelling the breakthroughs of miRNA in clinical diagnostics and therapeutic fields

    Selection of optimal oligonucleotide probes for microarrays using multiple criteria, global alignment and parameter estimation

    Get PDF
    The oligonucleotide specificity for microarray hybridization can be predicted by its sequence identity to non-targets, continuous stretch to non-targets, and/or binding free energy to non-targets. Most currently available programs only use one or two of these criteria, which may choose ‘false’ specific oligonucleotides or miss ‘true’ optimal probes in a considerable proportion. We have developed a software tool, called CommOligo using new algorithms and all three criteria for selection of optimal oligonucleotide probes. A series of filters, including sequence identity, free energy, continuous stretch, GC content, self-annealing, distance to the 3′-untranslated region (3′-UTR) and melting temperature (T(m)), are used to check each possible oligonucleotide. A sequence identity is calculated based on gapped global alignments. A traversal algorithm is used to generate alignments for free energy calculation. The optimal T(m) interval is determined based on probe candidates that have passed all other filters. Final probes are picked using a combination of user-configurable piece-wise linear functions and an iterative process. The thresholds for identity, stretch and free energy filters are automatically determined from experimental data by an accessory software tool, CommOligo_PE (CommOligo Parameter Estimator). The program was used to design probes for both whole-genome and highly homologous sequence data. CommOligo and CommOligo_PE are freely available to academic users upon request

    Reconfigurable hardware-software codesign methodology for protein identification

    Get PDF

    MULTIPLE SPACED SEEDS FOR OLIGONUCLEOTIDE DESIGN

    Get PDF
    An oligonucleotide is a small piece of DNA or RNA molecule, which is designed to hybridize with a unique position in a target sequence. DNA oligonucleotides have many applications such as gene identification, PCR (polymerase chain reaction) amplification, or DNA microarrays. One of the crucial issues in designing good oligonucleotide is to minimize the chance of cross-hybridization. Various heuristic algorithms are used to filter out the unsuitable regions before checking for cross-hybridization. The most successful ones are based on seeds because of their efficiency and ability to tolerate mismatches. The quality of the seeds is essential in this process. We present a sound framework for evaluating seed quality for oligonucleotide design and show that multiple spaced seeds are expected to provide the best discrimination between oligos and non-oligos

    Accelerating String Set Matching in FPGA Hardware for Bioinformatics Research

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>This paper describes techniques for accelerating the performance of the string set matching problem with particular emphasis on applications in computational proteomics. The process of matching peptide sequences against a genome translated in six reading frames is part of a proteogenomic mapping pipeline that is used as a case-study. The Aho-Corasick algorithm is adapted for execution in field programmable gate array (FPGA) devices in a manner that optimizes space and performance. In this approach, the traditional Aho-Corasick finite state machine (FSM) is split into smaller FSMs, operating in parallel, each of which matches up to 20 peptides in the input translated genome. Each of the smaller FSMs is further divided into five simpler FSMs such that each simple FSM operates on a single bit position in the input (five bits are sufficient for representing all amino acids and special symbols in protein sequences).</p> <p>Results</p> <p>This bit-split organization of the Aho-Corasick implementation enables efficient utilization of the limited random access memory (RAM) resources available in typical FPGAs. The use of on-chip RAM as opposed to FPGA logic resources for FSM implementation also enables rapid reconfiguration of the FPGA without the place and routing delays associated with complex digital designs.</p> <p>Conclusion</p> <p>Experimental results show storage efficiencies of over 80% for several data sets. Furthermore, the FPGA implementation executing at 100 MHz is nearly 20 times faster than an implementation of the traditional Aho-Corasick algorithm executing on a 2.67 GHz workstation.</p

    Mapping and characterization of G-quadruplexes in Mycobacterium tuberculosis gene promoter regions

    Get PDF
    Mycobacterium tuberculosis is the causative agent of tuberculosis (TB), one of the top 10 causes of death worldwide in 2015. The recent emergence of strains resistant to all current drugs urges the development of compounds with new mechanisms of action. G-quadruplexes are nucleic acids secondary structures that may form in G-rich regions to epigenetically regulate cellular functions. Here we implemented a computational tool to scan the presence of putative G-quadruplex forming sequences in the genome of Mycobacterium tuberculosis and analyse their association to transcription start sites. We found that the most stable G-quadruplexes were in the promoter region of genes belonging to definite functional categories. Actual G-quadruplex folding of four selected sequences was assessed by biophysical and biomolecular techniques: all molecules formed stable G-quadruplexes, which were further stabilized by two G-quadruplex ligands. These compounds inhibited Mycobacterium tuberculosis growth with minimal inhibitory concentrations in the low micromolar range. These data support formation of Mycobacterium tuberculosis G-quadruplexes in vivo and their potential regulation of gene transcription, and prompt the use of G4 ligands to develop original antitubercular agents
    • …
    corecore