387 research outputs found

    Evaluating the Relationship Between Running Times and DNA Sequence Sizes using a Generic-Based Filtering Program.

    Get PDF
    Generic programming depends on the decomposition of programs into simpler components which may be developed separately and combined arbitrarily, subject only to well- defined interfaces. Bioinformatics deals with the application of computational techniques to data present in the Biological sciences. A genetic sequence is a succession of letters which represents the basic structure of a hypothetical DNA molecule, with the capacity to carry information. This research article studied the relationship between the running times of a generic-based filtering program and different samples of genetic sequences in an increasing order of magnitude. A graphical result was obtained to adequately depict this relationship. It was also discovered that the complexity of the generic tree program was O (log2 N). This research article provided one of the systematic approaches of generic programming to Bioinformatics, which could be instrumental in elucidating major discoveries in Bioinformatics, as regards efficient data management and analysis

    SeqAn An efficient, generic C++ library for sequence analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> would not have been possible without advanced assembly algorithms. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there is a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use.</p> <p>Results</p> <p>To remedy this trend we propose the use of SeqAn, a library of efficient data types and algorithms for sequence analysis in computational biology. SeqAn comprises implementations of existing, practical state-of-the-art algorithmic components to provide a sound basis for algorithm testing and development. In this paper we describe the design and content of SeqAn and demonstrate its use by giving two examples. In the first example we show an application of SeqAn as an experimental platform by comparing different exact string matching algorithms. The second example is a simple version of the well-known MUMmer tool rewritten in SeqAn. Results indicate that our implementation is very efficient and versatile to use.</p> <p>Conclusion</p> <p>We anticipate that SeqAn greatly simplifies the rapid development of new bioinformatics tools by providing a collection of readily usable, well-designed algorithmic components which are fundamental for the field of sequence analysis. This leverages not only the implementation of new algorithms, but also enables a sound analysis and comparison of existing algorithms.</p

    Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics

    Get PDF
    BACKGROUND: A large number of bioinformatics applications in the fields of bio-sequence analysis, molecular evolution and population genetics typically share input/ouput methods, data storage requirements and data analysis algorithms. Such common features may be conveniently bundled into re-usable libraries, which enable the rapid development of new methods and robust applications. RESULTS: We present Bio++, a set of Object Oriented libraries written in C++. Available components include classes for data storage and handling (nucleotide/amino-acid/codon sequences, trees, distance matrices, population genetics datasets), various input/output formats, basic sequence manipulation (concatenation, transcription, translation, etc.), phylogenetic analysis (maximum parsimony, markov models, distance methods, likelihood computation and maximization), population genetics/genomics (diversity statistics, neutrality tests, various multi-locus analyses) and various algorithms for numerical calculus. CONCLUSION: Implementation of methods aims at being both efficient and user-friendly. A special concern was given to the library design to enable easy extension and new methods development. We defined a general hierarchy of classes that allow the developer to implement its own algorithms while remaining compatible with the rest of the libraries. Bio++ source code is distributed free of charge under the CeCILL general public licence from its website

    Knowledge Rich Natural Language Queries over Structured Biological Databases

    Full text link
    Increasingly, keyword, natural language and NoSQL queries are being used for information retrieval from traditional as well as non-traditional databases such as web, document, image, GIS, legal, and health databases. While their popularity are undeniable for obvious reasons, their engineering is far from simple. In most part, semantics and intent preserving mapping of a well understood natural language query expressed over a structured database schema to a structured query language is still a difficult task, and research to tame the complexity is intense. In this paper, we propose a multi-level knowledge-based middleware to facilitate such mappings that separate the conceptual level from the physical level. We augment these multi-level abstractions with a concept reasoner and a query strategy engine to dynamically link arbitrary natural language querying to well defined structured queries. We demonstrate the feasibility of our approach by presenting a Datalog based prototype system, called BioSmart, that can compute responses to arbitrary natural language queries over arbitrary databases once a syntactic classification of the natural language query is made

    The C++0x "Concepts" Effort

    Full text link
    C++0x is the working title for the revision of the ISO standard of the C++ programming language that was originally planned for release in 2009 but that was delayed to 2011. The largest language extension in C++0x was "concepts", that is, a collection of features for constraining template parameters. In September of 2008, the C++ standards committee voted the concepts extension into C++0x, but then in July of 2009, the committee voted the concepts extension back out of C++0x. This article is my account of the technical challenges and debates within the "concepts" effort in the years 2003 to 2009. To provide some background, the article also describes the design space for constrained parametric polymorphism, or what is colloquially know as constrained generics. While this article is meant to be generally accessible, the writing is aimed toward readers with background in functional programming and programming language theory. This article grew out of a lecture at the Spring School on Generic and Indexed Programming at the University of Oxford, March 2010

    Applications of Natural Language Processing in Biodiversity Science

    Get PDF
    Centuries of biological knowledge are contained in the massive body of scientific literature, written for human-readability but too big for any one person to consume. Large-scale mining of information from the literature is necessary if biology is to transform into a data-driven science. A computer can handle the volume but cannot make sense of the language. This paper reviews and discusses the use of natural language processing (NLP) and machine-learning algorithms to extract information from systematic literature. NLP algorithms have been used for decades, but require special development for application in the biological realm due to the special nature of the language. Many tools exist for biological information extraction (cellular processes, taxonomic names, and morphological characters), but none have been applied life wide and most still require testing and development. Progress has been made in developing algorithms for automated annotation of taxonomic text, identification of taxonomic names in text, and extraction of morphological character information from taxonomic descriptions. This manuscript will briefly discuss the key steps in applying information extraction tools to enhance biodiversity science

    Mining Host-Pathogen Interactions

    Get PDF
    corecore