3,819 research outputs found

    Multiple input parsing and lexical analysis

    Get PDF

    RNACompress: Grammar-based compression and informational complexity measurement of RNA secondary structure

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With the rapid emergence of RNA databases and newly identified non-coding RNAs, an efficient compression algorithm for RNA sequence and structural information is needed for the storage and analysis of such data. Although several algorithms for compressing DNA sequences have been proposed, none of them are suitable for the compression of RNA sequences with their secondary structures simultaneously. This kind of compression not only facilitates the maintenance of RNA data, but also supplies a novel way to measure the informational complexity of RNA structural data, raising the possibility of studying the relationship between the functional activities of RNA structures and their complexities, as well as various structural properties of RNA based on compression.</p> <p>Results</p> <p><it>RNACompress </it>employs an efficient grammar-based model to compress RNA sequences and their secondary structures. The main goals of this algorithm are two fold: (1) present a robust and effective way for RNA structural data compression; (2) design a suitable model to represent RNA secondary structure as well as derive the informational complexity of the structural data based on compression. Our extensive tests have shown that <it>RNACompress </it>achieves a universally better compression ratio compared with other sequence-specific or common text-specific compression algorithms, such as <it>Gencompress, winrar </it>and <it>gzip</it>. Moreover, a test of the activities of distinct GTP-binding RNAs (aptamers) compared with their structural complexity shows that our defined informational complexity can be used to describe how complexity varies with activity. These results lead to an objective means of comparing the functional properties of heteropolymers from the information perspective.</p> <p>Conclusion</p> <p>A universal algorithm for the compression of RNA secondary structure as well as the evaluation of its informational complexity is discussed in this paper. We have developed <it>RNACompress</it>, as a useful tool for academic users. Extensive tests have shown that <it>RNACompress </it>is a universally efficient algorithm for the compression of RNA sequences with their secondary structures. <it>RNACompress </it>also serves as a good measurement of the informational complexity of RNA secondary structure, which can be used to study the functional activities of RNA molecules.</p

    Data carving parser generation

    Get PDF
    As our day to day interaction with technology continues to grow, so does the amount of data created through this interaction. The science of digital forensics grew out of the need for specialists to recover, analyze, and interpret this data. When events or actions, either by accident or with criminal intent create, delete or manipulate data, it is the role of a digital forensics analyst to acquire this data and draw conclusions about the discovered facts about who or what is responsible for the event. This thesisidentifies a gap in the research between data analysis and interpretation. Current research and tool development has been focusing on data acquisition techniques and file carving. Data acquisition is the process of recovering a forensically sound copy of the evidence, such as a bit-by-bit copy of a hard drive or an image of the contents of a computer system’s RAM. File carving is the process of searching for and extracting files from the acquired data. Few tools provide a means of quick and easy file validation and data extraction once they have been recovered, and the tools that do are either limited in their ability or very complex and require a lot of overhead and a steep learning curve to use effectively. The tool created through this research fills this gap. The tool utilizes a file description language that can textually describe the layout and on-disk format of a file type’s data. This language is very intuitive and easy to read and understand by humans. By using a description as input, the tool builds a syntax tree which can be used to parse and extract various fields of interest from any file matching the provided description. This allows for the quick analysis and interpretation of any file type, even those with uncommon or proprietary formats, as long as a valid description is provided

    Multiple Context-Free Tree Grammars: Lexicalization and Characterization

    Get PDF
    Multiple (simple) context-free tree grammars are investigated, where "simple" means "linear and nondeleting". Every multiple context-free tree grammar that is finitely ambiguous can be lexicalized; i.e., it can be transformed into an equivalent one (generating the same tree language) in which each rule of the grammar contains a lexical symbol. Due to this transformation, the rank of the nonterminals increases at most by 1, and the multiplicity (or fan-out) of the grammar increases at most by the maximal rank of the lexical symbols; in particular, the multiplicity does not increase when all lexical symbols have rank 0. Multiple context-free tree grammars have the same tree generating power as multi-component tree adjoining grammars (provided the latter can use a root-marker). Moreover, every multi-component tree adjoining grammar that is finitely ambiguous can be lexicalized. Multiple context-free tree grammars have the same string generating power as multiple context-free (string) grammars and polynomial time parsing algorithms. A tree language can be generated by a multiple context-free tree grammar if and only if it is the image of a regular tree language under a deterministic finite-copying macro tree transducer. Multiple context-free tree grammars can be used as a synchronous translation device.Comment: 78 pages, 13 figure

    Proceedings

    Get PDF
    Proceedings of the NODALIDA 2011 Workshop Constraint Grammar Applications. Editors: Eckhard Bick, Kristin Hagen, Kaili Müürisep, Trond Trosterud. NEALT Proceedings Series, Vol. 14 (2011), vi+69 pp. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/19231
    corecore