Search CORE

44 research outputs found

Stream oriented computations in dataflow execution model and application to string matching problems

Author: Park Jin Hwan
Publication venue
Publication date: 01/12/1998
Field of study

Indices and Applications in High-Throughput Sequencing

Author: Weese D.
Publication venue
Publication date: 05/06/2013
Field of study

Recent advances in sequencing technology allow to produce billions of base pairs per day in the form of reads of length 100 bp an longer and current developments promise the personal $1,000 genome in a couple of years. The analysis of these unprecedented amounts of data demands for efficient data structures and algorithms. One such data structures is the substring index, that represents all substrings or substrings up to a certain length contained in a given text. In this thesis we propose 3 substring indices, which we extend to be applicable to millions of sequences. We devise internal and external memory construction algorithms and a uniform framework for accessing the generalized suffix tree. Additionally we propose different index-based applications, e.g. exact and approximate pattern matching and different repeat search algorithms. Second, we present the read mapping tool RazerS, which aligns millions of single or paired-end reads of arbitrary lengths to their potential genomic origin using either Hamming or edit distance. Our tool can work either lossless or with a user-defined loss rate at higher speeds. Given the loss rate, we present a novel approach that guarantees not to lose more reads than specified. This enables the user to adapt to the problem at hand and provides a seamless tradeoff between sensitivity and running time. We compare RazerS with other state-of-the-art read mappers and show that it has the highest sensitivity and a comparable performance on various real-world datasets. At last, we propose a general approach for frequency based string mining, which has many applications, e.g. in contrast data mining. Our contribution is a novel and lightweight algorithm that is faster and uses less memory than the best available algorithms. We show its applicability for mining multiple databases with a variety of frequency constraints. As such, we use the notion of entropy from information theory to generalize the emerging substring mining problem to multiple databases. To demonstrate the improvement of our algorithm we compared to recent approaches on real-world experiments of various string domains, e.g. natural language, DNA, or protein sequences

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

Towards a table top quantum computer

Author: Maguire Yael G., 1975-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1999
Field of study

Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 1999.Includes bibliographical references (leaves 135-139).In the early 1990s, quantum computing proved to be an enticing theoretical possibility but a extremely difficult experimental challenge. Two advances have made experimental quantum computing demonstrable: Quantum error correction; and bulk, thermal quantum computing using nuclear magnetic resonance (NMR). Simple algorithms have been implemented on large, commercial NMR spectrometers that are expensive and cumbersome. The goal of this project is to construct a table-top quantum computer that can match and eventually exceed the performance of commercial machines. This computer should be an inexpensive, easy-to-use machine that can be considered more a computer than its "supercomputer" counterparts. For this thesis, the goal is to develop a low-cost, table-top quantum computer capable of implementing simple quantum algorithms demonstrated thus far in the community, but is also amenable to the many scaling issues of practical quantum computing. Understanding these scaling issues requires developing a theoretical understanding of the signal enhancement techniques and fundamental noise sources of this powerful but delicate system. Complementary to quantum computing, this high performance but low cost NMR machine will be useful for a number of medical, low cost sensing and tagging applications due the unique properties of NMR: the ability to sense and manipulate the information content of materials on macroscopic and microscopic scales.Yael G. Maguire.S.M

CiteSeerX

DSpace@MIT

LASSO – an observatorium for the dynamic selection, analysis and comparison of software

Author: Kessel Marcus
Publication venue
Publication date: 01/01/2023
Field of study

Mining software repositories at the scale of 'big code' (i.e., big data) is a challenging activity. As well as finding a suitable software corpus and making it programmatically accessible through an index or database, researchers and practitioners have to establish an efficient analysis infrastructure and precisely define the metrics and data extraction approaches to be applied. Moreover, for analysis results to be generalisable, these tasks have to be applied at a large enough scale to have statistical significance, and if they are to be repeatable, the artefacts need to be carefully maintained and curated over time. Today, however, a lot of this work is still performed by human beings on a case-by-case basis, with the level of effort involved often having a significant negative impact on the generalisability and repeatability of studies, and thus on their overall scientific value. The general purpose, 'code mining' repositories and infrastructures that have emerged in recent years represent a significant step forward because they automate many software mining tasks at an ultra-large scale and allow researchers and practitioners to focus on defining the questions they would like to explore at an abstract level. However, they are currently limited to static analysis and data extraction techniques, and thus cannot support (i.e., help automate) any studies which involve the execution of software systems. This includes experimental validations of techniques and tools that hypothesise about the behaviour (i.e., semantics) of software, or data analysis and extraction techniques that aim to measure dynamic properties of software. In this thesis a platform called LASSO (Large-Scale Software Observatorium) is introduced that overcomes this limitation by automating the collection of dynamic (i.e., execution-based) information about software alongside static information. It features a single, ultra-large scale corpus of executable software systems created by amalgamating existing Open Source software repositories and a dedicated DSL for defining abstract selection and analysis pipelines. Its key innovations are integrated capabilities for searching for selecting software systems based on their exhibited behaviour and an 'arena' that allows their responses to software tests to be compared in a purely data-driven way. We call the platform a 'software observatorium' since it is a place where the behaviour of large numbers of software systems can be observed, analysed and compared

MAnnheim DOCument Server

Multiparty session types for dynamic verification of distributed systems

Author: Neykova Rumyana
Publication venue: Computing, Imperial College London
Publication date: 01/03/2017
Field of study

In large-scale distributed systems, each application is realised through interactions among distributed components. To guarantee safe communication (no deadlocks and communication mismatches) we need programming languages and tools that structure, manage, and policy-check these interactions. Multiparty session types (MPST), a typing discipline for structured interactions between communicating processes, offers a promising approach. To date, however, session types applications have been limited to static verification, which is not always feasible and is often restrictive in terms of programming API and specifying policies. This thesis investigates the design and implementation of a runtime verification framework, ensuring conformance between programs and specifications. Specifications are written in Scribble, a protocol description language formally founded on MPST. The central idea of the approach is a dynamic monitor, which takes a form of a communicating finite state machine, automatically generated from Scribble specifications, and a communication runtime stipulating a message format. We extend and apply Scribble-based runtime verification in manifold ways. First, we implement a Python library, facilitated with session primitives and verification runtime. We integrate the library in a large cyber-infrastructure project for oceanography. Second, we examine multiple communication patterns, which reveal and motivate two novel extensions, asynchronous interrupts for verification of exception handling behaviours, and time constraints for enforcement of realtime protocols. Third, we apply the verification framework to actor programming by augmenting an actor library in Python with protocol annotations. For both implementations, measurements show Scribble-based dynamic checking delivers minimal overhead and allows expressive specifications. Finally, we explore a static analysis of Scribble specifications as to efficiently compute a safe global state from which a monitored system of interacting processes can be recovered after a failure. We provide an implementation of a verification framework for recovery in Erlang. Benchmarks show our recovery strategy outperforms a built-in static recovery strategy, in Erlang, on a number of use cases.Open Acces

Spiral - Imperial College Digital Repository

Recommended from our members

Finding, Measuring, and Reducing Inefficiencies in Contemporary Computer Systems

Author: Kambadur Melanie Rae
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2016
Field of study

Computer systems have become increasingly diverse and specialized in recent years. This complexity supports a wide range of new computing uses and users, but is not without cost: it has become difficult to maintain the efficiency of contemporary general purpose computing systems. Computing inefficiencies, which include nonoptimal runtimes, excessive energy use, and limits to scalability, are a serious problem that can result in an inability to apply computing to solve the world's most important problems. Beyond the complexity and vast diversity of modern computing platforms and applications, a number of factors make improving general purpose efficiency challenging, including the requirement that multiple levels of the computer system stack be examined, that legacy hardware devices and software may stand in the way of achieving efficiency, and the need to balance efficiency with reusability, programmability, security, and other goals. This dissertation presents five case studies, each demonstrating different ways in which the measurement of emerging systems can provide actionable advice to help keep general purpose computing efficient. The first of the five case studies is Parallel Block Vectors, a new profiling method for understanding parallel programs with a fine-grained, code-centric perspective aids in both future hardware design and in optimizing software to map better to existing hardware. Second is a project that defines a new way of measuring application interference on a datacenter's worth of chip-multiprocessors, leading to improved scheduling where applications can more effectively utilize available hardware resources. Next is a project that uses the GT-Pin tool to define a method for accelerating the simulation of GPGPUs, ultimately allowing for the development of future hardware with fewer inefficiencies. The fourth project is an experimental energy survey that compares and combines the latest energy efficiency solutions at different levels of the stack to properly evaluate the state of the art and to find paths forward for future energy efficiency research. The final project presented is NRG-Loops, a language extension that allows programs to measure and intelligently adapt their own power and energy use

Columbia University Academic Commons

Formal Verification of Object-Oriented Software. Papers presented at the International Conference, June 28-30, 2010, Paris, France

Author: Beckert Bernhard
Marché Claude
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2010
Field of study

KITopen

ENERGY-EFFICIENT FEATURE EXTRACTION ENGINE AND SECURE CHIP IDENTIFICATION FOR UBIQUITOUS SURVEILLANCE

Author: ANASTACIA ALVAREZ
Publication venue
Publication date: 29/12/2016
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

The ciao prolog system

Author: Bueno Carrillo Francisco
Cabeza Gras Daniel
Carro Liñares Manuel
Hermenegildo Manuel V.
López García Pedro
Puebla Sánchez Alvaro Germán
Publication venue: Facultad de Informática (UPM)
Publication date: 12/12/2002
Field of study

Ciao is a public domain, next generation multi-paradigm programming environment with a unique set of features: Ciao offers a complete Prolog system, supporting ISO-Prolog, but its novel modular design allows both restricting and extending the language. As a result, it allows working with fully declarative subsets of Prolog and also to extend these subsets (or ISO-Prolog) both syntactically and semantically. Most importantly, these restrictions and extensions can be activated separately on each program module so that several extensions can coexist in the same application for different modules. Ciao also supports (through such extensions) programming with functions, higher-order (with predicate abstractions), constraints, and objects, as well as feature terms (records), persistence, several control rules (breadth-first search, iterative deepening, ...), concurrency (threads/engines), a good base for distributed execution (agents), and parallel execution. Libraries also support WWW programming, sockets, external interfaces (C, Java, TclTk, relational databases, etc.), etc. Ciao offers support for programming in the large with a robust module/object system, module-based separate/incremental compilation (automatically -no need for makefiles), an assertion language for declaring (optional) program properties (including types and modes, but also determinacy, non-failure, cost, etc.), automatic static inference and static/dynamic checking of such assertions, etc. Ciao also offers support for programming in the small producing small executables (including only those builtins used by the program) and support for writing scripts in Prolog. The Ciao programming environment includes a classical top-level and a rich emacs interface with an embeddable source-level debugger and a number of execution visualization tools. The Ciao compiler (which can be run outside the top level shell) generates several forms of architecture-independent and stand-alone executables, which run with speed, efficiency and executable size which are very competive with other commercial and academic Prolog/CLP systems. Library modules can be compiled into compact bytecode or C source files, and linked statically, dynamically, or autoloaded. The novel modular design of Ciao enables, in addition to modular program development, effective global program analysis and static debugging and optimization via source to source program transformation. These tasks are performed by the Ciao preprocessor ( ciaopp, distributed separately). The Ciao programming environment also includes lpdoc, an automatic documentation generator for LP/CLP programs. It processes Prolog files adorned with (Ciao) assertions and machine-readable comments and generates manuals in many formats including postscript, pdf, texinfo, info, HTML, man, etc. , as well as on-line help, ascii README files, entries for indices of manuals (info, WWW, ...), and maintains WWW distribution sites

Archivo Digital UPM

LIPIcs, Volume 244, ESA 2022, Complete Volume

Author: Chechik Shiri
Herman Grzegorz
Navarro Gonzalo
Rotenberg Eva
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th Annual European Symposium on Algorithms (ESA 2022)
Publication date: 01/01/2022
Field of study

LIPIcs, Volume 244, ESA 2022, Complete Volum

Dagstuhl Research Online Publication Server