12 research outputs found

    RMESH Algorithms for Parallel String Matching

    Get PDF
    String matching problem received much attention over the years due to its importance in various applications such as text/file comparison, DNA sequencing, search engines, and spelling correction. Especially with the introduction of search engines dealing with tremendous amount of textual information presented on the world wide web and the research on DNA sequencing, this problem deserves special attention and any algorithmic or hardware improvements to speed up the process will benefit these important applications. In this paper, we present three algorithms for string matching on reconfigurable mesh architectures. Given a text T of length n and a pattern P of length m, the first algorithm finds the exact matching between T and P in O(1) time on a 2-dimensional RMESH of size (n-m+1) * m. The second algorithm finds the approximate matching between T and P in O(k) time on a 2D RMESH, where k is the maximum edit distance between T and P. The third algorithm allows only the replacement operation in the calculation of the edit distance and finds an approximate matching between T and P in constant-time on a 3D RMESH

    A Frame Work for Parallel String Matching- A Computational Approach with Omega Model

    Get PDF
    Now a day2019;s parallel string matching problem is attracted by so many researchers because of the importance in information retrieval systems. While it is very easily stated and many of the simple algorithms perform very well in practice, numerous works have been published on the subject and research is still very active. In this paper we propose a omega parallel computing model for parallel string matching. Experimental results show that, on a multi-processor system, the omega model implementation of the proposed parallel string matching algorithm can reduce string matching time by more than 40%

    Ant colony optimization on runtime reconfigurable architectures

    Get PDF

    Multiple Biolgical Sequence Alignment: Scoring Functions, Algorithms, and Evaluations

    Get PDF
    Aligning multiple biological sequences such as protein sequences or DNA/RNA sequences is a fundamental task in bioinformatics and sequence analysis. These alignments may contain invaluable information that scientists need to predict the sequences\u27 structures, determine the evolutionary relationships between them, or discover drug-like compounds that can bind to the sequences. Unfortunately, multiple sequence alignment (MSA) is NP-Complete. In addition, the lack of a reliable scoring method makes it very hard to align the sequences reliably and to evaluate the alignment outcomes. In this dissertation, we have designed a new scoring method for use in multiple sequence alignment. Our scoring method encapsulates stereo-chemical properties of sequence residues and their substitution probabilities into a tree-structure scoring scheme. This new technique provides a reliable scoring scheme with low computational complexity. In addition to the new scoring scheme, we have designed an overlapping sequence clustering algorithm to use in our new three multiple sequence alignment algorithms. One of our alignment algorithms uses a dynamic weighted guidance tree to perform multiple sequence alignment in progressive fashion. The use of dynamic weighted tree allows errors in the early alignment stages to be corrected in the subsequence stages. Other two algorithms utilize sequence knowledge-bases and sequence consistency to produce biological meaningful sequence alignments. To improve the speed of the multiple sequence alignment, we have developed a parallel algorithm that can be deployed on reconfigurable computer models. Analytically, our parallel algorithm is the fastest progressive multiple sequence alignment algorithm

    A Modular Approach to Adaptive Reactive Streaming Systems

    Get PDF
    The latest generations of FPGA devices offer large resource counts that provide the headroom to implement large-scale and complex systems. However, there are increasing challenges for the designer, not just because of pure size and complexity, but also in harnessing effectively the flexibility and programmability of the FPGA. A central issue is the need to integrate modules from diverse sources to promote modular design and reuse. Further, the capability to perform dynamic partial reconfiguration (DPR) of FPGA devices means that implemented systems can be made reconfigurable, allowing components to be changed during operation. However, use of DPR typically requires low-level planning of the system implementation, adding to the design challenge. This dissertation presents ReShape: a high-level approach for designing systems by interconnecting modules, which gives a ‘plug and play’ look and feel to the designer, is supported by tools that carry out implementation and verification functions, and is carried through to support system reconfiguration during operation. The emphasis is on the inter-module connections and abstracting the communication patterns that are typical between modules – for example, the streaming of data that is common in many FPGA-based systems, or the reading and writing of data to and from memory modules. ShapeUp is also presented as the static precursor to ReShape. In both, the details of wiring and signaling are hidden from view, via metadata associated with individual modules. ReShape allows system reconfiguration at the module level, by supporting type checking of replacement modules and by managing the overall system implementation, via metadata associated with its FPGA floorplan. The methodology and tools have been implemented in a prototype for a broad domain-specific setting – networking systems – and have been validated on real telecommunications design projects

    Neural Representations of Concepts and Texts for Biomedical Information Retrieval

    Get PDF
    Information retrieval (IR) methods are an indispensable tool in the current landscape of exponentially increasing textual data, especially on the Web. A typical IR task involves fetching and ranking a set of documents (from a large corpus) in terms of relevance to a user\u27s query, which is often expressed as a short phrase. IR methods are the backbone of modern search engines where additional system-level aspects including fault tolerance, scale, user interfaces, and session maintenance are also addressed. In addition to fetching documents, modern search systems may also identify snippets within the documents that are potentially most relevant to the input query. Furthermore, current systems may also maintain preprocessed structured knowledge derived from textual data as so called knowledge graphs, so certain types of queries that are posed as questions can be parsed as such; a response can be an output of one or more named entities instead of a ranked list of documents (e.g., what diseases are associated with EGFR mutations? ). This refined setup is often termed as question answering (QA) in the IR and natural language processing (NLP) communities. In biomedicine and healthcare, specialized corpora are often at play including research articles by scientists, clinical notes generated by healthcare professionals, consumer forums for specific conditions (e.g., cancer survivors network), and clinical trial protocols (e.g., www.clinicaltrials.gov). Biomedical IR is specialized given the types of queries and the variations in the texts are different from that of general Web documents. For example, scientific articles are more formal with longer sentences but clinical notes tend to have less grammatical conformity and are rife with abbreviations. There is also a mismatch between the vocabulary of consumers and the lingo of domain experts and professionals. Queries are also different and can range from simple phrases (e.g., COVID-19 symptoms ) to more complex implicitly fielded queries (e.g., chemotherapy regimens for stage IV lung cancer patients with ALK mutations ). Hence, developing methods for different configurations (corpus, query type, user type) needs more deliberate attention in biomedical IR. Representations of documents and queries are at the core of IR methods and retrieval methodology involves coming up with these representations and matching queries with documents based on them. Traditional IR systems follow the approach of keyword based indexing of documents (the so called inverted index) and matching query phrases against the document index. It is not difficult to see that this keyword based matching ignores the semantics of texts (synonymy at the lexeme level and entailment at phrase/clause/sentence levels) and this has lead to dimensionality reduction methods such as latent semantic indexing that generally have scale-related concerns; such methods also do not address similarity at the sentence level. Since the resurgence of neural network methods in NLP, the IR field has also moved to incorporate advances in neural networks into current IR methods. This dissertation presents four specific methodological efforts toward improving biomedical IR. Neural methods always begin with dense embeddings for words and concepts to overcome the limitations of one-hot encoding in traditional NLP/IR. In the first effort, we present a new neural pre-training approach to jointly learn word and concept embeddings for downstream use in applications. In the second study, we present a joint neural model for two essential subtasks of information extraction (IE): named entity recognition (NER) and entity normalization (EN). Our method detects biomedical concept phrases in texts and links them to the corresponding semantic types and entity codes. These first two studies provide essential tools to model textual representations as compositions of both surface forms (lexical units) and high level concepts with potential downstream use in QA. In the third effort, we present a document reranking model that can help surface documents that are likely to contain answers (e.g, factoids, lists) to a question in a QA task. The model is essentially a sentence matching neural network that learns the relevance of a candidate answer sentence to the given question parametrized with a bilinear map. In the fourth effort, we present another document reranking approach that is tailored for precision medicine use-cases. It combines neural query-document matching and faceted text summarization. The main distinction of this effort from previous efforts is to pivot from a query manipulation setup to transforming candidate documents into pseudo-queries via neural text summarization. Overall, our contributions constitute nontrivial advances in biomedical IR using neural representations of concepts and texts

    Traçage de systèmes embarqués hétérogènes

    Get PDF
    RÉSUMÉ Pouvoir analyser et comprendre les interactions entre les composants d’un système embarqué hétérogène est essentiel pour détecter les fautes, trouver la cause de latences et optimiser les ressources. Bien souvent, des solutions propriétaires d’analyse sont directement fournies par les distributeurs. Cependant, ces solutions sont souvent incomplètes ou insuffisantes : elles nécessitent parfois de mettre le système en pause, ne sont pas adaptées pour plus d’une dizaine de processeurs analysés ou induisent des baisses de performance trop importantes. Le traçage est une technique répandue qui consiste à enregistrer des évènements, associés à des estampilles de temps, à certains points de l’application. Tracer un système permet d’obtenir toutes les informations imaginables, avec une granularité de l’ordre de la nanoseconde. Cela permet entre autres d’effectuer des débogages complets et de diagnostiquer les problèmes de performance, que ce soit sur une machine isolée ou dans un système distribué. Néanmoins, si le traçage est très utilisé dans les systèmes « classiques », il n’en reste pas moins marginal pour les systèmes embarqués, qui offrent souvent des caractéristiques techniques bien différentes. L’objectif de ce travail est de montrer comment il est possible de surmonter les difficultés techniques introduites par les systèmes embarqués hétérogènes (processeurs spécialisés, absence de système d’exploitation, peu de mémoire disponible, architectures exotiques…) pour produire une solution de traçage universelle sur de tels systèmes. Nous espérons ainsi démontrer que toute plateforme hétérogène embarquée peut être tracée avec les mêmes outils et dans le même format, généralisant ainsi le traçage de ces systèmes et facilitant par le fait même le travail des développeurs. Nous montrons ainsi comment l’utilisation de barectf, un outil python produisant du code C destiné à générer des points de trace CTF dans des applications tournant sans système d’exploitation (bare-metal), permet de tracer virtuellement n’importe quelle plateforme. La carte Parallella et le système sur puce Keystone 2 de TI seront nos deux modèles d’expérimentation. Nous verrons ensuite comment la synchronisation de traces peut être généralisée à de telles plates-formes pour permettre l’analyse de traces provenant d’un environnement multi-coeurs hétérogène. Finalement, nous démontrerons à travers un cas d’étude que les méthodes et solutions proposées sont valides, fonctionnent et permettent bien de répondre aux besoins spécifiques de ces plates-formes, leur apportant une solution de traçage générique, portable et efficace.----------ABSTRACT Being able to analyze and understand interactions between all the components of a heterogeneous embedded system is mandatory to detect bugs, find the causes of latencies and optimize the resources. Proprietary solutions are often directly shipped by the producing companies. However, such solutions are rarely sufficient: they sometime require the system to be paused, are not suitable for more than a few cores and might impact the overall performances. Tracing is a well-known technic which goal is to record timestamp-matched events. Tracing a system allows a deep understanding of the system as a whole and brings information at a nanosecond rate. This allows, among other things, to debug complete systems and diagnose performances issues, on a single machine as well as on a distributed system. Nevertheless, even if tracing is well-used in classical systems, it is still marginal on embedded systems, which are often a lot different. The goal of this work is to show how it is possible to overcome the difficulties induced by heterogeneous embedded systems (specialized processors, no operating system, few available memory, exotic architectures…) and to have a generic tracing solution for such devices. We hope to demonstrate that every heterogeneous embedded platform can be traced with the same tools and the same output format, thus generalizing the tracing solutions on those devices and easing the developers’ work. To do so, we show how barectf, a python tool generating C code providing CTF tracepoints on devices with no operating system (bare-metal), allow the tracing of virtually any platform. The Parallella board and the System-on-Chip Keystone 2 from TI will be our two experimenting devices. We will then see how traces synchronization can be generalized on such platforms and allow traces analysis on many-cores heterogeneous environments. Finally, we will demonstrate through a use-case that the proposed solutions and methods are valid and are well-suited for those platforms, thus bringing a generic, portable and efficient tracing solution

    Elements of Ion Linear Accelerators, Calm in The Resonances, Other_Tales

    Full text link
    The main part of this book, Elements of Linear Accelerators, outlines in Part 1 a framework for non-relativistic linear accelerator focusing and accelerating channel design, simulation, optimization and analysis where space charge is an important factor. Part 1 is the most important part of the book; grasping the framework is essential to fully understand and appreciate the elements within it, and the myriad application details of the following Parts. The treatment concentrates on all linacs, large or small, intended for high-intensity, very low beam loss, factory-type application. The Radio-Frequency-Quadrupole (RFQ) is especially developed as a representative and the most complicated linac form (from dc to bunched and accelerated beam), extending to practical design of long, high energy linacs, including space charge resonances and beam halo formation, and some challenges for future work. Also a practical method is presented for designing Alternating-Phase- Focused (APF) linacs with long sequences and high energy gain. Full open-source software is available. The following part, Calm in the Resonances and Other Tales, contains eyewitness accounts of nearly 60 years of participation in accelerator technology. (September 2023) The LINACS codes are released at no cost and, as always,with fully open-source coding. (p.2 & Ch 19.10)Comment: 652 pages. Some hundreds of figures - all images, there is no data in the figures. (September 2023) The LINACS codes are released at no cost and, as always,with fully open-source coding. (p.2 & Ch 19.10
    corecore