Search CORE

12 research outputs found

Approximate Matching in ACSM Dissimilarity Measure

Author: Amelio Alessia
Publication venue
Publication date: 01/01/2016
Field of study

AbstractThe paper introduces a new patch-based dissimilarity measure for image comparison employing an approximation strategy. It extends the Average Common Sub-matrix measure computing the exact dissimilarity among images. In the exact method, dissimilarity between two images is obtained by considering the average area of the biggest square sub-matrices in common between the images, by exact match of the extracted sub-matrices pixel by pixel. As an extension, the proposed dissimilarity measure computes an approximate match between the sub-matrices, which is obtained by omitting a controlled number of pixels at a given column offset inside the sub-matrices. The proposed dissimilarity measure is extensively compared with other well-known approximate methods for image comparison in the state-of-the-art. Experiments demonstrate the superiority of the proposed approximate measure in terms of execution time with respect to the exact method, and in terms of retrieval precision with respect to the other state-of-the-art methods

Elsevier - Publisher Connector

Crossref

Open Access Repository

Compressed Pattern Matching

Author: Holec Michal
Publication venue: Vysoká škola báňská - Technická univerzita Ostrava
Publication date: 01/01/2019
Field of study

Tato bakalářská práce se zabývá vyhledáváním vzorků v datech. Hlavním úkolem je popsat vybrané algoritmy a datové struktury, pomocí kterých se takové vyhledávání v praxi provádí a to v datech nekomprimovaných i komprimovaných. Nedílnou součástí tohoto úkolu je i implementace vybrané datové struktury. V současnosti se hojně používají komprimační algoritmy využívající Burrows-Wheelerovu transformaci, na které závisí datová struktura FM-Indexu, kterou budeme implementovat. Implementace je provedena v programovacím jazyce C#. Nad výslednou datovou strukturou budou provedeny experimenty, které se zaměří na rychlost vyhledávání a prostorové nároky. Rychlost vyhledávání bude porovnána oproti klasickým algoritmům. Prostorové nároky budou porovnány podle formátu vstupních dat a při různých konfiguracích FM-Indexu. Na závěr jsou prezentovány výsledky a zjištěné poznatky z experimentů implementované datové struktury.This Bachelor's thesis is about pattern matching. Main objective is to describe selected algorithms and data structures, that are used in practice for pattern matching on non-compressed as well as compressed data. Integral part of this thesis is subsequent implementation of the selected data structure. At present, compression algorithms using Burrows-Wheeler transformation are used extensively and data structure FM-Index depends on it. This data structure will be implemented in programming language C# and subjected to experiments. Experiments will mainly cover speed of pattern matching and will be cross examined against more classical algorithms. Space requirements will be tested on data of varying formats as well as with different configurations of FM-Index. At the end the results and findings from the experiments will be presented.460 - Katedra informatikyvelmi dobř

DSpace at VSB Technical University of Ostrava

An Investigation of GeoBase Mission Data Set Design, Implementation, and Usage within Air Force Civil Engineer Electrical and Utilities Work Centers

Author: Loeber Paul C.
Publication venue: AFIT Scholar
Publication date: 01/03/2005
Field of study

In 2001, the Office of the Civil Engineer, Installation and Logistics, Headquarters, United States Air Force, (ILE) identified Civil Engineer Squadrons as the central point of contact for all base-level mapping requirements/activities. In order to update mapping methods and procedures, ILE has put into place a program called GeoBase, which uses private sector Geographic Information Systems (GIS) technology as a foundation. In its current state, GeoBase uses the concept of a Common Installation Picture (CIP) to describe the goal of a consolidated visual that integrates the many layers of mapping information. The CIP visual is formed from a collection of data elements that are termed Mission Data Sets (MDS). There are varieties of MDS each of which contain data specific to a particular geospatial domain. The research uses a case study methodology to investigate how the MDS are designed, implemented, and used within four USAF Civil Engineer Squadron Electrical and Utilities Work Centers. The research findings indicate that MDS design and implementation processes vary across organizations; however, fundamental similarities do exist. At the same time, an evolution and maturation of these processes is evident. As for MDS usage within the Electrical and Utilities Work Centers, it was found that MDS usage is increasing; however, data quality is a limiting factor. Based on the research findings, recommendations are put forward for improving wing/base-level GeoBase program design, implementation, and usage

AFTI Scholar (Air Force Institute of Technology)

Efficient Storage of Genomic Sequences in High Performance Computing Systems

Author: Guerra Soler Aníbal José
Publication venue: Medellín, Colombia
Publication date: 01/01/2019
Field of study

ABSTRACT: In this dissertation, we address the challenges of genomic data storage in high performance computing systems. In particular, we focus on developing a referential compression approach for Next Generation Sequence data stored in FASTQ format files. The amount of genomic data available for researchers to process has increased exponentially, bringing enormous challenges for its efficient storage and transmission. General-purpose compressors can only offer limited performance for genomic data, thus the need for specialized compression solutions. Two trends have emerged as alternatives to harness the particular properties of genomic data: non-referential and referential compression. Non-referential compressors offer higher compression rations than general purpose compressors, but still below of what a referential compressor could theoretically achieve. However, the effectiveness of referential compression depends on selecting a good reference and on having enough computing resources available. This thesis presents one of the first referential compressors for FASTQ files. We first present a comprehensive analytical and experimental evaluation of the most relevant tools for genomic raw data compression, which led us to identify the main needs and opportunities in this field. As a consequence, we propose a novel compression workflow that aims at improving the usability of referential compressors. Subsequently, we discuss the implementation and performance evaluation for the core of the proposed workflow: a referential compressor for reads in FASTQ format that combines local read-to-reference alignments with a specialized binary-encoding strategy. The compression algorithm, named UdeACompress, achieved very competitive compression ratios when compared to the best compressors in the current state of the art, while showing reasonable execution times and memory use. In particular, UdeACompress outperformed all competitors when compressing long reads, typical of the newest sequencing technologies. Finally, we study the main aspects of the data-level parallelism in the Intel AVX-512 architecture, in order to develop a parallel version of the UdeACompress algorithms to reduce the runtime. Through the use of SIMD programming, we managed to significantly accelerate the main bottleneck found in UdeACompress, the Suffix Array Construction

Biblioteca Digital del Sistema de Bibliotecas de la Universidad de Antioquia

Optimal Parsing for Dictionary Text Compression

Author: LANGIU Alessio
Publication venue
Publication date: 03/04/2012
Field of study

Dictionary-based compression algorithms include a parsing strategy to transform the input text into a sequence of dictionary phrases. Given a text, such process usually is not unique and, for compression purpose, it makes sense to find one of the possible parsing that minimize the final compression ratio. This is the parsing problem. An optimal parsing is a parsing strategy or a parsing algorithm that solve the parsing problem taking account of all the constraints of a compression algorithm or of a class of homogeneous compression algorithms. Compression algorithm constrains are, for instance, the dictionary itself, i.e. the dynamic set of available phrases, and how much a phrase weights on the compressed text, i.e. the number of bits of which the codeword representing such phrase is composed, also denoted as the encoding cost of a dictionary pointer. In more than 30th years of history of dictionary based text compression, while plenty of algorithms, variants and extensions appeared and while dictionary approach to text compression became one of the most appreciated and utilized in almost all the storage and communication processes, only few optimal parsing algorithms were presented. Many compression algorithms still leaks optimality of their parsing or, at least, proof of optimality. This happens because there is not a general model of the parsing problem that includes all the dictionary based algorithms and because the existing optimal parsing algorithms work under too restrictive hypothesis. This work focus on the parsing problem and presents both a general model for dictionary based text compression called Dictionary-Symbolwise Text Compression theory and a general parsing algorithm that is proved to be optimal under some realistic hypothesis. This algorithm is called iii Dictionary-Symbolwise Flexible Parsing and it covers almost all of the known cases of dictionary based text compression algorithms together with the large class of their variants where the text is decomposed in a sequence of symbols and dictionary phrases. In this work we further consider the case of a free mixture of a dictionary compressor and a symbolwise compressor. Our Dictionary-Symbolwise Flexible Parsing covers also this case. We have indeed an optimal parsing algorithm in the case of dictionary-symbolwise compression where the dictionary is prefix closed and the cost of encoding dictionary pointer is variable. The symbolwise compressor is any classical one that works in linear time, as many common variable-length encoders do. Our algorithm works under the assumption that a special graph that will be described in the following, is well defined. Even if this condition is not satisfied, it is possible to use the same method to obtain almost optimal parses. In detail, when the dictionary is LZ78-like, we show how to implement our algorithm in linear time. When the dictionary is LZ77-like our algorithm can be implemented in time O(n log n). Both have O(n) space complexity. Even if the main aim of this work is of theoretical nature, some experimental results will be introduced to underline some practical effects of the parsing optimality in terms of compression performance and to show how to improve the compression ratio by building extensions Dictionary- Symbolwise of known algorithms. Finally, some more detailed experiments are hosted in a devoted appendix

Archivio istituzionale della ricerca - Università di Palermo

A comparison of exact string search algorithms for deep packet inspection

Author: Hunt Kieran
Publication venue: Faculty of Science, Computer Science
Publication date: 01/01/2018
Field of study

Every day, computer networks throughout the world face a constant onslaught of attacks. To combat these, network administrators are forced to employ a multitude of mitigating measures. Devices such as firewalls and Intrusion Detection Systems are prevalent today and employ extensive Deep Packet Inspection to scrutinise each piece of network traffic. Systems such as these usually require specialised hardware to meet the demand imposed by high throughput networks. Hardware like this is extremely expensive and singular in its function. It is with this in mind that the string search algorithms are introduced. These algorithms have been proven to perform well when searching through large volumes of text and may be able to perform equally well in the context of Deep Packet Inspection. String search algorithms are designed to match a single pattern to a substring of a given piece of text. This is not unlike the heuristics employed by traditional Deep Packet Inspection systems. This research compares the performance of a large number of string search algorithms during packet processing. Deep Packet Inspection places stringent restrictions on the reliability and speed of the algorithms due to increased performance pressures. A test system had to be designed in order to properly test the string search algorithms in the context of Deep Packet Inspection. The system allowed for precise and repeatable tests of each algorithm and then for their comparison. Of the algorithms tested, the Horspool and Quick Search algorithms posted the best results for both speed and reliability. The Not So Naive and Rabin-Karp algorithms were slowest overall

South East Academic Libraries System (SEALS)

Rhodes Repository (SEALS)

Recommended from our members

GPU-Acceleration of In-Memory Data Analytics

Author: Sitaridi Evangelia
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2016
Field of study

Hardware advances strongly influence the database system design. The flattening speed of CPU cores makes many-core accelerators, such as GPUs, a vital alternative to explore for processing the ever-increasing amounts of data. GPUs have a significantly higher degree of parallelism than multi-core CPUs but their cores are simpler. As a result, they do not face the power constraints limiting the parallelism of CPUs. Their trade-off, however, is the increased implementation complexity. This thesis adapts and redesigns data analytics operators to better exploit the GPU special memory and threading model. Due to the increasing memory capacity and also the user's need for fast interaction with the data, we focus on in-memory analytics. Our techniques span different steps of the data processing pipeline: (1) Data preprocessing, (2) Query compilation, and (3) Algorithmic optimization of the operators. Our data preprocessing techniques adapt the data layout for numeric and string columns to maximize the achieved GPU memory bandwidth. Our query compilation techniques compute the optimal execution plan for conjunctive filters. We formulate \textit{memory divergence} for string matching algorithms and suggest how to eliminate it. Finally, we parallelize decompression algorithms in our compression framework \textit{Gompresso} to fit more data into the limited GPU memory. Gompresso achieves high speed-ups on GPUs over multi-core CPU state-of-the-art libraries and is suitable for any massively parallel processor

Columbia University Academic Commons