Search CORE

38 research outputs found

FingerPrint Based Duplicate Detection in Streamed Data

Author: Batra Shalini
Singh Amritpal
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 04/02/2019
Field of study

In computing, duplicate data detection refers to identifying duplicate copies of repeating data. Identifying duplicate data items in streamed data and eliminating them before storing, is a complex job. This paper proposes a novel data structure for duplicate detection using a variant of stable Bloom filter named as FingerPrint Stable Bloom Filter (FP-SBF). The proposed approach uses counting Bloom filter with fingerprint bits along with an optimization mechanism for duplicate detection. FP-SBF uses d-left hashing which reduces the computational time and decreases the false positives as well as false negatives. FP-SBF can process unbounded data in single pass, using k hash functions, and successfully differentiate between duplicate and distinct elements in O(k+1) time, independent of the size of incoming data. The performance of FP-SBF has been compared with various Bloom Filters used for stream data duplication detection and it has been theoretically and experimentally proved that the proposed approach efficiently detects the duplicates in streaming data with less memory requirements

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Recency Queries with Succinct Representation

Author: Holland William L.
Wirth Anthony
Zobel Justin
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st International Symposium on Algorithms and Computation (ISAAC 2020)
Publication date: 01/01/2020
Field of study

In the context of the sliding-window set membership problem, and caching policies that require knowledge of item recency, we formalize the problem of Recency on a stream. Informally, the query asks, "when was the last time I saw item x?" Existing structures, such as hash tables, can support a recency query by augmenting item occurrences with timestamps. To support recency queries on a window of W items, this might require ?(W log W) bits. We propose a succinct data structure for Recency. By combining sliding-window dictionaries in a hierarchical structure, and careful design of the underlying hash tables, we achieve a data structure that returns a 1+? approximation to the recency of every item in O(log(? W)) time, in only (1+o(1))(1+?)(?+Wlog(?^(-1))) bits. Here, ? is the information-theoretic lower bound on the number of bits for a set of size W, in a universe of cardinality N

Dagstuhl Research Online Publication Server

On the Exploration of FPGAs and High-Level Synthesis Capabilities on Multi-Gigabit-per-Second Networks

Author: Ruiz Noguera Mario Daniel
Publication venue
Publication date: 24/01/2020
Field of study

Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Tecnología Electrónica y de las Comunicaciones. Fecha de lectura: 24-01-2020Traffic on computer networks has faced an exponential grown in recent years. Both links and communication equipment had to adapt in order to provide a minimum quality of service required for current needs. However, in recent years, a few factors have prevented commercial off-the-shelf hardware from being able to keep pace with this growth rate, consequently, some software tools are struggling to fulfill their tasks, especially at speeds higher than 10 Gbit/s. For this reason, Field Programmable Gate Arrays (FPGAs) have arisen as an alternative to address the most demanding tasks without the need to design an application specific integrated circuit, this is in part to their flexibility and programmability in the field. Needless to say, developing for FPGAs is well-known to be complex. Therefore, in this thesis we tackle the use of FPGAs and High-Level Synthesis (HLS) languages in the context of computer networks. We focus on the use of FPGA both in computer network monitoring application and reliable data transmission at very high-speed. On the other hand, we intend to shed light on the use of high level synthesis languages and boost FPGA applicability in the context of computer networks so as to reduce development time and design complexity. In the first part of the thesis, devoted to computer network monitoring. We take advantage of the FPGA determinism in order to implement active monitoring probes, which consist on sending a train of packets which is later used to obtain network parameters. In this case, the determinism is key to reduce the uncertainty of the measurements. The results of our experiments show that the FPGA implementations are much more accurate and more precise than the software counterpart. At the same time, the FPGA implementation is scalable in terms of network speed — 1, 10 and 100 Gbit/s. In the context of passive monitoring, we leverage the FPGA architecture to implement algorithms able to thin cyphered traffic as well as removing duplicate packets. These two algorithms straightforward in principle, but very useful to help traditional network analysis tools to cope with their task at higher network speeds. On one hand, processing cyphered traffic bring little benefits, on the other hand, processing duplicate traffic impacts negatively in the performance of the software tools. In the second part of the thesis, devoted to the TCP/IP stack. We explore the current limitations of reliable data transmission using standard software at very high-speed. Nowadays, the network is becoming an important bottleneck to fulfill current needs, in particular in data centers. What is more, in recent years the deployment of 100 Gbit/s network links has started. Consequently, there has been an increase scrutiny of how networking functionality is deployed, furthermore, a wide range of approaches are currently being explored to increase the efficiency of networks and tailor its functionality to the actual needs of the application at hand. FPGAs arise as the perfect alternative to deal with this problem. For this reason, in this thesis we develop Limago an FPGA-based open-source implementation of a TCP/IP stack operating at 100 Gbit/s for Xilinx’s FPGAs. Limago not only provides an unprecedented throughput, but also, provides a tiny latency when compared to the software implementations, at least fifteen times. Limago is a key contribution in some of the hottest topic at the moment, for instance, network-attached FPGA and in-network data processing

Biblos-e Archivo

Correlation of affiliate performance against web evaluation metrics

Author: Miehling Mathew J.
Publication venue
Publication date
Field of study

Affiliate advertising is changing the way that people do business online. Retailers are now offering incentives to third-party publishers for advertising goods and services on their behalf in order to capture more of the market. Online advertising spending has already over taken that of traditional advertising in all other channels in the UK and is slated to do so worldwide as well [1]. In this highly competitive industry, the livelihood of a publisher is intrinsically linked to their web site performance.Understanding the strengths and weaknesses of a web site is fundamental to improving its quality and performance. However, the definition of performance may vary between different business sectors or even different sites in the same sector. In the affiliate advertising industry, the measure of performance is generally linked to the fulfilment of advertising campaign goals, which often equates to the ability to generate revenue or brand awareness for the retailer.This thesis aims to explore the correlation of web site evaluation metrics to the business performance of a company within an affiliate advertising programme. In order to explore this correlation, an automated evaluation framework was built to examine a set of web sites from an active online advertising campaign. A purpose-built web crawler examined over 4,000 sites from the advertising campaign in approximately 260 hours gathering data to be used in the examination of URL similarity, URL relevance, search engine visibility, broken links, broken images and presence on a blacklist. The gathered data was used to calculate a score for each of the features which were then combined to create an overall HealthScore for each publishers. The evaluated metrics focus on the categories of domain and content analysis. From the performance data available, it was possible to calculate the business performance for the 234 active publishers using the number of sales and click-throughs they achieved.When the HealthScores and performance data were compared, the HealthScore was able to predict the publisher’s performance with 59% accuracy

Repository@Napier

LIPIcs, Volume 244, ESA 2022, Complete Volume

Author: Chechik Shiri
Herman Grzegorz
Navarro Gonzalo
Rotenberg Eva
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th Annual European Symposium on Algorithms (ESA 2022)
Publication date: 01/01/2022
Field of study

LIPIcs, Volume 244, ESA 2022, Complete Volum

Dagstuhl Research Online Publication Server

Shortest Route at Dynamic Location with Node Combination-Dijkstra Algorithm

Author: Fitro Achmad
Kusumaningrum Retno
Suryono Suryono
Publication venue
Publication date: 01/10/2018
Field of study

Abstract— Online transportation has become a basic requirement of the general public in support of all activities to go to work, school or vacation to the sights. Public transportation services compete to provide the best service so that consumers feel comfortable using the services offered, so that all activities are noticed, one of them is the search for the shortest route in picking the buyer or delivering to the destination. Node Combination method can minimize memory usage and this methode is more optimal when compared to A* and Ant Colony in the shortest route search like Dijkstra algorithm, but can’t store the history node that has been passed. Therefore, using node combination algorithm is very good in searching the shortest distance is not the shortest route. This paper is structured to modify the node combination algorithm to solve the problem of finding the shortest route at the dynamic location obtained from the transport fleet by displaying the nodes that have the shortest distance and will be implemented in the geographic information system in the form of map to facilitate the use of the system. Keywords— Shortest Path, Algorithm Dijkstra, Node Combination, Dynamic Location (key words

Politeknik NSC Surabay Repository

Particle Physics Reference Library

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

This second open access volume of the handbook series deals with detectors, large experimental facilities and data handling, both for accelerator and non-accelerator based experiments. It also covers applications in medicine and life sciences. A joint CERN-Springer initiative, the “Particle Physics Reference Library” provides revised and updated contributions based on previously published material in the well-known Landolt-Boernstein series on particle physics, accelerators and detectors (volumes 21A,B1,B2,C), which took stock of the field approximately one decade ago. Central to this new initiative is publication under full open access

OAPEN Library

Imaging studies of peripheral nerve regeneration induced by porous collagen biomaterials

Author: Tzeranis Dimitrios Spyridon
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2013
Field of study

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Mechanical Engineering, 2013.Cataloged from PDF version of thesis.Includes bibliographical references.There is urgent need to develop treatments for inducing regeneration in injured organs. Porous collagen-based scaffolds have been utilized clinically to induce regeneration in skin and peripheral nerves, however still there is no complete explanation about the underlying mechanism. This thesis utilizes advanced microscopy to study the expression of contractile cell phenotypes during wound healing, a phenotype believed to affect significantly the final outcome. The first part develops an efficient pipeline for processing challenging spectral fluorescence microscopy images. Images are segmented into regions of objects by refining the outcome of a pixel-wide model selection classifier by an efficient Markov Random Field model. The methods of this part are utilized by the following parts. The second part extends the image informatics methodology in studying signal transduction networks in cells interacting with 3D matrices. The methodology is applied in a pilot study of TGFP signal transduction by the SMAD pathway in fibroblasts seeded in porous collagen scaffolds. Preliminary analysis suggests that the differential effect of TGFP1 and TGFP3 to cells could be attributed to the "non-canonical" SMADI and SMAD5. The third part is an ex vivo imaging study of peripheral nerve regeneration, which focuses on the formation of a capsule of contractile cells around transected rat sciatic nerves grafted with collagen scaffolds, 1 or 2 weeks post-injury. It follows a recent study that highlights an inverse relationship between the quality of the newly formed nerve tissue and the size of the contractile cell capsule 9 weeks post-injury. Results suggest that "active" biomaterials result in significantly thinner capsule already 1 week post-injury. The fourth part describes a novel method for quantifying the surface chemistry of 3D matrices. The method is an in situ binding assay that utilizes fluorescently labeled recombinant proteins that emulate the receptor of , and is applied to quantify the density of ligands for integrins a113, a2p1 on the surface of porous collagen scaffolds. Results provide estimates for the density of ligands on "active" and "inactive" scaffolds and demonstrate that chemical crosslinking can affect the surface chemistry of biomaterials, therefore can affect the way cells sense and respond to the material.by Dimitrios S. Tzeranis.Ph. D

DSpace@MIT