Search CORE

1,269 research outputs found

Handling Massive N-Gram Datasets Efficiently

Author: Pibiri Giulio Ermanno
Venturini Rossano
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/06/2018
Field of study

This paper deals with the two fundamental problems concerning the handling of large n-gram language models: indexing, that is compressing the n-gram strings and associated satellite data without compromising their retrieval speed; and estimation, that is computing the probability distribution of the strings from a large textual source. Regarding the problem of indexing, we describe compressed, exact and lossless data structures that achieve, at the same time, high space reductions and no time degradation with respect to state-of-the-art solutions and related software packages. In particular, we present a compressed trie data structure in which each word following a context of fixed length k, i.e., its preceding k words, is encoded as an integer whose value is proportional to the number of words that follow such context. Since the number of words following a given context is typically very small in natural languages, we lower the space of representation to compression levels that were never achieved before. Despite the significant savings in space, our technique introduces a negligible penalty at query time. Regarding the problem of estimation, we present a novel algorithm for estimating modified Kneser-Ney language models, that have emerged as the de-facto choice for language modeling in both academia and industry, thanks to their relatively low perplexity performance. Estimating such models from large textual sources poses the challenge of devising algorithms that make a parsimonious use of the disk. The state-of-the-art algorithm uses three sorting steps in external memory: we show an improved construction that requires only one sorting step thanks to exploiting the properties of the extracted n-gram strings. With an extensive experimental analysis performed on billions of n-grams, we show an average improvement of 4.5X on the total running time of the state-of-the-art approach.Comment: Published in ACM Transactions on Information Systems (TOIS), February 2019, Article No: 2

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

IP lookup with low memory requirement and fast update

Author: Berger Michael Stübert
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2003
Field of study

Crossref

Online Research Database In Technology

Efficient hardware architecture for fast IP address lookup

Author: Chan KS
Liu C
Pao D
Wu A
Yeung L
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

A multigigabit IP router may receive several millions packets per second from each input link. For each packet, the router needs to find the longest matching prefix in the forwarding table in order to determine the packet's next-hop. In this paper, we present an efficient hardware solution for the IP address lookup problem. We model the address lookup problem as a searching problem on a binary-trie. The binary-trie is partitioned into four levels of fixed size 255-node subtrees. We employ a hierarchical indexing structure to facilitate direct access to subtrees in a given level. It is estimated that a forwarding table with 40K prefixes will consume 2.5Mbytes of memory. The searching is implemented using a hardware pipeline with a minimum cycle of 12.5ns if the memory modules are implemented using SRAM. A distinguishing feature of our design is that forwarding table entries are not replicated in the data structure. Hence, table updates can be done in constant time with only a few memory accesses.published_or_final_versio

HKU Scholars Hub

Reducing Router Forwarding Table Size Using Aggregation and Caching

Author: Liu Yaoqing
Publication venue: University of Memphis Digital Commons
Publication date: 24/07/2013
Field of study

The fast growth of global routing table size has been causing concerns that the Forwarding Information Base (FIB) will not be able to fit in existing routers\u27 expensive line-card memory, and upgrades will lead to a higher cost for network operators and customers. FIB Aggregation, a technique that merges multiple FIB entries into one, is probably the most practical solution since it is a software solution local to a router, and does not require any changes to routing protocols or network operations. While previous work on FIB aggregation mostly focuses on reducing table size, this work focuses on algorithms that can update compressed FIBs quickly and incrementally. Quick updates are critical to routers because they have very limited time to process routing updates without impacting packet delivery performance. We have designed three algorithms: FIFA-S for the smallest table size, FIFA-T for the shortest running time, and FIFA-H for both small tables and short running time, and operators can use the one best suited to their needs. These algorithms significantly improve over existing work in terms of reducing routers\u27 computation overhead and limiting impact on the forwarding plane while maintaining a good compression ratio. Another potential solution is to install only the most popular FIB entries into the fast memory (e.g., an FIB cache), while storing the complete FIB in slow memory. In this paper, we propose an effective FIB caching scheme that achieves a considerably higher hit ratio than previous approaches while preventing the cache-hiding problem. Our experimental results using data traffic from a regional network show that with only 20K prefixes in the cache (5.36% of the actual FIB size), the hit ratio of our scheme is higher than 99.95%. Our scheme can also efficiently handle cache misses, cache replacement and routing updates

University of Memphis Digital Commons

Content-Centric Networking at Internet Scale through The Integration of Name Resolution and Routing

Author: Afanasyev A.
Afanasyev A.
Gritter M.
Raju J.
Wahlisch M.
Wang Y.
Xylomenos G.
Yuan H.
Publication venue
Publication date: 16/08/2016
Field of study

We introduce CCN-RAMP (Routing to Anchors Matching Prefixes), a new approach to content-centric networking. CCN-RAMP offers all the advantages of the Named Data Networking (NDN) and Content-Centric Networking (CCNx) but eliminates the need to either use Pending Interest Tables (PIT) or lookup large Forwarding Information Bases (FIB) listing name prefixes in order to forward Interests. CCN-RAMP uses small forwarding tables listing anonymous sources of Interests and the locations of name prefixes. Such tables are immune to Interest-flooding attacks and are smaller than the FIBs used to list IP address ranges in the Internet. We show that no forwarding loops can occur with CCN-RAMP, and that Interests flow over the same routes that NDN and CCNx would maintain using large FIBs. The results of simulation experiments comparing NDN with CCN-RAMP based on ndnSIM show that CCN-RAMP requires forwarding state that is orders of magnitude smaller than what NDN requires, and attains even better performance

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Design and Evaluation of Packet Classification Systems, Doctoral Dissertation, December 2006

Author: Song Haoyu
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2006
Field of study

Although many algorithms and architectures have been proposed, the design of efficient packet classification systems remains a challenging problem. The diversity of filter specifications, the scale of filter sets, and the throughput requirements of high speed networks all contribute to the difficulty. We need to review the algorithms from a high-level point-of-view in order to advance the study. This level of understanding can lead to significant performance improvements. In this dissertation, we evaluate several existing algorithms and present several new algorithms as well. The previous evaluation results for existing algorithms are not convincing because they have not been done in a consistent way. To resolve this issue, an objective evaluation platform needs to be developed. We implement and evaluate several representative algorithms with uniform criteria. The source code and the evaluation results are both published on a web-site to provide the research community a benchmark for impartial and thorough algorithm evaluations. We propose several new algorithms to deal with the different variations of the packet classification problem. They are: (1) the Shape Shifting Trie algorithm for longest prefix matching, used in IP lookups or as a building block for general packet classification algorithms; (2) the Fast Hash Table lookup algorithm used for exact flow match; (3) the longest prefix matching algorithm using hash tables and tries, used in IP lookups or packet classification algorithms;(4) the 2D coarse-grained tuple-space search algorithm with controlled filter expansion, used for two-dimensional packet classification or as a building block for general packet classification algorithms; (5) the Adaptive Binary Cutting algorithm used for general multi-dimensional packet classification. In addition to the algorithmic solutions, we also consider the TCAM hardware solution. In particular, we address the TCAM filter update problem for general packet classification and provide an efficient algorithm. Building upon the previous work, these algorithms significantly improve the performance of packet classification systems and set a solid foundation for further study

Washington University St. Louis: Open Scholarship

Compressing dictionaries of strings

Author: LANDOLFI LORENZO
Publication venue: 'Pisa University Press'
Publication date: 27/04/2015
Field of study

The aim of this work is to develop a data structure capable of storing a set of strings in a compressed way providing the facility to access and search by prefix any string in the set. The notion of string will be formally exposed in this work, but it is enough to think a string as a stream of characters or a variable length dat}. We will prove that the data structure devised in our work will be able to search prefixes of the stored strings in a very efficient way, hence giving a performant solution to one of the most discussed problem of our age. In the discussion of our data structure, particular emphasis will be given to both space and time efficiency and a tradeoff between these two will be constantly searched. To understand how much string based data structures are important, think about modern search engines and social networks; they must store and process continuously immense streams of data which are mainly strings, while the output of such processed data must be available in few milliseconds not to try the patience of the user. Space efficiency is one of the main concern in this kind of problem. In order to satisfy real-time latency bounds, the largest possible amount of data must be stored in the highest levels of the memory hierarchy. Moreover, data compression allows to save money because it reduces the amount of physical memory needed to store abstract data and this particularly important since storage is the main source of expenditure in modern systems

Electronic Thesis and Dissertation Archive - Università di Pisa