1,877 research outputs found
CLUDE: An Efficient Algorithm for LU Decomposition Over a Sequence of Evolving Graphs
Session: Matrix Factorization, Clustering and Probabilistic DataIn many applications, entities and their relationships are
represented by graphs. Examples include the WWW (web
pages and hyperlinks) and bibliographic networks (authors
and co-authorship). A graph can be conveniently modeled
by a matrix from which various quantitative measures are
derived. Some example measures include PageRank and
SALSA (which measure nodes’ importance), and Personalized
PageRank and Random Walk with Restart (which measure
proximities between nodes). To compute these measures,
linear systems of the form Ax = b, where A is a matrix
that captures a graph’s structure, need to be solved. To
facilitate solving the linear system, the matrix A is often decomposed
into two triangular matrices (L and U). In a dynamic
world, the graph that models it changes with time and
thus is the matrix A that represents the graph. We consider
a sequence of evolving graphs and its associated sequence of
evolving matrices. We study how LU-decomposition should
be done over the sequence so that (1) the decomposition
is efficient and (2) the resulting LU matrices best preserve
the sparsity of the matrices A’s (i.e., the number of extra
non-zero entries introduced in L and U are minimized.) We
propose a cluster-based algorithm CLUDE for solving the
problem. Through an experimental study, we show that
CLUDE is about an order of magnitude faster than the
traditional incremental update algorithm. The number of
extra non-zero entries introduced by CLUDE is also about
an order of magnitude fewer than that of the traditional
algorithm. CLUDE is thus an efficient algorithm for LU decomposition
that produces high-quality LU matrices over an
evolving matrix sequence.published_or_final_versio
A Closer Look at Lightweight Graph Reordering
Graph analytics power a range of applications in areas as diverse as finance,
networking and business logistics. A common property of graphs used in the
domain of graph analytics is a power-law distribution of vertex connectivity,
wherein a small number of vertices are responsible for a high fraction of all
connections in the graph. These richly-connected (hot) vertices inherently
exhibit high reuse. However, their sparse distribution in memory leads to a
severe underutilization of on-chip cache capacity. Prior works have proposed
lightweight skew-aware vertex reordering that places hot vertices adjacent to
each other in memory, reducing the cache footprint of hot vertices. However, in
doing so, they may inadvertently destroy the inherent community structure
within the graph, which may negate the performance gains achieved from the
reduced footprint of hot vertices.
In this work, we study existing reordering techniques and demonstrate the
inherent tension between reducing the cache footprint of hot vertices and
preserving original graph structure. We quantify the potential performance loss
due to disruption in graph structure for different graph datasets. We further
show that reordering techniques that employ fine-grain reordering significantly
increase misses in the higher level caches, even when they reduce misses in the
last-level cache.
To overcome the limitations of existing reordering techniques, we propose
Degree-Based Grouping (DBG), a novel lightweight reordering technique that
employs a coarse-grain reordering to largely preserve graph structure while
reducing the cache footprint of hot vertices. Our evaluation on 40 combinations
of various graph applications and datasets shows that, compared to a baseline
with no reordering, DBG yields an average application speed-up of 16.8% vs
11.6% for the best-performing existing lightweight technique.Comment: Fixed ill-formatted page 6 from the earlier version. No content
change
Datacenter Traffic Control: Understanding Techniques and Trade-offs
Datacenters provide cost-effective and flexible access to scalable compute
and storage resources necessary for today's cloud computing needs. A typical
datacenter is made up of thousands of servers connected with a large network
and usually managed by one operator. To provide quality access to the variety
of applications and services hosted on datacenters and maximize performance, it
deems necessary to use datacenter networks effectively and efficiently.
Datacenter traffic is often a mix of several classes with different priorities
and requirements. This includes user-generated interactive traffic, traffic
with deadlines, and long-running traffic. To this end, custom transport
protocols and traffic management techniques have been developed to improve
datacenter network performance.
In this tutorial paper, we review the general architecture of datacenter
networks, various topologies proposed for them, their traffic properties,
general traffic control challenges in datacenters and general traffic control
objectives. The purpose of this paper is to bring out the important
characteristics of traffic control in datacenters and not to survey all
existing solutions (as it is virtually impossible due to massive body of
existing research). We hope to provide readers with a wide range of options and
factors while considering a variety of traffic control mechanisms. We discuss
various characteristics of datacenter traffic control including management
schemes, transmission control, traffic shaping, prioritization, load balancing,
multipathing, and traffic scheduling. Next, we point to several open challenges
as well as new and interesting networking paradigms. At the end of this paper,
we briefly review inter-datacenter networks that connect geographically
dispersed datacenters which have been receiving increasing attention recently
and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial
Efficient Diversification of Web Search Results
In this paper we analyze the efficiency of various search results
diversification methods. While efficacy of diversification approaches has been
deeply investigated in the past, response time and scalability issues have been
rarely addressed. A unified framework for studying performance and feasibility
of result diversification solutions is thus proposed. First we define a new
methodology for detecting when, and how, query results need to be diversified.
To this purpose, we rely on the concept of "query refinement" to estimate the
probability of a query to be ambiguous. Then, relying on this novel ambiguity
detection method, we deploy and compare on a standard test set, three different
diversification methods: IASelect, xQuAD, and OptSelect. While the first two
are recent state-of-the-art proposals, the latter is an original algorithm
introduced in this paper. We evaluate both the efficiency and the effectiveness
of our approach against its competitors by using the standard TREC Web
diversification track testbed. Results shown that OptSelect is able to run two
orders of magnitude faster than the two other state-of-the-art approaches and
to obtain comparable figures in diversification effectiveness.Comment: VLDB201
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression
In long context scenarios, large language models (LLMs) face three main
challenges: higher computational/financial cost, longer latency, and inferior
performance. Some studies reveal that the performance of LLMs depends on both
the density and the position of the key information (question relevant) in the
input prompt. Inspired by these findings, we propose LongLLMLingua for prompt
compression towards improving LLMs' perception of the key information to
simultaneously address the three challenges. We conduct evaluation on a wide
range of long context scenarios including single-/multi-document QA, few-shot
learning, summarization, synthetic tasks, and code completion. The experimental
results show that LongLLMLingua compressed prompt can derive higher performance
with much less cost. The latency of the end-to-end system is also reduced. For
example, on NaturalQuestions benchmark, LongLLMLingua gains a performance boost
of up to 17.1% over the original prompt with ~4x fewer tokens as input to
GPT-3.5-Turbo. It can derive cost savings of \$28.5 and \$27.4 per 1,000
samples from the LongBench and ZeroScrolls benchmark, respectively.
Additionally, when compressing prompts of ~10k tokens at a compression rate of
2x-10x, LongLLMLingua can speed up the end-to-end latency by 1.4x-3.8x. Our
code is available at https://aka.ms/LLMLingua
Database Optimization Aspects for Information Retrieval
There is a growing need for systems that can process queries, combining both structured data and text. One way to provide such functionality is to integrate information retrieval (IR) techniques in a database management system (DBMS). However, both IR and database research have been separate research fields for decades, resulting in different - even conflicting - approaches to data management.
Each DBMS has a component called a "query optimizer", which plays a crucial role in the efficiency and flexibility of the system. So, for successful integration the IR techniques and data structures, as well as the DBMS query optimizer, should be adapted to enable mutual cooperation.
The author concentrates on top-N queries - a common class of IR queries. An IR top-N query asks for the N best documents given a set of keywords. The author proposes processing the data in batches as a compromise between IR and DBMS query processing. Experiments with this technique show that porting IR optimization techniques is (still) not a promising option due to the additional administrative overhead. Two new mathematical models are introduced to eliminate this overhead: a model that predicts selectivity, which is a crucial factor in the execution costs, and a model that predicts the quality of the top-N
- …