108 research outputs found
Graffiti Networks: A Subversive, Internet-Scale File Sharing Model
The proliferation of peer-to-peer (P2P) file sharing protocols is due to
their efficient and scalable methods for data dissemination to numerous users.
But many of these networks have no provisions to provide users with long term
access to files after the initial interest has diminished, nor are they able to
guarantee protection for users from malicious clients that wish to implicate
them in incriminating activities. As such, users may turn to supplementary
measures for storing and transferring data in P2P systems. We present a new
file sharing paradigm, called a Graffiti Network, which allows peers to harness
the potentially unlimited storage of the Internet as a third-party
intermediary. Our key contributions in this paper are (1) an overview of a
distributed system based on this new threat model and (2) a measurement of its
viability through a one-year deployment study using a popular web-publishing
platform. The results of this experiment motivate a discussion about the
challenges of mitigating this type of file sharing in a hostile network
environment and how web site operators can protect their resources
A parent-centered radial layout algorithm for interactive graph visualization and animation
We have developed (1) a graph visualization system that allows users to
explore graphs by viewing them as a succession of spanning trees selected
interactively, (2) a radial graph layout algorithm, and (3) an animation
algorithm that generates meaningful visualizations and smooth transitions
between graphs while minimizing edge crossings during transitions and in static
layouts.
Our system is similar to the radial layout system of Yee et al. (2001), but
differs primarily in that each node is positioned on a coordinate system
centered on its own parent rather than on a single coordinate system for all
nodes. Our system is thus easy to define recursively and lends itself to
parallelization. It also guarantees that layouts have many nice properties,
such as: it guarantees certain edges never cross during an animation.
We compared the layouts and transitions produced by our algorithms to those
produced by Yee et al. Results from several experiments indicate that our
system produces fewer edge crossings during transitions between graph drawings,
and that the transitions more often involve changes in local scaling rather
than structure.
These findings suggest the system has promise as an interactive graph
exploration tool in a variety of settings
Interactive, tree-based graph visualization
We introduce an interactive graph visualization scheme that allows users to explore graphs by viewing them as a sequence of spanning trees, rather than the entire graph all at once. The user determines which spanning trees are displayed by selecting a vertex from the graph to be the root. Our main contributions are a graph drawing algorithm that generates meaningful representations of graphs using extracted spanning trees, and a graph animation algorithm for creating smooth, continuous transitions between graph drawings. We conduct experiments to measure how well our algorithms visualize graphs and compare them to another visualization scheme
On Predictive Modeling for Optimizing Transaction Execution in Parallel OLTP Systems
A new emerging class of parallel database management systems (DBMS) is
designed to take advantage of the partitionable workloads of on-line
transaction processing (OLTP) applications. Transactions in these systems are
optimized to execute to completion on a single node in a shared-nothing cluster
without needing to coordinate with other nodes or use expensive concurrency
control measures. But some OLTP applications cannot be partitioned such that
all of their transactions execute within a single-partition in this manner.
These distributed transactions access data not stored within their local
partitions and subsequently require more heavy-weight concurrency control
protocols. Further difficulties arise when the transaction's execution
properties, such as the number of partitions it may need to access or whether
it will abort, are not known beforehand. The DBMS could mitigate these
performance issues if it is provided with additional information about
transactions. Thus, in this paper we present a Markov model-based approach for
automatically selecting which optimizations a DBMS could use, namely (1) more
efficient concurrency control schemes, (2) intelligent scheduling, (3) reduced
undo logging, and (4) speculative execution. To evaluate our techniques, we
implemented our models and integrated them into a parallel, main-memory OLTP
DBMS to show that we can improve the performance of applications with diverse
workloads.Comment: VLDB201
Staring into the abyss: An evaluation of concurrency control with one thousand cores
Computer architectures are moving towards an era dominated by many-core machines with dozens or even hundreds of cores on a single chip. This unprecedented level of on-chip parallelism introduces a new dimension to scalability that current database management systems (DBMSs) were not designed for. In particular, as the number of cores increases, the problem of concurrency control becomes extremely challenging. With hundreds of threads running in parallel, the complexity of coordinating competing accesses to data will likely diminish the gains from increased core counts.
To better understand just how unprepared current DBMSs are for future CPU architectures, we performed an evaluation of concurrency control for on-line transaction processing (OLTP) workloads on many-core chips. We implemented seven concurrency control algorithms on a main-memory DBMS and using computer simulations scaled our system to 1024 cores. Our analysis shows that all algorithms fail to scale to this magnitude but for different reasons. In each case, we identify fundamental bottlenecks that are independent of the particular database implementation and argue that even state-of-the-art DBMSs suffer from these limitations. We conclude that rather than pursuing incremental solutions, many-core chips may require a completely redesigned DBMS architecture that is built from ground up and is tightly coupled with the hardware.Intel Corporation (Science and Technology Center for Big Data
An Empirical Evaluation of Columnar Storage Formats
Columnar storage is one of the core components of a modern data analytics
system. Although many database management systems (DBMSs) have proprietary
storage formats, most provide extensive support to open-source storage formats
such as Parquet and ORC to facilitate cross-platform data sharing. But these
formats were developed over a decade ago, in the early 2010s, for the Hadoop
ecosystem. Since then, both the hardware and workload landscapes have changed
significantly.
In this paper, we revisit the most widely adopted open-source columnar
storage formats (Parquet and ORC) with a deep dive into their internals. We
designed a benchmark to stress-test the formats' performance and space
efficiency under different workload configurations. From our comprehensive
evaluation of Parquet and ORC, we identify design decisions advantageous with
modern hardware and real-world data distributions. These include using
dictionary encoding by default, favoring decoding speed over compression ratio
for integer encoding algorithms, making block compression optional, and
embedding finer-grained auxiliary data structures. Our analysis identifies
important considerations that may guide future formats to better fit modern
technology trends
- …