19,668 research outputs found
Diluting the Scalability Boundaries: Exploring the Use of Disaggregated Architectures for High-Level Network Data Analysis
Traditional data centers are designed with a rigid architecture of
fit-for-purpose servers that provision resources beyond the average workload in
order to deal with occasional peaks of data. Heterogeneous data centers are
pushing towards more cost-efficient architectures with better resource
provisioning. In this paper we study the feasibility of using disaggregated
architectures for intensive data applications, in contrast to the monolithic
approach of server-oriented architectures. Particularly, we have tested a
proactive network analysis system in which the workload demands are highly
variable. In the context of the dReDBox disaggregated architecture, the results
show that the overhead caused by using remote memory resources is significant,
between 66\% and 80\%, but we have also observed that the memory usage is one
order of magnitude higher for the stress case with respect to average
workloads. Therefore, dimensioning memory for the worst case in conventional
systems will result in a notable waste of resources. Finally, we found that,
for the selected use case, parallelism is limited by memory. Therefore, using a
disaggregated architecture will allow for increased parallelism, which, at the
same time, will mitigate the overhead caused by remote memory.Comment: 8 pages, 6 figures, 2 tables, 32 references. Pre-print. The paper
will be presented during the IEEE International Conference on High
Performance Computing and Communications in Bangkok, Thailand. 18 - 20
December, 2017. To be published in the conference proceeding
The End of Slow Networks: It's Time for a Redesign
Next generation high-performance RDMA-capable networks will require a
fundamental rethinking of the design and architecture of modern distributed
DBMSs. These systems are commonly designed and optimized under the assumption
that the network is the bottleneck: the network is slow and "thin", and thus
needs to be avoided as much as possible. Yet this assumption no longer holds
true. With InfiniBand FDR 4x, the bandwidth available to transfer data across
network is in the same ballpark as the bandwidth of one memory channel, and it
increases even further with the most recent EDR standard. Moreover, with the
increasing advances of RDMA, the latency improves similarly fast. In this
paper, we first argue that the "old" distributed database design is not capable
of taking full advantage of the network. Second, we propose architectural
redesigns for OLTP, OLAP and advanced analytical frameworks to take better
advantage of the improved bandwidth, latency and RDMA capabilities. Finally,
for each of the workload categories, we show that remarkable performance
improvements can be achieved
The End of a Myth: Distributed Transactions Can Scale
The common wisdom is that distributed transactions do not scale. But what if
distributed transactions could be made scalable using the next generation of
networks and a redesign of distributed databases? There would be no need for
developers anymore to worry about co-partitioning schemes to achieve decent
performance. Application development would become easier as data placement
would no longer determine how scalable an application is. Hardware provisioning
would be simplified as the system administrator can expect a linear scale-out
when adding more machines rather than some complex sub-linear function, which
is highly application specific.
In this paper, we present the design of our novel scalable database system
NAM-DB and show that distributed transactions with the very common Snapshot
Isolation guarantee can indeed scale using the next generation of RDMA-enabled
network technology without any inherent bottlenecks. Our experiments with the
TPC-C benchmark show that our system scales linearly to over 6.5 million
new-order (14.5 million total) distributed transactions per second on 56
machines.Comment: 12 page
Preserving Both Privacy and Utility in Network Trace Anonymization
As network security monitoring grows more sophisticated, there is an
increasing need for outsourcing such tasks to third-party analysts. However,
organizations are usually reluctant to share their network traces due to
privacy concerns over sensitive information, e.g., network and system
configuration, which may potentially be exploited for attacks. In cases where
data owners are convinced to share their network traces, the data are typically
subjected to certain anonymization techniques, e.g., CryptoPAn, which replaces
real IP addresses with prefix-preserving pseudonyms. However, most such
techniques either are vulnerable to adversaries with prior knowledge about some
network flows in the traces, or require heavy data sanitization or
perturbation, both of which may result in a significant loss of data utility.
In this paper, we aim to preserve both privacy and utility through shifting the
trade-off from between privacy and utility to between privacy and computational
cost. The key idea is for the analysts to generate and analyze multiple
anonymized views of the original network traces; those views are designed to be
sufficiently indistinguishable even to adversaries armed with prior knowledge,
which preserves the privacy, whereas one of the views will yield true analysis
results privately retrieved by the data owner, which preserves the utility. We
present the general approach and instantiate it based on CryptoPAn. We formally
analyze the privacy of our solution and experimentally evaluate it using real
network traces provided by a major ISP. The results show that our approach can
significantly reduce the level of information leakage (e.g., less than 1\% of
the information leaked by CryptoPAn) with comparable utility
- …