Search CORE

42,375 research outputs found

The Case for Learned Index Structures

Author: Abadi M.
Armbrust M.
Böhm M.
Chang F.
Goodfellow I.
Grossi R.
Lehman T. J.
Litwin W.
Magdon-Ismail M.
Miller D. J.
Moerkotte G.
Sutskever I.
You S.
Publication venue
Publication date: 30/04/2018
Field of study

Indexes are models: a B-Tree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not. In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term learned indexes. The key idea is that a model can learn the sort order or structure of lookup keys and use this signal to effectively predict the position or existence of records. We theoretically analyze under which conditions learned indexes outperform traditional index structures and describe the main challenges in designing learned index structures. Our initial results show, that by using neural nets we are able to outperform cache-optimized B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over several real-world data sets. More importantly though, we believe that the idea of replacing core components of a data management system through learned models has far reaching implications for future systems designs and that this work just provides a glimpse of what might be possible

arXiv.org e-Print Archive

Crossref

When private set intersection meets big data : an efficient and scalable protocol

Author: Changyu Dong
Hewlett Packard Labs
Liqun Chen
Zikai Wen
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Large scale data processing brings new challenges to the design of privacy-preserving protocols: how to meet the increasing requirements of speed and throughput of modern applications, and how to scale up smoothly when data being protected is big. Efficiency and scalability become critical criteria for privacy preserving protocols in the age of Big Data. In this paper, we present a new Private Set Intersection (PSI) protocol that is extremely efficient and highly scalable compared with existing protocols. The protocol is based on a novel approach that we call oblivious Bloom intersection. It has linear complexity and relies mostly on efficient symmetric key operations. It has high scalability due to the fact that most operations can be parallelized easily. The protocol has two versions: a basic protocol and an enhanced protocol, the security of the two variants is analyzed and proved in the semi-honest model and the malicious model respectively. A prototype of the basic protocol has been built. We report the result of performance evaluation and compare it against the two previously fastest PSI protocols. Our protocol is orders of magnitude faster than these two protocols. To compute the intersection of two million-element sets, our protocol needs only 41 seconds (80-bit security) and 339 seconds (256-bit security) on moderate hardware in parallel mode

CiteSeerX

Crossref

University of Strathclyde Institutional Repository

Quality Assessment of Linked Datasets using Probabilistic Approximation

Author: A Hogan
AZ Broder
BH Bloom
C Guéret
JS Vitter
P Hitzler
Publication venue
Publication date: 17/03/2015
Field of study

With the increasing application of Linked Open Data, assessing the quality of datasets by computing quality metrics becomes an issue of crucial importance. For large and evolving datasets, an exact, deterministic computation of the quality metrics is too time consuming or expensive. We employ probabilistic techniques such as Reservoir Sampling, Bloom Filters and Clustering Coefficient estimation for implementing a broad set of data quality metrics in an approximate but sufficiently accurate way. Our implementation is integrated in the comprehensive data quality assessment framework Luzzu. We evaluated its performance and accuracy on Linked Open Datasets of broad relevance.Comment: 15 pages, 2 figures, To appear in ESWC 2015 proceeding

arXiv.org e-Print Archive

Crossref

Fraunhofer-ePrints

Efficient and Privacy-Preserving Ride Sharing Organization for Transferable and Non-Transferable Services

Author: Abdallah Mohamed
Alsharif Ahmad
Mahmoud Mohamed
Nabil Mahmoud
Sherif Ahmed
Publication venue
Publication date: 24/05/2019
Field of study

Ride-sharing allows multiple persons to share their trips together in one vehicle instead of using multiple vehicles. This can reduce the number of vehicles in the street, which consequently can reduce air pollution, traffic congestion and transportation cost. However, a ride-sharing organization requires passengers to report sensitive location information about their trips to a trip organizing server (TOS) which creates a serious privacy issue. In addition, existing ride-sharing schemes are non-flexible, i.e., they require a driver and a rider to have exactly the same trip to share a ride. Moreover, they are non-scalable, i.e., inefficient if applied to large geographic areas. In this paper, we propose two efficient privacy-preserving ride-sharing organization schemes for Non-transferable Ride-sharing Services (NRS) and Transferable Ride-sharing Services (TRS). In the NRS scheme, a rider can share a ride from its source to destination with only one driver whereas, in TRS scheme, a rider can transfer between multiple drivers while en route until he reaches his destination. In both schemes, the ride-sharing area is divided into a number of small geographic areas, called cells, and each cell has a unique identifier. Each driver/rider should encrypt his trip's data and send an encrypted ride-sharing offer/request to the TOS. In NRS scheme, Bloom filters are used to compactly represent the trip information before encryption. Then, the TOS can measure the similarity between the encrypted trips data to organize shared rides without revealing either the users' identities or the location information. In TRS scheme, drivers report their encrypted routes, an then the TOS builds an encrypted directed graph that is passed to a modified version of Dijkstra's shortest path algorithm to search for an optimal path of rides that can achieve a set of preferences defined by the riders

arXiv.org e-Print Archive

Aquila Digital Community (University of Southern Mississippi, USM)

Development of the premixing injector in burner system

Author: Sies Mohamad Farid
Publication venue
Publication date: 01/06/2013
Field of study

The alternative fuel is good attention especially for renewable and prevention energy such as biodiesel. Biodiesel fuel (BDF) has a potential for external combustion. BDF is one of the hydrocarbon fuels. Palm oil Biodiesel is free from sulfur and produced by esterification and transesterification reaction of vegetable oil with low molecular weight alcohol, such as ethanol or methanol. The objectives of this research are design the mixing injector fuel and water-fuel emulsion with air for open burner and analyze the behavior of mixture spray formation between fuel (DF and BDF) and water-fuel emulsion. Premix injector use for external combustion especially open burner system. The disadvantages of BDF are high toxic emissions such as NOx, CO and particular matter (PM) and but it can reduced the performance of burner system. High toxic emission can be solved by using a new concept injector with mixing fuel-water emulsion and air. The additional water for combustion process can reduce the NOx emissions, soot, and the flame temperature. This research focuses the Spray angle, penetration, and flame length with secondary and without secondary air. CPO biodiesel has longer penetration length and spray area than diesel, but the spray angle is smaller than diesel. The different of flame Image between pure fuel and water mix with fuel is the flame color. Water mix with fuel has brightness color and shorter flame than pure fuel

UTHM Institutional Repository