Search CORE

1,691 research outputs found

A grid-based infrastructure for distributed retrieval

Author: A. Guttman
D. Kossmann
D.C. Blair
F. Simeoni
I. Foster
J. Callan
J. Risson
J.P. Callan
J.P. Callan
L. Si
L. Si
M. Kobayashi
M. Stonebraker
P. Niblett
R.R. Larson
W. Sun
Y. Manolopoulos
Y.E. Ioannidis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

In large-scale distributed retrieval, challenges of latency, heterogeneity, and dynamicity emphasise the importance of infrastructural support in reducing the development costs of state-of-the-art solutions. We present a service-based infrastructure for distributed retrieval which blends middleware facilities and a design framework to ‘lift’ the resource sharing approach and the computational services of a European Grid platform into the domain of e-Science applications. In this paper, we give an overview of the DILIGENT Search Framework and illustrate its exploitation in the ﬁeld of Earth Science

Real-Time Data Processing With Lambda Architecture

Author: Malusare Omkar Ashok
Publication venue: SJSU ScholarWorks
Publication date: 20/05/2019
Field of study

Data has evolved immensely in recent years, in type, volume and velocity. There are several frameworks to handle the big data applications. The project focuses on the Lambda Architecture proposed by Marz and its application to obtain real-time data processing. The architecture is a solution that unites the benefits of the batch and stream processing techniques. Data can be historically processed with high precision and involved algorithms without loss of short-term information, alerts and insights. Lambda Architecture has an ability to serve a wide range of use cases and workloads that withstands hardware and human mistakes. The layered architecture enhances loose coupling and flexibility in the system. This a huge benefit that allows understanding the trade-offs and application of various tools and technologies across the layers. There has been an advancement in the approach of building the LA due to improvements in the underlying tools. The project demonstrates a simplified architecture for the LA that is maintainable

SJSU ScholarWorks

Data management in cloud environments: NoSQL and NewSQL data stores

Author: Capretz Miriam AM
Grolinger Katarina
Higashino Wilson A
Tiwari Abhinav
Publication venue: Scholarship@Western
Publication date: 01/01/2013
Field of study

: Advances in Web technology and the proliferation of mobile devices and sensors connected to the Internet have resulted in immense processing and storage requirements. Cloud computing has emerged as a paradigm that promises to meet these requirements. This work focuses on the storage aspect of cloud computing, specifically on data management in cloud environments. Traditional relational databases were designed in a different hardware and software era and are facing challenges in meeting the performance and scale requirements of Big Data. NoSQL and NewSQL data stores present themselves as alternatives that can handle huge volume of data. Because of the large number and diversity of existing NoSQL and NewSQL solutions, it is difficult to comprehend the domain and even more challenging to choose an appropriate solution for a specific task. Therefore, this paper reviews NoSQL and NewSQL solutions with the objective of: (1) providing a perspective in the field, (2) providing guidance to practitioners and researchers to choose the appropriate data store, and (3) identifying challenges and opportunities in the field. Specifically, the most prominent solutions are compared focusing on data models, querying, scaling, and security related capabilities. Features driving the ability to scale read requests and write requests, or scaling data storage are investigated, in particular partitioning, replication, consistency, and concurrency control. Furthermore, use cases and scenarios in which NoSQL and NewSQL data stores have been used are discussed and the suitability of various solutions for different sets of applications is examined. Consequently, this study has identified challenges in the field, including the immense diversity and inconsistency of terminologies, limited documentation, sparse comparison and benchmarking criteria, and nonexistence of standardized query languages

Scholarship@Western

Springer - Publisher Connector

Indexing of fictional video content for event detection and summarisation

Author: Lee Hyowon
Lehane Bart
O'Connor Noel E.
Smeaton Alan F.
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2007
Field of study

This paper presents an approach to movie video indexing that utilises audiovisual analysis to detect important and meaningful temporal video segments, that we term events. We consider three event classes, corresponding to dialogues, action sequences, and montages, where the latter also includes musical sequences. These three event classes are intuitive for a viewer to understand and recognise whilst accounting for over 90% of the content of most movies. To detect events we leverage traditional filmmaking principles and map these to a set of computable low-level audiovisual features. Finite state machines (FSMs) are used to detect when temporal sequences of specific features occur. A set of heuristics, again inspired by filmmaking conventions, are then applied to the output of multiple FSMs to detect the required events. A movie search system, named MovieBrowser, built upon this approach is also described. The overall approach is evaluated against a ground truth of over twenty-three hours of movie content drawn from various genres and consistently obtains high precision and recall for all event classes. A user experiment designed to evaluate the usefulness of an event-based structure for both searching and browsing movie archives is also described and the results indicate the usefulness of the proposed approach

CiteSeerX

Springer - Publisher Connector

Directory of Open Access Journals

The NASA Astrophysics Data System: Architecture

Author: Accomazzi A.
Eichhorn G.
Grant C. S.
Kurtz M. J.
Murray S. S.
Publication venue: 'EDP Sciences'
Publication date: 04/02/2000
Field of study

The powerful discovery capabilities available in the ADS bibliographic services are possible thanks to the design of a flexible search and retrieval system based on a relational database model. Bibliographic records are stored as a corpus of structured documents containing fielded data and metadata, while discipline-specific knowledge is segregated in a set of files independent of the bibliographic data itself. The creation and management of links to both internal and external resources associated with each bibliography in the database is made possible by representing them as a set of document properties and their attributes. To improve global access to the ADS data holdings, a number of mirror sites have been created by cloning the database contents and software on a variety of hardware and software platforms. The procedures used to create and manage the database and its mirrors have been written as a set of scripts that can be run in either an interactive or unsupervised fashion. The ADS can be accessed at http://adswww.harvard.eduComment: 25 pages, 8 figures, 3 table

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

DR.SGX: Hardening SGX Enclaves against Cache Attacks with Data Location Randomization

Author: Aline Souza (4218802)
Angélica Oliveira Pontes (4218796)
Camila Sanchez (4218811)
Diego Riaño-Pachón (3430481)
Eliane Santana (4218793)
Guilherme Zanini (4218805)
Gustavo Borin (4218808)
Gustavo Goldman (271482)
Juliana Oliveira (3747553)
Renato Santos (4218790)
Roberta Dal’Mas (4218799)
Publication venue
Publication date: 28/09/2017
Field of study

Recent research has demonstrated that Intel's SGX is vulnerable to various software-based side-channel attacks. In particular, attacks that monitor CPU caches shared between the victim enclave and untrusted software enable accurate leakage of secret enclave data. Known defenses assume developer assistance, require hardware changes, impose high overhead, or prevent only some of the known attacks. In this paper we propose data location randomization as a novel defensive approach to address the threat of side-channel attacks. Our main goal is to break the link between the cache observations by the privileged adversary and the actual data accesses by the victim. We design and implement a compiler-based tool called DR.SGX that instruments enclave code such that data locations are permuted at the granularity of cache lines. We realize the permutation with the CPU's cryptographic hardware-acceleration units providing secure randomization. To prevent correlation of repeated memory accesses we continuously re-randomize all enclave data during execution. Our solution effectively protects many (but not all) enclaves from cache attacks and provides a complementary enclave hardening technique that is especially useful against unpredictable information leakage

arXiv.org e-Print Archive

FigShare