34,371 research outputs found
Distributed Management of Massive Data: an Efficient Fine-Grain Data Access Scheme
This paper addresses the problem of efficiently storing and accessing massive
data blocks in a large-scale distributed environment, while providing efficient
fine-grain access to data subsets. This issue is crucial in the context of
applications in the field of databases, data mining and multimedia. We propose
a data sharing service based on distributed, RAM-based storage of data, while
leveraging a DHT-based, natively parallel metadata management scheme. As
opposed to the most commonly used grid storage infrastructures that provide
mechanisms for explicit data localization and transfer, we provide a
transparent access model, where data are accessed through global identifiers.
Our proposal has been validated through a prototype implementation whose
preliminary evaluation provides promising results
Enabling Lock-Free Concurrent Fine-Grain Access to Massive Distributed Data: Application to Supernovae Detection
We consider the problem of efficiently managing massive data in a large-scale
distributed environment. We consider data strings of size in the order of
Terabytes, shared and accessed by concurrent clients. On each individual
access, a segment of a string, of the order of Megabytes, is read or modified.
Our goal is to provide the clients with efficient fine-grain access the data
string as concurrently as possible, without locking the string itself. This
issue is crucial in the context of applications in the field of astronomy,
databases, data mining and multimedia. We illustrate these requiremens with the
case of an application for searching supernovae. Our solution relies on
distributed, RAM-based data storage, while leveraging a DHT-based, parallel
metadata management scheme. The proposed architecture and algorithms have been
validated through a software prototype and evaluated in a cluster environment
Comb-e-Chem: an e-science research project
The background to the Comb-e-Chem e-Science pilot project funded under the UK-Science Programme is presented and the areas being addresses within chemistry and more specifically combinatorial chemistry are discussed. The ways in which the ideas underlying the application of computer technology can improve the production, analysis and dissemination of chemical information and knowledge in a collaborative environment are discussed
Distributed top-k aggregation queries at large
Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network
Challenging Ubiquitous Inverted Files
Stand-alone ranking systems based on highly optimized inverted file structures are generally considered ‘the’ solution for building search engines. Observing various developments in software and hardware, we argue however that IR research faces a complex engineering problem in the quest for more flexible yet efficient retrieval systems. We propose to base the development of retrieval systems on ‘the database approach’: mapping high-level declarative specifications of the retrieval process into efficient query plans. We present the Mirror DBMS as a prototype implementation of a retrieval system based on this approach
Big Data Model Simulation on a Graph Database for Surveillance in Wireless Multimedia Sensor Networks
Sensors are present in various forms all around the world such as mobile
phones, surveillance cameras, smart televisions, intelligent refrigerators and
blood pressure monitors. Usually, most of the sensors are a part of some other
system with similar sensors that compose a network. One of such networks is
composed of millions of sensors connect to the Internet which is called
Internet of things (IoT). With the advances in wireless communication
technologies, multimedia sensors and their networks are expected to be major
components in IoT. Many studies have already been done on wireless multimedia
sensor networks in diverse domains like fire detection, city surveillance,
early warning systems, etc. All those applications position sensor nodes and
collect their data for a long time period with real-time data flow, which is
considered as big data. Big data may be structured or unstructured and needs to
be stored for further processing and analyzing. Analyzing multimedia big data
is a challenging task requiring a high-level modeling to efficiently extract
valuable information/knowledge from data. In this study, we propose a big
database model based on graph database model for handling data generated by
wireless multimedia sensor networks. We introduce a simulator to generate
synthetic data and store and query big data using graph model as a big
database. For this purpose, we evaluate the well-known graph-based NoSQL
databases, Neo4j and OrientDB, and a relational database, MySQL.We have run a
number of query experiments on our implemented simulator to show that which
database system(s) for surveillance in wireless multimedia sensor networks is
efficient and scalable
Image mining: trends and developments
[Abstract]: Advances in image acquisition and storage technology have led to tremendous growth in very large and detailed image databases. These images, if analyzed, can reveal useful information to the human users. Image mining deals with the extraction of implicit knowledge, image data relationship, or other patterns not explicitly stored in the images. Image mining is more than just an extension of data mining to image domain. It is an interdisciplinary endeavor that draws upon expertise in computer vision, image processing, image retrieval, data mining, machine learning, database, and artificial intelligence. In this paper, we will examine the research issues in image mining, current developments in image mining, particularly, image mining frameworks, state-of-the-art techniques and systems. We will also identify some future research directions for image mining
Recommended from our members
Hierarchical classification for multiple, distributed web databases
The proliferation of online information resources increases the importance of effective and efficient distributed searching. Our research aims to provide an alternative hierarchical categorization and search capability based on a Bayesian network learning algorithm. Our proposed approach, which is grounded on automatic textual analysis of subject content of online web databases, attempts to address the database selection problem by first classifying web databases into a hierarchy of topic categories. The experimental results reported demonstrate that such a classification approach not only effectively reduces the class search space, but also helps to significantly improve the accuracy of classification performance
- …