1,936 research outputs found
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
Data sharing in DHT based P2P systems
International audienceThe evolution of peer-to-peer (P2P) systems triggered the building of large scale distributed applications. The main application domain is data sharing across a very large number of highly autonomous participants. Building such data sharing systems is particularly challenging because of the "extreme" characteristics of P2P infrastructures: massive distribution, high churn rate, no global control, potentially untrusted participants... This article focuses on declarative querying support, query optimization and data privacy on a major class of P2P systems, that based on Distributed Hash Table (P2P DHT). The usual approaches and the algorithms used by classic distributed systems and databases forproviding data privacy and querying services are not well suited to P2P DHT systems. A considerable amount of work was required to adapt them for the new challenges such systems present. This paper describes the most important solutions found. It also identies important future research trends in data management in P2P DHT systems
Recommended from our members
Constructing secure service compositions with patterns
In service based applications, it is often necessary to construct compositions of services in order to provide required functionality in cases where this is not possible through the use of a single service. Whilst creating service compositions, it is necessary to ensure not only that the functionality required of the composition is achieved but also that certain security properties are preserved. In this paper, we describe an approach to constructing secure service compositions. Our approach is based on the use of composition patterns and rules that determine the security properties that should be preserved by the individual services that constitute a composition in order to ensure that security properties of the overall composition are also satisfied. Our approach extends a framework developed to support the runtime service discovery
Hybrid approach for XML access control (HyXAC)
While XML has been widely adopted for sharing and managing information over the Internet, the need for efficient XML access control naturally arise. Various access control models and mechanisms have been proposed in the research community, such as view-based approaches and preprocessing approaches. All categories of solutions have their inherent advantages and disadvantages. For instance, view based approach provides high performance in query evaluation, but suffers from the view maintenance issues. To remedy the problems, we propose a hybrid approach, namely HyXAC: Hybrid XML Access Control. HyXAC provides efficient access control and query processing by maximizing the utilization of available (but constrained) resources. HyXAC uses pre-processing approach as a baseline to process queries and define sub-views. It dynamically allocates the available resources (memory and secondary storage) to materialize sub-views to improve query performance. Dynamic and fine-grained view management is introduced to utilize cost-effectiveness analysis for optimal query performance. Fine-grained view management also allows sub-views to be shared across multiple roles to eliminate the redundancies in storage
Semantic Query Reformulation in Social PDMS
We consider social peer-to-peer data management systems (PDMS), where each
peer maintains both semantic mappings between its schema and some
acquaintances, and social links with peer friends. In this context,
reformulating a query from a peer's schema into other peer's schemas is a hard
problem, as it may generate as many rewritings as the set of mappings from that
peer to the outside and transitively on, by eventually traversing the entire
network. However, not all the obtained rewritings are relevant to a given
query. In this paper, we address this problem by inspecting semantic mappings
and social links to find only relevant rewritings. We propose a new notion of
'relevance' of a query with respect to a mapping, and, based on this notion, a
new semantic query reformulation approach for social PDMS, which achieves great
accuracy and flexibility. To find rapidly the most interesting mappings, we
combine several techniques: (i) social links are expressed as FOAF (Friend of a
Friend) links to characterize peer's friendship and compact mapping summaries
are used to obtain mapping descriptions; (ii) local semantic views are special
views that contain information about external mappings; and (iii) gossiping
techniques improve the search of relevant mappings. Our experimental
evaluation, based on a prototype on top of PeerSim and a simulated network
demonstrate that our solution yields greater recall, compared to traditional
query translation approaches proposed in the literature.Comment: 29 pages, 8 figures, query rewriting in PDM
GridIMAGE: A Novel Use of Grid Computing to Support Interactive Human and Computer-Assisted Detection Decision Support
This paper describes a Grid-aware image reviewing system (GridIMAGE) that allows practitioners to (a) select images from multiple geographically distributed digital imaging and communication in medicine (DICOM) servers, (b) send those images to a specified group of human readers and computer-assisted detection (CAD) algorithms, and (c) obtain and compare interpretations from human readers and CAD algorithms. The currently implemented system was developed using the National Cancer Institute caGrid infrastructure and is designed to support the identification of lung nodules on thoracic computed tomography. However, the infrastructure is general and can support any type of distributed review. caGrid data and analytical services are used to link DICOM image databases and CAD systems and to interact with human readers. Moreover, the service-oriented and distributed structure of the GridIMAGE framework enables a flexible system, which can be deployed in an institution (linking multiple DICOM servers and CAD algorithms) and in a Grid environment (linking the resources of collaborating research groups). GridIMAGE provides a framework that allows practitioners to obtain interpretations from one or more human readers or CAD algorithms. It also provides a mechanism to allow cooperative imaging groups to systematically perform image interpretation tasks associated with research protocols
- …