Search CORE

954 research outputs found

The Family of MapReduce and Large Scale Data Processing Systems

Author: Anna Liu
Ayman G. Fayoumi
King Abdulaziz
See Profile
Sherif Sakr
Sherif Sakr
South Wales
South Wales
Publication venue
Publication date: 12/02/2013
Field of study

In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

arXiv.org e-Print Archive

CiteSeerX

Big Data Management Challenges, Approaches, Tools and their limitations

Author: Adiba Michel
Castrejon-Castillo Juan-Carlos
Espinosa Oviedo Javier Alfonso
Vargas-Solar Genoveva
Zechinelli-Martini José-Luis
Publication venue: Chapman and Hall/CRC
Publication date: 01/02/2016
Field of study

International audienceBig Data is the buzzword everyone talks about. Independently of the application domain, today there is a consensus about the V's characterizing Big Data: Volume, Variety, and Velocity. By focusing on Data Management issues and past experiences in the area of databases systems, this chapter examines the main challenges involved in the three V's of Big Data. Then it reviews the main characteristics of existing solutions for addressing each of the V's (e.g., NoSQL, parallel RDBMS, stream data management systems and complex event processing systems). Finally, it provides a classification of different functions offered by NewSQL systems and discusses their benefits and limitations for processing Big Data

Hal - Université Grenoble Alpes

A standard methodology for the interoperability of heterogeneous information sources.

Author: Ashir Jehad Saleh
Publication venue: 'De Montfort University'
Publication date: 01/01/2001
Field of study

De Montfort University Open Research Archive

Development of a parallel database environment

Author: Tranter Mette
Publication venue: The University of Edinburgh
Publication date: 01/01/2000
Field of study

Edinburgh Research Archive

ACTiCLOUD: Enabling the Next Generation of Cloud Applications

Author: Attwood A.
Elmroth E.
Flouris M.
Foutris N.
Goodacre J.
Goumas G.
Grohmann D.
Karakostas V.
Kersten M.
Kotselidis C.
Koutsourakis P.
Koziris N.
Lakew E.B.
Lee K.
Liu L.
Lujàn M.
Nikas K.
Rustad E.
Thomson J.
Tomás L.
Vesterkjaer A.
Webber J.
Zhang Y.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Despite their proliferation as a dominant computing paradigm, cloud computing systems lack effective mechanisms to manage their vast amounts of resources efficiently. Resources are stranded and fragmented, ultimately limiting cloud systems' applicability to large classes of critical applications that pose non-moderate resource demands. Eliminating current technological barriers of actual fluidity and scalability of cloud resources is essential to strengthen cloud computing's role as a critical cornerstone for the digital economy. ACTiCLOUD proposes a novel cloud architecture that breaks the existing scale-up and share-nothing barriers and enables the holistic management of physical resources both at the local cloud site and at distributed levels. Specifically, it makes advancements in the cloud resource management stacks by extending state-of-the-art hypervisor technology beyond the physical server boundary and localized cloud management system to provide a holistic resource management within a rack, within a site, and across distributed cloud sites. On top of this, ACTiCLOUD will adapt and optimize system libraries and runtimes (e.g., JVM) as well as ACTiCLOUD-native applications, which are extremely demanding, and critical classes of applications that currently face severe difficulties in matching their resource requirements to state-of-the-art cloud offerings

Crossref

The University of Manchester - Institutional Repository

International Migration, Integration and Social Cohesion online publications

Upstream database and digital asset management in variable data printing

Author: Barzelay Nicholas
Frey Franziska
Publication venue: RIT Scholar Works
Publication date: 01/01/2008
Field of study

This study outlines the upstream database and digital asset management issues for variable data printing. The goal is to clarify what work environment and processes are needed during digital asset and data preparation. A literature review was conducted and complemented with the experiential experience of establishing and using a variable data preparation and testing platform

RIT Scholar Works

A software architecture for electro-mobility services: a milestone for sustainable remote vehicle capabilities

Author: Poaka Poaka Vladivy
Publication venue
Publication date: 26/10/2022
Field of study

To face the tough competition, changing markets and technologies in automotive industry, automakers have to be highly innovative. In the previous decades, innovations were electronics and IT-driven, which increased exponentially the complexity of vehicle’s internal network. Furthermore, the growing expectations and preferences of customers oblige these manufacturers to adapt their business models and to also propose mobility-based services. One other hand, there is also an increasing pressure from regulators to significantly reduce the environmental footprint in transportation and mobility, down to zero in the foreseeable future. This dissertation investigates an architecture for communication and data exchange within a complex and heterogeneous ecosystem. This communication takes place between various third-party entities on one side, and between these entities and the infrastructure on the other. The proposed solution reduces considerably the complexity of vehicle communication and within the parties involved in the ODX life cycle. In such an heterogeneous environment, a particular attention is paid to the protection of confidential and private data. Confidential data here refers to the OEM’s know-how which is enclosed in vehicle projects. The data delivered by a car during a vehicle communication session might contain private data from customers. Our solution ensures that every entity of this ecosystem has access only to data it has the right to. We designed our solution to be non-technological-coupling so that it can be implemented in any platform to benefit from the best environment suited for each task. We also proposed a data model for vehicle projects, which improves query time during a vehicle diagnostic session. The scalability and the backwards compatibility were also taken into account during the design phase of our solution. We proposed the necessary algorithms and the workflow to perform an efficient vehicle diagnostic with considerably lower latency and substantially better complexity time and space than current solutions. To prove the practicality of our design, we presented a prototypical implementation of our design. Then, we analyzed the results of a series of tests we performed on several vehicle models and projects. We also evaluated the prototype against quality attributes in software engineering

Publikationsserver der Technischen Universität Clausthal