954 research outputs found
The Family of MapReduce and Large Scale Data Processing Systems
In the last two decades, the continuous increase of computational power has
produced an overwhelming flow of data which has called for a paradigm shift in
the computing architecture and large scale data processing mechanisms.
MapReduce is a simple and powerful programming model that enables easy
development of scalable parallel applications to process vast amounts of data
on large clusters of commodity machines. It isolates the application from the
details of running a distributed program such as issues on data distribution,
scheduling and fault tolerance. However, the original implementation of the
MapReduce framework had some limitations that have been tackled by many
research efforts in several followup works after its introduction. This article
provides a comprehensive survey for a family of approaches and mechanisms of
large scale data processing mechanisms that have been implemented based on the
original idea of the MapReduce framework and are currently gaining a lot of
momentum in both research and industrial communities. We also cover a set of
introduced systems that have been implemented to provide declarative
programming interfaces on top of the MapReduce framework. In addition, we
review several large scale data processing systems that resemble some of the
ideas of the MapReduce framework for different purposes and application
scenarios. Finally, we discuss some of the future research directions for
implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
Big Data Management Challenges, Approaches, Tools and their limitations
International audienceBig Data is the buzzword everyone talks about. Independently of the application domain, today there is a consensus about the V's characterizing Big Data: Volume, Variety, and Velocity. By focusing on Data Management issues and past experiences in the area of databases systems, this chapter examines the main challenges involved in the three V's of Big Data. Then it reviews the main characteristics of existing solutions for addressing each of the V's (e.g., NoSQL, parallel RDBMS, stream data management systems and complex event processing systems). Finally, it provides a classification of different functions offered by NewSQL systems and discusses their benefits and limitations for processing Big Data
ACTiCLOUD: Enabling the Next Generation of Cloud Applications
Despite their proliferation as a dominant computing paradigm, cloud computing systems lack effective mechanisms to manage their vast amounts of resources efficiently. Resources are stranded and fragmented, ultimately limiting cloud systems' applicability to large classes of critical applications that pose non-moderate resource demands. Eliminating current technological barriers of actual fluidity and scalability of cloud resources is essential to strengthen cloud computing's role as a critical cornerstone for the digital economy. ACTiCLOUD proposes a novel cloud architecture that breaks the existing scale-up and share-nothing barriers and enables the holistic management of physical resources both at the local cloud site and at distributed levels. Specifically, it makes advancements in the cloud resource management stacks by extending state-of-the-art hypervisor technology beyond the physical server boundary and localized cloud management system to provide a holistic resource management within a rack, within a site, and across distributed cloud sites. On top of this, ACTiCLOUD will adapt and optimize system libraries and runtimes (e.g., JVM) as well as ACTiCLOUD-native applications, which are extremely demanding, and critical classes of applications that currently face severe difficulties in matching their resource requirements to state-of-the-art cloud offerings
Upstream database and digital asset management in variable data printing
This study outlines the upstream database and digital asset management issues for variable data printing. The goal is to clarify what work environment and processes are needed during digital asset and data preparation. A literature review was conducted and complemented with the experiential experience of establishing and using a variable data preparation and testing platform
A software architecture for electro-mobility services: a milestone for sustainable remote vehicle capabilities
To face the tough competition, changing markets and technologies in automotive industry,
automakers have to be highly innovative. In the previous decades, innovations were
electronics and IT-driven, which increased exponentially the complexity of vehicle’s internal
network. Furthermore, the growing expectations and preferences of customers oblige these
manufacturers to adapt their business models and to also propose mobility-based services.
One other hand, there is also an increasing pressure from regulators to significantly reduce
the environmental footprint in transportation and mobility, down to zero in the foreseeable
future.
This dissertation investigates an architecture for communication and data exchange
within a complex and heterogeneous ecosystem. This communication takes place between
various third-party entities on one side, and between these entities and the infrastructure
on the other. The proposed solution reduces considerably the complexity of vehicle
communication and within the parties involved in the ODX life cycle. In such an
heterogeneous environment, a particular attention is paid to the protection of confidential
and private data. Confidential data here refers to the OEM’s know-how which is enclosed
in vehicle projects. The data delivered by a car during a vehicle communication session
might contain private data from customers. Our solution ensures that every entity of this
ecosystem has access only to data it has the right to. We designed our solution to be
non-technological-coupling so that it can be implemented in any platform to benefit from
the best environment suited for each task. We also proposed a data model for vehicle
projects, which improves query time during a vehicle diagnostic session. The scalability and
the backwards compatibility were also taken into account during the design phase of our
solution.
We proposed the necessary algorithms and the workflow to perform an efficient vehicle
diagnostic with considerably lower latency and substantially better complexity time and
space than current solutions. To prove the practicality of our design, we presented a
prototypical implementation of our design. Then, we analyzed the results of a series of tests
we performed on several vehicle models and projects. We also evaluated the prototype
against quality attributes in software engineering
- …