9,322 research outputs found
Efficient Data Streaming Analytic Designs for Parallel and Distributed Processing
Today, ubiquitously sensing technologies enable inter-connection of physical\ua0objects, as part of Internet of Things (IoT), and provide massive amounts of\ua0data streams. In such scenarios, the demand for timely analysis has resulted in\ua0a shift of data processing paradigms towards continuous, parallel, and multitier\ua0computing. However, these paradigms are followed by several challenges\ua0especially regarding analysis speed, precision, costs, and deterministic execution.\ua0This thesis studies a number of such challenges to enable efficient continuous\ua0processing of streams of data in a decentralized and timely manner.In the first part of the thesis, we investigate techniques aiming at speeding\ua0up the processing without a loss in precision. The focus is on continuous\ua0machine learning/data mining types of problems, appearing commonly in IoT\ua0applications, and in particular continuous clustering and monitoring, for which\ua0we present novel algorithms; (i) Lisco, a sequential algorithm to cluster data\ua0points collected by LiDAR (a distance sensor that creates a 3D mapping of the\ua0environment), (ii) p-Lisco, the parallel version of Lisco to enhance pipeline- and\ua0data-parallelism of the latter, (iii) pi-Lisco, the parallel and incremental version\ua0to reuse the information and prevent redundant computations, (iv) g-Lisco, a\ua0generalized version of Lisco to cluster any data with spatio-temporal locality\ua0by leveraging the implicit ordering of the data, and (v) Amble, a continuous\ua0monitoring solution in an industrial process.In the second part, we investigate techniques to reduce the analysis costs\ua0in addition to speeding up the processing while also supporting deterministic\ua0execution. The focus is on problems associated with availability and utilization\ua0of computing resources, namely reducing the volumes of data, involving\ua0concurrent computing elements, and adjusting the level of concurrency. For\ua0that, we propose three frameworks; (i) DRIVEN, a framework to continuously\ua0compress the data and enable efficient transmission of the compact data in the\ua0processing pipeline, (ii) STRATUM, a framework to continuously pre-process\ua0the data before transferring the later to upper tiers for further processing, and\ua0(iii) STRETCH, a framework to enable instantaneous elastic reconfigurations\ua0to adjust intra-node resources at runtime while ensuring determinism.The algorithms and frameworks presented in this thesis contribute to an\ua0efficient processing of data streams in an online manner while utilizing available\ua0resources. Using extensive evaluations, we show the efficiency and achievements\ua0of the proposed techniques for IoT representative applications that involve a\ua0wide spectrum of platforms, and illustrate that the performance of our work\ua0exceeds that of state-of-the-art techniques
HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges
High Performance Computing (HPC) clouds are becoming an alternative to
on-premise clusters for executing scientific applications and business
analytics services. Most research efforts in HPC cloud aim to understand the
cost-benefit of moving resource-intensive applications from on-premise
environments to public cloud platforms. Industry trends show hybrid
environments are the natural path to get the best of the on-premise and cloud
resources---steady (and sensitive) workloads can run on on-premise resources
and peak demand can leverage remote resources in a pay-as-you-go manner.
Nevertheless, there are plenty of questions to be answered in HPC cloud, which
range from how to extract the best performance of an unknown underlying
platform to what services are essential to make its usage easier. Moreover, the
discussion on the right pricing and contractual models to fit small and large
users is relevant for the sustainability of HPC clouds. This paper brings a
survey and taxonomy of efforts in HPC cloud and a vision on what we believe is
ahead of us, including a set of research challenges that, once tackled, can
help advance businesses and scientific discoveries. This becomes particularly
relevant due to the fast increasing wave of new HPC applications coming from
big data and artificial intelligence.Comment: 29 pages, 5 figures, Published in ACM Computing Surveys (CSUR
Split and Migrate: Resource-Driven Placement and Discovery of Microservices at the Edge
Microservices architectures combine the use of fine-grained and independently-scalable services with lightweight communication protocols, such as REST calls over HTTP. Microservices bring flexibility to the development and deployment of application back-ends in the cloud.
Applications such as collaborative editing tools require frequent interactions between the front-end running on users\u27 machines and a back-end formed of multiple microservices. User-perceived latencies depend on their connection to microservices, but also on the interaction patterns between these services and their databases. Placing services at the edge of the network, closer to the users, is necessary to reduce user-perceived latencies. It is however difficult to decide on the placement of complete stateful microservices at one specific core or edge location without trading between a latency reduction for some users and a latency increase for the others.
We present how to dynamically deploy microservices on a combination of core and edge resources to systematically reduce user-perceived latencies. Our approach enables the split of stateful microservices, and the placement of the resulting splits on appropriate core and edge sites. Koala, a decentralized and resource-driven service discovery middleware, enables REST calls to reach and use the appropriate split, with only minimal changes to a legacy microservices application. Locality awareness using network coordinates further enables to automatically migrate services split and follow the location of the users. We confirm the effectiveness of our approach with a full prototype and an application to ShareLatex, a microservices-based collaborative editing application
MultiLibOS: an OS architecture for cloud computing
Cloud computing is resulting in fundamental changes to computing infrastructure, yet these changes have not resulted in corresponding changes to operating systems. In this paper we discuss some key changes we see in the computing infrastructure and applications of IaaS systems. We argue that these changes enable and demand a very different model of operating system. We then describe the MulitLibOS architecture we are exploring and how it helps exploit the scale and elasticity of integrated systems while still allowing for legacy software run on traditional OSes
Towards data-driven additive manufacturing processes
Additive Manufacturing (AM), or 3D printing, is a potential game-changer in medical and aerospatial sectors, among others. AM enables rapid prototyping (allowing development/manufacturing of advanced components in a matter of days), weight reduction, mass customization, and on-demand manufacturing to reduce inventory costs. At present, though, AM has been showcased in many pilot studies but has not reached broad industrial application. Online monitoring and data-driven decision-making are needed to go beyond existing offline and manual approaches. We aim at advancing the state-of-the-art by introducing the STRATA framework. While providing APIs tailored to AM printing processes, STRATA leverages common processing paradigms such as stream processing and key-value stores, enabling both scalable analysis and portability. As we show with a real-world use case, STRATA can support online analysis with sub-second latency for custom data pipelines monitoring several processes in parallel
- …