182 research outputs found
Data analytics in IoT FaaS with DataFlasks
Dissertação de mestrado em Computer ScienceThe current exponential growth of data demands new strategies for processing and analyzing information.
Increased Internet usage, as well as the everyday appearance of new sources of data, is
generating data volumes to be processed by Cloud applications that are growing much faster than
available Cloud computing power.
These issues, combined with the appearance of new devices with relatively low computational
power (such as smartphones), have pushed for the development of new applications able to make
use of this power as a complement to the Cloud, pushing the frontier of computing applications,
data storage and services to the edge of the network.
However, the environment in Edge computing is very unstable. It requires leveraging resources
that may not be continuously connected to a network and device failure is a certainty. The system
has to be aware of the processing capabilities of each node to achieve proper task distribution as it
may exist a high level of heterogeneity between the system devices.
A recent approach for developing applications in the Cloud, named Function as a Service (FaaS),
proposes a way to enable data processing in these environments. FaaS services adhere to the principles
of serverless architectures, providing stateless computing containers that allow users to run
code without provisioning or managing servers.
In this dissertation we present OpenFlasks, a new approach to the management and processing
of data in a decentralized manner across Cloud and Edge. We build upon these types of architectures
and other data storage tools and combine them in a novel way to create a flexible system
capable of balancing data storage and data analytics needs in both environments. In addition, we
call for a new approach to provide task execution both in Edge and Cloud environments that is able
to handle high churn and heterogeneity of the system.
Our evaluation shows an increase in the percentage of task execution success under high churn
environments of up to 18%withOpenFlasks relatively to other FaaS systems. In addition, it denotes
improvements in load balancing and average resource usage in the system for the execution of
simple analytics at the Edge.O atual crescimento exponencial de dados exige novas estratégias para processar e analisar informação.
O aumento do uso da Internet, assim como o aparecimento diário de novas fontes de
dados, produz volumes de dados a ser processados por aplicações Cloud que crescem a umamaior
velocidade do que o poder de computação aí disponível.
Este problema, combinado com o surgir de novos dispositivos com poder computacional relativamente
baixo (como smartphones), tem motivado o desenvolvimento de novas aplicações capazes
de usar esse poder como complemento a Cloud computing, expandindo a fronteira dos
serviços de processamento e armazenamento de dados atuais para o limite da rede (Edge).
No entanto, o ambiente de Edge computing é muito instável. Requer a gestão de recursos que
podem não estar continuamente conectados à rede e a falha de dispositivos é uma certeza. O
sistema deve estar ciente das capacidades de processamento de cada dispositivo para obter uma
distribuição de tarefas adequada, dado que pode existir um alto nível de heterogeneidade entre os
dispositivos do sistema.
Uma abordagem recente para o desenvolvimento de aplicações de Cloud computing, denominada
Function as a Service (FaaS), propõe uma forma de permitir o processamento de dados neste
tipo de ambientes. Os serviços FaaS aderem aos princípios de arquiteturas serverless, fornecendo
containers de computação que nãomantêmestado e que permitemaos utilizadores executar código
sem a necessidade de instanciar e gerir servidores.
Nesta dissertação apresentamos OpenFlasks, uma nova abordagem para a gestão e processamento
de dados de forma descentralizada em ambientes Cloud e Edge. Baseamo-nos neste tipo
de arquiteturas, assimcomo outros serviços atuais de armazenamento de dados e combinamo-los
de forma a criar um sistema flexível, capaz de equilibrar o armazenamento e as necessidades de
análise de dados em ambos ambientes. Além disso, propomos uma nova abordagem para possibilitar
a execução de tarefas tanto em ambientes de Edge como de Cloud, capaz de lidar com o
elevado dinamismo e heterogeneidade do sistema.
A nossa avaliação mostra um aumento na percentagem de sucesso da execução de tarefas sob
ambientes de elevado dinamismo de até 18% relativamente a outros sistemas FaaS. Além disso,
denotamelhorias na distribuição de carga e no uso médio de recursos do sistema para a execução
de data analytics simples em ambientes Edge
How can SMEs benefit from big data? Challenges and a path forward
Big data is big news, and large companies in all sectors are making significant advances in their customer relations, product selection and development and consequent profitability through using this valuable commodity. Small and medium enterprises (SMEs) have proved themselves to be slow adopters of the new technology of big data analytics and are in danger of being left behind. In Europe, SMEs are a vital part of the economy, and the challenges they encounter need to be addressed as a matter of urgency. This paper identifies barriers to SME uptake of big data analytics and recognises their complex challenge to all stakeholders, including national and international policy makers, IT, business management and data science communities.
The paper proposes a big data maturity model for SMEs as a first step towards an SME roadmap to data analytics. It considers the ‘state-of-the-art’ of IT with respect to usability and usefulness for SMEs and discusses how SMEs can overcome the barriers preventing them from adopting existing solutions. The paper then considers management perspectives and the role of maturity models in enhancing and structuring the adoption of data analytics in an organisation. The history of total quality management is reviewed to inform the core aspects of implanting a new paradigm. The paper concludes with recommendations to help SMEs develop their big data capability and enable them to continue as the engines of European industrial and business success. Copyright © 2016 John Wiley & Sons, Ltd.Peer ReviewedPostprint (author's final draft
Seer: Empowering Software Defined Networking with Data Analytics
Network complexity is increasing, making network control and orchestration a
challenging task. The proliferation of network information and tools for data
analytics can provide an important insight into resource provisioning and
optimisation. The network knowledge incorporated in software defined networking
can facilitate the knowledge driven control, leveraging the network
programmability. We present Seer: a flexible, highly configurable data
analytics platform for network intelligence based on software defined
networking and big data principles. Seer combines a computational engine with a
distributed messaging system to provide a scalable, fault tolerant and
real-time platform for knowledge extraction. Our first prototype uses Apache
Spark for streaming analytics and open network operating system (ONOS)
controller to program a network in real-time. The first application we
developed aims to predict the mobility pattern of mobile devices inside a smart
city environment.Comment: 8 pages, 6 figures, Big data, data analytics, data mining, knowledge
centric networking (KCN), software defined networking (SDN), Seer, 2016 15th
International Conference on Ubiquitous Computing and Communications and 2016
International Symposium on Cyberspace and Security (IUCC-CSS 2016
DataFlasks : an epidemic dependable key-value substrate
Recently, tuple-stores have become pivotal struc- tures in many information systems. Their ability to handle large datasets makes them important in an era with unprecedented amounts of data being produced and exchanged. However, these tuple-stores typically rely on structured peer-to-peer protocols which assume moderately stable environments. Such assumption does not always hold for very large scale systems sized in the scale of thousands of machines. In this paper we present a novel approach to the design of a tuple-store. Our approach follows a stratified design based on an unstructured substrate. We focus on this substrate and how the use of epidemic protocols allow reaching high dependability and scalability.(undefined
DIN Spec 91345 RAMI 4.0 compliant data pipelining: An approach to support data understanding and data acquisition in smart manufacturing environments
Today, data scientists in the manufacturing domain are confronted with a set of challenges associated to data acquisition as well as data processing including the extraction of valuable in-formation to support both, the work of the manufacturing equipment as well as the manufacturing processes behind it.
One essential aspect related to data acquisition is the pipelining, including various commu-nication standards, protocols and technologies to save and transfer heterogenous data. These circumstances make it hard to understand, find, access and extract data from the sources depend-ing on use cases and applications.
In order to support this data pipelining process, this thesis proposes the use of the semantic model. The selected semantic model should be able to describe smart manufacturing assets them-selves as well as to access their data along their life-cycle.
As a matter of fact, there are many research contributions in smart manufacturing, which already came out with reference architectures or standards for semantic-based meta data descrip-tion or asset classification. This research builds upon these outcomes and introduces a novel se-mantic model-based data pipelining approach using as a basis the Reference Architecture Model for Industry 4.0 (RAMI 4.0).Hoje em dia, os cientistas de dados no domínio da manufatura são confrontados com várias normas, protocolos e tecnologias de comunicação para gravar, processar e transferir vários tipos de dados. Estas circunstâncias tornam difícil compreender, encontrar, aceder e extrair dados necessários para aplicações dependentes de casos de utilização, desde os equipamentos aos respectivos processos de manufatura.
Um aspecto essencial poderia ser um processo de canalisação de dados incluindo vários normas de comunicação, protocolos e tecnologias para gravar e transferir dados. Uma solução para suporte deste processo, proposto por esta tese, é a aplicação de um modelo semântico que descreva os próprios recursos de manufactura inteligente e o acesso aos seus dados ao longo do seu ciclo de vida.
Muitas das contribuições de investigação em manufatura inteligente já produziram arquitecturas de referência como a RAMI 4.0 ou normas para a descrição semântica de meta dados ou classificação de recursos. Esta investigação baseia-se nestas fontes externas e introduz um novo modelo semântico baseado no Modelo de Arquitectura de Referência para Indústria 4.0 (RAMI 4.0), em conformidade com a abordagem de canalisação de dados no domínio da produção inteligente como caso exemplar de utilização para permitir uma fácil exploração, compreensão, descoberta, selecção e extracção de dados
DATAFLASKS: epidemic store for massive scale systems
Very large scale distributed systems provide some of the most interesting research challenges while at the same time being increasingly required by nowadays applications. The escalation in the amount of connected devices and data being produced and exchanged, demands new data management systems. Although new data stores are continuously being proposed, they are not suitable for very large scale environments. The high levels of churn and constant dynamics found in very large scale systems demand robust, proactive and unstructured approaches to data management. In this paper we propose a novel data store solely based on epidemic (or gossip-based) protocols. It leverages the capacity of these protocols to provide data persistence guarantees even in highly dynamic, massive scale systems. We provide an open source prototype of the data store and correspondent evaluation
- …