383 research outputs found
Performance modelling and optimization for video-analytic algorithms in a cloud-like environment using machine learning
CCTV cameras produce a large amount of video surveillance data per day, and
analysing them require the use of significant computing resources that often need to be scalable. The emergence of the Hadoop distributed processing framework has had a significant impact on various data intensive applications as the distributed computed based processing enables an increase of the processing capability of applications it serves. Hadoop is an open source implementation of the MapReduce
programming model. It automates the operation of creating tasks for each
function, distribute data, parallelize executions and handles machine failures that reliefs users from the complexity of having to manage the underlying processing and only focus on building their application. It is noted that in a practical deployment the challenge of Hadoop based architecture is that it requires several scalable machines for effective processing, which in turn adds hardware investment cost to the infrastructure. Although using a cloud infrastructure offers scalable and elastic utilization of resources where users can scale up or scale down the number of Virtual Machines (VM) upon requirements, a user such as a CCTV system operator intending to use a public cloud would aspire to know what cloud resources (i.e. number of VMs) need to be deployed
so that the processing can be done in the fastest (or within a known time
constraint) and the most cost effective manner. Often such resources will also
have to satisfy practical, procedural and legal requirements. The capability to
model a distributed processing architecture where the resource requirements can
be effectively and optimally predicted will thus be a useful tool, if available. In
literature there is no clear and comprehensive modelling framework that provides
proactive resource allocation mechanisms to satisfy a user's target requirements,
especially for a processing intensive application such as video analytic.
In this thesis, with the hope of closing the above research gap, novel research
is first initiated by understanding the current legal practices and requirements of
implementing video surveillance system within a distributed processing and data
storage environment, since the legal validity of data gathered or processed within
such a system is vital for a distributed system's applicability in such domains.
Subsequently the thesis presents a comprehensive framework for the performance
ii
modelling and optimization of resource allocation in deploying a scalable distributed
video analytic application in a Hadoop based framework, running on virtualized
cluster of machines.
The proposed modelling framework investigates the use of several machine
learning algorithms such as, decision trees (M5P, RepTree), Linear Regression,
Multi Layer Perceptron(MLP) and the Ensemble Classifier Bagging model, to
model and predict the execution time of video analytic jobs, based on infrastructure
level as well as job level parameters. Further in order to propose a novel
framework for the allocate resources under constraints to obtain optimal performance
in terms of job execution time, we propose a Genetic Algorithms (GAs) based
optimization technique.
Experimental results are provided to demonstrate the proposed framework's
capability to successfully predict the job execution time of a given video analytic task based on infrastructure and input data related parameters and its ability determine the minimum job execution time, given constraints of these parameters.
Given the above, the thesis contributes to the state-of-art in distributed video
analytics, design, implementation, performance analysis and optimisation
Experiences with workflows for automating data-intensive bioinformatics
High-throughput technologies, such as next-generation sequencing, have turned molecular biology into a
data-intensive discipline, requiring bioinformaticians to use high-performance computing resources and carry out
data management and analysis tasks on large scale. Workflow systems can be useful to simplify construction of
analysis pipelines that automate tasks, support reproducibility and provide measures for fault-tolerance. However,
workflow systems can incur significant development and administration overhead so bioinformatics pipelines are
often still built without them. We present the experiences with workflows and workflow systems within the
bioinformatics community participating in a series of hackathons and workshops of the EU COST action SeqAhead.
The organizations are working on similar problems, but we have addressed them with different strategies and
solutions. This fragmentation of efforts is inefficient and leads to redundant and incompatible solutions. Based on our
experiences we define a set of recommendations for future systems to enable efficient yet simple bioinformatics
workflow construction and execution.Pubblicat
How can SMEs benefit from big data? Challenges and a path forward
Big data is big news, and large companies in all sectors are making significant advances in their customer relations, product selection and development and consequent profitability through using this valuable commodity. Small and medium enterprises (SMEs) have proved themselves to be slow adopters of the new technology of big data analytics and are in danger of being left behind. In Europe, SMEs are a vital part of the economy, and the challenges they encounter need to be addressed as a matter of urgency. This paper identifies barriers to SME uptake of big data analytics and recognises their complex challenge to all stakeholders, including national and international policy makers, IT, business management and data science communities.
The paper proposes a big data maturity model for SMEs as a first step towards an SME roadmap to data analytics. It considers the ‘state-of-the-art’ of IT with respect to usability and usefulness for SMEs and discusses how SMEs can overcome the barriers preventing them from adopting existing solutions. The paper then considers management perspectives and the role of maturity models in enhancing and structuring the adoption of data analytics in an organisation. The history of total quality management is reviewed to inform the core aspects of implanting a new paradigm. The paper concludes with recommendations to help SMEs develop their big data capability and enable them to continue as the engines of European industrial and business success. Copyright © 2016 John Wiley & Sons, Ltd.Peer ReviewedPostprint (author's final draft
VISOR: virtual machine images management service for cloud infarestructures
Cloud Computing is a relatively novel paradigm that aims to fulfill the computing as utility dream. It has appeared to bring the possibility of providing computing resources (such as servers, storage and networks) as a service and on demand, making them accessible through common Internet protocols. Through cloud offers, users only need to pay for the amount of resources they
need and for the time they use them. Virtualization is the clouds key technology, acting upon virtual machine images to deliver fully functional virtual machine instances. Therefore, virtual machine images play an important role in Cloud Computing and their efficient management becomes a key concern that should be carefully addressed. To tackle this requirement, most cloud offers provide their own image repository, where images are stored and retrieved from, in order to instantiate new virtual machines. However, the rise of Cloud Computing has brought
new problems in managing large collections of images.
Existing image repositories are not able to efficiently manage, store and catalogue virtual machine images from other clouds through the same centralized service repository. This becomes especially important when considering the management of multiple heterogeneous cloud offers. In fact, despite the hype around Cloud Computing, there are still existing barriers to its widespread adoption. Among them, clouds interoperability is one of the most notable issues.
Interoperability limitations arise from the fact that current cloud offers provide proprietary interfaces, and their services are tied to their own requirements. Therefore, when dealing with multiple heterogeneous clouds, users face hard to manage integration and compatibility issues.
The management and delivery of virtual machine images across different clouds is an example of such interoperability constraints.
This dissertation presents VISOR, a cloud agnostic virtual machine images management service and repository. Our work towards VISOR aims to provide a service not designed to fit in a specific cloud offer but rather to overreach sharing and interoperability limitations among different clouds. With VISOR, the management of clouds interoperability can be seamlessly abstracted
from the underlying procedures details. In this way, it aims to provide users with the
ability to manage and expose virtual machine images across heterogeneous clouds, throughout the same generic and centralized repository and management service. VISOR is an open source software with a community-driven development process, thus it can be freely customized and further improved by everyone. The conducted tests to evaluate its performance and resources
usage rate have shown VISOR as a stable and high performance service, even when compared
with other services already in production. Lastly, placing clouds as the main target audience is not a limitation for other use cases. In fact, virtualization and virtual machine images are not exclusively linked to cloud environments. Therefore and given the service agnostic design concerns, it is possible to adapt it to other usage scenarios as well.A Computação em Nuvem (”Cloud Computing”) é um paradigma relativamente novo que visa
cumprir o sonho de fornecer a computação como um serviço. O mesmo surgiu para possibilitar o
fornecimento de recursos de computação (servidores, armazenamento e redes) como um serviço
de acordo com as necessidades dos utilizadores, tornando-os acessíveis através de protocolos de
Internet comuns. Através das ofertas de ”cloud”, os utilizadores apenas pagam pela quantidade
de recursos que precisam e pelo tempo que os usam. A virtualização é a tecnologia chave
das ”clouds”, atuando sobre imagens de máquinas virtuais de forma a gerar máquinas virtuais
totalmente funcionais. Sendo assim, as imagens de máquinas virtuais desempenham um papel
fundamental no ”Cloud Computing” e a sua gestão eficiente torna-se um requisito que deve ser
cuidadosamente analisado. Para fazer face a tal necessidade, a maioria das ofertas de ”cloud”
fornece o seu próprio repositório de imagens, onde as mesmas são armazenadas e de onde
são copiadas a fim de criar novas máquinas virtuais. Contudo, com o crescimento do ”Cloud
Computing” surgiram novos problemas na gestão de grandes conjuntos de imagens.
Os repositórios existentes não são capazes de gerir, armazenar e catalogar images de máquinas
virtuais de forma eficiente a partir de outras ”clouds”, mantendo um único repositório e serviço
centralizado. Esta necessidade torna-se especialmente importante quando se considera a gestão
de múltiplas ”clouds” heterogéneas. Na verdade, apesar da promoção extrema do ”Cloud Computing”, ainda existem barreiras à sua adoção generalizada. Entre elas, a interoperabilidade
entre ”clouds” é um dos constrangimentos mais notáveis. As limitações de interoperabilidade
surgem do fato de as ofertas de ”cloud” atuais possuírem interfaces proprietárias, e de os seus
serviços estarem vinculados às suas próprias necessidades. Os utilizadores enfrentam assim
problemas de compatibilidade e integração difíceis de gerir, ao lidar com ”clouds” de diferentes fornecedores. A gestão e disponibilização de imagens de máquinas virtuais entre diferentes
”clouds” é um exemplo de tais restrições de interoperabilidade.
Esta dissertação apresenta o VISOR, o qual é um repositório e serviço de gestão de imagens de máquinas virtuais genérico. O nosso trabalho em torno do VISOR visa proporcionar um
serviço que não foi concebido para lidar com uma ”cloud” específica, mas sim para superar as
limitações de interoperabilidade entre ”clouds”. Com o VISOR, a gestão da interoperabilidade
entre ”clouds” é abstraída dos detalhes subjacentes. Desta forma pretende-se proporcionar
aos utilizadores a capacidade de gerir e expor imagens entre ”clouds” heterogéneas, mantendo
um repositório e serviço de gestão centralizados. O VISOR é um software de código livre com
um processo de desenvolvimento aberto. O mesmo pode ser livremente personalizado e melhorado por qualquer pessoa. Os testes realizados para avaliar o seu desempenho e a taxa de
utilização de recursos mostraram o VISOR como sendo um serviço estável e de alto desempenho,
mesmo quando comparado com outros serviços já em utilização. Por fim, colocar as ”clouds”
como principal público-alvo não representa uma limitação para outros tipos de utilização. Na
verdade, as imagens de máquinas virtuais e a virtualização não estão exclusivamente ligadas a
ambientes de ”cloud”. Assim sendo, e tendo em conta as preocupações tidas no desenho de um
serviço genérico, também é possível adaptar o nosso serviço a outros cenários de utilização
Cloud Rule-based System for Analysis of IoT Data in a Big Data Context
Nowadays, enormous amounts of information are produced, on a daily basis, by sensors.
Information which, after being analysed, is transformed from simple data into knowledge which,
in itself, can be an asset to those who can take advantage of that knowledge.
An example of this situation is the data being generated by sensors installed on trains, that can be
analysed to different ends, one of which, the condition-based maintenance of trains.
Condition-based maintenance takes advantage of data to understand the current state of
mechanical equipment, avoiding unnecessary replacements or preventing accidents consequent
of late maintenance.
In this dissertation, it is presented an architecture which integrates a rule-based system functioning
over cloud applications that analyses all the data that’s being acquired by the trains’ sensors in a
way that, whenever a specific set of conditions is met alerts are activated, so the train operators,
the mechanics in charge and all their staff know how to proceed.
This architecture is to be created on a cloud environment since, with this vast amount of data
being generated, these highly scalable environments assure that data processing performance isn’t
compromised and that all this data is analysed in a timely manner, taking advantage of all its
computational components.
The process of creating this architecture is demonstrated step by step and the test results are
presented and analysed.Nos dias de hoje são produzidas, diariamente, enormíssimas quantidades de informação por parte
de sensores, informação essa que após analisada se transforma de simples dados em conhecimento
que, por si, é uma mais-valia para quem pode fazer uso desse conhecimento.
Um exemplo destes casos são os dados produzidos pelos sensores instalados nos comboios, que
podem ser analisados com variadas finalidades, uma delas, a manutenção baseada na condição.
A manutenção baseada na condição tira proveito dos dados para compreender o estado dos
equipamentos, evitando substituições desnecessárias ou prevenindo acidentes consequentes de
manutenções tardias.
Nesta dissertação é apresentada uma arquitetura para o funcionamento de um sistema de regras
na cloud, que analise todos estes dados que estão a ser adquiridos nos sensores dos comboios de
forma que, quando certas condições são cumpridas, alertas sejam ativados e os operadores dos
comboios, os mecânicos responsáveis e toda a equipa envolvente saibam como atuar.
Esta arquitetura quer-se criada na cloud pois, com uma quantidade enorme de dados a ser gerada,
estes ambientes altamente escaláveis garantem que o desempenho no processamento dos dados
não fique comprometido. Garantem também que os intervalos de tempo necessários para analisar
todos estes dados sejam muito pequenos, tirando partido do poder computacional disponível em
tais ambientes.
O processo de criação desta arquitetura é demonstrado passo a passo e os resultados obtidos são
apresentados e analisados
Monitoring the waste to energy plant using the latest AI methods and tools
Solid wastes for instance, municipal and industrial wastes present great environmental concerns and challenges all over the world. This has led to development of innovative waste-to-energy process technologies capable of handling different waste materials in a more sustainable and energy efficient manner. However, like in many other complex industrial process operations, waste-to-energy plants would require sophisticated process monitoring systems in order to realize very high overall plant efficiencies. Conventional data-driven statistical methods which include principal component analysis, partial least squares, multivariable linear regression and so forth, are normally applied in process monitoring. But recently, latest artificial intelligence (AI) methods in particular deep learning algorithms have demostrated remarkable performances in several important areas such as machine vision, natural language processing and pattern recognition. The new AI algorithms have gained increasing attention from the process industrial applications for instance in areas such as predictive product quality control and machine health monitoring. Moreover, the availability of big-data processing tools and cloud computing technologies further support the use of deep learning based algorithms for process monitoring.
In this work, a process monitoring scheme based on the state-of-the-art artificial intelligence methods and cloud computing platforms is proposed for a waste-to-energy industrial use case. The monitoring scheme supports use of latest AI methods, laveraging big-data processing tools and taking advantage of available cloud computing platforms. Deep learning algorithms are able to describe non-linear, dynamic and high demensionality systems better than most conventional data-based process monitoring methods. Moreover, deep learning based methods are best suited for big-data analytics unlike traditional statistical machine learning methods which are less efficient.
Furthermore, the proposed monitoring scheme emphasizes real-time process monitoring in addition to offline data analysis. To achieve this the monitoring scheme proposes use of big-data analytics software frameworks and tools such as Microsoft Azure stream analytics, Apache storm, Apache Spark, Hadoop and many others. The availability of open source in addition to proprietary cloud computing platforms, AI and big-data software tools, all support the realization of the proposed monitoring scheme
- …