383 research outputs found

    Performance modelling and optimization for video-analytic algorithms in a cloud-like environment using machine learning

    Get PDF
    CCTV cameras produce a large amount of video surveillance data per day, and analysing them require the use of significant computing resources that often need to be scalable. The emergence of the Hadoop distributed processing framework has had a significant impact on various data intensive applications as the distributed computed based processing enables an increase of the processing capability of applications it serves. Hadoop is an open source implementation of the MapReduce programming model. It automates the operation of creating tasks for each function, distribute data, parallelize executions and handles machine failures that reliefs users from the complexity of having to manage the underlying processing and only focus on building their application. It is noted that in a practical deployment the challenge of Hadoop based architecture is that it requires several scalable machines for effective processing, which in turn adds hardware investment cost to the infrastructure. Although using a cloud infrastructure offers scalable and elastic utilization of resources where users can scale up or scale down the number of Virtual Machines (VM) upon requirements, a user such as a CCTV system operator intending to use a public cloud would aspire to know what cloud resources (i.e. number of VMs) need to be deployed so that the processing can be done in the fastest (or within a known time constraint) and the most cost effective manner. Often such resources will also have to satisfy practical, procedural and legal requirements. The capability to model a distributed processing architecture where the resource requirements can be effectively and optimally predicted will thus be a useful tool, if available. In literature there is no clear and comprehensive modelling framework that provides proactive resource allocation mechanisms to satisfy a user's target requirements, especially for a processing intensive application such as video analytic. In this thesis, with the hope of closing the above research gap, novel research is first initiated by understanding the current legal practices and requirements of implementing video surveillance system within a distributed processing and data storage environment, since the legal validity of data gathered or processed within such a system is vital for a distributed system's applicability in such domains. Subsequently the thesis presents a comprehensive framework for the performance ii modelling and optimization of resource allocation in deploying a scalable distributed video analytic application in a Hadoop based framework, running on virtualized cluster of machines. The proposed modelling framework investigates the use of several machine learning algorithms such as, decision trees (M5P, RepTree), Linear Regression, Multi Layer Perceptron(MLP) and the Ensemble Classifier Bagging model, to model and predict the execution time of video analytic jobs, based on infrastructure level as well as job level parameters. Further in order to propose a novel framework for the allocate resources under constraints to obtain optimal performance in terms of job execution time, we propose a Genetic Algorithms (GAs) based optimization technique. Experimental results are provided to demonstrate the proposed framework's capability to successfully predict the job execution time of a given video analytic task based on infrastructure and input data related parameters and its ability determine the minimum job execution time, given constraints of these parameters. Given the above, the thesis contributes to the state-of-art in distributed video analytics, design, implementation, performance analysis and optimisation

    Experiences with workflows for automating data-intensive bioinformatics

    Get PDF
    High-throughput technologies, such as next-generation sequencing, have turned molecular biology into a data-intensive discipline, requiring bioinformaticians to use high-performance computing resources and carry out data management and analysis tasks on large scale. Workflow systems can be useful to simplify construction of analysis pipelines that automate tasks, support reproducibility and provide measures for fault-tolerance. However, workflow systems can incur significant development and administration overhead so bioinformatics pipelines are often still built without them. We present the experiences with workflows and workflow systems within the bioinformatics community participating in a series of hackathons and workshops of the EU COST action SeqAhead. The organizations are working on similar problems, but we have addressed them with different strategies and solutions. This fragmentation of efforts is inefficient and leads to redundant and incompatible solutions. Based on our experiences we define a set of recommendations for future systems to enable efficient yet simple bioinformatics workflow construction and execution.Pubblicat

    How can SMEs benefit from big data? Challenges and a path forward

    Get PDF
    Big data is big news, and large companies in all sectors are making significant advances in their customer relations, product selection and development and consequent profitability through using this valuable commodity. Small and medium enterprises (SMEs) have proved themselves to be slow adopters of the new technology of big data analytics and are in danger of being left behind. In Europe, SMEs are a vital part of the economy, and the challenges they encounter need to be addressed as a matter of urgency. This paper identifies barriers to SME uptake of big data analytics and recognises their complex challenge to all stakeholders, including national and international policy makers, IT, business management and data science communities. The paper proposes a big data maturity model for SMEs as a first step towards an SME roadmap to data analytics. It considers the ‘state-of-the-art’ of IT with respect to usability and usefulness for SMEs and discusses how SMEs can overcome the barriers preventing them from adopting existing solutions. The paper then considers management perspectives and the role of maturity models in enhancing and structuring the adoption of data analytics in an organisation. The history of total quality management is reviewed to inform the core aspects of implanting a new paradigm. The paper concludes with recommendations to help SMEs develop their big data capability and enable them to continue as the engines of European industrial and business success. Copyright © 2016 John Wiley & Sons, Ltd.Peer ReviewedPostprint (author's final draft

    VISOR: virtual machine images management service for cloud infarestructures

    Get PDF
    Cloud Computing is a relatively novel paradigm that aims to fulfill the computing as utility dream. It has appeared to bring the possibility of providing computing resources (such as servers, storage and networks) as a service and on demand, making them accessible through common Internet protocols. Through cloud offers, users only need to pay for the amount of resources they need and for the time they use them. Virtualization is the clouds key technology, acting upon virtual machine images to deliver fully functional virtual machine instances. Therefore, virtual machine images play an important role in Cloud Computing and their efficient management becomes a key concern that should be carefully addressed. To tackle this requirement, most cloud offers provide their own image repository, where images are stored and retrieved from, in order to instantiate new virtual machines. However, the rise of Cloud Computing has brought new problems in managing large collections of images. Existing image repositories are not able to efficiently manage, store and catalogue virtual machine images from other clouds through the same centralized service repository. This becomes especially important when considering the management of multiple heterogeneous cloud offers. In fact, despite the hype around Cloud Computing, there are still existing barriers to its widespread adoption. Among them, clouds interoperability is one of the most notable issues. Interoperability limitations arise from the fact that current cloud offers provide proprietary interfaces, and their services are tied to their own requirements. Therefore, when dealing with multiple heterogeneous clouds, users face hard to manage integration and compatibility issues. The management and delivery of virtual machine images across different clouds is an example of such interoperability constraints. This dissertation presents VISOR, a cloud agnostic virtual machine images management service and repository. Our work towards VISOR aims to provide a service not designed to fit in a specific cloud offer but rather to overreach sharing and interoperability limitations among different clouds. With VISOR, the management of clouds interoperability can be seamlessly abstracted from the underlying procedures details. In this way, it aims to provide users with the ability to manage and expose virtual machine images across heterogeneous clouds, throughout the same generic and centralized repository and management service. VISOR is an open source software with a community-driven development process, thus it can be freely customized and further improved by everyone. The conducted tests to evaluate its performance and resources usage rate have shown VISOR as a stable and high performance service, even when compared with other services already in production. Lastly, placing clouds as the main target audience is not a limitation for other use cases. In fact, virtualization and virtual machine images are not exclusively linked to cloud environments. Therefore and given the service agnostic design concerns, it is possible to adapt it to other usage scenarios as well.A Computação em Nuvem (”Cloud Computing”) é um paradigma relativamente novo que visa cumprir o sonho de fornecer a computação como um serviço. O mesmo surgiu para possibilitar o fornecimento de recursos de computação (servidores, armazenamento e redes) como um serviço de acordo com as necessidades dos utilizadores, tornando-os acessíveis através de protocolos de Internet comuns. Através das ofertas de ”cloud”, os utilizadores apenas pagam pela quantidade de recursos que precisam e pelo tempo que os usam. A virtualização é a tecnologia chave das ”clouds”, atuando sobre imagens de máquinas virtuais de forma a gerar máquinas virtuais totalmente funcionais. Sendo assim, as imagens de máquinas virtuais desempenham um papel fundamental no ”Cloud Computing” e a sua gestão eficiente torna-se um requisito que deve ser cuidadosamente analisado. Para fazer face a tal necessidade, a maioria das ofertas de ”cloud” fornece o seu próprio repositório de imagens, onde as mesmas são armazenadas e de onde são copiadas a fim de criar novas máquinas virtuais. Contudo, com o crescimento do ”Cloud Computing” surgiram novos problemas na gestão de grandes conjuntos de imagens. Os repositórios existentes não são capazes de gerir, armazenar e catalogar images de máquinas virtuais de forma eficiente a partir de outras ”clouds”, mantendo um único repositório e serviço centralizado. Esta necessidade torna-se especialmente importante quando se considera a gestão de múltiplas ”clouds” heterogéneas. Na verdade, apesar da promoção extrema do ”Cloud Computing”, ainda existem barreiras à sua adoção generalizada. Entre elas, a interoperabilidade entre ”clouds” é um dos constrangimentos mais notáveis. As limitações de interoperabilidade surgem do fato de as ofertas de ”cloud” atuais possuírem interfaces proprietárias, e de os seus serviços estarem vinculados às suas próprias necessidades. Os utilizadores enfrentam assim problemas de compatibilidade e integração difíceis de gerir, ao lidar com ”clouds” de diferentes fornecedores. A gestão e disponibilização de imagens de máquinas virtuais entre diferentes ”clouds” é um exemplo de tais restrições de interoperabilidade. Esta dissertação apresenta o VISOR, o qual é um repositório e serviço de gestão de imagens de máquinas virtuais genérico. O nosso trabalho em torno do VISOR visa proporcionar um serviço que não foi concebido para lidar com uma ”cloud” específica, mas sim para superar as limitações de interoperabilidade entre ”clouds”. Com o VISOR, a gestão da interoperabilidade entre ”clouds” é abstraída dos detalhes subjacentes. Desta forma pretende-se proporcionar aos utilizadores a capacidade de gerir e expor imagens entre ”clouds” heterogéneas, mantendo um repositório e serviço de gestão centralizados. O VISOR é um software de código livre com um processo de desenvolvimento aberto. O mesmo pode ser livremente personalizado e melhorado por qualquer pessoa. Os testes realizados para avaliar o seu desempenho e a taxa de utilização de recursos mostraram o VISOR como sendo um serviço estável e de alto desempenho, mesmo quando comparado com outros serviços já em utilização. Por fim, colocar as ”clouds” como principal público-alvo não representa uma limitação para outros tipos de utilização. Na verdade, as imagens de máquinas virtuais e a virtualização não estão exclusivamente ligadas a ambientes de ”cloud”. Assim sendo, e tendo em conta as preocupações tidas no desenho de um serviço genérico, também é possível adaptar o nosso serviço a outros cenários de utilização

    Cloud Rule-based System for Analysis of IoT Data in a Big Data Context

    Get PDF
    Nowadays, enormous amounts of information are produced, on a daily basis, by sensors. Information which, after being analysed, is transformed from simple data into knowledge which, in itself, can be an asset to those who can take advantage of that knowledge. An example of this situation is the data being generated by sensors installed on trains, that can be analysed to different ends, one of which, the condition-based maintenance of trains. Condition-based maintenance takes advantage of data to understand the current state of mechanical equipment, avoiding unnecessary replacements or preventing accidents consequent of late maintenance. In this dissertation, it is presented an architecture which integrates a rule-based system functioning over cloud applications that analyses all the data that’s being acquired by the trains’ sensors in a way that, whenever a specific set of conditions is met alerts are activated, so the train operators, the mechanics in charge and all their staff know how to proceed. This architecture is to be created on a cloud environment since, with this vast amount of data being generated, these highly scalable environments assure that data processing performance isn’t compromised and that all this data is analysed in a timely manner, taking advantage of all its computational components. The process of creating this architecture is demonstrated step by step and the test results are presented and analysed.Nos dias de hoje são produzidas, diariamente, enormíssimas quantidades de informação por parte de sensores, informação essa que após analisada se transforma de simples dados em conhecimento que, por si, é uma mais-valia para quem pode fazer uso desse conhecimento. Um exemplo destes casos são os dados produzidos pelos sensores instalados nos comboios, que podem ser analisados com variadas finalidades, uma delas, a manutenção baseada na condição. A manutenção baseada na condição tira proveito dos dados para compreender o estado dos equipamentos, evitando substituições desnecessárias ou prevenindo acidentes consequentes de manutenções tardias. Nesta dissertação é apresentada uma arquitetura para o funcionamento de um sistema de regras na cloud, que analise todos estes dados que estão a ser adquiridos nos sensores dos comboios de forma que, quando certas condições são cumpridas, alertas sejam ativados e os operadores dos comboios, os mecânicos responsáveis e toda a equipa envolvente saibam como atuar. Esta arquitetura quer-se criada na cloud pois, com uma quantidade enorme de dados a ser gerada, estes ambientes altamente escaláveis garantem que o desempenho no processamento dos dados não fique comprometido. Garantem também que os intervalos de tempo necessários para analisar todos estes dados sejam muito pequenos, tirando partido do poder computacional disponível em tais ambientes. O processo de criação desta arquitetura é demonstrado passo a passo e os resultados obtidos são apresentados e analisados

    Monitoring the waste to energy plant using the latest AI methods and tools

    Get PDF
    Solid wastes for instance, municipal and industrial wastes present great environmental concerns and challenges all over the world. This has led to development of innovative waste-to-energy process technologies capable of handling different waste materials in a more sustainable and energy efficient manner. However, like in many other complex industrial process operations, waste-to-energy plants would require sophisticated process monitoring systems in order to realize very high overall plant efficiencies. Conventional data-driven statistical methods which include principal component analysis, partial least squares, multivariable linear regression and so forth, are normally applied in process monitoring. But recently, latest artificial intelligence (AI) methods in particular deep learning algorithms have demostrated remarkable performances in several important areas such as machine vision, natural language processing and pattern recognition. The new AI algorithms have gained increasing attention from the process industrial applications for instance in areas such as predictive product quality control and machine health monitoring. Moreover, the availability of big-data processing tools and cloud computing technologies further support the use of deep learning based algorithms for process monitoring. In this work, a process monitoring scheme based on the state-of-the-art artificial intelligence methods and cloud computing platforms is proposed for a waste-to-energy industrial use case. The monitoring scheme supports use of latest AI methods, laveraging big-data processing tools and taking advantage of available cloud computing platforms. Deep learning algorithms are able to describe non-linear, dynamic and high demensionality systems better than most conventional data-based process monitoring methods. Moreover, deep learning based methods are best suited for big-data analytics unlike traditional statistical machine learning methods which are less efficient. Furthermore, the proposed monitoring scheme emphasizes real-time process monitoring in addition to offline data analysis. To achieve this the monitoring scheme proposes use of big-data analytics software frameworks and tools such as Microsoft Azure stream analytics, Apache storm, Apache Spark, Hadoop and many others. The availability of open source in addition to proprietary cloud computing platforms, AI and big-data software tools, all support the realization of the proposed monitoring scheme
    corecore