111 research outputs found

    Performance modelling and optimization for video-analytic algorithms in a cloud-like environment using machine learning

    Get PDF
    CCTV cameras produce a large amount of video surveillance data per day, and analysing them require the use of significant computing resources that often need to be scalable. The emergence of the Hadoop distributed processing framework has had a significant impact on various data intensive applications as the distributed computed based processing enables an increase of the processing capability of applications it serves. Hadoop is an open source implementation of the MapReduce programming model. It automates the operation of creating tasks for each function, distribute data, parallelize executions and handles machine failures that reliefs users from the complexity of having to manage the underlying processing and only focus on building their application. It is noted that in a practical deployment the challenge of Hadoop based architecture is that it requires several scalable machines for effective processing, which in turn adds hardware investment cost to the infrastructure. Although using a cloud infrastructure offers scalable and elastic utilization of resources where users can scale up or scale down the number of Virtual Machines (VM) upon requirements, a user such as a CCTV system operator intending to use a public cloud would aspire to know what cloud resources (i.e. number of VMs) need to be deployed so that the processing can be done in the fastest (or within a known time constraint) and the most cost effective manner. Often such resources will also have to satisfy practical, procedural and legal requirements. The capability to model a distributed processing architecture where the resource requirements can be effectively and optimally predicted will thus be a useful tool, if available. In literature there is no clear and comprehensive modelling framework that provides proactive resource allocation mechanisms to satisfy a user's target requirements, especially for a processing intensive application such as video analytic. In this thesis, with the hope of closing the above research gap, novel research is first initiated by understanding the current legal practices and requirements of implementing video surveillance system within a distributed processing and data storage environment, since the legal validity of data gathered or processed within such a system is vital for a distributed system's applicability in such domains. Subsequently the thesis presents a comprehensive framework for the performance ii modelling and optimization of resource allocation in deploying a scalable distributed video analytic application in a Hadoop based framework, running on virtualized cluster of machines. The proposed modelling framework investigates the use of several machine learning algorithms such as, decision trees (M5P, RepTree), Linear Regression, Multi Layer Perceptron(MLP) and the Ensemble Classifier Bagging model, to model and predict the execution time of video analytic jobs, based on infrastructure level as well as job level parameters. Further in order to propose a novel framework for the allocate resources under constraints to obtain optimal performance in terms of job execution time, we propose a Genetic Algorithms (GAs) based optimization technique. Experimental results are provided to demonstrate the proposed framework's capability to successfully predict the job execution time of a given video analytic task based on infrastructure and input data related parameters and its ability determine the minimum job execution time, given constraints of these parameters. Given the above, the thesis contributes to the state-of-art in distributed video analytics, design, implementation, performance analysis and optimisation

    Building Blocks for IoT Analytics Internet-of-Things Analytics

    Get PDF
    Internet-of-Things (IoT) Analytics are an integral element of most IoT applications, as it provides the means to extract knowledge, drive actuation services and optimize decision making. IoT analytics will be a major contributor to IoT business value in the coming years, as it will enable organizations to process and fully leverage large amounts of IoT data, which are nowadays largely underutilized. The Building Blocks of IoT Analytics is devoted to the presentation the main technology building blocks that comprise advanced IoT analytics systems. It introduces IoT analytics as a special case of BigData analytics and accordingly presents leading edge technologies that can be deployed in order to successfully confront the main challenges of IoT analytics applications. Special emphasis is paid in the presentation of technologies for IoT streaming and semantic interoperability across diverse IoT streams. Furthermore, the role of cloud computing and BigData technologies in IoT analytics are presented, along with practical tools for implementing, deploying and operating non-trivial IoT applications. Along with the main building blocks of IoT analytics systems and applications, the book presents a series of practical applications, which illustrate the use of these technologies in the scope of pragmatic applications. Technical topics discussed in the book include: Cloud Computing and BigData for IoT analyticsSearching the Internet of ThingsDevelopment Tools for IoT Analytics ApplicationsIoT Analytics-as-a-ServiceSemantic Modelling and Reasoning for IoT AnalyticsIoT analytics for Smart BuildingsIoT analytics for Smart CitiesOperationalization of IoT analyticsEthical aspects of IoT analyticsThis book contains both research oriented and applied articles on IoT analytics, including several articles reflecting work undertaken in the scope of recent European Commission funded projects in the scope of the FP7 and H2020 programmes. These articles present results of these projects on IoT analytics platforms and applications. Even though several articles have been contributed by different authors, they are structured in a well thought order that facilitates the reader either to follow the evolution of the book or to focus on specific topics depending on his/her background and interest in IoT and IoT analytics technologies. The compilation of these articles in this edited volume has been largely motivated by the close collaboration of the co-authors in the scope of working groups and IoT events organized by the Internet-of-Things Research Cluster (IERC), which is currently a part of EU's Alliance for Internet of Things Innovation (AIOTI)

    Big Data Analytics for the Cloud

    Get PDF
    Η παρούσα διπλωματική εργασία χωρίζεται σε τρία μέρη. Το πρώτο μέρος αντιστοιχεί στη μελέτη και την παρουσίαση αρχιτεκτονικών που αποτελούν λύσεις για την αντιμετώπιση της πρόκλησης της Διαχείρισης Μεγάλων Δεδομένων, οι οποίες κλιμακώνονται. Το δεύτερο μέρος περιλαμβάνει την επεξεργασία ενός συνόλου δεδομένων το οποίο αποτελείται από μετρήσεις διαφόρων αισθητήρων εγκατεστημένων σε τρένα. Το τελευταίο μέρος περιέχει την ρύθμιση του SiteWhere, μιας IoT πλατφόρμας ανοιχτού λογισμικού, την αποστολή και την αποθήκευση δεδομένων στην πλατφόρμα αυτή, καθώς και την επεξεργασία αυτών των δεδομένων σε περιβάλλον Spark. Το Κεφάλαιο 1 αποτελεί μια εισαγωγή. Στο Κεφάλαιο 2 παρουσιάζεται η αρχιτεκτονική και οι δυνατότητες του SiteWhere ως μία γενική λύση για τη διαχείριση συσκευών IoT. Το Κεφάλαιο 3 εισάγει τις έννοιες των όρων «Μεγάλα Δεδομένα» και «Υπολογιστικό Νέφος». Επίσης παρουσιάζει διάφορες λύσεις για τη Διαχείριση Μεγάλων Δεδομένων καθώς και τις επιστημονικές τάσεις σε αυτό το ζήτημα. Το Κεφάλαιο 4 περιέχει τη μελέτη αλγορίθμων Συσταδοποίησης (KMeans, Birch, Mean Shift, DBSCAN), που χρησιμοποιούνται στο σύνολο δεδομένων του τρένου. Το Κεφάλαιο 5 παρουσιάζει την έννοια της «Πρόβλεψης Χρονοσειράς» και ερευνά τη συμπεριφορά δύο διαφορετικών Νευρωνικών Δικτύων (MLP, LSTM), σχετικά με τη δυνατότητα που παρέχουν για προβλέψεις. Στο Κεφάλαιο 6 παρουσιάζεται λεπτομερώς ο τρόπος με τον οποίο χρησιμοποιήθηκε η πλατφόρμα SiteWhere. Αρχικά παρουσιάζεται η αποστολή δεδομένων στην πλατφόρμα, τα οποία αποθηκεύονται στη βάση δεδομένων InfluxDB και οπτικοποιούνται μέσω της πλατφόρμας Grafana. Στη συνέχεια τα δεδομένα αυτά ανακτώνται από τη βάση, υφίστανται επεξεργασία (Συσταδοποίηση με KMeans και Πρόβλεψη με MLP) στο Spark και γίνεται σύγκριση αυτών των αποτελεσμάτων με αυτά της επεξεργασίας στο «τοπικό σύστημα». Στο Κεφάλαιο 7 γίνεται μια ανακεφαλαίωση και παρουσιάζεται μια σύνοψη των συμπερασμάτων που έχουν εξαχθεί και παρουσιαστεί στα προηγούμενα κεφάλαια.The work for this master thesis is divided into three parts. The first part focused on the study and presentation of scalable solutions for data processing architectures for the Big Data challenge. The second focuses on the processing of a dataset comprising measurements that were collected by different sensors, which were installed on a train. The last part focused on is the setup of a server of the open source IoT platform SiteWhere, the dispatch of data to the server, the storage of the data to a NoSQL database and the processing of these data in a Spark instance. Chapter 1 provides an introduction. In Chapter 2, the architecture and the capabilities of SiteWhere as a holistic solution for IoT management is presented. Chapter 3 introduces the basic notions of the terms “Big Data” and “Cloud”. It also presents different solutions for the Big Data challenge along with the scientific trends on this topic. In Chapter 4, a study of various Clustering algorithms (KMeans, Birch, Mean Shift, DBSCAN), which are used to process the real dataset collected from onboard train sensors, takes place. Chapter 5 introduces the notion of “time-series forecasting” and investigates the behavior of two different types of Neural Networks (MLP, LSTM) with respect to this notion. Chapter 6 presents the work that took place on the SiteWhere platform. The chapter begins with the description of the dispatch of data to the server and continues with the visualization, on Grafana, of the train data that were stored in InfluxDB, a database that SiteWhere supports. Following this, the retrieval of the data from the database and their processing (through KMeans Clustering and Forecasting with MLP) on a Spark instance takes place and finally a comparison between that process and the one on the local system is presented. Chapter 7 provides a summary and highlights some of the conclusions that were derived and presented in the previous chapters

    Data hosting infrastructure for primary biodiversity data

    Get PDF
    © The Author(s), 2011. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in BMC Bioinformatics 12 Suppl. 15 (2011): S5, doi:10.1186/1471-2105-12-S15-S5.Today, an unprecedented volume of primary biodiversity data are being generated worldwide, yet significant amounts of these data have been and will continue to be lost after the conclusion of the projects tasked with collecting them. To get the most value out of these data it is imperative to seek a solution whereby these data are rescued, archived and made available to the biodiversity community. To this end, the biodiversity informatics community requires investment in processes and infrastructure to mitigate data loss and provide solutions for long-term hosting and sharing of biodiversity data. We review the current state of biodiversity data hosting and investigate the technological and sociological barriers to proper data management. We further explore the rescuing and re-hosting of legacy data, the state of existing toolsets and propose a future direction for the development of new discovery tools. We also explore the role of data standards and licensing in the context of data hosting and preservation. We provide five recommendations for the biodiversity community that will foster better data preservation and access: (1) encourage the community's use of data standards, (2) promote the public domain licensing of data, (3) establish a community of those involved in data hosting and archival, (4) establish hosting centers for biodiversity data, and (5) develop tools for data discovery. The community's adoption of standards and development of tools to enable data discovery is essential to sustainable data preservation. Furthermore, the increased adoption of open content licensing, the establishment of data hosting infrastructure and the creation of a data hosting and archiving community are all necessary steps towards the community ensuring that data archival policies become standardized

    Algorithms for advance bandwidth reservation in media production networks

    Get PDF
    Media production generally requires many geographically distributed actors (e.g., production houses, broadcasters, advertisers) to exchange huge amounts of raw video and audio data. Traditional distribution techniques, such as dedicated point-to-point optical links, are highly inefficient in terms of installation time and cost. To improve efficiency, shared media production networks that connect all involved actors over a large geographical area, are currently being deployed. The traffic in such networks is often predictable, as the timing and bandwidth requirements of data transfers are generally known hours or even days in advance. As such, the use of advance bandwidth reservation (AR) can greatly increase resource utilization and cost efficiency. In this paper, we propose an Integer Linear Programming formulation of the bandwidth scheduling problem, which takes into account the specific characteristics of media production networks, is presented. Two novel optimization algorithms based on this model are thoroughly evaluated and compared by means of in-depth simulation results

    Building Blocks for IoT Analytics Internet-of-Things Analytics

    Get PDF
    Internet-of-Things (IoT) Analytics are an integral element of most IoT applications, as it provides the means to extract knowledge, drive actuation services and optimize decision making. IoT analytics will be a major contributor to IoT business value in the coming years, as it will enable organizations to process and fully leverage large amounts of IoT data, which are nowadays largely underutilized. The Building Blocks of IoT Analytics is devoted to the presentation the main technology building blocks that comprise advanced IoT analytics systems. It introduces IoT analytics as a special case of BigData analytics and accordingly presents leading edge technologies that can be deployed in order to successfully confront the main challenges of IoT analytics applications. Special emphasis is paid in the presentation of technologies for IoT streaming and semantic interoperability across diverse IoT streams. Furthermore, the role of cloud computing and BigData technologies in IoT analytics are presented, along with practical tools for implementing, deploying and operating non-trivial IoT applications. Along with the main building blocks of IoT analytics systems and applications, the book presents a series of practical applications, which illustrate the use of these technologies in the scope of pragmatic applications. Technical topics discussed in the book include: Cloud Computing and BigData for IoT analyticsSearching the Internet of ThingsDevelopment Tools for IoT Analytics ApplicationsIoT Analytics-as-a-ServiceSemantic Modelling and Reasoning for IoT AnalyticsIoT analytics for Smart BuildingsIoT analytics for Smart CitiesOperationalization of IoT analyticsEthical aspects of IoT analyticsThis book contains both research oriented and applied articles on IoT analytics, including several articles reflecting work undertaken in the scope of recent European Commission funded projects in the scope of the FP7 and H2020 programmes. These articles present results of these projects on IoT analytics platforms and applications. Even though several articles have been contributed by different authors, they are structured in a well thought order that facilitates the reader either to follow the evolution of the book or to focus on specific topics depending on his/her background and interest in IoT and IoT analytics technologies. The compilation of these articles in this edited volume has been largely motivated by the close collaboration of the co-authors in the scope of working groups and IoT events organized by the Internet-of-Things Research Cluster (IERC), which is currently a part of EU's Alliance for Internet of Things Innovation (AIOTI)

    Big data in epilepsy: Clinical and research considerations. Report from the Epilepsy Big Data Task Force of the International League Against Epilepsy

    Get PDF
    Epilepsy is a heterogeneous condition with disparate etiologies and phenotypic and genotypic characteristics. Clinical and research aspects are accordingly varied, ranging from epidemiological to molecular, spanning clinical trials and outcomes, gene and drug discovery, imaging, electroencephalography, pathology, epilepsy surgery, digital technologies, and numerous others. Epilepsy data are collected in the terabytes and petabytes, pushing the limits of current capabilities. Modern computing firepower and advances in machine and deep learning, pioneered in other diseases, open up exciting possibilities for epilepsy too. However, without carefully designed approaches to acquiring, standardizing, curating, and making available such data, there is a risk of failure. Thus, careful construction of relevant ontologies, with intimate stakeholder inputs, provides the requisite scaffolding for more ambitious big data undertakings, such as an epilepsy data commons. In this review, we assess the clinical and research epilepsy landscapes in the big data arena, current challenges, and future directions, and make the case for a systematic approach to epilepsy big data
    corecore