5,410 research outputs found
Performance modelling and optimization for video-analytic algorithms in a cloud-like environment using machine learning
CCTV cameras produce a large amount of video surveillance data per day, and
analysing them require the use of significant computing resources that often need to be scalable. The emergence of the Hadoop distributed processing framework has had a significant impact on various data intensive applications as the distributed computed based processing enables an increase of the processing capability of applications it serves. Hadoop is an open source implementation of the MapReduce
programming model. It automates the operation of creating tasks for each
function, distribute data, parallelize executions and handles machine failures that reliefs users from the complexity of having to manage the underlying processing and only focus on building their application. It is noted that in a practical deployment the challenge of Hadoop based architecture is that it requires several scalable machines for effective processing, which in turn adds hardware investment cost to the infrastructure. Although using a cloud infrastructure offers scalable and elastic utilization of resources where users can scale up or scale down the number of Virtual Machines (VM) upon requirements, a user such as a CCTV system operator intending to use a public cloud would aspire to know what cloud resources (i.e. number of VMs) need to be deployed
so that the processing can be done in the fastest (or within a known time
constraint) and the most cost effective manner. Often such resources will also
have to satisfy practical, procedural and legal requirements. The capability to
model a distributed processing architecture where the resource requirements can
be effectively and optimally predicted will thus be a useful tool, if available. In
literature there is no clear and comprehensive modelling framework that provides
proactive resource allocation mechanisms to satisfy a user's target requirements,
especially for a processing intensive application such as video analytic.
In this thesis, with the hope of closing the above research gap, novel research
is first initiated by understanding the current legal practices and requirements of
implementing video surveillance system within a distributed processing and data
storage environment, since the legal validity of data gathered or processed within
such a system is vital for a distributed system's applicability in such domains.
Subsequently the thesis presents a comprehensive framework for the performance
ii
modelling and optimization of resource allocation in deploying a scalable distributed
video analytic application in a Hadoop based framework, running on virtualized
cluster of machines.
The proposed modelling framework investigates the use of several machine
learning algorithms such as, decision trees (M5P, RepTree), Linear Regression,
Multi Layer Perceptron(MLP) and the Ensemble Classifier Bagging model, to
model and predict the execution time of video analytic jobs, based on infrastructure
level as well as job level parameters. Further in order to propose a novel
framework for the allocate resources under constraints to obtain optimal performance
in terms of job execution time, we propose a Genetic Algorithms (GAs) based
optimization technique.
Experimental results are provided to demonstrate the proposed framework's
capability to successfully predict the job execution time of a given video analytic task based on infrastructure and input data related parameters and its ability determine the minimum job execution time, given constraints of these parameters.
Given the above, the thesis contributes to the state-of-art in distributed video
analytics, design, implementation, performance analysis and optimisation
Neural Networks for Modeling and Control of Particle Accelerators
We describe some of the challenges of particle accelerator control, highlight
recent advances in neural network techniques, discuss some promising avenues
for incorporating neural networks into particle accelerator control systems,
and describe a neural network-based control system that is being developed for
resonance control of an RF electron gun at the Fermilab Accelerator Science and
Technology (FAST) facility, including initial experimental results from a
benchmark controller.Comment: 21 p
An Evolutionary Optimization Algorithm for Automated Classical Machine Learning
Machine learning is an evolving branch of computational algorithms that allow computers to learn from experiences, make predictions, and solve different problems without being explicitly programmed. However, building a useful machine learning model is a challenging process, requiring human expertise to perform various proper tasks and ensure that the machine learning\u27s primary objective --determining the best and most predictive model-- is achieved. These tasks include pre-processing, feature selection, and model selection. Many machine learning models developed by experts are designed manually and by trial and error. In other words, even experts need the time and resources to create good predictive machine learning models. The idea of automated machine learning (AutoML) is to automate a machine learning pipeline to release the burden of substantial development costs and manual processes. The algorithms leveraged in these systems have different hyper-parameters. On the other hand, different input datasets have various features. In both cases, the final performance of the model is closely related to the final selected configuration of features and hyper-parameters. That is why they are considered as crucial tasks in the AutoML. The challenges regarding the computationally expensive nature of tuning hyper-parameters and optimally selecting features create significant opportunities for filling the research gaps in the AutoML field. This dissertation explores how to select the features and tune the hyper-parameters of conventional machine learning algorithms efficiently and automatically. To address the challenges in the AutoML area, novel algorithms for hyper-parameter tuning and feature selection are proposed. The hyper-parameter tuning algorithm aims to provide the optimal set of hyper-parameters in three conventional machine learning models (Random Forest, XGBoost and Support Vector Machine) to obtain best scores regarding performance. On the other hand, the feature selection algorithm looks for the optimal subset of features to achieve the highest performance. Afterward, a hybrid framework is designed for both hyper-parameter tuning and feature selection. The proposed framework can discover close to the optimal configuration of features and hyper-parameters. The proposed framework includes the following components: (1) an automatic feature selection component based on artificial bee colony algorithms and machine learning training, and (2) an automatic hyper-parameter tuning component based on artificial bee colony algorithms and machine learning training for faster training and convergence of the learning models. The whole framework has been evaluated using four real-world datasets in different applications. This framework is an attempt to alleviate the challenges of hyper-parameter tuning and feature selection by using efficient algorithms. However, distributed processing, distributed learning, parallel computing, and other big data solutions are not taken into consideration in this framework
IoT Data Analytics in Dynamic Environments: From An Automated Machine Learning Perspective
With the wide spread of sensors and smart devices in recent years, the data
generation speed of the Internet of Things (IoT) systems has increased
dramatically. In IoT systems, massive volumes of data must be processed,
transformed, and analyzed on a frequent basis to enable various IoT services
and functionalities. Machine Learning (ML) approaches have shown their capacity
for IoT data analytics. However, applying ML models to IoT data analytics tasks
still faces many difficulties and challenges, specifically, effective model
selection, design/tuning, and updating, which have brought massive demand for
experienced data scientists. Additionally, the dynamic nature of IoT data may
introduce concept drift issues, causing model performance degradation. To
reduce human efforts, Automated Machine Learning (AutoML) has become a popular
field that aims to automatically select, construct, tune, and update machine
learning models to achieve the best performance on specified tasks. In this
paper, we conduct a review of existing methods in the model selection, tuning,
and updating procedures in the area of AutoML in order to identify and
summarize the optimal solutions for every step of applying ML algorithms to IoT
data analytics. To justify our findings and help industrial users and
researchers better implement AutoML approaches, a case study of applying AutoML
to IoT anomaly detection problems is conducted in this work. Lastly, we discuss
and classify the challenges and research directions for this domain.Comment: Published in Engineering Applications of Artificial Intelligence
(Elsevier, IF:7.8); Code/An AutoML tutorial is available at Github link:
https://github.com/Western-OC2-Lab/AutoML-Implementation-for-Static-and-Dynamic-Data-Analytic
- …