49 research outputs found

    The 10th Jubilee Conference of PhD Students in Computer Science

    Get PDF

    Real-Time Localization Using Software Defined Radio

    Get PDF
    Service providers make use of cost-effective wireless solutions to identify, localize, and possibly track users using their carried MDs to support added services, such as geo-advertisement, security, and management. Indoor and outdoor hotspot areas play a significant role for such services. However, GPS does not work in many of these areas. To solve this problem, service providers leverage available indoor radio technologies, such as WiFi, GSM, and LTE, to identify and localize users. We focus our research on passive services provided by third parties, which are responsible for (i) data acquisition and (ii) processing, and network-based services, where (i) and (ii) are done inside the serving network. For better understanding of parameters that affect indoor localization, we investigate several factors that affect indoor signal propagation for both Bluetooth and WiFi technologies. For GSM-based passive services, we developed first a data acquisition module: a GSM receiver that can overhear GSM uplink messages transmitted by MDs while being invisible. A set of optimizations were made for the receiver components to support wideband capturing of the GSM spectrum while operating in real-time. Processing the wide-spectrum of the GSM is possible using a proposed distributed processing approach over an IP network. Then, to overcome the lack of information about tracked devices’ radio settings, we developed two novel localization algorithms that rely on proximity-based solutions to estimate in real environments devices’ locations. Given the challenging indoor environment on radio signals, such as NLOS reception and multipath propagation, we developed an original algorithm to detect and remove contaminated radio signals before being fed to the localization algorithm. To improve the localization algorithm, we extended our work with a hybrid based approach that uses both WiFi and GSM interfaces to localize users. For network-based services, we used a software implementation of a LTE base station to develop our algorithms, which characterize the indoor environment before applying the localization algorithm. Experiments were conducted without any special hardware, any prior knowledge of the indoor layout or any offline calibration of the system

    Supercomputer Emulation For Evaluating Scheduling Algorithms

    Get PDF
    Scheduling algorithms have a significant impact on the optimal utilization of HPC facilities, yet the vast majority of the research in this area is done using simulations. In working with simulations, a great deal of factors that affect a real scheduler, such as its scheduling processing time, communication latencies and the scheduler intrinsic implementation complexity are not considered. As a result, despite theoretical improvements reported in several articles, practically no new algorithms proposed have been implemented in real schedulers, with HPC facilities still using the basic first-come-first-served (FCFS) with Backfill policy scheduling algorithm. A better approach could be, therefore, the use of real schedulers in an emulation environment to evaluate new algorithms. This thesis investigates two related challenges in emulations: computational cost and faithfulness of the results to real scheduling environments. It finds that the sampling, shrinking and shuffling of a trace must be done carefully to keep the classical metrics invariant or linear variant in relation to size and times of the original workload. This is accomplished by the careful control of the submission period and the consideration of drifts in the submission period and trace duration. This methodology can help researchers to better evaluate their scheduling algorithms and help HPC administrators to optimize the parameters of production schedulers. In order to assess the proposed methodology, we evaluated both the FCFS with Backfill and Suspend/Resume scheduling algorithms. The results strongly suggest that Suspend/Resume leads to a better utilization of a supercomputer when high priorities are given to big jobs

    Performance modelling and optimization for video-analytic algorithms in a cloud-like environment using machine learning

    Get PDF
    CCTV cameras produce a large amount of video surveillance data per day, and analysing them require the use of significant computing resources that often need to be scalable. The emergence of the Hadoop distributed processing framework has had a significant impact on various data intensive applications as the distributed computed based processing enables an increase of the processing capability of applications it serves. Hadoop is an open source implementation of the MapReduce programming model. It automates the operation of creating tasks for each function, distribute data, parallelize executions and handles machine failures that reliefs users from the complexity of having to manage the underlying processing and only focus on building their application. It is noted that in a practical deployment the challenge of Hadoop based architecture is that it requires several scalable machines for effective processing, which in turn adds hardware investment cost to the infrastructure. Although using a cloud infrastructure offers scalable and elastic utilization of resources where users can scale up or scale down the number of Virtual Machines (VM) upon requirements, a user such as a CCTV system operator intending to use a public cloud would aspire to know what cloud resources (i.e. number of VMs) need to be deployed so that the processing can be done in the fastest (or within a known time constraint) and the most cost effective manner. Often such resources will also have to satisfy practical, procedural and legal requirements. The capability to model a distributed processing architecture where the resource requirements can be effectively and optimally predicted will thus be a useful tool, if available. In literature there is no clear and comprehensive modelling framework that provides proactive resource allocation mechanisms to satisfy a user's target requirements, especially for a processing intensive application such as video analytic. In this thesis, with the hope of closing the above research gap, novel research is first initiated by understanding the current legal practices and requirements of implementing video surveillance system within a distributed processing and data storage environment, since the legal validity of data gathered or processed within such a system is vital for a distributed system's applicability in such domains. Subsequently the thesis presents a comprehensive framework for the performance ii modelling and optimization of resource allocation in deploying a scalable distributed video analytic application in a Hadoop based framework, running on virtualized cluster of machines. The proposed modelling framework investigates the use of several machine learning algorithms such as, decision trees (M5P, RepTree), Linear Regression, Multi Layer Perceptron(MLP) and the Ensemble Classifier Bagging model, to model and predict the execution time of video analytic jobs, based on infrastructure level as well as job level parameters. Further in order to propose a novel framework for the allocate resources under constraints to obtain optimal performance in terms of job execution time, we propose a Genetic Algorithms (GAs) based optimization technique. Experimental results are provided to demonstrate the proposed framework's capability to successfully predict the job execution time of a given video analytic task based on infrastructure and input data related parameters and its ability determine the minimum job execution time, given constraints of these parameters. Given the above, the thesis contributes to the state-of-art in distributed video analytics, design, implementation, performance analysis and optimisation

    Model-Driven Machine Learning for Predictive Cloud Auto-scaling

    Get PDF
    Cloud provisioning of resources requires continuous monitoring and analysis of the workload on virtual computing resources. However, cloud providers offer the rule-based and schedule-based auto-scaling service. Auto-scaling is a cloud system that reacts to real-time metrics and adjusts service instances based on predefined scaling policies. The challenge of this reactive approach to auto-scaling is to cope with fluctuating load changes. For data management applications, the workload is changing and needs forecasting on historical trends and integrating with auto-scaling service. We aim to discover changes and patterns on multi metrics of resource usages of CPU, memory, and networking. To address this problem, the learning-and-inference based prediction has been adopted to predict the needs prior to provision action. First, we develop a novel machine learning-based auto-scaling process that covers the technique of learning multiple metrics for cloud auto-scaling decision. This technique is used for continuous model training and workload forecasting. Furthermore, the result of workload forecasting triggers the auto-scaling process automatically. Also, we build the serverless functions of this machine learning-based process, including monitoring, machine learning, model selection, scheduling as microservices and orchestrating these independent services by platform, language orthogonal APIs. We demonstrate this architectural implementation on AWS and Microsoft Azure and show the prediction results from machine learning on-the-fly. Results show significant cost reductions by our proposed solution compared to a general threshold-based auto-scaling. Still, there is a need to integrate the machine learning prediction with the auto-scaling system. So, the deployment effort of devising additional machine learning components is increased. So, we present a model-driven framework that defines first-class entities to represent machine learning algorithm types, inputs, outputs, parameters, and evaluation scores. We set up rules for validating machine learning entities. The connection between the machine learning and auto-scaling system is presented by two levels of abstraction models, namely cloud platform independent model and cloud platform specific model. We automate the model-to-model transformation and model-to-deployment transformation. We integrate model-driven with a DevOps approach to make models deployable and executable on a target cloud platform. We demonstrate our method with scaling configuration and deployment of two open source benchmark applications - Dell DVD store and Netflix (NDBench) on three cloud platforms, AWS, Azure, and Rackspace. The evaluation shows our inference-based auto-scaling with model-driven reduces approximately 27% of deployment effort compared to the ordinary auto-scaling

    Система безперервної програмної обробки з використанням хмарних технологій

    Get PDF
    Робота публікується згідно наказу ректора від 29.12.2020 р. №580/од "Про розміщення кваліфікаційних робіт вищої освіти в репозиторії НАУ". Керівник проекту: к.т.н., доцент Телешко Ігор ВасильовичIn the modern economy, the use of digital tools in business decisions plays a defining role. With the increasing complexity of high-tech platforms, the continuity of critical IT systems is becoming an important factor. This trend also affected software development. Nowadays, a very high interest in the tasks of optimization, saving time and money for large and small businesses is determined by the need to automate both software processing and continuous integration in real time in cloud systems. Therefore, now the search and implementation of effective methods and principles of continuous software processing using cloud computing systems. I believe that more promising directions for solving this problem is based on the use of cloud platforms and services, as the most advanced solution to the problems of ensuring uninterrupted integration and delivery to ensure processing.У сучасній економіці використання цифрових інструментів у прийнятті бізнес-рішень відіграє визначальну роль. Зі збільшенням складності високотехнологічних платформ безперервність критично важливих ІТ-систем стає важливим фактором. Ця тенденція також вплинула на розробку програмного забезпечення. У наш час дуже високий інтерес до завдань оптимізації, економії часу та грошей для великого та малого бізнесу визначається необхідністю автоматизації як обробки програмного забезпечення, так і постійної інтеграції в реальному часі в хмарні системи. Тому зараз здійснюється пошук та впровадження ефективних методів та принципів безперервної обробки програмного забезпечення з використанням систем хмарних обчислень. Я вважаю, що більш перспективні напрямки вирішення цієї проблеми засновані на використанні хмарних платформ та сервісів як найдосконалішого рішення проблем забезпечення безперебійної інтеграції та доставки для забезпечення обробки

    Machine Learning-based Orchestration Solutions for Future Slicing-Enabled Mobile Networks

    Get PDF
    The fifth generation mobile networks (5G) will incorporate novel technologies such as network programmability and virtualization enabled by Software-Defined Networking (SDN) and Network Function Virtualization (NFV) paradigms, which have recently attracted major interest from both academic and industrial stakeholders. Building on these concepts, Network Slicing raised as the main driver of a novel business model where mobile operators may open, i.e., “slice”, their infrastructure to new business players and offer independent, isolated and self-contained sets of network functions and physical/virtual resources tailored to specific services requirements. While Network Slicing has the potential to increase the revenue sources of service providers, it involves a number of technical challenges that must be carefully addressed. End-to-end (E2E) network slices encompass time and spectrum resources in the radio access network (RAN), transport resources on the fronthauling/backhauling links, and computing and storage resources at core and edge data centers. Additionally, the vertical service requirements’ heterogeneity (e.g., high throughput, low latency, high reliability) exacerbates the need for novel orchestration solutions able to manage end-to-end network slice resources across different domains, while satisfying stringent service level agreements and specific traffic requirements. An end-to-end network slicing orchestration solution shall i) admit network slice requests such that the overall system revenues are maximized, ii) provide the required resources across different network domains to fulfill the Service Level Agreements (SLAs) iii) dynamically adapt the resource allocation based on the real-time traffic load, endusers’ mobility and instantaneous wireless channel statistics. Certainly, a mobile network represents a fast-changing scenario characterized by complex spatio-temporal relationship connecting end-users’ traffic demand with social activities and economy. Legacy models that aim at providing dynamic resource allocation based on traditional traffic demand forecasting techniques fail to capture these important aspects. To close this gap, machine learning-aided solutions are quickly arising as promising technologies to sustain, in a scalable manner, the set of operations required by the network slicing context. How to implement such resource allocation schemes among slices, while trying to make the most efficient use of the networking resources composing the mobile infrastructure, are key problems underlying the network slicing paradigm, which will be addressed in this thesis

    A forensics and compliance auditing framework for critical infrastructure protection

    Get PDF
    Contemporary societies are increasingly dependent on products and services provided by Critical Infrastructure (CI) such as power plants, energy distribution networks, transportation systems and manufacturing facilities. Due to their nature, size and complexity, such CIs are often supported by Industrial Automation and Control Systems (IACS), which are in charge of managing assets and controlling everyday operations. As these IACS become larger and more complex, encompassing a growing number of processes and interconnected monitoring and actuating devices, the attack surface of the underlying CIs increases. This situation calls for new strategies to improve Critical Infrastructure Protection (CIP) frameworks, based on evolved approaches for data analytics, able to gather insights from the CI. In this paper, we propose an Intrusion and Anomaly Detection System (IADS) framework that adopts forensics and compliance auditing capabilities at its core to improve CIP. Adopted forensics techniques help to address, for instance, post-incident analysis and investigation, while the support of continuous auditing processes simplifies compliance management and service quality assessment. More specifically, after discussing the rationale for such a framework, this paper presents a formal description of the proposed components and functions and discusses how the framework can be implemented using a cloud-native approach, to address both functional and non-functional requirements. An experimental analysis of the framework scalability is also provided.info:eu-repo/semantics/publishedVersio

    Process Models for Learning Patterns in FLOSS Repositories

    Get PDF
    Evidence suggests that Free/Libre Open Source Software (FLOSS) environments provide unlimited learning opportunities. Community members engage in a number of activities both during their interaction with their peers and while making use of these environments’ repositories. To date, numerous studies document the existence of learning processes in FLOSS through surveys or by means of questionnaires filled by FLOSS projects participants. At the same time, there is a surge in developing tools and techniques for extracting and analyzing data from different FLOSS data sources that has birthed a new field called Mining Software Repositories (MSR). In spite of these growing tools and techniques for mining FLOSS repositories, there is limited or no existing approaches to providing empirical evidence of learning processes directly from these repositories. Therefore, in this work we sought to trigger such an initiative by proposing an approach based on Process Mining. With this technique, we aim to trace learning behaviors from FLOSS participants’ trails of activities as recorded in FLOSS repositories. We identify the participants as Novices and Experts. A Novice is defined as any FLOSS member that benefits from a learning experience through acquiring new skills while the Expert is the provider of these skills. The significance of our work is mainly twofold. First and foremost, we extend the MSR field by showing the potential of mining FLOSS repositories by applying Process Mining techniques. Lastly, our work provides critical evidence that boosts the understanding of learning behavior in FLOSS communities by analyzing the relevant repositories. In order to accomplish this, we have proposed and implemented a methodology that follows a seven-step approach including developing an appropriate terminology or ontology for learning processes in FLOSS, contextualizing learning processes through a-priori models, generating Event Logs, generating corresponding process models, interpreting and evaluating the value of process discovery, performing conformance analysis and verifying a number of formulated hypotheses with regard to tracing learning patterns in FLOSS communities. The implementation of this approach has resulted in the development of the Ontology of Learning in FLOSS (OntoLiFLOSS) environments that defines the terms needed to describe learning processes in FLOSS as well as providing a visual representation of these processes through Petri net-like Workflow nets. Moreover, another novelty pertains to the mining of FLOSS repositories by defining and describing the preliminaries required for preprocessing FLOSS data before applying Process Mining techniques for analysis. Through a step-by-step process, we effectively detail how the Event Logs are constructed through generating key phrases and making use of Semantic Search. Taking a FLOSS environment called Openstack as our data source, we apply our proposed techniques to identify learning activities based on key phrases catalogs and classification rules expressed through pseudo code as well as the appropriate Process Mining tool. We thus produced Event Logs that are based on the semantic content of messages in Openstack’s Mailing archives, Internet Relay Chat (IRC) messages, Reviews, Bug reports and Source code to retrieve the corresponding activities. Considering these repositories in light of the three learning process phases (Initiation, Progression and maturation), we produced an Event Log for each participant (Novice or Expert) in every phase on the corresponding dataset. Hence, we produced 14 Event Logs that helped build 14 corresponding process maps which are visual representation of the flow occurrence of learning activities in FLOSS for each participant. These process maps provide critical indications that speak volumes in terms of the presence of learning processes in the analyzed repositories. The results show that learning activities do occur at a significant rate during messages exchange on both Mailing archives and IRC messages. The slight differences between the two datasets can be highlighted in two ways. First, the involvement of Experts is more on iv IRC than it is on Mailing archives with 7.22% and 0.36% of Expert involvement respectively on IRC forums and Mailing lists. This can be justified by the differences in the length of messages sent on these two datasets. The average length of sent messages is 3261 characters for an email compared to 60 characters for a chat message. The evidence produced from this mining experiment solidifies the finding in terms of the existence of learning processes in FLOSS as well as the scale at which they occur. While the Initiation phase shows the Novice as the most involved in the start of the learning process, during Progression phase the involvement of the Expert can be seen to be significantly increasing. In order to trace the advanced skills in the Maturation phase, we look at repositories that store data about developing, creating code, examining and reviewing the code, identifying and fixing possible bugs. Therefore, we consider three repositories including Source Code, Bug reports and Reviews. The results obtained in this phase largely justify the choice of these three datasets to track learning behavior at this stage. Both the Bug reports and the Source code demonstrate the commitment of the Novice to seek answers and interact as much as possible in strengthening the acquired skills. With a participation of 49.22% for the Novice against 46.72% for the Expert and 46.19 % against 42.04% respectively on Bug reports and Source code, the Novice still engages significantly in learning. On the last dataset, Reviews, we notice an increase in the Expert’s role. The Expert performs activities to the tune of 40.36 % of total number of activities against 22.17 % for the Novice. The last steps of our methodology steer the comparison of the defined a-priori models with final models that describe how learning processes occur according to the actual behavior from Event Logs. Our attempts to producing process models start with depicting process maps to track the actual behaviour as it occurs in Openstack repositories, before concluding with final Petri net models representative of learning processes in FLOSS as a result of conformance analysis. For every dataset in the corresponding learning phase, we produce 3 process maps respectively depicting the overall learning behaviour for all FLOSS community members (Novice or Expert together), then the Novice and Expert. In total, we produced 21 process maps, empirically describing process models on real data, 14 process models in the form of Petri nets for every participant on each dataset. We make use of the Artificial Immune System (AIS) algorithms to merge the 14 Event Logs that uniquely capture the behaviour of every participant on different datasets in the three phases. We then reanalyze the resulting logs in order to produce 6 global models that inclusively provide a comprehensive depiction of participants’ learning behavior in FLOSS communities. This description hints that Workflow nets introduced as our a-priori models give rather a more simplistic representation of learning processes in FLOSS. Nevertheless, our experiments with Event Logs starting from process discovery to conformance checking from Openstack repositories demonstrate that the real learning behaviors are more complete and most importantly largely submerge these simplistic a-priori models. Finally, our methodology has proved to be effective in both providing a novel alternative for mining FLOSS repositories and providing empirical evidence that describes how knowledge is exchanged in FLOSS environments. Moreover, our results enrich the MSR field by providing a reproducible step-by-step problem solving approach that can be customized to answer subsequent research questions in FLOSS repositories using Process Mining
    corecore