Search CORE

421 research outputs found

Advances on Time Series Analysis using Elastic Measures of Similarity

Author: Oregui I.
Publication venue
Publication date: 23/07/2020
Field of study

A sequence is a collection of data instances arranged in a structured manner. When this arrangement is held in the time domain, sequences are instead referred to as time series. As such, each observation in a time series represents an observation drawn from an underlying process, produced at a specific time instant. However, other type of data indexing structures, such as space- or threshold-based arrangements are possible. Data points that compose a time series are often correlated with each other. To account for this correlation in data mining tasks, time series are usually studied as a whole data object rather than as a collection of independent observations. In this context, techniques for time series analysis aim at analyzing this type of data structures by applying specific approaches developed to leverage intrinsic properties of the time series for a wide range of problems, such as classification, clustering and other tasks alike. The development of monitoring and storage devices has made time se- ries analysis proliferate in numerous application fields, including medicine, economics, manufacturing and telecommunications, among others. Over the years, the community has gathered efforts towards the development of new data-based techniques for time series analysis suited to address the problems and needs of such application fields. In the related literature, such techniques can be divided in three main groups: feature-, model- and distance-based methods. The first group (feature-based) transforms time series into a collection of features, which are then used by conventional learning algorithms to provide solutions to the task under consideration. In contrast, methods belonging to the second group (model-based) assume that each time series is drawn from a generative model, which is then har- nessed to elicit knowledge from data. Finally, distance-based techniques operate directly on raw time series. To this end, these methods resort to specially defined measures of distance or similarity for comparing time series, without requiring any further processing. Among them, elastic sim- ilarity measures (e.g., dynamic time warping and edit distance) compute the closeness between two sequences by finding the best alignment between them, disregarding differences in time, and thus focusing exclusively on shape differences. This Thesis presents several contributions to the field of distance-based techniques for time series analysis, namely: i) a novel multi-dimensional elastic similarity learning method for time series classification; ii) an adap- tation of elastic measures to streaming time series scenarios; and iii) the use of distance-based time series analysis to make machine learning meth- ods for image classification robust against adversarial attacks. Throughout the Thesis, each contribution is framed within its related state of the art, explained in detail and empirically evaluated. The obtained results lead to new insights on the application of distance-based time series methods for the considered scenarios, and motivates research directions that highlight the vibrant momentum of this research area

BCAM's Institutional Repository Data

Archivo Digital para la Docencia y la Investigación

Advances on Time Series Analysis using Elastic Measures of Similarity

Author: Oregui Bravo Izaskun
Publication venue
Publication date: 23/07/2020
Field of study

135 p.A sequence is a collection of data instances arranged in an structured manner. When thisarrangement is held in the time domain, sequences are instead referred to as time series. As such,each observation in a time series represents an observation drawn from an underlying process,produced at a specific time instant. However, other type of data indexing structures, such as spaceorthreshold-based arrangements are possible. Data points that compose a time series are oftencorrelated to each other. To account for this correlation in data mining tasks, time series are usuallystudied as a whole data object rather than as a collection of independent observations. In thiscontext, techniques for time series analysis aim at analyzing this type of data structures by applyingspecific approaches developed to harness intrinsic properties of the time series for a wide range ofproblems such as, classification, clustering and other tasks alike.The development of monitoring and storage devices has made time series analysisproliferate in numerous application fields including medicine, economics, manufacturing andtelecommunications, among others. Over the years, the community has gathered efforts towards thedevelopment of new data-based techniques for time series analysis suited to address the problemsand needs of such application fields. In the related literature, such techniques can be divided in threemain groups: feature-, model- and distance- based methods. The first group (feature-based)transforms time series into a collection of features, which are then used by conventional learningalgorithms to provide solutions to the task under consideration. In contrast, methods belonging to thesecond group (model-based) assume that each time series is drawn from a generative model, whichis then harnessed to elicit information from data. Finally, distance-based techniques operate directlyon raw time series. To this end, these latter methods resort to specially defined measures of distanceor similarity for comparing time series, without requiring any further processing. Among them,elastic similarity measures (e.g., dynamic time warping and edit distance) compute the closenessbetween two sequences by finding the best alignment between them, disregarding differences intime gaps and thus focusing exclusively on shape differences.This Thesis presents several contributions to the field of distance-based techniques for timeseries analysis, namely: i) a novel multi-dimensional elastic similarity learning method for timeseries classification; ii) an adaptation of elastic measures to streaming time series scenarios; and iii)the use of distance-based time series analysis to make machine learning methods for imageclassification robust against adversarial attacks. Throughout the Thesis, each contribution is framedwithin its related state of the art, explained in detail and empirically evaluated. The obtained resultslead to new insights on the application of distance-based time series methods for the consideredscenarios, and motivates research directions that highlight the vibrant momentum of this researcharea

Archivo Digital para la Docencia y la Investigación

Adversarial Deep Learning and Security with a Hardware Perspective

Author: Clements Joseph
Publication venue: Clemson University Libraries
Publication date: 01/05/2023
Field of study

Adversarial deep learning is the field of study which analyzes deep learning in the presence of adversarial entities. This entails understanding the capabilities, objectives, and attack scenarios available to the adversary to develop defensive mechanisms and avenues of robustness available to the benign parties. Understanding this facet of deep learning helps us improve the safety of the deep learning systems against external threats from adversaries. However, of equal importance, this perspective also helps the industry understand and respond to critical failures in the technology. The expectation of future success has driven significant interest in developing this technology broadly. Adversarial deep learning stands as a balancing force to ensure these developments remain grounded in the real-world and proceed along a responsible trajectory. Recently, the growth of deep learning has begun intersecting with the computer hardware domain to improve performance and efficiency for resource constrained application domains. The works investigated in this dissertation constitute our pioneering efforts in migrating adversarial deep learning into the hardware domain alongside its parent field of research

Clemson University: TigerPrints

Trustworthiness in Mobile Cyber Physical Systems

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

Computing and communication capabilities are increasingly embedded in diverse objects and structures in the physical environment. They will link the ‘cyberworld’ of computing and communications with the physical world. These applications are called cyber physical systems (CPS). Obviously, the increased involvement of real-world entities leads to a greater demand for trustworthy systems. Hence, we use "system trustworthiness" here, which can guarantee continuous service in the presence of internal errors or external attacks. Mobile CPS (MCPS) is a prominent subcategory of CPS in which the physical component has no permanent location. Mobile Internet devices already provide ubiquitous platforms for building novel MCPS applications. The objective of this Special Issue is to contribute to research in modern/future trustworthy MCPS, including design, modeling, simulation, dependability, and so on. It is imperative to address the issues which are critical to their mobility, report significant advances in the underlying science, and discuss the challenges of development and implementation in various applications of MCPS

Directory of Open Access Books (DOAB)

Effective and Secure Healthcare Machine Learning System with Explanations Based on High Quality Crowdsourcing Data

Author: Xue Qinghan
Publication venue: Lehigh Preserve
Publication date
Field of study

Affordable cloud computing technologies allow users to efficiently outsource, store, and manage their Personal Health Records (PHRs) and share with their caregivers or physicians. With this exponential growth of the stored large scale clinical data and the growing need for personalized care, researchers are keen on developing data mining methodologies to learn efficient hidden patterns in such data. While studies have shown that those progresses can significantly improve the performance of various healthcare applications for clinical decision making and personalized medicine, the collected medical datasets are highly ambiguous and noisy. Thus, it is essential to develop a better tool for disease progression and survival rate predictions, where dataset needs to be cleaned before it is used for predictions and useful feature selection techniques need to be employed before prediction models can be constructed. In addition, having predictions without explanations prevent medical personnel and patients from adopting such healthcare deep learning models. Thus, any prediction models must come with some explanations. Finally, despite the efficiency of machine learning systems and their outstanding prediction performance, it is still a risk to reuse pre-trained models since most machine learning modules that are contributed and maintained by third parties lack proper checking to ensure that they are robust to various adversarial attacks. We need to design mechanisms for detection such attacks. In this thesis, we focus on addressing all the above issues: (i) Privacy Preserving Disease Treatment & Complication Prediction System (PDTCPS): A privacy-preserving disease treatment, complication prediction scheme (PDTCPS) is proposed, which allows authorized users to conduct searches for disease diagnosis, personalized treatments, and prediction of potential complications. (ii) Incentivizing High Quality Crowdsourcing Data For Disease Prediction: A new incentive model with individual rationality and platform profitability features is developed to encourage different hospitals to share high quality data so that better prediction models can be constructed. We also explore how data cleaning and feature selection techniques affect the performance of the prediction models. (iii) Explainable Deep Learning Based Medical Diagnostic System: A deep learning based medical diagnosis system (DL-MDS) is present which integrates heterogeneous medical data sources to produce better disease diagnosis with explanations for authorized users who submit their personalized health related queries. (iv) Attacks on RNN based Healthcare Learning Systems and Their Detection & Defense Mechanisms: Potential attacks on Recurrent Neural Network (RNN) based ML systems are identified and low-cost detection & defense schemes are designed to prevent such adversarial attacks. Finally, we conduct extensive experiments using both synthetic and real-world datasets to validate the feasibility and practicality of our proposed systems

Lehigh University: Lehigh Preserve

Towards Scalable, Private and Practical Deep Learning

Author: Zawad Syed
Publication venue
Publication date: 01/02/2023
Field of study

Deep Learning (DL) models have drastically improved the performance of Artificial Intelligence (AI) tasks such as image recognition, word prediction, translation, among many others, on which traditional Machine Learning (ML) models fall short. However, DL models are costly to design, train, and deploy due to their computing and memory demands. Designing DL models usually requires extensive expertise and significant manual tuning efforts. Even with the latest accelerators such as Graphics Processing Unit (GPU) and Tensor Processing Unit (TPU), training DL models can take prohibitively long time, therefore training large DL models in a distributed manner is a norm. Massive amount of data is made available thanks to the prevalence of mobile and internet-of-things (IoT) devices. However, regulations such as HIPAA and GDPR limit the access and transmission of personal data to protect security and privacy. Therefore, enabling DL model training in a decentralized but private fashion is urgent and critical. Deploying trained DL models in a real world environment usually requires meeting Quality of Service (QoS) standards, which makes adaptability of DL models an important yet challenging matter. In this dissertation, we aim to address the above challenges to make a step towards scalable, private, and practical deep learning. To simplify DL model design, we propose Efficient Progressive Neural-Architecture Search (EPNAS) and FedCust to automatically design model architectures and tune hyperparameters, respectively. To provide efficient and robust distributed training while preserving privacy, we design LEASGD, TiFL, and HDFL. We further conduct a study on the security aspect of distributed learning by focusing on how data heterogeneity affects backdoor attacks and how to mitigate such threats. Finally, we use super resolution (SR) as an example application to explore model adaptability for cross platform deployment and dynamic runtime environment. Specifically, we propose DySR and AdaSR frameworks which enable SR models to meet QoS by dynamically adapting to available resources instantly and seamlessly without excessive memory overheads

University of Nevada, Reno ScholarWorks Repository

Cyber Security

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/02/2022
Field of study

This open access book constitutes the refereed proceedings of the 17th International Annual Conference on Cyber Security, CNCERT 2021, held in Beijing, China, in AJuly 2021. The 14 papers presented were carefully reviewed and selected from 51 submissions. The papers are organized according to the following topical sections: data security; privacy protection; anomaly detection; traffic analysis; social network security; vulnerability detection; text classification

Directory of Open Access Books (DOAB)