658 research outputs found

    Machine Learning

    Get PDF
    Machine Learning can be defined in various ways related to a scientific domain concerned with the design and development of theoretical and implementation tools that allow building systems with some Human Like intelligent behavior. Machine learning addresses more specifically the ability to improve automatically through experience

    Online Spectral Clustering on Network Streams

    Get PDF
    Graph is an extremely useful representation of a wide variety of practical systems in data analysis. Recently, with the fast accumulation of stream data from various type of networks, significant research interests have arisen on spectral clustering for network streams (or evolving networks). Compared with the general spectral clustering problem, the data analysis of this new type of problems may have additional requirements, such as short processing time, scalability in distributed computing environments, and temporal variation tracking. However, to design a spectral clustering method to satisfy these requirements certainly presents non-trivial efforts. There are three major challenges for the new algorithm design. The first challenge is online clustering computation. Most of the existing spectral methods on evolving networks are off-line methods, using standard eigensystem solvers such as the Lanczos method. It needs to recompute solutions from scratch at each time point. The second challenge is the parallelization of algorithms. To parallelize such algorithms is non-trivial since standard eigen solvers are iterative algorithms and the number of iterations can not be predetermined. The third challenge is the very limited existing work. In addition, there exists multiple limitations in the existing method, such as computational inefficiency on large similarity changes, the lack of sound theoretical basis, and the lack of effective way to handle accumulated approximate errors and large data variations over time. In this thesis, we proposed a new online spectral graph clustering approach with a family of three novel spectrum approximation algorithms. Our algorithms incrementally update the eigenpairs in an online manner to improve the computational performance. Our approaches outperformed the existing method in computational efficiency and scalability while retaining competitive or even better clustering accuracy. We derived our spectrum approximation techniques GEPT and EEPT through formal theoretical analysis. The well established matrix perturbation theory forms a solid theoretic foundation for our online clustering method. We facilitated our clustering method with a new metric to track accumulated approximation errors and measure the short-term temporal variation. The metric not only provides a balance between computational efficiency and clustering accuracy, but also offers a useful tool to adapt the online algorithm to the condition of unexpected drastic noise. In addition, we discussed our preliminary work on approximate graph mining with evolutionary process, non-stationary Bayesian Network structure learning from non-stationary time series data, and Bayesian Network structure learning with text priors imposed by non-parametric hierarchical topic modeling

    Emotion Recognition from Speech with Acoustic, Non-Linear and Wavelet-based Features Extracted in Different Acoustic Conditions

    Get PDF
    ABSTRACT: In the last years, there has a great progress in automatic speech recognition. The challenge now it is not only recognize the semantic content in the speech but also the called "paralinguistic" aspects of the speech, including the emotions, and the personality of the speaker. This research work aims in the development of a methodology for the automatic emotion recognition from speech signals in non-controlled noise conditions. For that purpose, different sets of acoustic, non-linear, and wavelet based features are used to characterize emotions in different databases created for such purpose

    Vertical Federated Learning

    Full text link
    Vertical Federated Learning (VFL) is a federated learning setting where multiple parties with different features about the same set of users jointly train machine learning models without exposing their raw data or model parameters. Motivated by the rapid growth in VFL research and real-world applications, we provide a comprehensive review of the concept and algorithms of VFL, as well as current advances and challenges in various aspects, including effectiveness, efficiency, and privacy. We provide an exhaustive categorization for VFL settings and privacy-preserving protocols and comprehensively analyze the privacy attacks and defense strategies for each protocol. In the end, we propose a unified framework, termed VFLow, which considers the VFL problem under communication, computation, privacy, and effectiveness constraints. Finally, we review the most recent advances in industrial applications, highlighting open challenges and future directions for VFL

    Fourteenth Biennial Status Report: März 2017 - February 2019

    No full text

    Robust and Adversarial Data Mining

    Get PDF
    In the domain of data mining and machine learning, researchers have made significant contributions in developing algorithms handling clustering and classification problems. We develop algorithms under assumptions that are not met by previous works. (i) In adversarial learning, which is the study of machine learning techniques deployed in non-benign environments. We design an algorithm to show how a classifier should be designed to be robust against sparse adversarial attacks. Our main insight is that sparse feature attacks are best defended by designing classifiers which use L1 regularizers. (ii) The different properties between L1 (Lasso) and L2 (Tikhonov or Ridge) regularization has been studied extensively. However, given a data set, principle to follow in terms of choosing the suitable regularizer is yet to be developed. We use mathematical properties of the two regularization methods followed by detailed experimentation to understand their impact based on four characteristics. (iii) The identification of anomalies is an inherent component of knowledge discovery. In lots of cases, the number of features of a data set can be traced to a much smaller set of features. We claim that algorithms applied in a latent space are more robust. This can lead to more accurate results, and potentially provide a natural medium to explain and describe outliers. (iv) We also apply data mining techniques on health care industry. In a lot cases, health insurance companies cover unnecessary costs carried out by healthcare providers. The potential adversarial behaviours of surgeon physicians are addressed. We describe a specific con- text of private healthcare in Australia and describe our social network based approach (applied to health insurance claims) to understand the nature of collaboration among doctors treating hospital inpatients and explore the impact of collaboration on cost and quality of care. (v) We further develop models that predict the behaviours of orthopaedic surgeons in regard to surgery type and use of prosthetic device. An important feature of these models is that they can not only predict the behaviours of surgeons but also provide explanation for the predictions

    Privacy-preserving machine learning system at the edge

    Get PDF
    Data privacy in machine learning has become an urgent problem to be solved, along with machine learning's rapid development and the large attack surface being explored. Pre-trained deep neural networks are increasingly deployed in smartphones and other edge devices for a variety of applications, leading to potential disclosures of private information. In collaborative learning, participants keep private data locally and communicate deep neural networks updated on their local data, but still, the private information encoded in the networks' gradients can be explored by adversaries. This dissertation aims to perform dedicated investigations on privacy leakage from neural networks and to propose privacy-preserving machine learning systems for edge devices. Firstly, the systematization of knowledge is conducted to identify the key challenges and existing/adaptable solutions. Then a framework is proposed to measure the amount of sensitive information memorized in each layer's weights of a neural network based on the generalization error. Results show that, when considered individually, the last layers encode a larger amount of information from the training data compared to the first layers. To protect such sensitive information in weights, DarkneTZ is proposed as a framework that uses an edge device's Trusted Execution Environment (TEE) in conjunction with model partitioning to limit the attack surface against neural networks. The performance of DarkneTZ is evaluated, including CPU execution time, memory usage, and accurate power consumption, using two small and six large image classification models. Due to the limited memory of the edge device's TEE, model layers are partitioned into more sensitive layers (to be executed inside the device TEE), and a set of layers to be executed in the untrusted part of the operating system. Results show that even if a single layer is hidden, one can provide reliable model privacy and defend against state of art membership inference attacks, with only a 3% performance overhead. This thesis further strengthens investigations from neural network weights (in on-device machine learning deployment) to gradients (in collaborative learning). An information-theoretical framework is proposed, by adapting usable information theory and considering the attack outcome as a probability measure, to quantify private information leakage from network gradients. The private original information and latent information are localized in a layer-wise manner. After that, this work performs sensitivity analysis over the gradients \wrt~private information to further explore the underlying cause of information leakage. Numerical evaluations are conducted on six benchmark datasets and four well-known networks and further measure the impact of training hyper-parameters and defense mechanisms. Last but not least, to limit the privacy leakages in gradients, I propose and implement a Privacy-preserving Federated Learning (PPFL) framework for mobile systems. TEEs are utilized on clients for local training, and on servers for secure aggregation, so that model/gradient updates are hidden from adversaries. This work leverages greedy layer-wise training to train each model's layer inside the trusted area until its convergence. The performance evaluation of the implementation shows that PPFL significantly improves privacy by defending against data reconstruction, property inference, and membership inference attacks while incurring small communication overhead and client-side system overheads. This thesis offers a better understanding of the sources of private information in machine learning and provides frameworks to fully guarantee privacy and achieve comparable ML model utility and system overhead with regular machine learning framework.Open Acces

    Adversarial Attacks and Defenses in Machine Learning-Powered Networks: A Contemporary Survey

    Full text link
    Adversarial attacks and defenses in machine learning and deep neural network have been gaining significant attention due to the rapidly growing applications of deep learning in the Internet and relevant scenarios. This survey provides a comprehensive overview of the recent advancements in the field of adversarial attack and defense techniques, with a focus on deep neural network-based classification models. Specifically, we conduct a comprehensive classification of recent adversarial attack methods and state-of-the-art adversarial defense techniques based on attack principles, and present them in visually appealing tables and tree diagrams. This is based on a rigorous evaluation of the existing works, including an analysis of their strengths and limitations. We also categorize the methods into counter-attack detection and robustness enhancement, with a specific focus on regularization-based methods for enhancing robustness. New avenues of attack are also explored, including search-based, decision-based, drop-based, and physical-world attacks, and a hierarchical classification of the latest defense methods is provided, highlighting the challenges of balancing training costs with performance, maintaining clean accuracy, overcoming the effect of gradient masking, and ensuring method transferability. At last, the lessons learned and open challenges are summarized with future research opportunities recommended.Comment: 46 pages, 21 figure
    • …
    corecore