Search CORE

4,758 research outputs found

Machine Learning Models that Remember Too Much

Author: Biggio B.
Bugiel S.
Dinh T. T. A.
Fredrikson M.
Graham-Cumming J.
Han S.
Kloft M.
Krizhevsky A.
Krizhevsky A.
Lin Z.
Lowd D.
Maas A. L.
Ohrimenko O.
Schuster F.
Torres-Arias S.
Vapnik V.
Zhang C.
Publication venue
Publication date: 22/09/2017
Field of study

Machine learning (ML) is becoming a commodity. Numerous ML frameworks and services are available to data holders who are not ML experts but want to train predictive models on their data. It is important that ML models trained on sensitive inputs (e.g., personal images or documents) not leak too much information about the training data. We consider a malicious ML provider who supplies model-training code to the data holder, does not observe the training, but then obtains white- or black-box access to the resulting model. In this setting, we design and implement practical algorithms, some of them very similar to standard ML techniques such as regularization and data augmentation, that "memorize" information about the training dataset in the model yet the model is as accurate and predictive as a conventionally trained model. We then explain how the adversary can extract memorized information from the model. We evaluate our techniques on standard ML tasks for image classification (CIFAR10), face recognition (LFW and FaceScrub), and text analysis (20 Newsgroups and IMDB). In all cases, we show how our algorithms create models that have high predictive power yet allow accurate extraction of subsets of their training data

arXiv.org e-Print Archive

Crossref

Automatic Detection of Online Jihadist Hate Speech

Author: De Pauw Guy
De Smedt Tom
Van Ostaeyen Pieter
Publication venue
Publication date: 01/01/2018
Field of study

We have developed a system that automatically detects online jihadist hate speech with over 80% accuracy, by using techniques from Natural Language Processing and Machine Learning. The system is trained on a corpus of 45,000 subversive Twitter messages collected from October 2014 to December 2016. We present a qualitative and quantitative analysis of the jihadist rhetoric in the corpus, examine the network of Twitter users, outline the technical procedure used to train the system, and discuss examples of use.Comment: 31 page

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen

Fingerprinting encrypted network traffic types using machine learning

Author: Bohez Steven
Dhoedt Bart
Leroux Sam
Maenhaut Pieter-Jan
Meheus Nathan
Simoens Pieter
Publication venue
Publication date: 01/01/2018
Field of study

Crossref

Ghent University Academic Bibliography