73,715 research outputs found
Share your Model instead of your Data: Privacy Preserving Mimic Learning for Ranking
Deep neural networks have become a primary tool for solving problems in many
fields. They are also used for addressing information retrieval problems and
show strong performance in several tasks. Training these models requires large,
representative datasets and for most IR tasks, such data contains sensitive
information from users. Privacy and confidentiality concerns prevent many data
owners from sharing the data, thus today the research community can only
benefit from research on large-scale datasets in a limited manner. In this
paper, we discuss privacy preserving mimic learning, i.e., using predictions
from a privacy preserving trained model instead of labels from the original
sensitive training data as a supervision signal. We present the results of
preliminary experiments in which we apply the idea of mimic learning and
privacy preserving mimic learning for the task of document re-ranking as one of
the core IR tasks. This research is a step toward laying the ground for
enabling researchers from data-rich environments to share knowledge learned
from actual users' data, which should facilitate research collaborations.Comment: SIGIR 2017 Workshop on Neural Information Retrieval
(Neu-IR'17)}{}{August 7--11, 2017, Shinjuku, Tokyo, Japa
Deep security analysis of program code: a systematic literature review
Due to the continuous digitalization of our society, distributed and web-based applications become omnipresent and making them more secure gains paramount relevance. Deep learning (DL) and its representation learning approach are increasingly been proposed for program code analysis potentially providing a powerful means in making software systems less vulnerable. This systematic literature review (SLR) is aiming for a thorough analysis and comparison of 32 primary studies on DL-based vulnerability analysis of program code. We found a rich variety of proposed analysis approaches, code embeddings and network topologies. We discuss these techniques and alternatives in detail. By compiling commonalities and differences in the approaches, we identify the current state of research in this area and discuss future directions. We also provide an overview of publicly available datasets in order to foster a stronger benchmarking of approaches. This SLR provides an overview and starting point for researchers interested in deep vulnerability analysis on program code
Towards Adversarial Malware Detection: Lessons Learned from PDF-based Attacks
Malware still constitutes a major threat in the cybersecurity landscape, also
due to the widespread use of infection vectors such as documents. These
infection vectors hide embedded malicious code to the victim users,
facilitating the use of social engineering techniques to infect their machines.
Research showed that machine-learning algorithms provide effective detection
mechanisms against such threats, but the existence of an arms race in
adversarial settings has recently challenged such systems. In this work, we
focus on malware embedded in PDF files as a representative case of such an arms
race. We start by providing a comprehensive taxonomy of the different
approaches used to generate PDF malware, and of the corresponding
learning-based detection systems. We then categorize threats specifically
targeted against learning-based PDF malware detectors, using a well-established
framework in the field of adversarial machine learning. This framework allows
us to categorize known vulnerabilities of learning-based PDF malware detectors
and to identify novel attacks that may threaten such systems, along with the
potential defense mechanisms that can mitigate the impact of such threats. We
conclude the paper by discussing how such findings highlight promising research
directions towards tackling the more general challenge of designing robust
malware detectors in adversarial settings
Logistic Knowledge Tracing: A Constrained Framework for Learner Modeling
Adaptive learning technology solutions often use a learner model to trace
learning and make pedagogical decisions. The present research introduces a
formalized methodology for specifying learner models, Logistic Knowledge
Tracing (LKT), that consolidates many extant learner modeling methods. The
strength of LKT is the specification of a symbolic notation system for
alternative logistic regression models that is powerful enough to specify many
extant models in the literature and many new models. To demonstrate the
generality of LKT, we fit 12 models, some variants of well-known models and
some newly devised, to 6 learning technology datasets. The results indicated
that no single learner model was best in all cases, further justifying a broad
approach that considers multiple learner model features and the learning
context. The models presented here avoid student-level fixed parameters to
increase generalizability. We also introduce features to stand in for these
intercepts. We argue that to be maximally applicable, a learner model needs to
adapt to student differences, rather than needing to be pre-parameterized with
the level of each student's ability
The SkyMapper Transient Survey
The SkyMapper 1.3 m telescope at Siding Spring Observatory has now begun
regular operations. Alongside the Southern Sky Survey, a comprehensive digital
survey of the entire southern sky, SkyMapper will carry out a search for
supernovae and other transients. The search strategy, covering a total
footprint area of ~2000 deg2 with a cadence of days, is optimised for
discovery and follow-up of low-redshift type Ia supernovae to constrain cosmic
expansion and peculiar velocities. We describe the search operations and
infrastructure, including a parallelised software pipeline to discover variable
objects in difference imaging; simulations of the performance of the survey
over its lifetime; public access to discovered transients; and some first
results from the Science Verification data.Comment: 13 pages, 11 figures; submitted to PAS
- …