5,964 research outputs found
Automatically combining static malware detection techniques
Malware detection techniques come in many different flavors, and cover different effectiveness and efficiency trade-offs. This paper evaluates a number of machine learning techniques to combine multiple static Android malware detection techniques using automatically constructed decision trees. We identify the best methods to construct the trees. We demonstrate that those trees classify sample apps better and faster than individual techniques alone
Learning and Interpreting Multi-Multi-Instance Learning Networks
We introduce an extension of the multi-instance learning problem where
examples are organized as nested bags of instances (e.g., a document could be
represented as a bag of sentences, which in turn are bags of words). This
framework can be useful in various scenarios, such as text and image
classification, but also supervised learning over graphs. As a further
advantage, multi-multi instance learning enables a particular way of
interpreting predictions and the decision function. Our approach is based on a
special neural network layer, called bag-layer, whose units aggregate bags of
inputs of arbitrary size. We prove theoretically that the associated class of
functions contains all Boolean functions over sets of sets of instances and we
provide empirical evidence that functions of this kind can be actually learned
on semi-synthetic datasets. We finally present experiments on text
classification, on citation graphs, and social graph data, which show that our
model obtains competitive results with respect to accuracy when compared to
other approaches such as convolutional networks on graphs, while at the same
time it supports a general approach to interpret the learnt model, as well as
explain individual predictions.Comment: JML
On Cognitive Preferences and the Plausibility of Rule-based Models
It is conventional wisdom in machine learning and data mining that logical
models such as rule sets are more interpretable than other models, and that
among such rule-based models, simpler models are more interpretable than more
complex ones. In this position paper, we question this latter assumption by
focusing on one particular aspect of interpretability, namely the plausibility
of models. Roughly speaking, we equate the plausibility of a model with the
likeliness that a user accepts it as an explanation for a prediction. In
particular, we argue that, all other things being equal, longer explanations
may be more convincing than shorter ones, and that the predominant bias for
shorter models, which is typically necessary for learning powerful
discriminative models, may not be suitable when it comes to user acceptance of
the learned models. To that end, we first recapitulate evidence for and against
this postulate, and then report the results of an evaluation in a
crowd-sourcing study based on about 3.000 judgments. The results do not reveal
a strong preference for simple rules, whereas we can observe a weak preference
for longer rules in some domains. We then relate these results to well-known
cognitive biases such as the conjunction fallacy, the representative heuristic,
or the recogition heuristic, and investigate their relation to rule length and
plausibility.Comment: V4: Another rewrite of section on interpretability to clarify focus
on plausibility and relation to interpretability, comprehensibility, and
justifiabilit
Metaphysics and Law
The dichotomy between questions of fact and questions of law serves as a starting point for the following discussion of the nature of legal reasoning. In the course of the dialogue the author notes similarities and dissimilarities between legal reasoning and philosophical and mathematical reasoning. In the end we are left with a clearer insight into the distinctive features of the adjudicative process
A Customer Segmentation Mining System on the Web Platform
We will introduce a knowledge discovery system developed on the World Wide Web platform in this paper. Its algorithm is based on Fuzzy Inductive Learning Method (FILM), which can segment consumers\u27 behavior from a set of customer data with noises. In a visualization way, the system will present the acquired knowledge as a set of IF-THEN rules that can be run on top of an expert system. Moreover, the system will provide advices in response to a user\u27s request through the network and a friendly user interface. At last, we evaluate the function of the system by training it with a transaction database provided by a local automobile dealer
Data mining using Matlab
Data mining is a relatively new field emerging in many disciplines. It is becoming more
popular as technology advances, and the need for efficient data analysis is required.
The aim of data mining itself is not to provide strict rules by analysing the full data
set, data mining is used to predict with some certainty while only analysing a small
portion of the data. This project seeks to compare the efficiency of a decision tree
induction method with that of the neural network method.
MATLAB has inbuilt data mining toolboxes. However the decision tree induction
method is not as yet implemented. Decision tree induction has been implemented in
several forms in the past. The greatest contribution to this method has been made by
DR John Ross Quinlan, who has brought forward this method in the form of ID3, C4.5
and C5 algorithms. The methodologies used within ID3 and C4.5 are well documented
and therefore provide a strong platform for the implementation of this method within
a higher level language.
The objectives of this study are to fully comprehend two methods of data mining,
namely decision tree induction and neural networks. The decision tree induction
method is to be implemented within the mathematical computer language MATLAB.
The results found when analysing some suitable data will be compared with the results
from the neural network toolbox already implemented in MATLAB.
The data used to compare and contrast the two methods included voting records from
the US House of Representatives, which consists of yes, no and undecided votes on sixteen
separate issues. The voters are grouped into categories according to their political
party. This can be either republican or democratic. The objective of using this data
set is to predict what party a congressman is affiliated with by analysing their voting
trends.
The findings of this study reveal that the decision tree method can accurately predict
outcomes if an ideal data set is used for building the tree. The neural network method
has less accuracy in some situations however it is more robust towards unexpected data
Automated Certification of Authorisation Policy Resistance
Attribute-based Access Control (ABAC) extends traditional Access Control by
considering an access request as a set of pairs attribute name-value, making it
particularly useful in the context of open and distributed systems, where
security relevant information can be collected from different sources. However,
ABAC enables attribute hiding attacks, allowing an attacker to gain some access
by withholding information. In this paper, we first introduce the notion of
policy resistance to attribute hiding attacks. We then propose the tool ATRAP
(Automatic Term Rewriting for Authorisation Policies), based on the recent
formal ABAC language PTaCL, which first automatically searches for resistance
counter-examples using Maude, and then automatically searches for an Isabelle
proof of resistance. We illustrate our approach with two simple examples of
policies and propose an evaluation of ATRAP performances.Comment: 20 pages, 4 figures, version including proofs of the paper that will
be presented at ESORICS 201
- âŠ