155 research outputs found
A Motif-based Approach for Identifying Controversy
Among the topics discussed in Social Media, some lead to controversy. A
number of recent studies have focused on the problem of identifying controversy
in social media mostly based on the analysis of textual content or rely on
global network structure. Such approaches have strong limitations due to the
difficulty of understanding natural language, and of investigating the global
network structure. In this work we show that it is possible to detect
controversy in social media by exploiting network motifs, i.e., local patterns
of user interaction. The proposed approach allows for a language-independent
and fine- grained and efficient-to-compute analysis of user discussions and
their evolution over time. The supervised model exploiting motif patterns can
achieve 85% accuracy, with an improvement of 7% compared to baseline
structural, propagation-based and temporal network features
Learning Relatedness Measures for Entity Linking
Entity Linking is the task of detecting, in text documents, relevant mentions to entities of a given knowledge base. To this end, entity-linking algorithms use several signals and features extracted from the input text or from the knowl- edge base. The most important of such features is entity relatedness. Indeed, we argue that these algorithms benefit from maximizing the relatedness among the relevant enti- ties selected for annotation, since this minimizes errors in disambiguating entity-linking.
The definition of an e↵ective relatedness function is thus a crucial point in any entity-linking algorithm. In this paper we address the problem of learning high-quality entity relatedness functions. First, we formalize the problem of learning entity relatedness as a learning-to-rank problem. We propose a methodology to create reference datasets on the basis of manually annotated data. Finally, we show that our machine-learned entity relatedness function performs better than other relatedness functions previously proposed, and, more importantly, improves the overall performance of dif- ferent state-of-the-art entity-linking algorithms
On Suggesting Entities as Web Search Queries
The Web of Data is growing in popularity and dimension,
and named entity exploitation is gaining importance in many research
fields. In this paper, we explore the use of entities that can be extracted
from a query log to enhance query recommendation. In particular, we
extend a state-of-the-art recommendation algorithm to take into account
the semantic information associated with submitted queries. Our novel
method generates highly related and diversified suggestions that we as-
sess by means of a new evaluation technique. The manually annotated
dataset used for performance comparisons has been made available to
the research community to favor the repeatability of experiments
Discovering Europeana users’ search behavior
Europeana is a strategic project funded by the European Commission with the goal of making Europe's cultural and scientific heritage accessible to the public. ASSETS is a two-year Best Practice Network co-funded by the CIP PSP Programme to improve performance, accessibility and usability of the Europeana search engine. Here we present a characterization of the Europeana logs by showing statistics on common behavioural patterns of the Europeana users
EiFFFeL: Enforcing Fairness in Forests by Flipping Leaves
Nowadays Machine Learning (ML) techniques are extensively adopted in many socially sensitive systems, thus requiring to carefully study the fairness of the decisions taken by such systems. Many approaches have been proposed to address and to make sure there is no bias against individuals or specific groups which might originally come from biased training datasets or algorithm design. In this regard, we propose a fairness enforcing approach called EiFFFeL --Enforcing Fairness in Forests by Flipping Leaves-- which exploits tree-based or leaf-based post-processing strategies to relabel leaves of selected decision trees of a given forest. Experimental results show that our approach achieves a user-defined group fairness degree without losing a significant amount of accuracy
Feature Partitioning for Robust Tree Ensembles and their Certification in Adversarial Scenarios
Machine learning algorithms, however effective, are known to be vulnerable in
adversarial scenarios where a malicious user may inject manipulated instances.
In this work we focus on evasion attacks, where a model is trained in a safe
environment and exposed to attacks at test time. The attacker aims at finding a
minimal perturbation of a test instance that changes the model outcome.
We propose a model-agnostic strategy that builds a robust ensemble by
training its basic models on feature-based partitions of the given dataset. Our
algorithm guarantees that the majority of the models in the ensemble cannot be
affected by the attacker. We experimented the proposed strategy on decision
tree ensembles, and we also propose an approximate certification method for
tree ensembles that efficiently assess the minimal accuracy of a forest on a
given dataset avoiding the costly computation of evasion attacks.
Experimental evaluation on publicly available datasets shows that proposed
strategy outperforms state-of-the-art adversarial learning algorithms against
evasion attacks
Feature partitioning for robust tree ensembles and their certification in adversarial scenarios
Machine learning algorithms, however effective, are known to be vulnerable in adversarial scenarios where a malicious user may inject manipulated instances. In this work, we focus on evasion attacks, where a model is trained in a safe environment and exposed to attacks at inference time. The attacker aims at finding a perturbation of an instance that changes the model outcome.We propose a model-agnostic strategy that builds a robust ensemble by training its basic models on feature-based partitions of the given dataset. Our algorithm guarantees that the majority of the models in the ensemble cannot be affected by the attacker. We apply the proposed strategy to decision tree ensembles, and we also propose an approximate certification method for tree ensembles that efficiently provides a lower bound of the accuracy of a forest in the presence of attacks on a given dataset avoiding the costly computation of evasion attacks.Experimental evaluation on publicly available datasets shows that the proposed feature partitioning strategy provides a significant accuracy improvement with respect to competitor algorithms and that the proposed certification method allows ones to accurately estimate the effectiveness of a classifier where the brute-force approach would be unfeasible
Verifiable Boosted Tree Ensembles
Verifiable learning advocates for training machine learning models amenable
to efficient security verification. Prior research demonstrated that specific
classes of decision tree ensembles -- called large-spread ensembles -- allow
for robustness verification in polynomial time against any norm-based attacker.
This study expands prior work on verifiable learning from basic ensemble
methods (i.e., hard majority voting) to advanced boosted tree ensembles, such
as those trained using XGBoost or LightGBM. Our formal results indicate that
robustness verification is achievable in polynomial time when considering
attackers based on the -norm, but remains NP-hard for other
norm-based attackers. Nevertheless, we present a pseudo-polynomial time
algorithm to verify robustness against attackers based on the -norm for
any , which in practice grants excellent
performance. Our experimental evaluation shows that large-spread boosted
ensembles are accurate enough for practical adoption, while being amenable to
efficient security verification.Comment: 15 pages, 3 figure
Extending the state-of-the-art of constraint-based pattern discovery, In:
Abstract The constraint-based pattern discovery paradigm was introduced with the aim of providing to the user a tool to drive the discovery process towards potentially interesting patterns, with the positive side effect of achieving a more efficient computation. In this paper we review and extend the state-of-the-art of the constraints that can be pushed in a frequent pattern computation. We introduce novel data reduction techniques which are able to exploit convertible anti-monotone constraints (e.g., constraints on average or median) as well as tougher constraints (e.g., constraints on variance or standard deviation). A thorough experimental study is performed and it confirms that our framework outperforms previous algorithms for convertible constraints, and exploit the tougher ones with the same effectiveness. Finally, we highlight that the main advantage of our approach, i.e., pushing constraints by means of data reduction in a level-wise framework, is that different properties of different constraints can be exploited all together, and the total benefit is always greater than the sum of the individual benefits. This consideration leads to the definition of a general Apriori-like algorithm which is able to exploit all possible kinds of constraints studied so far
- …