273 research outputs found
Classes of Terminating Logic Programs
Termination of logic programs depends critically on the selection rule, i.e.
the rule that determines which atom is selected in each resolution step. In
this article, we classify programs (and queries) according to the selection
rules for which they terminate. This is a survey and unified view on different
approaches in the literature. For each class, we present a sufficient, for most
classes even necessary, criterion for determining that a program is in that
class. We study six classes: a program strongly terminates if it terminates for
all selection rules; a program input terminates if it terminates for selection
rules which only select atoms that are sufficiently instantiated in their input
positions, so that these arguments do not get instantiated any further by the
unification; a program local delay terminates if it terminates for local
selection rules which only select atoms that are bounded w.r.t. an appropriate
level mapping; a program left-terminates if it terminates for the usual
left-to-right selection rule; a program exists-terminates if there exists a
selection rule for which it terminates; finally, a program has bounded
nondeterminism if it only has finitely many refutations. We propose a
semantics-preserving transformation from programs with bounded nondeterminism
into strongly terminating programs. Moreover, by unifying different formalisms
and making appropriate assumptions, we are able to establish a formal hierarchy
between the different classes.Comment: 50 pages. The following mistake was corrected: In figure 5, the first
clause for insert was insert([],X,[X]
DEMON: a Local-First Discovery Method for Overlapping Communities
Community discovery in complex networks is an interesting problem with a
number of applications, especially in the knowledge extraction task in social
and information networks. However, many large networks often lack a particular
community organization at a global level. In these cases, traditional graph
partitioning algorithms fail to let the latent knowledge embedded in modular
structure emerge, because they impose a top-down global view of a network. We
propose here a simple local-first approach to community discovery, able to
unveil the modular organization of real complex networks. This is achieved by
democratically letting each node vote for the communities it sees surrounding
it in its limited view of the global system, i.e. its ego neighborhood, using a
label propagation algorithm; finally, the local communities are merged into a
global collection. We tested this intuition against the state-of-the-art
overlapping and non-overlapping community discovery methods, and found that our
new method clearly outperforms the others in the quality of the obtained
communities, evaluated by using the extracted communities to predict the
metadata about the nodes of several real world networks. We also show how our
method is deterministic, fully incremental, and has a limited time complexity,
so that it can be used on web-scale real networks.Comment: 9 pages; Proceedings of the 18th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, Beijing, China, August 12-16, 201
Data Mining for Discrimination Discovery
In the context of civil rights law, discrimination refers to unfair or unequal treatment of people based on membership to a category or a minority, without regard to individual merit. Discrimination in credit, mortgage, insurance, labor market, and education has been investigated by researchers in economics and human sciences. With the advent of automatic decision support systems, such as credit scoring systems, the ease of data collection opens several challenges to data analysts for the fight against discrimination. In this paper, we introduce the problem of discovering discrimination through data mining in a dataset of historical decision records, taken by humans or by automatic systems. We formalize the processes of direct and indirect discrimination discovery by modelling protected-by-law groups and contexts where discrimination occurs in a classification rule based syntax. Basically, classification rules extracted from the dataset allow for unveiling contexts of unlawful discrimination, where the degree of burden over protected-bylaw groups is formalized by an extension of the lift measure of a classification rule. In direct discrimination, the extracted rules can be directly mined in search of discriminatory contexts. In indirect discrimination, the mining process needs some background knowledge as a further input, e.g., census data, that combined with the extracted rules might allow for unveiling contexts of discriminatory decisions. A strategy adopted for combining extracted classification rules with background knowledge is called an inference model. In this paper, we propose two inference models and provide automatic procedures for their implementation. An empirical assessment of our results is provided on the German credit dataset and on the PKDD Discovery Challenge 1999 financial dataset
RT-MongoDB: A NoSQL Database with Differentiated Performance
The advent of Cloud Computing and Big Data brought several changes and innovations in the landscape of database management systems. Nowadays, a cloud-friendly storage system is required to reliably support data that is in continuous motion and of previously unthinkable magnitude, while guaranteeing high availability and optimal performance to thousands of clients. In particular, NoSQL database services are taking momentum as a key technology thanks to their relaxed requirements with respect to their relational counterparts, that are not designed to scale massively on distributed systems. Most research papers on performance of cloud storage systems propose solutions that aim to achieve the highest possible throughput, while neglecting the problem of controlling the response latency for specific users or queries. The latter research topic is particularly important for distributed real-time applications, where task completion is bounded by precise timing constraints. In this paper, the popular MongoDB NoSQL database software is modified introducing a per-client/request prioritization mechanism within the request processing engine, allowing for a better control of the temporal interference among competing requests with different priorities. Extensive experimentation with synthetic stress workloads demonstrates that the proposed solution is able to assure differentiated per-client/request performance in a shared MongoDB instance. Namely, requests with higher priorities achieve reduced and significantly more stable response times, with respect to lower priorities ones. This constitutes a basic but fundamental brick in providing assured performance to distributed real-time applications making use of NoSQL database services
PlayeRank: data-driven performance evaluation and player ranking in soccer via a machine learning approach
The problem of evaluating the performance of soccer players is attracting the
interest of many companies and the scientific community, thanks to the
availability of massive data capturing all the events generated during a match
(e.g., tackles, passes, shots, etc.). Unfortunately, there is no consolidated
and widely accepted metric for measuring performance quality in all of its
facets. In this paper, we design and implement PlayeRank, a data-driven
framework that offers a principled multi-dimensional and role-aware evaluation
of the performance of soccer players. We build our framework by deploying a
massive dataset of soccer-logs and consisting of millions of match events
pertaining to four seasons of 18 prominent soccer competitions. By comparing
PlayeRank to known algorithms for performance evaluation in soccer, and by
exploiting a dataset of players' evaluations made by professional soccer
scouts, we show that PlayeRank significantly outperforms the competitors. We
also explore the ratings produced by {\sf PlayeRank} and discover interesting
patterns about the nature of excellent performances and what distinguishes the
top players from the others. At the end, we explore some applications of
PlayeRank -- i.e. searching players and player versatility --- showing its
flexibility and efficiency, which makes it worth to be used in the design of a
scalable platform for soccer analytics
Local Rule-Based Explanations of Black Box Decision Systems
The recent years have witnessed the rise of accurate but obscure decision
systems which hide the logic of their internal decision processes to the users.
The lack of explanations for the decisions of black box systems is a key
ethical issue, and a limitation to the adoption of machine learning components
in socially sensitive and safety-critical contexts. %Therefore, we need
explanations that reveals the reasons why a predictor takes a certain decision.
In this paper we focus on the problem of black box outcome explanation, i.e.,
explaining the reasons of the decision taken on a specific instance. We propose
LORE, an agnostic method able to provide interpretable and faithful
explanations. LORE first leans a local interpretable predictor on a synthetic
neighborhood generated by a genetic algorithm. Then it derives from the logic
of the local interpretable predictor a meaningful explanation consisting of: a
decision rule, which explains the reasons of the decision; and a set of
counterfactual rules, suggesting the changes in the instance's features that
lead to a different outcome. Wide experiments show that LORE outperforms
existing methods and baselines both in the quality of explanations and in the
accuracy in mimicking the black box
- …