140 research outputs found
Normative thinking on wastewater treatment plants
This document is the report of the thesis "Normative thinking on wastewater treatment plants". This
thesis was born from the interest of the author in Artificial Intelligence (A.I.). Having done all the subjects
related with AJ. that the Barcelona School of Informatics (FIB) offers, I asked the teachers of my favorite ones
for a thesis related with the A.I. . Ulises Cortés and Juan Carlos Nieves offered me this interesting thesis based
on a doctoral thesis of environmental sciences done by Montse Aulinas [23]. The proposed work implied
theoretical research, a working implementation and a real life domain to work with. I accepted without any
doubt.
Aulinas's thesis proposed a multi-agent based system to manage the problems caused by the industrial
wastewater discharges in rivers. She discussed that, by the use of intelligent agents in the managing process
of wastewaters, there could be an important increase in the quality of the river water and in the efficiency
from the organizational point of view. To do that she proposed a group of agents, which would take the roles
of the most important entities in the process of wastewater discharges, from industries to the agencies in
charge of controlling them, in order to represent all the involved parts. It is obvious that, for the agents to be
able to work rationally, they need to interact with the laws they are subject too That is the main issue this
thesis deals with.
Based on a real world doma in, this thesis proposes a way to make those laws to be comprehensible for
agents. It will discuss a methodology for analyzing, specifying, implementing and testing those laws, in a
generic way that can be applied to any normative environment.
The goals of this thesis are,
To obtain a generic and complete specification syntax for analyzing laws and norms, prove that
specification with an implementation of reallaws applied to the given doma in and
To develop a prototype where the norms implementation can be tested using a possible real
scenario
Link prediction in very large directed graphs: Exploiting hierarchical properties in parallel
Link prediction is a link mining task that tries to find new edges within a given graph. Among the targets of link prediction there is large directed graphs, which are frequent structures nowadays. The typical sparsity of large graphs demands of high precision predictions in order to obtain usable results. However, the size of those graphs only permits the execution of scalable algorithms. As a trade-off between those two problems we recently proposed a link prediction algorithm for directed graphs that exploits hierarchical properties. The algorithm can be classified as a local score, which entails scalability. Unlike the rest of local scores, our proposal assumes the existence of an underlying model for the data which allows it to produce predictions with a higher precision. We test the validity of its hierarchical assumptions on two clearly hierarchical data sets, one of them based on RDF. Then we test it on a non-hierarchical data set based on Wikipedia to demonstrate its broad applicability. Given the computational complexity of link prediction in very large graphs we also introduce some general recommendations useful to make of link prediction an efficiently parallelized problem.Peer ReviewedPostprint (published version
Link prediction in large directed graphs
The first chapter introduces an approach to machine learning (ML) were data is understood as a network of connected entities. This strategy seeks inter-entity information for knowledge discovery, in contrast with traditional intra-entity approaches based on instances and their features. We discuss the importance of this connectivist ML (which we refer to as graph mining) in the current context where large, topology-based data sets have been made available. Chapter ends by introducing the Link Prediction (LP) problem, together with its current computational and performance limitations.
The second chapter discusses early contributions to graph mining, and introduces problems frequently tackled through this paradigm. Later the chapter focuses on the state-of-the-art of LP. It presents three different approaches to the problem of finding links in a relational set, and argues about the importance of the most computationally scalable one: similarity-based algorithms. It categorizes similarity-based algorithms in three types of LP scores. For the most scalable type, local similarity-based algorithms, the chapter identifies and formally describes the most competitive proposals according to the bibliography.
Chapter three analyses the LP problem, partly as a classic binary classification problem. A list of graph properties such as directionality, weights and time are discussed in the context of LP. Follows a formal time and space complexity analysis of similarity-based scores of LP. The chapter ends with an study of the class imbalance found in LP problems.
In chapter four a novel similarity-based score of LP is introduced. The chapter first elaborates on the importance of hierarchies for representing knowledge through directed graphs. Several modifications to the proposed score are also defined. This chapter presents a modified version of the most competitive undirected scores of LP, to adapt them to directed graphs.
The evaluation methodologies of LP are analyzed in the fifth chapter. It starts by discussing the problem of evaluating domains with a huge class imbalance, identifying the most appropriate methodologies for it. A modification of the most appropriate evaluation methodology according to the bibliography is presented, with the goal of focusing on relevant predictions. Follows a discussion on the faithful estimation of the precision of predictors.
Chapter six describes the graphs used for score evaluation, as well as how data was transformed into a directed graph. Reasons on why these particular domains were chosen are given, making a special case of webgraphs and their well known relation with hierarchies. The most basic properties of each resultant graph are shown.
Tests performed are presented in chapter seven. The three most competitive LP scores currently available are tested among themselves, and against a proposed version of those same scores for directed graphs. Our proposed score and its modifications are tested against the scores obtaining the best results in the previous tests. The case of LP in webgraphs is considered separately, testing six different webgraphs. The chapter ends with a discussion on the limitations of this formal analysis, showing examples of predictions obtained.
Chapter eight includes the computational aspects of the work done. It starts with a discussion on the importance of memory management for determining the computational cost of LP algorithms. A proposal on how to reduce this cost through precision reduction is presented. Follows a section focused on the parallelization of code, which includes two different implementations on one graph-specific programming model (Pregel) and on one generic programming model (OpenMP). The chapter ends with a specification of the computational resources used for the tests done.
The conclusions of this thesis proposal are presented in nine. Chapter ten contains several future lines of work.El primer capÃtol introdueix una perspectiva de l'aprenentatge automà tic on les dades s'entén com una xarxa d'entitats connectades. Aquesta estratègia es centra en les relacions entre entitats per aprendre, en contrast amb les solucions tradicionals basades en instancies i els seus atributs. Discutim sobre la importà ncia d'aquesta perspectiva connectivista (a la que ens referim com mineria de grafs) en el context actual on grans conjunts de dades basats en xarxes estan apareixent. El capÃtol finalitza amb la presentació del problema de Predicció d'Arestes (PA), junt amb una primera anà lisi de les seves limitacions actuals. El segon capÃtol presenta les primeres contribucions a la mineria de grafs, introduint problemes tÃpicament solucionats mitjançant aquest paradigma. El capÃtol es centra en l'estat de l'art de PA. Presenta tres solucions diferents per al problema i argumenta la importà ncia del més computacionalment escalable: els algoritmes basats en similitud. Categoritza aquests en tres tipus, i per als més escalables d'aquests, els algoritmes locals, s'identifica i es descriu formalment les propostes més competitives d'acord amb la bibliografia. El tercer capÃtol analitza el problema de PA, inicialment com a problema de classificació binari. Una llista de propietats de grafs són discutides en el context de la PA, com la direccionalitat o els pesos. Segueix una anà lisi del cost computacional en temps com en espai, dels algorismes basats en similitud. El capÃtol finalitza amb un estudi del desbalanceig de classes, freqüent en la PA. Al capÃtol quatre es presenta un nou algorisme basat en similitud per la PA. El capÃtol elabora sobre la importà ncia de les jerarquies a la representació del coneixement a través de grafs dirigits. Varies modificacions es proposen per al nou algorisme. Aquest capÃtol també inclou una modificació sobre els actuals algorismes de similitud per a grafs no dirigits, per adaptar-los per a grafs dirigits. Les metodologies d'avaluació de la PA s'analitzen al cinquè capÃtol. Comença amb una discussió sobre els problemes que suposa avaluar un context amb un gran desbalanceig de classes, identificant les metodologies apropiades per aquests casos. Es proposa una modificació sobre el mètode més apropiat actualment disponible, per tal de centrar-se en les prediccions rellevants. Segueix una discussió sobre l'estimació fidedigna de la precisió dels predictors. El sisè capÃtol descriu els grafs usats per avaluar els algorismes, aixà com la metodologia usada per transformar-los en grafs dirigits. Les raons per triar aquest conjunt de grafs són exposades, posant especial interès al cas dels grafs web i a la seva ben coneguda relació amb les jerarquies. Les propietats més bà siques de cada graf resultant són descrites. Els tests efectuats es mostren al capÃtol setè. Els tres algorismes actuals de PA més competitius són comparats amb ells mateixos i amb la versió per a grafs dirigits definida anteriorment. L'algorisme proposat anteriorment i les seves modificacions també són avaluats. El problema de la PA en grafs web es considera per separat, avaluant sis grafs web diferents. El capÃtol acaba amb una discussió sobre les limitacions de les avaluacions formals, mostrant exemples de prediccions obtingudes. El vuitè capÃtol inclou els aspectes computacionals de la tesi. Comença amb una discussió sobre la importà ncia de la gestió de memòria per a la definició del cost computacional dels algorismes de PA. Inclou una proposta sobre com reduir aquest cost mitjançant una reducció en la precisió. Segueix una secció centrada en la paral·lelització del codi, que inclou dues implementacions diferents, una en un model de programació especÃfic per grafs (Pregel) i una amb un model de programació paral·lela genèric (OpenMP). El capÃtol finalitza amb una especificació dels recursos computacionals usats per als tests realitzats. Les conclusions de la tesi es presenten al capÃtol novè, i les lÃnies de treball futur al des
Normative thinking on wastewater treatment plants
This document is the report of the thesis "Normative thinking on wastewater treatment plants". This
thesis was born from the interest of the author in Artificial Intelligence (A.I.). Having done all the subjects
related with AJ. that the Barcelona School of Informatics (FIB) offers, I asked the teachers of my favorite ones
for a thesis related with the A.I. . Ulises Cortés and Juan Carlos Nieves offered me this interesting thesis based
on a doctoral thesis of environmental sciences done by Montse Aulinas [23]. The proposed work implied
theoretical research, a working implementation and a real life domain to work with. I accepted without any
doubt.
Aulinas's thesis proposed a multi-agent based system to manage the problems caused by the industrial
wastewater discharges in rivers. She discussed that, by the use of intelligent agents in the managing process
of wastewaters, there could be an important increase in the quality of the river water and in the efficiency
from the organizational point of view. To do that she proposed a group of agents, which would take the roles
of the most important entities in the process of wastewater discharges, from industries to the agencies in
charge of controlling them, in order to represent all the involved parts. It is obvious that, for the agents to be
able to work rationally, they need to interact with the laws they are subject too That is the main issue this
thesis deals with.
Based on a real world doma in, this thesis proposes a way to make those laws to be comprehensible for
agents. It will discuss a methodology for analyzing, specifying, implementing and testing those laws, in a
generic way that can be applied to any normative environment.
The goals of this thesis are,
To obtain a generic and complete specification syntax for analyzing laws and norms, prove that
specification with an implementation of reallaws applied to the given doma in and
To develop a prototype where the norms implementation can be tested using a possible real
scenario
Hierarchical inference applied to Cyc
Hierarchical graphs are a frequent solution for capturing symbolic data due the importance of hierarchies for defining knowledge. In these graphs, relations among elements may contain large portions of the element’s semantics. However, knowledge discovery based on analyzing the patterns of hierarchical relations is rarely used. We outline four inference based algorithms exploiting semantic properties of hierarchically represented knowledge for producing new links, and test one of them on a generalization of Cyc’s KB. Finally, we argue why such algorithms can be useful for unsupervised learning and supervised analysis of a KBPeer ReviewedPostprint (author’s final draft
Bringing action language C+ to normative contexts: preliminary report
C+ is an action language for specifying and reasoning about the e ects of actions and the persistence of facts over time. Based on it. we present CN+, an operational enhanced form of C+ designed for representing complex normative systems and integrate them easily into the semantics of the causal theory of actions. The proposed system contains a particular formalization of norms using a life-cycle approach to capture the whole normative meaning of a complex normative framework. We discuss this approach and illustrate it with examples.Peer ReviewedPostprint (author’s final draft
Focus! rating XAI methods and finding biases
Explainability has become a major topic of research in
Artificial Intelligence (AI), aimed at increasing trust in models
such as Deep Learning (DL) networks. However, trustworthy
models cannot be achieved with explainable AI (XAI) methods
unless the XAI methods themselves can be trusted.
To evaluate XAI methods one may assess interpretability,
a qualitative measure of how understandable an explanation is
to humans [1]. While this is important to guarantee the proper
interaction between humans and the model, interpretability
generally involves end-users in the process [2], inducing strong
biases. In fact, a qualitative evaluation alone cannot guarantee
coherency to reality (i.e., model behavior), as false explanations
can be more interpretable than accurate ones. To enable
trust on XAI methods, we also need quantitative and objective
evaluation metrics, which validate the relation between the
explanations produced by the XAI method and the behavior
of the trained model under assessment.
In this work we propose a novel evaluation score for feature
attribution methods, described in §I-A. Our input alteration
approach induces in-distribution noise into samples, that is,
alterations on the input which correspond to visual patterns
found within the original data distribution. To do so we modify
the context of the sample instead of the content, leaving the
original pixels values untouched. In practice, we create a
new sample, composed of samples of different classes, which
we call a mosaic image (see examples in Figure 2). Using
mosaics as input has a major benefit: each input quadrant is
an image from the original distribution, producing blobs of
activations in each quadrant which are consequently coherent.
Only the pixels forming the borders between images, and
the few corresponding activations, may be considered out of
distribution.
By inducing in-distribution noise, mosaic images introduce
a problem in which XAI methods may objectively err (focus on
something it should not be focusing on). On those composed
mosaics we ask a XAI method to provide explanation for just
one of the contained classes, and follow its response. Then,
we measure how much of the explanation generated by the
XAI is located on the areas corresponding to the target class,
quantifying it through the Focus score. This score allows us to
compare methods in terms of explanation precision, evaluating
the capability of XAI methods to provide explanations related
to the requested class. Using mosaics has another benefit. Since
the noise introduced is in-distribution, the explanation errors
identify and exemplify biases of the model. This facilitates
the elimination of biases in models and datasets, potentially
resulting in more reliable solutions. We illustrate how to do so
in §I-C
- …