24 research outputs found
Reading the news through its structure: new hybrid connectivity based approaches
In this thesis a solution for the problem of identifying the structure of news published
by online newspapers is presented. This problem requires new approaches and algorithms
that are capable of dealing with the massive number of online publications in existence
(and that will grow in the future). The fact that news documents present a high degree of
interconnection makes this an interesting and hard problem to solve. The identification
of the structure of the news is accomplished both by descriptive methods that expose the
dimensionality of the relations between different news, and by clustering the news into
topic groups. To achieve this analysis this integrated whole was studied using different
perspectives and approaches.
In the identification of news clusters and structure, and after a preparatory data collection
phase, where several online newspapers from different parts of the globe were
collected, two newspapers were chosen in particular: the Portuguese daily newspaper
PĂşblico and the British newspaper The Guardian.
In the first case, it was shown how information theory (namely variation of information)
combined with adaptive networks was able to identify topic clusters in the news published
by the Portuguese online newspaper PĂşblico.
In the second case, the structure of news published by the British newspaper The
Guardian is revealed through the construction of time series of news clustered by a kmeans
process. After this approach an unsupervised algorithm, that filters out irrelevant
news published online by taking into consideration the connectivity of the news labels
entered by the journalists, was developed. This novel hybrid technique is based on Qanalysis
for the construction of the filtered network followed by a clustering technique to
identify the topical clusters. Presently this work uses a modularity optimisation clustering technique but this step is general enough that other hybrid approaches can be used without
losing generality.
A novel second order swarm intelligence algorithm based on Ant Colony Systems
was developed for the travelling salesman problem that is consistently better than the
traditional benchmarks. This algorithm is used to construct Hamiltonian paths over the
news published using the eccentricity of the different documents as a measure of distance.
This approach allows for an easy navigation between published stories that is dependent
on the connectivity of the underlying structure.
The results presented in this work show the importance of taking topic detection in
large corpora as a multitude of relations and connectivities that are not in a static state.
They also influence the way of looking at multi-dimensional ensembles, by showing that
the inclusion of the high dimension connectivities gives better results to solving a particular
problem as was the case in the clustering problem of the news published online.Neste trabalho resolvemos o problema da identificação da estrutura das notĂcias publicadas
em linha por jornais e agĂŞncias noticiosas. Este problema requer novas abordagens e
algoritmos que sejam capazes de lidar com o número crescente de publicações em linha
(e que se espera continuam a crescer no futuro). Este facto, juntamente com o elevado
grau de interconexĂŁo que as notĂcias apresentam tornam este problema num problema
interessante e de difĂcil resolução. A identificação da estrutura do sistema de notĂcias foi
conseguido quer através da utilização de métodos descritivos que expõem a dimensão das
relações existentes entre as diferentes notĂcias, quer atravĂ©s de algoritmos de agrupamento
das mesmas em tópicos. Para atingir este objetivo foi necessário proceder a ao estudo deste
sistema complexo sob diferentes perspectivas e abordagens.
ApĂłs uma fase preparatĂłria do corpo de dados, onde foram recolhidos diversos jornais
publicados online optou-se por dois jornais em particular: O PĂşblico e o The Guardian.
A escolha de jornais em lĂnguas diferentes deve-se Ă vontade de encontrar estratĂ©gias de
análise que sejam independentes do conhecimento prévio que se tem sobre estes sistemas.
Numa primeira análise é empregada uma abordagem baseada em redes adaptativas
e teoria de informação (nomeadamente variação de informação) para identificar tópicos
noticiosos que sĂŁo publicados no jornal portuguĂŞs PĂşblico.
Numa segunda abordagem analisamos a estrutura das notĂcias publicadas pelo jornal
Britânico The Guardian atravĂ©s da construção de sĂ©ries temporais de notĂcias. Estas foram
seguidamente agrupadas através de um processo de k-means. Para além disso desenvolveuse
um algoritmo que permite filtrar de forma nĂŁo supervisionada notĂcias irrelevantes que
apresentam baixa conectividade Ă s restantes notĂcias atravĂ©s da utilização de Q-analysis
seguida de um processo de clustering. Presentemente este mĂ©todo utiliza otimização de modularidade, mas a tĂ©cnica Ă© suficientemente geral para que outras abordagens hĂbridas
possam ser utilizadas sem perda de generalidade do método.
Desenvolveu-se ainda um novo algoritmo baseado em sistemas de colĂłnias de formigas
para solução do problema do caixeiro viajante que consistentemente apresenta resultados
melhores que os tradicionais bancos de testes. Este algoritmo foi aplicado na construção
de caminhos Hamiltonianos das notĂcias publicadas utilizando a excentricidade obtida a
partir da conectividade do sistema estudado como medida da distância entre notĂcias. Esta
abordagem permitiu construir um sistema de navegação entre as notĂcias publicadas que Ă©
dependente da conectividade observada na estrutura de notĂcias encontrada.
Os resultados apresentados neste trabalho mostram a importância de analisar sistemas
complexos na sua multitude de relações e conectividades que não são estáticas e que
influenciam a forma como tradicionalmente se olha para sistema multi-dimensionais.
Mostra-se que a inclusão desta dimensões extra produzem melhores resultados na resolução
do problema de identificar a estrutura subjacente a este problema da publicação de notĂcias em linha
Machine Learning
Machine Learning can be defined in various ways related to a scientific domain concerned with the design and development of theoretical and implementation tools that allow building systems with some Human Like intelligent behavior. Machine learning addresses more specifically the ability to improve automatically through experience
Advances in Artificial Intelligence: Models, Optimization, and Machine Learning
The present book contains all the articles accepted and published in the Special Issue “Advances in Artificial Intelligence: Models, Optimization, and Machine Learning” of the MDPI Mathematics journal, which covers a wide range of topics connected to the theory and applications of artificial intelligence and its subfields. These topics include, among others, deep learning and classic machine learning algorithms, neural modelling, architectures and learning algorithms, biologically inspired optimization algorithms, algorithms for autonomous driving, probabilistic models and Bayesian reasoning, intelligent agents and multiagent systems. We hope that the scientific results presented in this book will serve as valuable sources of documentation and inspiration for anyone willing to pursue research in artificial intelligence, machine learning and their widespread applications
Knowledge-based incremental induction of clinical algorithms
The current approaches for the induction of medical procedural knowledge suffer from several drawbacks: the structures produced may not be explicit medical structures, they are only based on statistical measures that do not necessarily respect medical criteria which can be essential to guarantee medical correct structures, or they are not prepared to deal with the incremental arrival of new data.
In this thesis we propose a methodology to automatically induce medically correct clinical algorithms (CAs) from hospital databases. These CAs are represented according to the SDA knowledge model. The methodology considers relevant background knowledge and it is able to work in an incremental way.
The methodology has been tested in the domains of hypertension, diabetes mellitus and the comborbidity of both diseases. As a result, we propose a repository of background knowledge for these pathologies and provide the SDA diagrams obtained. Later analyses show that the results are medically correct and comprehensible when validated with health care professionals
Fuzzy Techniques for Decision Making 2018
Zadeh's fuzzy set theory incorporates the impreciseness of data and evaluations, by imputting the degrees by which each object belongs to a set. Its success fostered theories that codify the subjectivity, uncertainty, imprecision, or roughness of the evaluations. Their rationale is to produce new flexible methodologies in order to model a variety of concrete decision problems more realistically. This Special Issue garners contributions addressing novel tools, techniques and methodologies for decision making (inclusive of both individual and group, single- or multi-criteria decision making) in the context of these theories. It contains 38 research articles that contribute to a variety of setups that combine fuzziness, hesitancy, roughness, covering sets, and linguistic approaches. Their ranges vary from fundamental or technical to applied approaches
Modélisation formelle des systèmes de détection d'intrusions
L’écosystème de la cybersécurité évolue en permanence en termes du nombre, de la diversité, et de la complexité des attaques. De ce fait, les outils de détection deviennent inefficaces face à certaines attaques. On distingue généralement trois types de systèmes de détection d’intrusions : détection par anomalies, détection par signatures et détection hybride. La détection par anomalies est fondée sur la caractérisation du comportement habituel du système, typiquement de manière statistique. Elle permet de détecter des attaques connues ou inconnues, mais génère aussi un très grand nombre de faux positifs. La détection par signatures permet de détecter des attaques connues en définissant des règles qui décrivent le comportement connu d’un attaquant. Cela demande une bonne connaissance du comportement de l’attaquant. La détection hybride repose sur plusieurs méthodes de détection incluant celles sus-citées. Elle présente l’avantage d’être plus précise pendant la détection. Des outils tels que Snort et Zeek offrent des langages de bas niveau pour l’expression de règles de reconnaissance d’attaques. Le nombre d’attaques potentielles étant très grand, ces bases de règles deviennent rapidement difficiles à gérer et à maintenir. De plus, l’expression de règles avec état dit stateful est particulièrement ardue pour reconnaître une séquence d’événements. Dans cette thèse, nous proposons une approche stateful basée sur les diagrammes d’état-transition algébriques (ASTDs) afin d’identifier des attaques complexes. Les ASTDs permettent de représenter de façon graphique et modulaire une spécification, ce qui facilite la maintenance et la compréhension des règles. Nous étendons la notation ASTD avec de nouvelles fonctionnalités pour représenter des attaques complexes. Ensuite, nous spécifions plusieurs attaques avec la notation étendue et exécutons les spécifications obtenues sur des flots d’événements à l’aide d’un interpréteur pour identifier des attaques. Nous évaluons aussi les performances de l’interpréteur avec des outils industriels tels que Snort et Zeek. Puis, nous réalisons un compilateur afin de générer du code exécutable à partir d’une spécification ASTD, capable d’identifier de façon efficiente les séquences d’événements.Abstract : The cybersecurity ecosystem continuously evolves with the number, the diversity,
and the complexity of cyber attacks. Generally, we have three types of Intrusion
Detection System (IDS) : anomaly-based detection, signature-based detection, and
hybrid detection. Anomaly detection is based on the usual behavior description of
the system, typically in a static manner. It enables detecting known or unknown attacks
but also generating a large number of false positives. Signature based detection
enables detecting known attacks by defining rules that describe known attacker’s behavior.
It needs a good knowledge of attacker behavior. Hybrid detection relies on
several detection methods including the previous ones. It has the advantage of being
more precise during detection. Tools like Snort and Zeek offer low level languages to
represent rules for detecting attacks. The number of potential attacks being large,
these rule bases become quickly hard to manage and maintain. Moreover, the representation
of stateful rules to recognize a sequence of events is particularly arduous. In this thesis, we propose a stateful approach based on algebraic state-transition
diagrams (ASTDs) to identify complex attacks. ASTDs allow a graphical and modular
representation of a specification, that facilitates maintenance and understanding of
rules. We extend the ASTD notation with new features to represent complex attacks.
Next, we specify several attacks with the extended notation and run the resulting specifications
on event streams using an interpreter to identify attacks. We also evaluate
the performance of the interpreter with industrial tools such as Snort and Zeek. Then,
we build a compiler in order to generate executable code from an ASTD specification,
able to efficiently identify sequences of events