321 research outputs found
Reading the news through its structure: new hybrid connectivity based approaches
In this thesis a solution for the problem of identifying the structure of news published
by online newspapers is presented. This problem requires new approaches and algorithms
that are capable of dealing with the massive number of online publications in existence
(and that will grow in the future). The fact that news documents present a high degree of
interconnection makes this an interesting and hard problem to solve. The identification
of the structure of the news is accomplished both by descriptive methods that expose the
dimensionality of the relations between different news, and by clustering the news into
topic groups. To achieve this analysis this integrated whole was studied using different
perspectives and approaches.
In the identification of news clusters and structure, and after a preparatory data collection
phase, where several online newspapers from different parts of the globe were
collected, two newspapers were chosen in particular: the Portuguese daily newspaper
PĂşblico and the British newspaper The Guardian.
In the first case, it was shown how information theory (namely variation of information)
combined with adaptive networks was able to identify topic clusters in the news published
by the Portuguese online newspaper PĂşblico.
In the second case, the structure of news published by the British newspaper The
Guardian is revealed through the construction of time series of news clustered by a kmeans
process. After this approach an unsupervised algorithm, that filters out irrelevant
news published online by taking into consideration the connectivity of the news labels
entered by the journalists, was developed. This novel hybrid technique is based on Qanalysis
for the construction of the filtered network followed by a clustering technique to
identify the topical clusters. Presently this work uses a modularity optimisation clustering technique but this step is general enough that other hybrid approaches can be used without
losing generality.
A novel second order swarm intelligence algorithm based on Ant Colony Systems
was developed for the travelling salesman problem that is consistently better than the
traditional benchmarks. This algorithm is used to construct Hamiltonian paths over the
news published using the eccentricity of the different documents as a measure of distance.
This approach allows for an easy navigation between published stories that is dependent
on the connectivity of the underlying structure.
The results presented in this work show the importance of taking topic detection in
large corpora as a multitude of relations and connectivities that are not in a static state.
They also influence the way of looking at multi-dimensional ensembles, by showing that
the inclusion of the high dimension connectivities gives better results to solving a particular
problem as was the case in the clustering problem of the news published online.Neste trabalho resolvemos o problema da identificação da estrutura das notĂcias publicadas
em linha por jornais e agĂŞncias noticiosas. Este problema requer novas abordagens e
algoritmos que sejam capazes de lidar com o número crescente de publicações em linha
(e que se espera continuam a crescer no futuro). Este facto, juntamente com o elevado
grau de interconexĂŁo que as notĂcias apresentam tornam este problema num problema
interessante e de difĂcil resolução. A identificação da estrutura do sistema de notĂcias foi
conseguido quer através da utilização de métodos descritivos que expõem a dimensão das
relações existentes entre as diferentes notĂcias, quer atravĂ©s de algoritmos de agrupamento
das mesmas em tópicos. Para atingir este objetivo foi necessário proceder a ao estudo deste
sistema complexo sob diferentes perspectivas e abordagens.
ApĂłs uma fase preparatĂłria do corpo de dados, onde foram recolhidos diversos jornais
publicados online optou-se por dois jornais em particular: O PĂşblico e o The Guardian.
A escolha de jornais em lĂnguas diferentes deve-se Ă vontade de encontrar estratĂ©gias de
análise que sejam independentes do conhecimento prévio que se tem sobre estes sistemas.
Numa primeira análise é empregada uma abordagem baseada em redes adaptativas
e teoria de informação (nomeadamente variação de informação) para identificar tópicos
noticiosos que sĂŁo publicados no jornal portuguĂŞs PĂşblico.
Numa segunda abordagem analisamos a estrutura das notĂcias publicadas pelo jornal
Britânico The Guardian atravĂ©s da construção de sĂ©ries temporais de notĂcias. Estas foram
seguidamente agrupadas através de um processo de k-means. Para além disso desenvolveuse
um algoritmo que permite filtrar de forma nĂŁo supervisionada notĂcias irrelevantes que
apresentam baixa conectividade Ă s restantes notĂcias atravĂ©s da utilização de Q-analysis
seguida de um processo de clustering. Presentemente este mĂ©todo utiliza otimização de modularidade, mas a tĂ©cnica Ă© suficientemente geral para que outras abordagens hĂbridas
possam ser utilizadas sem perda de generalidade do método.
Desenvolveu-se ainda um novo algoritmo baseado em sistemas de colĂłnias de formigas
para solução do problema do caixeiro viajante que consistentemente apresenta resultados
melhores que os tradicionais bancos de testes. Este algoritmo foi aplicado na construção
de caminhos Hamiltonianos das notĂcias publicadas utilizando a excentricidade obtida a
partir da conectividade do sistema estudado como medida da distância entre notĂcias. Esta
abordagem permitiu construir um sistema de navegação entre as notĂcias publicadas que Ă©
dependente da conectividade observada na estrutura de notĂcias encontrada.
Os resultados apresentados neste trabalho mostram a importância de analisar sistemas
complexos na sua multitude de relações e conectividades que não são estáticas e que
influenciam a forma como tradicionalmente se olha para sistema multi-dimensionais.
Mostra-se que a inclusão desta dimensões extra produzem melhores resultados na resolução
do problema de identificar a estrutura subjacente a este problema da publicação de notĂcias em linha
Visual and computational analysis of structure-activity relationships in high-throughput screening data
Novel analytic methods are required to assimilate the large volumes of structural and bioassay data generated by combinatorial chemistry and high-throughput screening programmes in the pharmaceutical and agrochemical industries. This paper reviews recent work in visualisation and data mining that can be used to develop structure-activity relationships from such chemical/biological datasets
Discovering Higher-order SNP Interactions in High-dimensional Genomic Data
In this thesis, a multifactor dimensionality reduction based method on associative classification is employed to identify higher-order SNP interactions for enhancing the understanding of the genetic architecture of complex diseases. Further, this thesis explored the application of deep learning techniques by providing new clues into the interaction analysis. The performance of the deep learning method is maximized by unifying deep neural networks with a random forest for achieving reliable interactions in the presence of noise
AI Solutions for MDS: Artificial Intelligence Techniques for Misuse Detection and Localisation in Telecommunication Environments
This report considers the application of Articial Intelligence (AI) techniques to
the problem of misuse detection and misuse localisation within telecommunications
environments. A broad survey of techniques is provided, that covers inter alia
rule based systems, model-based systems, case based reasoning, pattern matching,
clustering and feature extraction, articial neural networks, genetic algorithms, arti
cial immune systems, agent based systems, data mining and a variety of hybrid
approaches. The report then considers the central issue of event correlation, that
is at the heart of many misuse detection and localisation systems. The notion of
being able to infer misuse by the correlation of individual temporally distributed
events within a multiple data stream environment is explored, and a range of techniques,
covering model based approaches, `programmed' AI and machine learning
paradigms. It is found that, in general, correlation is best achieved via rule based approaches,
but that these suffer from a number of drawbacks, such as the difculty of
developing and maintaining an appropriate knowledge base, and the lack of ability
to generalise from known misuses to new unseen misuses. Two distinct approaches
are evident. One attempts to encode knowledge of known misuses, typically within
rules, and use this to screen events. This approach cannot generally detect misuses
for which it has not been programmed, i.e. it is prone to issuing false negatives.
The other attempts to `learn' the features of event patterns that constitute normal
behaviour, and, by observing patterns that do not match expected behaviour, detect
when a misuse has occurred. This approach is prone to issuing false positives,
i.e. inferring misuse from innocent patterns of behaviour that the system was not
trained to recognise. Contemporary approaches are seen to favour hybridisation,
often combining detection or localisation mechanisms for both abnormal and normal
behaviour, the former to capture known cases of misuse, the latter to capture
unknown cases. In some systems, these mechanisms even work together to update
each other to increase detection rates and lower false positive rates. It is concluded
that hybridisation offers the most promising future direction, but that a rule or state
based component is likely to remain, being the most natural approach to the correlation
of complex events. The challenge, then, is to mitigate the weaknesses of
canonical programmed systems such that learning, generalisation and adaptation
are more readily facilitated
Probabilistic modelling of oil rig drilling operations for business decision support: a real world application of Bayesian networks and computational intelligence.
This work investigates the use of evolved Bayesian networks learning algorithms based on computational intelligence meta-heuristic algorithms. These algorithms are applied to a new domain provided by the exclusive data, available to this project from an industry partnership with ODS-Petrodata, a business intelligence company in Aberdeen, Scotland. This research proposes statistical models that serve as a foundation for building a novel operational tool for forecasting the performance of rig drilling operations. A prototype for a tool able to forecast the future performance of a drilling operation is created using the obtained data, the statistical model and the experts' domain knowledge. This work makes the following contributions: applying K2GA and Bayesian networks to a real-world industry problem; developing a well-performing and adaptive solution to forecast oil drilling rig performance; using the knowledge of industry experts to guide the creation of competitive models; creating models able to forecast oil drilling rig performance consistently with nearly 80% forecast accuracy, using either logistic regression or Bayesian network learning using genetic algorithms; introducing the node juxtaposition analysis graph, which allows the visualisation of the frequency of nodes links appearing in a set of orderings, thereby providing new insights when analysing node ordering landscapes; exploring the correlation factors between model score and model predictive accuracy, and showing that the model score does not correlate with the predictive accuracy of the model; exploring a method for feature selection using multiple algorithms and drastically reducing the modelling time by multiple factors; proposing new fixed structure Bayesian network learning algorithms for node ordering search-space exploration. Finally, this work proposes real-world applications for the models based on current industry needs, such as recommender systems, an oil drilling rig selection tool, a user-ready rig performance forecasting software and rig scheduling tools
Aplicaciones de la teorĂa de la informaciĂłn y la inteligencia artificial al testing de software
Tesis inĂ©dita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de IngenierĂa de Sistemas lnformáticos y de ComputaciĂłn, leĂda el 4-05-2022Software Testing is a critical field for the software industry, as it has the main tools used to ensure the reliability of the produced software. Currently, mor then 50% of the time and resources for creating a software product are diverted to testing tasks, from unit testing to system testing. Moreover, there is a huge interest into automatising this field, as software gets bigger and the amount of required testing increases. however, software Testing is not only an industry oriented field; it is also a really interesting field with a noble goal (improving the reliability of software systems) that at the same tieme is full of problems to solve....Es Testing Software es un campo crĂtico para la industria del software, ya que Ă©ste contienen las principales herramientas que se usan para asegurar la fiabilidad del software producido. Hoy en dĂa, más del 50% del tiempo y recursos necesarios para crear un producto software son dirigidos a tareas de testing, desde el testing unitario al testing a nivel de sistema. Más aĂşn, hay un gran interĂ©s en automatizar este campo, ya que el software cada vez es más grande y la cantidad de testing requerido crece. Sin embargo, el Testing de Software no es solo un campo orientado a la industria; tambiĂ©n es un campo muy interesante con un objetivo noble (mejorar la fiabilidad de los sistemas software) que al mismo tiempo está lleno de problemas por resolver...Fac. de InformáticaTRUEunpu
Fuzzy Rules from Ant-Inspired Computation
Centre for Intelligent Systems and their ApplicationsThis research identifies and investigates major issues in inducing accurate and comprehensible fuzzy rules from datasets.A review of the current literature on fuzzy rulebase induction uncovers two significant issues:
A. There is a tradeoff between inducing accurate fuzzy rules and inducing comprehensible fuzzy rules; and,
B. A common strategy for the induction of fuzzy rulebases, that of iterative rule learning where the rules are generated one by one and independently of each other, may not be an optimal one.FRANTIC, a system that provides a framework for exploring the claims above is developed. At the core lies a mechanism for creating individual fuzzy rules. This is based on a significantly modified social insect-inspired heuristic for combinatorial optimisation -- Ant Colony Optimisation. The rule discovery mechanism is utilised in two very different strategies for the induction of a complete fuzzy rulebase:
1. The first follows the common iterative rule learning approach for the induction of crisp and fuzzy rules;
2. The second has been designed during this research explicitly for the induction of a fuzzy rulebase, and generates all rules in parallel.Both strategies have been tested on a number of classification problems, including medical diagnosis and industrial plant fault detection, and compared against other crisp or fuzzy induction algorithms that use more well-established approaches. The results challenge statement A above, by presenting evidence to show that one criterion need not be met at the expense of the other. This research also uncovers the cost that is paid -- that of computational expenditure -- and makes concrete suggestions on how this may be resolved.With regards to statement B, until now little or no evidence has been put forward to support or disprove the claim. The results of this research indicate that definite advantages are offered by the second simultaneous strategy, that are not offered by the iterative one. These benefits include improved accuracy over a wide range of values for several key system parameters. However, both approaches also fare well when compared to other learning algorithms. This latter fact is due to the rule discovery mechanism itself -- the adapted Ant Colony Optimisation algorithm -- which affords several additional advantages. These include a simple mechanism within the rule construction process that enables it to cope with datasets that have an imbalanced distribution between the classes, and another for controlling the amount of fit to the training data.In addition, several system parameters have been designed to be semi-autonomous so as to avoid unnecessary user intervention, and in future work the social insect metaphor may be exploited and extended further to enable it to deal with industrial-strength data mining issues involving large volumes of data, and distributed and/or heterogeneous databases
Applied Metaheuristic Computing
For decades, Applied Metaheuristic Computing (AMC) has been a prevailing optimization technique for tackling perplexing engineering and business problems, such as scheduling, routing, ordering, bin packing, assignment, facility layout planning, among others. This is partly because the classic exact methods are constrained with prior assumptions, and partly due to the heuristics being problem-dependent and lacking generalization. AMC, on the contrary, guides the course of low-level heuristics to search beyond the local optimality, which impairs the capability of traditional computation methods. This topic series has collected quality papers proposing cutting-edge methodology and innovative applications which drive the advances of AMC
Holistic, data-driven, service and supply chain optimisation: linked optimisation.
The intensity of competition and technological advancements in the business environment has made companies collaborate and cooperate together as a means of survival. This creates a chain of companies and business components with unified business objectives. However, managing the decision-making process (like scheduling, ordering, delivering and allocating) at the various business components and maintaining a holistic objective is a huge business challenge, as these operations are complex and dynamic. This is because the overall chain of business processes is widely distributed across all the supply chain participants; therefore, no individual collaborator has a complete overview of the processes. Increasingly, such decisions are automated and are strongly supported by optimisation algorithms - manufacturing optimisation, B2B ordering, financial trading, transportation scheduling and allocation. However, most of these algorithms do not incorporate the complexity associated with interacting decision-making systems like supply chains. It is well-known that decisions made at one point in supply chains can have significant consequences that ripple through linked production and transportation systems. Recently, global shocks to supply chains (COVID-19, climate change, blockage of the Suez Canal) have demonstrated the importance of these interdependencies, and the need to create supply chains that are more resilient and have significantly reduced impact on the environment. Such interacting decision-making systems need to be considered through an optimisation process. However, the interactions between such decision-making systems are not modelled. We therefore believe that modelling such interactions is an opportunity to provide computational extensions to current optimisation paradigms. This research study aims to develop a general framework for formulating and solving holistic, data-driven optimisation problems in service and supply chains. This research achieved this aim and contributes to scholarship by firstly considering the complexities of supply chain problems from a linked problem perspective. This leads to developing a formalism for characterising linked optimisation problems as a model for supply chains. Secondly, the research adopts a method for creating a linked optimisation problem benchmark by linking existing classical benchmark sets. This involves using a mix of classical optimisation problems, typically relating to supply chain decision problems, to describe different modes of linkages in linked optimisation problems. Thirdly, several techniques for linking supply chain fragmented data have been proposed in the literature to identify data relationships. Therefore, this thesis explores some of these techniques and combines them in specific ways to improve the data discovery process. Lastly, many state-of-the-art algorithms have been explored in the literature and these algorithms have been used to tackle problems relating to supply chain problems. This research therefore investigates the resilient state-of-the-art optimisation algorithms presented in the literature, and then designs suitable algorithmic approaches inspired by the existing algorithms and the nature of problem linkages to address different problem linkages in supply chains. Considering research findings and future perspectives, the study demonstrates the suitability of algorithms to different linked structures involving two sub-problems, which suggests further investigations on issues like the suitability of algorithms on more complex structures, benchmark methodologies, holistic goals and evaluation, processmining, game theory and dependency analysis
- …