321 research outputs found

    Reading the news through its structure: new hybrid connectivity based approaches

    Get PDF
    In this thesis a solution for the problem of identifying the structure of news published by online newspapers is presented. This problem requires new approaches and algorithms that are capable of dealing with the massive number of online publications in existence (and that will grow in the future). The fact that news documents present a high degree of interconnection makes this an interesting and hard problem to solve. The identification of the structure of the news is accomplished both by descriptive methods that expose the dimensionality of the relations between different news, and by clustering the news into topic groups. To achieve this analysis this integrated whole was studied using different perspectives and approaches. In the identification of news clusters and structure, and after a preparatory data collection phase, where several online newspapers from different parts of the globe were collected, two newspapers were chosen in particular: the Portuguese daily newspaper Público and the British newspaper The Guardian. In the first case, it was shown how information theory (namely variation of information) combined with adaptive networks was able to identify topic clusters in the news published by the Portuguese online newspaper Público. In the second case, the structure of news published by the British newspaper The Guardian is revealed through the construction of time series of news clustered by a kmeans process. After this approach an unsupervised algorithm, that filters out irrelevant news published online by taking into consideration the connectivity of the news labels entered by the journalists, was developed. This novel hybrid technique is based on Qanalysis for the construction of the filtered network followed by a clustering technique to identify the topical clusters. Presently this work uses a modularity optimisation clustering technique but this step is general enough that other hybrid approaches can be used without losing generality. A novel second order swarm intelligence algorithm based on Ant Colony Systems was developed for the travelling salesman problem that is consistently better than the traditional benchmarks. This algorithm is used to construct Hamiltonian paths over the news published using the eccentricity of the different documents as a measure of distance. This approach allows for an easy navigation between published stories that is dependent on the connectivity of the underlying structure. The results presented in this work show the importance of taking topic detection in large corpora as a multitude of relations and connectivities that are not in a static state. They also influence the way of looking at multi-dimensional ensembles, by showing that the inclusion of the high dimension connectivities gives better results to solving a particular problem as was the case in the clustering problem of the news published online.Neste trabalho resolvemos o problema da identificação da estrutura das notícias publicadas em linha por jornais e agências noticiosas. Este problema requer novas abordagens e algoritmos que sejam capazes de lidar com o número crescente de publicações em linha (e que se espera continuam a crescer no futuro). Este facto, juntamente com o elevado grau de interconexão que as notícias apresentam tornam este problema num problema interessante e de difícil resolução. A identificação da estrutura do sistema de notícias foi conseguido quer através da utilização de métodos descritivos que expõem a dimensão das relações existentes entre as diferentes notícias, quer através de algoritmos de agrupamento das mesmas em tópicos. Para atingir este objetivo foi necessário proceder a ao estudo deste sistema complexo sob diferentes perspectivas e abordagens. Após uma fase preparatória do corpo de dados, onde foram recolhidos diversos jornais publicados online optou-se por dois jornais em particular: O Público e o The Guardian. A escolha de jornais em línguas diferentes deve-se à vontade de encontrar estratégias de análise que sejam independentes do conhecimento prévio que se tem sobre estes sistemas. Numa primeira análise é empregada uma abordagem baseada em redes adaptativas e teoria de informação (nomeadamente variação de informação) para identificar tópicos noticiosos que são publicados no jornal português Público. Numa segunda abordagem analisamos a estrutura das notícias publicadas pelo jornal Britânico The Guardian através da construção de séries temporais de notícias. Estas foram seguidamente agrupadas através de um processo de k-means. Para além disso desenvolveuse um algoritmo que permite filtrar de forma não supervisionada notícias irrelevantes que apresentam baixa conectividade às restantes notícias através da utilização de Q-analysis seguida de um processo de clustering. Presentemente este método utiliza otimização de modularidade, mas a técnica é suficientemente geral para que outras abordagens híbridas possam ser utilizadas sem perda de generalidade do método. Desenvolveu-se ainda um novo algoritmo baseado em sistemas de colónias de formigas para solução do problema do caixeiro viajante que consistentemente apresenta resultados melhores que os tradicionais bancos de testes. Este algoritmo foi aplicado na construção de caminhos Hamiltonianos das notícias publicadas utilizando a excentricidade obtida a partir da conectividade do sistema estudado como medida da distância entre notícias. Esta abordagem permitiu construir um sistema de navegação entre as notícias publicadas que é dependente da conectividade observada na estrutura de notícias encontrada. Os resultados apresentados neste trabalho mostram a importância de analisar sistemas complexos na sua multitude de relações e conectividades que não são estáticas e que influenciam a forma como tradicionalmente se olha para sistema multi-dimensionais. Mostra-se que a inclusão desta dimensões extra produzem melhores resultados na resolução do problema de identificar a estrutura subjacente a este problema da publicação de notícias em linha

    Visual and computational analysis of structure-activity relationships in high-throughput screening data

    Get PDF
    Novel analytic methods are required to assimilate the large volumes of structural and bioassay data generated by combinatorial chemistry and high-throughput screening programmes in the pharmaceutical and agrochemical industries. This paper reviews recent work in visualisation and data mining that can be used to develop structure-activity relationships from such chemical/biological datasets

    Discovering Higher-order SNP Interactions in High-dimensional Genomic Data

    Get PDF
    In this thesis, a multifactor dimensionality reduction based method on associative classification is employed to identify higher-order SNP interactions for enhancing the understanding of the genetic architecture of complex diseases. Further, this thesis explored the application of deep learning techniques by providing new clues into the interaction analysis. The performance of the deep learning method is maximized by unifying deep neural networks with a random forest for achieving reliable interactions in the presence of noise

    Feature Grouping-based Feature Selection

    Get PDF

    AI Solutions for MDS: Artificial Intelligence Techniques for Misuse Detection and Localisation in Telecommunication Environments

    Get PDF
    This report considers the application of Articial Intelligence (AI) techniques to the problem of misuse detection and misuse localisation within telecommunications environments. A broad survey of techniques is provided, that covers inter alia rule based systems, model-based systems, case based reasoning, pattern matching, clustering and feature extraction, articial neural networks, genetic algorithms, arti cial immune systems, agent based systems, data mining and a variety of hybrid approaches. The report then considers the central issue of event correlation, that is at the heart of many misuse detection and localisation systems. The notion of being able to infer misuse by the correlation of individual temporally distributed events within a multiple data stream environment is explored, and a range of techniques, covering model based approaches, `programmed' AI and machine learning paradigms. It is found that, in general, correlation is best achieved via rule based approaches, but that these suffer from a number of drawbacks, such as the difculty of developing and maintaining an appropriate knowledge base, and the lack of ability to generalise from known misuses to new unseen misuses. Two distinct approaches are evident. One attempts to encode knowledge of known misuses, typically within rules, and use this to screen events. This approach cannot generally detect misuses for which it has not been programmed, i.e. it is prone to issuing false negatives. The other attempts to `learn' the features of event patterns that constitute normal behaviour, and, by observing patterns that do not match expected behaviour, detect when a misuse has occurred. This approach is prone to issuing false positives, i.e. inferring misuse from innocent patterns of behaviour that the system was not trained to recognise. Contemporary approaches are seen to favour hybridisation, often combining detection or localisation mechanisms for both abnormal and normal behaviour, the former to capture known cases of misuse, the latter to capture unknown cases. In some systems, these mechanisms even work together to update each other to increase detection rates and lower false positive rates. It is concluded that hybridisation offers the most promising future direction, but that a rule or state based component is likely to remain, being the most natural approach to the correlation of complex events. The challenge, then, is to mitigate the weaknesses of canonical programmed systems such that learning, generalisation and adaptation are more readily facilitated

    Probabilistic modelling of oil rig drilling operations for business decision support: a real world application of Bayesian networks and computational intelligence.

    Get PDF
    This work investigates the use of evolved Bayesian networks learning algorithms based on computational intelligence meta-heuristic algorithms. These algorithms are applied to a new domain provided by the exclusive data, available to this project from an industry partnership with ODS-Petrodata, a business intelligence company in Aberdeen, Scotland. This research proposes statistical models that serve as a foundation for building a novel operational tool for forecasting the performance of rig drilling operations. A prototype for a tool able to forecast the future performance of a drilling operation is created using the obtained data, the statistical model and the experts' domain knowledge. This work makes the following contributions: applying K2GA and Bayesian networks to a real-world industry problem; developing a well-performing and adaptive solution to forecast oil drilling rig performance; using the knowledge of industry experts to guide the creation of competitive models; creating models able to forecast oil drilling rig performance consistently with nearly 80% forecast accuracy, using either logistic regression or Bayesian network learning using genetic algorithms; introducing the node juxtaposition analysis graph, which allows the visualisation of the frequency of nodes links appearing in a set of orderings, thereby providing new insights when analysing node ordering landscapes; exploring the correlation factors between model score and model predictive accuracy, and showing that the model score does not correlate with the predictive accuracy of the model; exploring a method for feature selection using multiple algorithms and drastically reducing the modelling time by multiple factors; proposing new fixed structure Bayesian network learning algorithms for node ordering search-space exploration. Finally, this work proposes real-world applications for the models based on current industry needs, such as recommender systems, an oil drilling rig selection tool, a user-ready rig performance forecasting software and rig scheduling tools

    Aplicaciones de la teorĂ­a de la informaciĂłn y la inteligencia artificial al testing de software

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Ingeniería de Sistemas lnformáticos y de Computación, leída el 4-05-2022Software Testing is a critical field for the software industry, as it has the main tools used to ensure the reliability of the produced software. Currently, mor then 50% of the time and resources for creating a software product are diverted to testing tasks, from unit testing to system testing. Moreover, there is a huge interest into automatising this field, as software gets bigger and the amount of required testing increases. however, software Testing is not only an industry oriented field; it is also a really interesting field with a noble goal (improving the reliability of software systems) that at the same tieme is full of problems to solve....Es Testing Software es un campo crítico para la industria del software, ya que éste contienen las principales herramientas que se usan para asegurar la fiabilidad del software producido. Hoy en día, más del 50% del tiempo y recursos necesarios para crear un producto software son dirigidos a tareas de testing, desde el testing unitario al testing a nivel de sistema. Más aún, hay un gran interés en automatizar este campo, ya que el software cada vez es más grande y la cantidad de testing requerido crece. Sin embargo, el Testing de Software no es solo un campo orientado a la industria; también es un campo muy interesante con un objetivo noble (mejorar la fiabilidad de los sistemas software) que al mismo tiempo está lleno de problemas por resolver...Fac. de InformáticaTRUEunpu

    Fuzzy Rules from Ant-Inspired Computation

    Get PDF
    Centre for Intelligent Systems and their ApplicationsThis research identifies and investigates major issues in inducing accurate and comprehensible fuzzy rules from datasets.A review of the current literature on fuzzy rulebase induction uncovers two significant issues: A. There is a tradeoff between inducing accurate fuzzy rules and inducing comprehensible fuzzy rules; and, B. A common strategy for the induction of fuzzy rulebases, that of iterative rule learning where the rules are generated one by one and independently of each other, may not be an optimal one.FRANTIC, a system that provides a framework for exploring the claims above is developed. At the core lies a mechanism for creating individual fuzzy rules. This is based on a significantly modified social insect-inspired heuristic for combinatorial optimisation -- Ant Colony Optimisation. The rule discovery mechanism is utilised in two very different strategies for the induction of a complete fuzzy rulebase: 1. The first follows the common iterative rule learning approach for the induction of crisp and fuzzy rules; 2. The second has been designed during this research explicitly for the induction of a fuzzy rulebase, and generates all rules in parallel.Both strategies have been tested on a number of classification problems, including medical diagnosis and industrial plant fault detection, and compared against other crisp or fuzzy induction algorithms that use more well-established approaches. The results challenge statement A above, by presenting evidence to show that one criterion need not be met at the expense of the other. This research also uncovers the cost that is paid -- that of computational expenditure -- and makes concrete suggestions on how this may be resolved.With regards to statement B, until now little or no evidence has been put forward to support or disprove the claim. The results of this research indicate that definite advantages are offered by the second simultaneous strategy, that are not offered by the iterative one. These benefits include improved accuracy over a wide range of values for several key system parameters. However, both approaches also fare well when compared to other learning algorithms. This latter fact is due to the rule discovery mechanism itself -- the adapted Ant Colony Optimisation algorithm -- which affords several additional advantages. These include a simple mechanism within the rule construction process that enables it to cope with datasets that have an imbalanced distribution between the classes, and another for controlling the amount of fit to the training data.In addition, several system parameters have been designed to be semi-autonomous so as to avoid unnecessary user intervention, and in future work the social insect metaphor may be exploited and extended further to enable it to deal with industrial-strength data mining issues involving large volumes of data, and distributed and/or heterogeneous databases

    Applied Metaheuristic Computing

    Get PDF
    For decades, Applied Metaheuristic Computing (AMC) has been a prevailing optimization technique for tackling perplexing engineering and business problems, such as scheduling, routing, ordering, bin packing, assignment, facility layout planning, among others. This is partly because the classic exact methods are constrained with prior assumptions, and partly due to the heuristics being problem-dependent and lacking generalization. AMC, on the contrary, guides the course of low-level heuristics to search beyond the local optimality, which impairs the capability of traditional computation methods. This topic series has collected quality papers proposing cutting-edge methodology and innovative applications which drive the advances of AMC

    Holistic, data-driven, service and supply chain optimisation: linked optimisation.

    Get PDF
    The intensity of competition and technological advancements in the business environment has made companies collaborate and cooperate together as a means of survival. This creates a chain of companies and business components with unified business objectives. However, managing the decision-making process (like scheduling, ordering, delivering and allocating) at the various business components and maintaining a holistic objective is a huge business challenge, as these operations are complex and dynamic. This is because the overall chain of business processes is widely distributed across all the supply chain participants; therefore, no individual collaborator has a complete overview of the processes. Increasingly, such decisions are automated and are strongly supported by optimisation algorithms - manufacturing optimisation, B2B ordering, financial trading, transportation scheduling and allocation. However, most of these algorithms do not incorporate the complexity associated with interacting decision-making systems like supply chains. It is well-known that decisions made at one point in supply chains can have significant consequences that ripple through linked production and transportation systems. Recently, global shocks to supply chains (COVID-19, climate change, blockage of the Suez Canal) have demonstrated the importance of these interdependencies, and the need to create supply chains that are more resilient and have significantly reduced impact on the environment. Such interacting decision-making systems need to be considered through an optimisation process. However, the interactions between such decision-making systems are not modelled. We therefore believe that modelling such interactions is an opportunity to provide computational extensions to current optimisation paradigms. This research study aims to develop a general framework for formulating and solving holistic, data-driven optimisation problems in service and supply chains. This research achieved this aim and contributes to scholarship by firstly considering the complexities of supply chain problems from a linked problem perspective. This leads to developing a formalism for characterising linked optimisation problems as a model for supply chains. Secondly, the research adopts a method for creating a linked optimisation problem benchmark by linking existing classical benchmark sets. This involves using a mix of classical optimisation problems, typically relating to supply chain decision problems, to describe different modes of linkages in linked optimisation problems. Thirdly, several techniques for linking supply chain fragmented data have been proposed in the literature to identify data relationships. Therefore, this thesis explores some of these techniques and combines them in specific ways to improve the data discovery process. Lastly, many state-of-the-art algorithms have been explored in the literature and these algorithms have been used to tackle problems relating to supply chain problems. This research therefore investigates the resilient state-of-the-art optimisation algorithms presented in the literature, and then designs suitable algorithmic approaches inspired by the existing algorithms and the nature of problem linkages to address different problem linkages in supply chains. Considering research findings and future perspectives, the study demonstrates the suitability of algorithms to different linked structures involving two sub-problems, which suggests further investigations on issues like the suitability of algorithms on more complex structures, benchmark methodologies, holistic goals and evaluation, processmining, game theory and dependency analysis
    • …
    corecore