1,022 research outputs found
Reading the news through its structure: new hybrid connectivity based approaches
In this thesis a solution for the problem of identifying the structure of news published
by online newspapers is presented. This problem requires new approaches and algorithms
that are capable of dealing with the massive number of online publications in existence
(and that will grow in the future). The fact that news documents present a high degree of
interconnection makes this an interesting and hard problem to solve. The identification
of the structure of the news is accomplished both by descriptive methods that expose the
dimensionality of the relations between different news, and by clustering the news into
topic groups. To achieve this analysis this integrated whole was studied using different
perspectives and approaches.
In the identification of news clusters and structure, and after a preparatory data collection
phase, where several online newspapers from different parts of the globe were
collected, two newspapers were chosen in particular: the Portuguese daily newspaper
Público and the British newspaper The Guardian.
In the first case, it was shown how information theory (namely variation of information)
combined with adaptive networks was able to identify topic clusters in the news published
by the Portuguese online newspaper Público.
In the second case, the structure of news published by the British newspaper The
Guardian is revealed through the construction of time series of news clustered by a kmeans
process. After this approach an unsupervised algorithm, that filters out irrelevant
news published online by taking into consideration the connectivity of the news labels
entered by the journalists, was developed. This novel hybrid technique is based on Qanalysis
for the construction of the filtered network followed by a clustering technique to
identify the topical clusters. Presently this work uses a modularity optimisation clustering technique but this step is general enough that other hybrid approaches can be used without
losing generality.
A novel second order swarm intelligence algorithm based on Ant Colony Systems
was developed for the travelling salesman problem that is consistently better than the
traditional benchmarks. This algorithm is used to construct Hamiltonian paths over the
news published using the eccentricity of the different documents as a measure of distance.
This approach allows for an easy navigation between published stories that is dependent
on the connectivity of the underlying structure.
The results presented in this work show the importance of taking topic detection in
large corpora as a multitude of relations and connectivities that are not in a static state.
They also influence the way of looking at multi-dimensional ensembles, by showing that
the inclusion of the high dimension connectivities gives better results to solving a particular
problem as was the case in the clustering problem of the news published online.Neste trabalho resolvemos o problema da identificação da estrutura das notícias publicadas
em linha por jornais e agências noticiosas. Este problema requer novas abordagens e
algoritmos que sejam capazes de lidar com o número crescente de publicações em linha
(e que se espera continuam a crescer no futuro). Este facto, juntamente com o elevado
grau de interconexão que as notícias apresentam tornam este problema num problema
interessante e de difícil resolução. A identificação da estrutura do sistema de notícias foi
conseguido quer através da utilização de métodos descritivos que expõem a dimensão das
relações existentes entre as diferentes notícias, quer através de algoritmos de agrupamento
das mesmas em tópicos. Para atingir este objetivo foi necessário proceder a ao estudo deste
sistema complexo sob diferentes perspectivas e abordagens.
Após uma fase preparatória do corpo de dados, onde foram recolhidos diversos jornais
publicados online optou-se por dois jornais em particular: O Público e o The Guardian.
A escolha de jornais em línguas diferentes deve-se à vontade de encontrar estratégias de
análise que sejam independentes do conhecimento prévio que se tem sobre estes sistemas.
Numa primeira análise é empregada uma abordagem baseada em redes adaptativas
e teoria de informação (nomeadamente variação de informação) para identificar tópicos
noticiosos que são publicados no jornal português Público.
Numa segunda abordagem analisamos a estrutura das notícias publicadas pelo jornal
Britânico The Guardian através da construção de séries temporais de notícias. Estas foram
seguidamente agrupadas através de um processo de k-means. Para além disso desenvolveuse
um algoritmo que permite filtrar de forma não supervisionada notícias irrelevantes que
apresentam baixa conectividade às restantes notícias através da utilização de Q-analysis
seguida de um processo de clustering. Presentemente este método utiliza otimização de modularidade, mas a técnica é suficientemente geral para que outras abordagens híbridas
possam ser utilizadas sem perda de generalidade do método.
Desenvolveu-se ainda um novo algoritmo baseado em sistemas de colónias de formigas
para solução do problema do caixeiro viajante que consistentemente apresenta resultados
melhores que os tradicionais bancos de testes. Este algoritmo foi aplicado na construção
de caminhos Hamiltonianos das notícias publicadas utilizando a excentricidade obtida a
partir da conectividade do sistema estudado como medida da distância entre notícias. Esta
abordagem permitiu construir um sistema de navegação entre as notícias publicadas que é
dependente da conectividade observada na estrutura de notícias encontrada.
Os resultados apresentados neste trabalho mostram a importância de analisar sistemas
complexos na sua multitude de relações e conectividades que não são estáticas e que
influenciam a forma como tradicionalmente se olha para sistema multi-dimensionais.
Mostra-se que a inclusão desta dimensões extra produzem melhores resultados na resolução
do problema de identificar a estrutura subjacente a este problema da publicação de notícias em linha
Dynamic small world network topology for particle swarm optimization
Abstract: A new particle optimization algorithm with dynamic topology is proposed based on a small world network. The technique imitates the dissemination of information in a small world network by dynamically updating the neighborhood topology of the particle swarm optimization(PSO). In comparison with other four classic topologies and two PSO algorithms based on small world network, the proposed dynamic neighborhood strategy is more eÆective in coordinating the exploration and exploitation ability of PSO. Simulations demonstrated that the convergence of the swarms is faster than its competitors. Meanwhile, the proposed method maintains population diversity and enhances the global search ability for a series of benchmark problems
Distributed Adaptation Techniques for Connected Vehicles
In this PhD dissertation, we propose distributed adaptation mechanisms for connected vehicles to deal with the connectivity challenges. To understand the system behavior of the solutions for connected vehicles, we first need to characterize the operational environment. Therefore, we devised a large scale fading model for various link types, including point-to-point vehicular communications and multi-hop connected vehicles. We explored two small scale fading models to define the characteristics of multi-hop connected vehicles. Taking our research into multi-hop connected vehicles one step further, we propose selective information relaying to avoid message congestion due to redundant messages received by the relay vehicle. Results show that the proposed mechanism reduces messaging load by up to 75% without sacrificing environmental awareness. Once we define the channel characteristics, we propose a distributed congestion control algorithm to solve the messaging overhead on the channels as the next research interest of this dissertation. We propose a combined transmit power and message rate adaptation for connected vehicles. The proposed algorithm increases the environmental awareness and achieves the application requirements by considering highly dynamic network characteristics. Both power and rate adaptation mechanisms are performed jointly to avoid one result affecting the other negatively. Results prove that the proposed algorithm can increase awareness by 20% while keeping the channel load and interference at almost the same level as well as improve the average message rate by 18%. As the last step of this dissertation, distributed cooperative dynamic spectrum access technique is proposed to solve the channel overhead and the limited resources issues. The adaptive energy detection threshold, which is used to decide whether the channel is busy, is optimized in this work by using a computationally efficient numerical approach. Each vehicle evaluates the available channels by voting on the information received from one-hop neighbors. An interdisciplinary approach referred to as entropy-based weighting is used for defining the neighbor credibility. Once the vehicle accesses the channel, we propose a decision mechanism for channel switching that is inspired by the optimal flower selection process employed by bumblebees foraging. Experimental results show that by using the proposed distributed cooperative spectrum sensing mechanism, spectrum detection error converges to zero
Improved adaptive semi-unsupervised weighted oversampling (IA-SUWO) using sparsity factor for imbalanced datasets
The imbalanced data problem is common in data mining nowadays due to the skewed nature of data, which impact the classification process negatively in machine learning. For preprocessing, oversampling techniques significantly benefitted the imbalanced domain, in which artificial data is generated in minority class to enhance the number of samples and balance the distribution of samples in both classes. However, existing oversampling techniques encounter through overfitting and over-generalization problems which lessen the classifier performance. Although many clustering based oversampling techniques significantly overcome these problems but most of these techniques are not able to produce the appropriate number of synthetic samples in minority clusters. This study proposed an improved Adaptive Semi-unsupervised Weighted Oversampling (IA-SUWO) technique, using the sparsity factor which determine the sparse minority samples in each minority cluster. This technique consider the sparse minority samples which are far from the decision boundary. These samples also carry the important information for learning of minority class, if these samples are also considered for oversampling, imbalance ratio will be more reduce also it could enhance the learnability of the classifiers. The outcomes of the proposed approach have been compared with existing oversampling techniques such as SMOTE, Borderline-SMOTE, Safe-level SMOTE, and standard A-SUWO technique in terms of accuracy. As aforementioned, the comparative analysis revealed that the proposed oversampling approach performance increased in average by 5% from 85% to 90% than the existing comparative techniques
DATA-DRIVEN ANALYTICAL MODELS FOR IDENTIFICATION AND PREDICTION OF OPPORTUNITIES AND THREATS
During the lifecycle of mega engineering projects such as: energy facilities,
infrastructure projects, or data centers, executives in charge should take into account
the potential opportunities and threats that could affect the execution of such projects.
These opportunities and threats can arise from different domains; including for
example: geopolitical, economic or financial, and can have an impact on different
entities, such as, countries, cities or companies. The goal of this research is to provide
a new approach to identify and predict opportunities and threats using large and diverse
data sets, and ensemble Long-Short Term Memory (LSTM) neural network models to
inform domain specific foresights. In addition to predicting the opportunities and
threats, this research proposes new techniques to help decision-makers for deduction
and reasoning purposes. The proposed models and results provide structured output to
inform the executive decision-making process concerning large engineering projects
(LEPs). This research proposes new techniques that not only provide reliable timeseries
predictions but uncertainty quantification to help make more informed decisions.
The proposed ensemble framework consists of the following components: first,
processed domain knowledge is used to extract a set of entity-domain features; second,
structured learning based on Dynamic Time Warping (DTW), to learn similarity
between sequences and Hierarchical Clustering Analysis (HCA), is used to determine
which features are relevant for a given prediction problem; and finally, an automated
decision based on the input and structured learning from the DTW-HCA is used to
build a training data-set which is fed into a deep LSTM neural network for time-series
predictions. A set of deeper ensemble programs are proposed such as Monte Carlo
Simulations and Time Label Assignment to offer a controlled setting for assessing the
impact of external shocks and a temporal alert system, respectively. The developed
model can be used to inform decision makers about the set of opportunities and threats
that their entities and assets face as a result of being engaged in an LEP accounting for
epistemic uncertainty
Particle Swarm Optimization
Particle swarm optimization (PSO) is a population based stochastic optimization technique influenced by the social behavior of bird flocking or fish schooling.PSO shares many similarities with evolutionary computation techniques such as Genetic Algorithms (GA). The system is initialized with a population of random solutions and searches for optima by updating generations. However, unlike GA, PSO has no evolution operators such as crossover and mutation. In PSO, the potential solutions, called particles, fly through the problem space by following the current optimum particles. This book represents the contributions of the top researchers in this field and will serve as a valuable tool for professionals in this interdisciplinary field
- …