1,886 research outputs found
The Best Trail Algorithm for Assisted Navigation of Web Sites
We present an algorithm called the Best Trail Algorithm, which helps solve
the hypertext navigation problem by automating the construction of memex-like
trails through the corpus. The algorithm performs a probabilistic best-first
expansion of a set of navigation trees to find relevant and compact trails. We
describe the implementation of the algorithm, scoring methods for trails,
filtering algorithms and a new metric called \emph{potential gain} which
measures the potential of a page for future navigation opportunities.Comment: 11 pages, 11 figure
Cluster Optimization for Improved Web Usage Mining
Now days, World Wide Web (WWW) has become rich and most powerful source of information. Conversely, it has become tricky and critical task to retrieve actual information due to its continuous expansion in dimensions. Web Usage Mining is a step-wise technique of extracting useful access patterns of the user from web. Web personalization makes use of web usage mining techniques, for knowledge acquisition process done by analyzing the user navigational patterns. The web page personalization involves clustering of different web pages having similar navigation patterns for an individual. Since cluster size expands due to the frequent access, optimization or shrinking the size of clusters becomes a chief consideration. This paper proposes a tactic of cluster optimization based on concept of swarm intelligence techniques. Later on based on the recognition of user access patterns, clustering is implemented using neural fuzzy approach i.e. NEF Class algorithm and cluster optimization is implemented using Ant Nest Mate Approach
Reading the news through its structure: new hybrid connectivity based approaches
In this thesis a solution for the problem of identifying the structure of news published
by online newspapers is presented. This problem requires new approaches and algorithms
that are capable of dealing with the massive number of online publications in existence
(and that will grow in the future). The fact that news documents present a high degree of
interconnection makes this an interesting and hard problem to solve. The identification
of the structure of the news is accomplished both by descriptive methods that expose the
dimensionality of the relations between different news, and by clustering the news into
topic groups. To achieve this analysis this integrated whole was studied using different
perspectives and approaches.
In the identification of news clusters and structure, and after a preparatory data collection
phase, where several online newspapers from different parts of the globe were
collected, two newspapers were chosen in particular: the Portuguese daily newspaper
Público and the British newspaper The Guardian.
In the first case, it was shown how information theory (namely variation of information)
combined with adaptive networks was able to identify topic clusters in the news published
by the Portuguese online newspaper Público.
In the second case, the structure of news published by the British newspaper The
Guardian is revealed through the construction of time series of news clustered by a kmeans
process. After this approach an unsupervised algorithm, that filters out irrelevant
news published online by taking into consideration the connectivity of the news labels
entered by the journalists, was developed. This novel hybrid technique is based on Qanalysis
for the construction of the filtered network followed by a clustering technique to
identify the topical clusters. Presently this work uses a modularity optimisation clustering technique but this step is general enough that other hybrid approaches can be used without
losing generality.
A novel second order swarm intelligence algorithm based on Ant Colony Systems
was developed for the travelling salesman problem that is consistently better than the
traditional benchmarks. This algorithm is used to construct Hamiltonian paths over the
news published using the eccentricity of the different documents as a measure of distance.
This approach allows for an easy navigation between published stories that is dependent
on the connectivity of the underlying structure.
The results presented in this work show the importance of taking topic detection in
large corpora as a multitude of relations and connectivities that are not in a static state.
They also influence the way of looking at multi-dimensional ensembles, by showing that
the inclusion of the high dimension connectivities gives better results to solving a particular
problem as was the case in the clustering problem of the news published online.Neste trabalho resolvemos o problema da identificação da estrutura das notícias publicadas
em linha por jornais e agências noticiosas. Este problema requer novas abordagens e
algoritmos que sejam capazes de lidar com o número crescente de publicações em linha
(e que se espera continuam a crescer no futuro). Este facto, juntamente com o elevado
grau de interconexão que as notícias apresentam tornam este problema num problema
interessante e de difícil resolução. A identificação da estrutura do sistema de notícias foi
conseguido quer através da utilização de métodos descritivos que expõem a dimensão das
relações existentes entre as diferentes notícias, quer através de algoritmos de agrupamento
das mesmas em tópicos. Para atingir este objetivo foi necessário proceder a ao estudo deste
sistema complexo sob diferentes perspectivas e abordagens.
Após uma fase preparatória do corpo de dados, onde foram recolhidos diversos jornais
publicados online optou-se por dois jornais em particular: O Público e o The Guardian.
A escolha de jornais em línguas diferentes deve-se à vontade de encontrar estratégias de
análise que sejam independentes do conhecimento prévio que se tem sobre estes sistemas.
Numa primeira análise é empregada uma abordagem baseada em redes adaptativas
e teoria de informação (nomeadamente variação de informação) para identificar tópicos
noticiosos que são publicados no jornal português Público.
Numa segunda abordagem analisamos a estrutura das notícias publicadas pelo jornal
Britânico The Guardian através da construção de séries temporais de notícias. Estas foram
seguidamente agrupadas através de um processo de k-means. Para além disso desenvolveuse
um algoritmo que permite filtrar de forma não supervisionada notícias irrelevantes que
apresentam baixa conectividade às restantes notícias através da utilização de Q-analysis
seguida de um processo de clustering. Presentemente este método utiliza otimização de modularidade, mas a técnica é suficientemente geral para que outras abordagens híbridas
possam ser utilizadas sem perda de generalidade do método.
Desenvolveu-se ainda um novo algoritmo baseado em sistemas de colónias de formigas
para solução do problema do caixeiro viajante que consistentemente apresenta resultados
melhores que os tradicionais bancos de testes. Este algoritmo foi aplicado na construção
de caminhos Hamiltonianos das notícias publicadas utilizando a excentricidade obtida a
partir da conectividade do sistema estudado como medida da distância entre notícias. Esta
abordagem permitiu construir um sistema de navegação entre as notícias publicadas que é
dependente da conectividade observada na estrutura de notícias encontrada.
Os resultados apresentados neste trabalho mostram a importância de analisar sistemas
complexos na sua multitude de relações e conectividades que não são estáticas e que
influenciam a forma como tradicionalmente se olha para sistema multi-dimensionais.
Mostra-se que a inclusão desta dimensões extra produzem melhores resultados na resolução
do problema de identificar a estrutura subjacente a este problema da publicação de notícias em linha
Ant-inspired Interaction Networks For Decentralized Vehicular Traffic Congestion Control
Mimicking the autonomous behaviors of animals and their adaptability to changing or foreign environments lead to the development of swarm intelligence techniques such as ant colony optimization (ACO) and particle swarm optimization (PSO) now widely used to tackle a variety of optimization problems. The aim of this dissertation is to develop an alternative swarm intelligence model geared toward decentralized congestion avoidance and to determine qualities of the model suitable for use in a transportation network.
A microscopic multi-agent interaction network inspired by insect foraging behaviors, especially ants, was developed and consequently adapted to prioritize the avoidance of congestion, evaluated as perceived density of other agents in the immediate environment extrapolated from the occurrence of direct interactions between agents, while foraging for food outside the base/nest. The agents eschew pheromone trails or other forms of stigmergic communication in favor of these direct interactions whose rate is the primary motivator for the agents\u27 decision making process.
The decision making process at the core of the multi-agent interaction network is consequently transferred to transportation networks utilizing vehicular ad-hoc networks (VANETs) for communication between vehicles. Direct interactions are replaced by dedicated short range communications for wireless access in vehicular environments (DSRC/WAVE) messages used for a variety of applications like left turn assist, intersection collision avoidance, or cooperative adaptive cruise control. Each vehicle correlates the traffic on the wireless network with congestion in the transportation network and consequently decides whether to reroute and, if so, what alternate route to take in a decentralized, non-deterministic manner. The algorithm has been shown to increase throughput and decrease mean travel times significantly while not requiring access to centralized infrastructure or up-to-date traffic information
Modelling human network behaviour using simulation and optimization tools: the need for hybridization
The inclusion of stakeholder behaviour in Operations Research / Industrial Engineering (OR/IE) models has gained much attention in recent years. Behavioural and cognitive traits of people and groups have been integrated in simulation models (mainly through agent-based approaches) as well as in optimization algorithms. However, especially the influence of relations between different actors in human networks is a broad and interdisciplinary topic that has not yet been fully investigated. This paper analyses, from an OR/IE point of view, the existing literature on behaviour-related factors in human networks. This review covers different application fields, including: supply chain management, public policies in emergency situations, and Internet-based human networks. The review reveals that the methodological approach of choice (either simulation or optimization) is highly dependent on the application area. However, an integrated approach combining simulation and optimization is rarely used. Thus, the paper proposes the hybridization of simulation with optimization as one of the best strategies to incorporate human behaviour in human networks and the resulting uncertainty, randomness, and dynamism in related OR/IE models.Peer Reviewe
Mobile Ad-Hoc Networks
Being infrastructure-less and without central administration control, wireless ad-hoc networking is playing a more and more important role in extending the coverage of traditional wireless infrastructure (cellular networks, wireless LAN, etc). This book includes state-of the-art techniques and solutions for wireless ad-hoc networks. It focuses on the following topics in ad-hoc networks: vehicular ad-hoc networks, security and caching, TCP in ad-hoc networks and emerging applications. It is targeted to provide network engineers and researchers with design guidelines for large scale wireless ad hoc networks
Mining Aircraft Telemetry Data With Evolutionary Algorithms
The Ganged Phased Array Radar - Risk Mitigation System (GPAR-RMS) was a
mobile ground-based sense-and-avoid system for Unmanned Aircraft System (UAS)
operations developed by the University of North Dakota. GPAR-RMS detected proximate
aircraft with various sensor systems, including a 2D radar and an Automatic Dependent
Surveillance - Broadcast (ADS-B) receiver. Information about those aircraft was then
displayed to UAS operators via visualization software developed by the University of
North Dakota. The Risk Mitigation (RM) subsystem for GPAR-RMS was designed to
estimate the current risk of midair collision, between the Unmanned Aircraft (UA) and a
General Aviation (GA) aircraft flying under Visual Flight Rules (VFR) in the surrounding
airspace, for UAS operations in Class E airspace (i.e. below 18,000 feet MSL). However,
accurate probabilistic models for the behavior of pilots of GA aircraft flying under VFR
in Class E airspace were needed before the RM subsystem could be implemented.
In this dissertation the author presents the results of data mining an aircraft
telemetry data set from a consecutive nine month period in 2011. This aircraft telemetry
data set consisted of Flight Data Monitoring (FDM) data obtained from Garmin G1000
devices onboard every Cessna 172 in the University of North Dakota\u27s training fleet.
Data from aircraft which were potentially within the controlled airspace surrounding
controlled airports were excluded. Also, GA aircraft in the FDM data flying in Class E
airspace were assumed to be flying under VFR, which is usually a valid assumption.
Complex subpaths were discovered from the aircraft telemetry data set using a novel
application of an ant colony algorithm. Then, probabilistic models were data mined from
those subpaths using extensions of the Genetic K-Means (GKA) and Expectation-
Maximization (EM) algorithms.
The results obtained from the subpath discovery and data mining suggest a pilot
flying a GA aircraft near to an uncontrolled airport will perform different maneuvers than
a pilot flying a GA aircraft far from an uncontrolled airport, irrespective of the altitude of
the GA aircraft. However, since only aircraft telemetry data from the University of North
Dakota\u27s training fleet were data mined, these results are not likely to be applicable to GA
aircraft operating in a non-training environment
- …