9 research outputs found
Comparative evaluation of query expansion methods for enhanced search on microblog data: DCU ADAPT @ SMERP 2017 workshop data challenge
The rapid growth in the availability of social media content
posted during emergency situations is creating significant interest in research into how this information can be exploited to assist emergency
relief operations and to help with emergency preparedness and in early
warning systems. We describe the DCU ADAPT Centre participation
in the microblog search data challenge at the SMERP 2017 workshop.
This task aimed to promote development of information retrieval (IR)
methods for practical challenges that need to be addressed during an
emergency event, along with comparative evaluation of the methodologies developed for this task. The task is based on a large dataset of microblogs posted during the earthquake in Italy in August 2016, together
with a set of query topics provided by the task organisers. For our participation in this task we explored use of three different IR techniques:
standard IR query expansion based on an external resource, query expansion based on WordNet and use of query expansio
Utilisation of metadata fields and query expansion in cross-lingual search of user-generated Internet video
Recent years have seen signicant eorts in the area of Cross Language Information Retrieval (CLIR) for text retrieval. This work initially focused on formally published content, but more recently research has begun to concentrate on CLIR for informal social media content. However, despite the current expansion in online multimedia archives, there has been little work on CLIR for this content. While there has been some limited work on Cross-Language Video Retrieval (CLVR) for professional videos, such as documentaries or TV news broadcasts, there has to date, been no signicant investigation of CLVR for the rapidly growing archives of informal user generated (UGC) content. Key differences between such UGC and professionally produced content are the nature and structure of the textual UGC metadata associated with it, as well as the form and quality of the content itself. In this setting, retrieval eectiveness may not only suer from translation errors common to all CLIR tasks, but also recognition errors associated with the automatic speech recognition (ASR) systems used to transcribe the spoken content of the video and with the informality and inconsistency of the associated user-created metadata for each video. This work proposes and evaluates techniques to improve CLIR effectiveness of such noisy UGC content. Our experimental investigation shows that dierent sources of evidence, e.g. the content from dierent elds of the structured metadata, significantly affect CLIR effectiveness. Results from our experiments also show that each metadata eld
has a varying robustness to query expansion (QE) and hence can have a negative impact on the CLIR eectiveness. Our work proposes a novel adaptive QE technique that predicts the most reliable source for expansion and shows how this technique can be effective for improving CLIR effectiveness for UGC content
A systematic comparison of spatial search strategies for open government datasets
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial TechnologiesDatasets produced or collected by governments are being made publicly available
for re-use. Open government data portals help realize such reuse by providing list
of datasets and links to access those datasets. This ensures that users can search,
inspect and use the data easily.
With the rapidly increasing size of datasets in open government data portals,
just like it is the case with the web, nding relevant datasets with a query of few
keywords is a challenge. Furthermore, those data portals not only consist of textual
information but also georeferenced data that needs to be searched properly. Currently,
most popular open government data portals like the data.gov.uk and data.gov.ie lack
the support for simultaneous thematic and spatial search. Moreover, the use of query
expansion hasn't also been studied in open government datasets.
In this study we have assessed di erent spatial search strategies and query expansions'
performance and impact on user relevance judgment. To evaluate those
strategies we harvested machine readable spatial datasets and their metadata from
three English based open government data portals, performed metadata enhancement,
developed a prototype and performed theoretical and user evaluation.
According to the results from the evaluations keyword based search strategy returned
limited number of results but the highest relevance rating. In the other hand
aggregated spatial and thematic search improved the number of results of the baseline
keyword based strategy with a 1 second increase in response time and but decreased
relevance rating. Moreover, strategies based on WordNet Synonyms query expansion
exhibited the highest relevance rated rst seven results than all other strategies except
the keyword based baseline strategy in three out of the four query terms.
Regarding the use of Hausdor distance and area of overlap, since documents
were returned as results only if they overlap with the query, the number of results
returned were the same in both spatial similarities. But strategies using Hausdor
distance were of higher relevance rating and average mean than area of overlap based
strategies in three of the four queries.
In conclusion, while the spatial search strategies assessed in this study can be
used to improve the existing keyword based OGDs search approaches, we recommend
OGD developers to also consider using WordNet Synonyms based query expansion
and hausdor distance as a way of improving relevant spatial data discovery in open
government datasets using few keywords and tolerable response time
Contribution à l’amélioration de la recherche d’information par utilisation des méthodes sémantiques: application à la langue arabe
Un système de recherche d’information est un ensemble de programmes et de modules qui sert à interfacer avec l’utilisateur, pour prendre et interpréter une requête, faire la recherche dans l’index et retourner un classement des documents sélectionnés à cet utilisateur. Cependant le plus grand challenge de ce système est qu’il doit faire face au grand volume d’informations multi modales
et multilingues disponibles via les bases documentaires ou le web pour trouver celles qui correspondent au mieux aux besoins des utilisateurs. A travers ce travail, nous avons présenté deux contributions. Dans la première nous avons
proposé une nouvelle approche pour la reformulation des requêtes dans le contexte de la recherche d’information en arabe. Le principe est donc de représenter la requête par un arbre sémantique pondéré pour mieux identifier le besoin d'information de l'utilisateur, dont les nœuds représentent les concepts (synsets) reliés par des relations sémantiques. La construction de cet arbre est réalisée
par la méthode de la Pseudo-Réinjection de la Pertinence combinée à la ressource sémantique du
WordNet Arabe. Les résultats expérimentaux montrent une bonne amélioration dans les
performances du système de recherche d’information. Dans la deuxième contribution, nous avons aussi proposé une nouvelle approche pour la construction d’une collection de test de recherche d’information arabe. L'approche repose sur la combinaison de la méthode de la stratégie de Pooling utilisant les moteurs de recherches et l’algorithme Naïve-Bayes de classification par l’apprentissage automatique. Pour l’expérimentation nous avons créé une nouvelle collection de test composée d’une base documentaire de 632
documents et de 165 requêtes avec leurs jugements de pertinence sous plusieurs topics. L’expérimentation a également montré l’efficacité du classificateur Bayésien pour la récupération de pertinences des documents, encore plus, il a réalisé des bonnes performances
après l’enrichissement sémantique de la base documentaire par le modèle word2vec
An Intelligent Multi-Agent System Approach to Automating Safety Features for On-Line Real Time Communications: Agent Mediated Information Exchange
Child safety online is a growing problem, governmental attempts to highlight and combat this issue have not been as successful as it was hoped, and still there are highly publicised cases of children, young people and vulnerable adults coming to harm as a result of unsafe online practices. This thesis presents the research, design and development of a prototype system called SafeChat, which will
provide a safer environment for children interacting in online environments.
In order to combat such a complex problem, it is necessary to integrate various artificial intelligent technologies and autonomous systems. The SafeChat prototype system discussed within this research
has been implemented in Java Agent Development Environment (JADE) and utilises Protégé Ontology development, reasoning and natural language processing techniques. To evaluate our system performance, comprehensive testing to measure its effectiveness in detecting potential risk to the user (e.g. child) is in constant development. Initial results of system testing are encouraging and demonstrate its effectiveness in identifying different levels of threat during online conversation.
The potential impact of this work is immense, when used as a plug-in to popular communications software, such as Facebook Messenger, Skype and WhatsApp. SafeChat provides a safer environment for children to communicate, identifying potential and actual threats, whilst maintaining the privacy
of their discourse. The SafeChat system could be easily adapted to provide autonomous solutions in other areas of online threat, such as cyberbullying and radicalisation