47 research outputs found
Durham - a word sense disambiguation system
Ever since the 1950's when Machine Translation first began to be developed, word sense disambiguation (WSD) has been considered a problem to developers. In more recent times, all NLP tasks which are sensitive to lexical semantics potentially benefit from WSD although to what extent is largely unknown. The thesis presents a novel approach to the task of WSD on a large scale. In particular a novel knowledge source is presented named contextual information. This knowledge source adopts a sub-symbolic training mechanism to learn information from the context of a sentence which is able to aid disambiguation. The system also takes advantage of frequency information and these two knowledge sources are combined. The system is trained and tested on SEMCOR. A novel disambiguation algorithm is also developed. The algorithm must tackle the problem of a large possible number of sense combinations in a sentence. The algorithm presented aims to make an appropriate choice between accuracy and efficiency. This is performed by directing the search at a word level. The performance achieved on SEMCOR is reported and an analysis of the various components of the system is performed. The results achieved on this test data are pleasing, but are difficult to compare with most of the other work carried out in the field. For this reason the system took part in the SENSEVAL evaluation which provided an excellent opportunity to extensively compare WSD systems. SENSEVAL is a small scale WSD evaluation using the HECTOR lexicon. Despite this, few adaptations to the system were required. The performance of the system on the SENSEVAL task are reported and have also been presented in [Hawkins, 2000]
Proceedings of the Workshop Semantic Content Acquisition and Representation (SCAR) 2007
This is the proceedings of the Workshop on Semantic Content Acquisition and Representation, held in conjunction with NODALIDA 2007, on May 24 2007 in Tartu, Estonia.</p
Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language
This article presents a measure of semantic similarity in an IS-A taxonomy
based on the notion of shared information content. Experimental evaluation
against a benchmark set of human similarity judgments demonstrates that the
measure performs better than the traditional edge-counting approach. The
article presents algorithms that take advantage of taxonomic similarity in
resolving syntactic and semantic ambiguity, along with experimental results
demonstrating their effectiveness
POS Tagging Using Relaxation Labelling
Relaxation labelling is an optimization technique used in many fields to
solve constraint satisfaction problems. The algorithm finds a combination of
values for a set of variables such that satisfies -to the maximum possible
degree- a set of given constraints. This paper describes some experiments
performed applying it to POS tagging, and the results obtained. It also ponders
the possibility of applying it to word sense disambiguation.Comment: compressed & uuencoded postscript file. Paper length: 39 page
Automatic generation of labelled data for word sense disambiguation
Master'sMASTER OF SCIENC
Recommended from our members
Integration of search theories and evidential analysis to Web-wide Discovery of information for decision support
The main contribution of this research is that it addresses the issues associated with traditional information gathering and presents a novel semantic approach method to Web-based discovery of previously unknown intelligence for effective decision making. Itprovides a comprehensive theoretical background to the proposed solution together with a demonstration of the effectiveness of the method from results of the experiments, showing how the quality of collected information can be significantly enhanced by previously unknown information derived from the available known facts.
The quality of decisions made in business and government relates directly to the quality of the information used to formulate the decision. This information may be retrieved from an organisation’s knowledge base (Intranet) or from the World Wide Web. The purpose of this thesis is to investigate the specifics of information gathering from these sources. It has studied a number of search techniques that rely on statistical and semantic analysis of unstructured information, and identified benefits and limitations of these techniques. It was concluded that enterprise search technologies can efficiently manipulate Intranet held information, but require complex processing of large amount of textual information, which is not feasible and scalable when applied to the Web.
Based upon the search methods investigations, this thesis introduces a new semantic Web-based search method that automates the correlation of topic-related content for discovery of hitherto unknown information from disparate and widely diverse Web-sources. This method is in contrast to traditional search methods that are constrained to specific or narrowly defined topics. It addresses the three key aspects of the information: semantic closeness to search topic, information completeness, and quality. The method is based on algorithms from Natural Language Processing combined with techniques adapted from grounded theory and Dempster-Shafer theory to significantly enhance the discovery of topic related Web-sourced intelligence.
This thesis also describes the development of the new search solution by showing the integration of the mathematical methods used as well as the development of the working model. Real-world experiments demonstrate the effectiveness of the model with supporting performance analysis, showing that the quality of the extracted content is significantly enhanced comparing to the traditional Web-search approaches
Exploring the adaptive structure of the mental lexicon
The mental lexicon is a complex structure organised in terms of phonology, semantics and syntax, among other levels. In this thesis I propose that this structure can be explained in terms of the pressures acting on it: every aspect
of the organisation of the lexicon is an adaptation ultimately related to the function of language as a tool for human communication, or to the fact that language has to be learned by subsequent generations of people. A collection
of methods, most of which are applied to a Spanish speech corpus, reveal structure at different levels of the lexicon.• The patterns of intra-word distribution of phonological information may be a consequence of pressures for optimal representation of the lexicon in the brain, and of the pressure to facilitate speech segmentation.• An analysis of perceived phonological similarity between words shows that the sharing of different aspects of phonological similarity is related to different functions. Phonological similarity perception sometimes relates to morphology (the stressed final vowel determines verb tense and person) and at other times shows processing biases (similarity in the word initial and final segments is more readily perceived than in word-internal segments).• Another similarity analysis focuses on cooccurrence in speech to create a
representation of the lexicon where the position of a word is determined by the words that tend to occur in its close vicinity. Variations of context-based lexical space naturally categorise words
syntactically and semantically.• A higher level of lexicon structure is revealed by examining the relationships between the phonological and the cooccurrence similarity spaces. A study in Spanish supports the universality of the small but significant correlation between these two spaces found in English by Shillcock, Kirby, McDonald and Brew (2001). This systematicity across levels of representation adds an extra layer of structure that may help lexical acquisition and recognition. I apply it to a new paradigm to determine the function of parameters of
phonological similarity based on their relationships with the syntacticsemantic level. I find that while some aspects of a language's phonology maintain systematicity, others work against it, perhaps
responding to the opposed pressure for word identification.This thesis is an exploratory approach to the study of the mental lexicon structure that uses existing and new methodology to deepen our
understanding of the relationships between language use and language structure
Image summarisation: human action description from static images
Dissertação de Mestrado, Processamento de Linguagem Natural e Indústrias da Língua, Faculdade de Ciências Humanas e Sociais, Universidade do Algarve, 2014The object of this master thesis is Image Summarisation and more specifically the automatic human action description from static images. The work has been organised into three main phases, with first one being the data collection, second the actual system implementation and third the system evaluation. The dataset consists of 1287 images depicting human activities belonging in fours semantic categories; "walking a dog", "riding a bike", "riding a horse" and "playing the guitar". The images were manually annotated with an approach based in the idea of crowd sourcing, and the annotation of each sentence is in the form of one or two simple sentences.
The system is composed by two parts, a Content-based Image Retrieval part and a Natural Language Processing part. Given a query image the first part retrieves a set of images perceived as visually similar and the second part processes the annotations following each of the images in order to extract common information by using a graph merging technique of the dependency graphs of the annotated sentences. An optimal path consisting of a subject-verb-complement relation is extracted and transformed into a proper sentence by applying a set of surface processing rules.
The evaluation of the system was carried out in three different ways. Firstly, the Content-based Image Retrieval sub-system was evaluated in terms of precision and recall and compared to a baseline classification system based on randomness. In order to evaluate the Natural Language Processing sub-system, the Image Summarisation task was considered as a machine translation task, and therefore it was evaluated in terms of BLEU score. Given images that correspond to the same semantic as a query image the system output was compared to the corresponding reference summary as provided during the annotation phase, in terms of BLEU score. Finally, the whole system has been qualitatively evaluated by means of a questionnaire.
The conclusions reached by the evaluation is that even if the system does not always capture the right human action and subjects and objects involved in it, it produces understandable and efficient in terms of language summaries.O objetivo desta dissertação é sumarização imagem e, mais especificamente, a geração automática de descrições de ações humanas a partir de imagens estáticas. O trabalho foi organizado em três fases principais: a coleta de dados, a implementação do sistema e, finalmente, a sua avaliação. O conjunto de dados é composto por 1.287 imagens que descrevem atividades humanas pertencentes a quatro categorias semânticas: "passear o cão", "andar de bicicleta", "andar a cavalo" e "tocar guitarra". As imagens foram anotadas manualmente com uma abordagem baseada na ideia de 'crowd-sourcing' e a anotação de cada frase foi feita sob a forma de uma ou duas frases simples.
O sistema é composto por duas partes: uma parte consiste na recuperação de imagens baseada em conteúdo e a outra parte, que envolve Processamento de Língua Natural. Dada uma imagem para procura, a primeira parte recupera um conjunto de imagens percebidas como visualmente semelhantes e a segunda parte processa as anotações associadas a cada uma dessas imagens, a fim de extrair informações comuns, usando uma técnica de fusão de grafos a partir dos grafos de dependência das frases anotadas. Um caminho ideal consistindo numa relação sujeito-verbo-complemento é então extraído desses grafos e transformado numa frase apropriada, pela aplicação de um conjunto de regras de processamento de superfície.
A avaliação do sistema foi realizado de três maneiras diferentes. Em primeiro lugar, o subsistema de recuperação de imagens baseado em conteúdo foi avaliado em termos de precisão e abrangência (recall) e comparado com um limiar de referência (baseline) definido com base num resultado aleatório. A fim de avaliar o subsistema de Processamento de Linguagem Natural, a tarefa de sumarização imagem foi considerada como uma tarefa de tradução automática e foi, portanto, avaliada com base na medida BLEU. Dadas as imagens que correspondem ao mesmo significado da imagem de consulta, a saída do sistema foi comparada com o resumo de referência correspondente, fornecido durante a fase de anotação, utilizando a medida BLEU. Por fim, todo o sistema foi avaliado qualitativamente por meio de um questionário.
Em conclusão, verificou-se que o sistema, apesar de nem sempre capturar corretamente a ação humana e os sujeitos ou objetos envolvidos, produz, no entanto, descrições compreensíveis e e linguisticamente adequadas.Erasmus Mundu