Search CORE

5 research outputs found

Identifying and Extracting Named Entities from Wikipedia Database Using Entity Infoboxes

Author: Mohamed Muhidin
Oussalah Mourad
Publication venue: 'The Science and Information Organization'
Publication date: 01/01/2014
Field of study

An approach for named entity classification based on Wikipedia article infoboxes is described in this paper. It identifies the three fundamental named entity types, namely; Person, Location and Organization. An entity classification is accomplished by matching entity attributes extracted from the relevant entity article infobox against core entity attributes built from Wikipedia Infobox Templates. Experimental results showed that the classifier can achieve a high accuracy and F-measure scores of 97%. Based on this approach, a database of around 1.6 million 3-typed named entities is created from 20140203 Wikipedia dump. Experiments on CoNLL2003 shared task named entity recognition (NER) dataset disclosed the system’s outstanding performance in comparison to three different state-of-the-art systems

CiteSeerX

Crossref

University of Birmingham Research Portal

Aston Publications Explorer

Effectively Grouping Named Entities From Click- Through Data Into Clusters Of Generated Keywords1

Author: Du Xiaoyong
He Jun
Jiang Xuan
Liu Hongyan
Zhu Rui
Publication venue: AIS Electronic Library (AISeL)
Publication date: 15/07/2012
Field of study

Many studies show that named entities are closely related to users\u27 search behaviors, which brings increasing interest in studying named entities in search logs recently. This paper addresses the problem of forming fine grained semantic clusters of named entities within a broad domain such as “company”, and generating keywords for each cluster, which help users to interpret the embedded semantic information in the cluster. By exploring contexts, URLs and session IDs as features of named entities, a three-phase approach proposed in this paper first disambiguates named entities according to the features. Then it properly weights the features with a novel measurement, calculates the semantic similarity between named entities with the weighted feature space, and clusters named entities accordingly. After that, keywords for the clusters are generated using a text-oriented graph ranking algorithm. Each phase of the proposed approach solves problems that are not addressed in existing works, and experimental results obtained from a real click through data demonstrate the effectiveness of the proposed approach

AIS Electronic Library (AISeL)

Métrica de dissimilaridade semântica baseada na wikipédia

Author: Rodrigues Rui Filipe dos Santos
Publication venue
Publication date: 01/11/2015
Field of study

Não obstante a vasta quantidade de informações disponibilizadas nem sempre é fácil obter o conhecimento que se almeja alcançar, devido à dificuldade de catalogar a informação. Os sistemas de “descoberta de conhecimento” atuais centram-se na procura de palavras idênticas, podendo aqui observar-se variadas limitações, entre elas a falta de capacidade de interpretação. A compreensão do significado semântico do conjunto de expressões é uma característica do ser humano, sendo difícil de replicar em sistemas computacionais. O objetivo principal deste trabalho consiste na criação de um sistema de cálculo de semelhança semântica entre classes abstratas, sistema esse que deve possuir por base uma ontologia de conhecimento. Para atingirmos o objetivo proposto começou-se por identificar e analisar a necessidade de uma máquina conseguir simular ou melhorar a apreciação do ser humano relativamente à interpretação semântica. Apôs a definição e enquadramento do problema na área de conhecimento respetiva partiu-se para a criação do sistema capacitado de calcular uma medida de semelhança entre entidades, tendo em consideração a importância que o desempenho apresenta neste tipo de sistema

Repositório Comum

Automatic text summarisation using linguistic knowledge-based semantics

Author: Mohamed Muhidin Abdullahi
Publication venue
Publication date: 01/07/2016
Field of study

Text summarisation is reducing a text document to a short substitute summary. Since the commencement of the field, almost all summarisation research works implemented to this date involve identification and extraction of the most important document/cluster segments, called extraction. This typically involves scoring each document sentence according to a composite scoring function consisting of surface level and semantic features. Enabling machines to analyse text features and understand their meaning potentially requires both text semantic analysis and equipping computers with an external semantic knowledge. This thesis addresses extractive text summarisation by proposing a number of semantic and knowledge-based approaches. The work combines the high-quality semantic information in WordNet, the crowdsourced encyclopaedic knowledge in Wikipedia, and the manually crafted categorial variation in CatVar, to improve the summary quality. Such improvements are accomplished through sentence level morphological analysis and the incorporation of Wikipedia-based named-entity semantic relatedness while using heuristic algorithms. The study also investigates how sentence-level semantic analysis based on semantic role labelling (SRL), leveraged with a background world knowledge, influences sentence textual similarity and text summarisation. The proposed sentence similarity and summarisation methods were evaluated on standard publicly available datasets such as the Microsoft Research Paraphrase Corpus (MSRPC), TREC-9 Question Variants, and the Document Understanding Conference 2002, 2005, 2006 (DUC 2002, DUC 2005, DUC 2006) Corpora. The project also uses Recall-Oriented Understudy for Gisting Evaluation (ROUGE) for the quantitative assessment of the proposed summarisers’ performances. Results of our systems showed their effectiveness as compared to related state-of-the-art summarisation methods and baselines. Of the proposed summarisers, the SRL Wikipedia-based system demonstrated the best performance

University of Birmingham Research Archive, E-theses Repository

特定領域研究「日本語コーパス」平成21年度公開ワークショップ（研究成果報告会）予稿集

Author: General Headquarters Priority-Area Research "Japanese Corpus"
特定領域研究「日本語コーパス」総括班
Publication venue: 文部科学省科学研究費特定領域研究「日本語コーパス」総括班
Publication date: 15/03/2007
Field of study

特定領域研究「日本語コーパス」平成21年度公開ワークショップ,国立国語研究所,2010年3月15-16日,特定領域研究「日本語コーパス」総括

Academic Repository of the National Institute for Japanese Language and Linguistics / 国立国語研究所学術情報リポジトリ