16,930 research outputs found
A Bootstrapping architecture for time expression recognition in unlabelled corpora via syntactic-semantic patterns
In this paper we describe a semi-supervised approach to the extraction of time expression mentions in large unlabelled corpora based on bootstrapping.
Bootstrapping techniques rely on a relatively small amount of initial human-supplied examples (termed “seeds”) of the type of entity or concept to be learned, in order to capture an initial set of patterns or rules from the unlabelled text that extract the supplied data. In turn, the learned patterns are employed to find new potential examples, and the process is repeated to grow the set of patterns and (optionally) the set of examples. In order to prevent the learned pattern set from producing spurious results, it becomes essential
to implement a ranking and selection procedure to filter out “bad” patterns and, depending on the case, new candidate examples. Therefore, the type of patterns employed (knowledge representation) as well as the ranking and selection procedure are paramount to the quality of the results. We present a complete bootstrapping algorithm for recognition of time expressions, with a special emphasis on the type of patterns used (a combination of semantic and morpho- syntantic elements) and the ranking and selection criteria. Bootstrap-
ping techniques have been previously employed with limited success for several NLP problems, both of recognition and classification, but their application to time expression recognition is, to the best of our knowledge, novel. As of this writing, the described architecture is in the final stages of implementation, with experimention and evalution being already underway.Postprint (published version
Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure
It has been established that incorporating word cluster features derived from large unlabeled corpora can significantly improve prediction of linguistic structure. While previous work has focused primarily on English, we extend these results to other languages along two dimensions. First, we show that these results hold true for a number of languages across families. Second, and more interestingly, we provide an algorithm for inducing cross-lingual clusters and we show that features derived from these clusters significantly improve the accuracy of cross-lingual structure prediction. Specifically, we show that by augmenting direct-transfer systems with cross-lingual cluster features, the relative error of delexicalized dependency parsers, trained on English treebanks and transferred to foreign languages, can be reduced by up to 13%. When applying the same method to direct transfer of named-entity recognizers, we observe relative improvements of up to 26%
Exploring Metaphorical Senses and Word Representations for Identifying Metonyms
A metonym is a word with a figurative meaning, similar to a metaphor. Because
metonyms are closely related to metaphors, we apply features that are used
successfully for metaphor recognition to the task of detecting metonyms. On the
ACL SemEval 2007 Task 8 data with gold standard metonym annotations, our system
achieved 86.45% accuracy on the location metonyms. Our code can be found on
GitHub.Comment: 9 pages, 8 pages conten
A Hybrid Approach to Domain-Specific Entity Linking
The current state-of-the-art Entity Linking (EL) systems are geared towards
corpora that are as heterogeneous as the Web, and therefore perform
sub-optimally on domain-specific corpora. A key open problem is how to
construct effective EL systems for specific domains, as knowledge of the local
context should in principle increase, rather than decrease, effectiveness. In
this paper we propose the hybrid use of simple specialist linkers in
combination with an existing generalist system to address this problem. Our
main findings are the following. First, we construct a new reusable benchmark
for EL on a corpus of domain-specific conversations. Second, we test the
performance of a range of approaches under the same conditions, and show that
specialist linkers obtain high precision in isolation, and high recall when
combined with generalist linkers. Hence, we can effectively exploit local
context and get the best of both worlds.Comment: SEM'1
- …