3 research outputs found
Recommended from our members
Evaluation of the DEFINDER System for Fully Automatic Glossary Construction
In this paper we present a quantitative and qualitative evaluation of DEFINDER, a rule-based system that mines consumer-oriented full text articles in order to extract definitions and the terms they define. The quantitative evaluation shows that in terms of precision and recall as measured against human performance, DEFINDER obtained 87% and 75% respectively, thereby revealing the incompleteness of existing resources and the ability of DEFINDER to address these gaps. Our basis for comparison is definitions from on-line dictionaries, including the UMLS Metathesaurus. Qualitative evaluation shows that the definitions extracted by our system are ranked higher in terms of user-centered criteria of usability and readability than are definitions from on-line specialized dictionaries. The output of DEFINDER can be used to enhance these dictionaries. DEFINDER output is being incorporated in a system to clarify technical terms for non-specialist users in understandable non-technical language
Head to head: Semantic similarity of multi-word terms
Terms are linguistic signifiers of domain–specific concepts. Semantic similarity between terms refers to the corresponding distance in the conceptual space. In this study, we use lexico–syntactic information to define a vector space representation in which cosine similarity closely approximates semantic similarity between the corresponding terms. Given a multi–word term, each word is weighed in terms of its defining properties. In this context, the head noun is given the highest weight. Other words are weighed depending on their relations to the head noun. We formalized the problem as that of determining a topological ordering of a direct acyclic graph, which is based on constituency and dependency relations within a noun phrase. To counteract the errors associated with automatically inferred constituency and dependency relations, we implemented a heuristic approach to approximating the topological ordering. Different weights are assigned to different words based on their positions. Clustering experiments performed on such a vector space representation showed considerable improvement over the conventional bag–of–word representation. Specifically, it more consistently reflected semantic similarity between the terms. This was established by analyzing the differences between automatically generated dendrograms and manually constructed taxonomies. In conclusion, our method can be used to semi–automate taxonomy construction