6 research outputs found

    The contribution of corpus linguistics to lexicography and the future of Tibetan dictionaries

    Get PDF
    The first alphabetized dictionary of Tibetan appeared in 1829 (cf. Bray 2008) and the intervening 184 years have witnessed the publication of scores of other Tibetan dictionaries (cf. Simon 1964). Hundreds of Tibetan dictionaries are now available; these include bilin gual dictionaries, both to and from such languages as English, French, German, Latin, Japanese, etc. and specialized dictionaries focusing on medicine, plants, dialects, archaic terms, neologisms, etc. (cf. Walter 2006, McGrath 2008). However, if one classifies Tibetan dictionaries by the methods of their compilation the accomplishments of Tibetan lexicography are less impressive. Methodologies of dictionary compilation divide heuristically into three types. First, some dictionaries lack explicit methodology; these works assemble words in an ad hoc manner and illustrate them with invented examples. Second, there are dictionaries that are compiled over very long periods of time on the basis of collections of slips recording attestations of words as used in context. Third, more recent dictionaries are compiled on the basis of electronic text corpora, which are processed computationally to aid in the precision, consistency and speed of dictionary compilation. These methods may be called respectively the 'informal method', the 'traditional method', and the 'modern method'. The overwhelming majority of Tibetan dictionaries were compiled with the informal method. Only five Tibetan dictionaries use the traditional methodology. No Tibetan dictionary yet compiled makes use of the modern method

    Verbose Labels for Semantic Roles

    Get PDF
    We introduce a new task that takes the output of semantic role labeling and associates each of the argument slots for a predicate with a verbose description such as buyer or thing_bought to semantic role labels such as `Arg0\u27 and `Arg1\u27 for predicate like "buy". Ambiguous verb senses and syntactic alternations make this a challenging task. We adapt the frame information for each verb in the PropBank to create our training data. We propose various baseline methods and more informed models which can identify such verbose labels with 95.2% accuracy if the semantic roles have already been correctly identified. We extend our work to text visualization to illustrate the importance of verbose labeling. As a proof of concept, we built an interactive browser for human history articles from Wikipedia, called lensingWikipedia

    Knowledge Base Population and Visualization Using an Ontology based on Semantic Roles

    No full text
    This paper extracts facts using “micro-reading ” of text in contrast to approaches that extract common-sense knowledge using “macro-reading ” methods. Our goal is to extract detailed facts about events from natural language using a predicate-centered view of events (who did what to whom, when and how). We exploit semantic role labels in order to create a novel predicate-centric ontology for entities in our knowledge base. This allows users to find uncommon facts easily. To this end, we tightly couple our knowledge base and ontology to an information visualization system that can be used to explore and navigate events extracted from a large natural language text collection. We use our methodology to create a web-based visual browser of history events in Wikipedia

    LensingWikipedia: Parsing Text for the Interactive Visualization of Human History

    No full text
    Extracting information from text is challenging. Most current practices treat text as a bag of words or word clusters, ignoring valuable linguistic information. Leveraging this linguistic information, we propose a novel approach to visualize textual information. The novelty lies in using state-of-the-art Natural Language Processing (NLP) tools to automatically annotate text which provides a basis for new and powerful interactive visualizations. Using NLP tools, we built a web-based interactive visual browser for human history articles from Wikipedia.

    Corpus-based vocabulary lists for language learners for nine languages

    Get PDF
    We present the KELLY project and its work on developing monolingual and bilingual word lists for language learning, using corpus methods, for nine languages and thirty-six language pairs. We describe the method and discuss the many challenges encountered. We have loaded the data into an online database to make it accessible for anyone to explore and we present our own first explorations of it. The focus of the paper is thus twofold, covering pedagogical and methodological aspects of the lists’ construction, and linguistic aspects of the by-product of the project, the KELLY database. © The Author(s) 2013. This article is published with open access at Springerlink.co
    corecore