1,626 research outputs found

    Linguistics and LIS: A Research Agenda

    Get PDF
    Linguistics and Library and Information Science (LIS) are both interdisciplinary fields that draws from areas such as languages, psychology, sociology, cognitive science, computer science, anthropology, education, and management. The theories and methods of linguistic research can have significant explanatory power for LIS. This article presents a research agenda for LIS that proposes the use of linguistic analysis methods, including discourse analysis, typology, and genre theory

    A general framework for the annotation of causality based on FrameNet

    Get PDF
    International audienceWe present here a general set of semantic frames to annotate causal expressions, with a rich lexicon in French and an annotated corpus of about 4000 instances of causal lexical items with their corresponding semantic frames. The aim of our project is to have both the largest possible coverage of causal phenomena in French, across all parts of speech, and have it linked to a general semantic framework such as FN, to benefit in particular from the relations between other semantic frames, e.g., temporal ones or intentional ones, and the underlying upper lexical ontology that enables some forms of reasoning. This is part of the larger ASFALDA French FrameNet project, which focuses on a few different notional domains which are interesting in their own right (Djemaa et al., 2016), including cognitive positions and communication frames. In the process of building the French lexicon and preparing the annotation of the corpus, we had to remodel some of the frames proposed in FN based on English data, with hopefully more precise frame definitions to facilitate human annotation. This includes semantic clarifications of frames and frame elements, redundancy elimination, and added coverage. The result is arguably a significant improvement of the treatment of causality in FN itself

    Semantic frames of taơxis (“identification”) in Persian: A corpus-based study

    Get PDF
    One of the lexical conceptual relations in language is the polysemy relation by which Finch (2000) and Saeed (2009) mean that a word or lexeme has more than one meaning. In polysemy, out of the polysemous word, multiple meanings are interpreted which are closely related to each other. According to what Richards and Schmidt (1985) define, the semantic units composed of a sequence of events or affairs which are relevant to specific situations evoke their own semantic frames. In fact, a frame is a representation of the context including the sentence in which linguistic items are presented (Matthews, 1997). The concept of Frame was primarily proposed by Fillmore (1977; 1982; 1985) in 1970s. The present research has been done in two phases with the goal of comparing the semantic frames of the word taĆĄxis (Identification) by determining the relationship among them in a way that first the sentences containing it were looked up in the Persian Corpus of Bijankhan. Then, the sentences including taĆĄxis (Identification) were separated from the sentences comprising different inflectional forms of the verb taĆĄxis dādan (to identify). Afterwards, each sentence was converted into its equivalent noun/adjective phrase. In the second phase, the English equivalents of Identification in each phrase were obtained from three different Persian to English dictionaries to be able to extract the semantic frames for them. After extracting the frames, each English counterpart called Lexical Unit in the FrameNet alongside its semantic frame was compared to other frames and ultimately the following conclusions were drawn: the contexts where Identification is used are classified into 5 categories as linguistics, medical science, law, security checking and politics. Regarding the same usage of some words in two categories, four semantic frames are evoked out of five contexts all of which share the concept of the capability of making distinction and that of making decision

    Effective weakly supervised semantic frame induction using expression sharing in hierarchical hidden Markov models

    Get PDF
    We present a framework for the induction of semantic frames from utterances in the context of an adaptive command-and-control interface. The system is trained on an individual user's utterances and the corresponding semantic frames representing controls. During training, no prior information on the alignment between utterance segments and frame slots and values is available. In addition, semantic frames in the training data can contain information that is not expressed in the utterances. To tackle this weakly supervised classification task, we propose a framework based on Hidden Markov Models (HMMs). Structural modifications, resulting in a hierarchical HMM, and an extension called expression sharing are introduced to minimize the amount of training time and effort required for the user. The dataset used for the present study is PATCOR, which contains commands uttered in the context of a vocally guided card game, Patience. Experiments were carried out on orthographic and phonetic transcriptions of commands, segmented on different levels of n-gram granularity. The experimental results show positive effects of all the studied system extensions, with some effect differences between the different input representations. Moreover, evaluation experiments on held-out data with the optimal system configuration show that the extended system is able to achieve high accuracies with relatively small amounts of training data

    SLIS Student Research Journal, Vol.7, Iss.1

    Get PDF

    Evaluating automatically acquired f-structures against PropBank

    Get PDF
    An automatic method for annotating the Penn-II Treebank (Marcus et al., 1994) with high-level Lexical Functional Grammar (Kaplan and Bresnan, 1982; Bresnan, 2001; Dalrymple, 2001) f-structure representations is presented by Burke et al. (2004b). The annotation algorithm is the basis for the automatic acquisition of wide-coverage and robust probabilistic approximations of LFG grammars (Cahill et al., 2004) and for the induction of subcategorisation frames (O’Donovan et al., 2004; O’Donovan et al., 2005). Annotation quality is, therefore, extremely important and to date has been measured against the DCU 105 and the PARC 700 Dependency Bank (King et al., 2003). The annotation algorithm achieves f-scores of 96.73% for complete f-structures and 94.28% for preds-only f-structures against the DCU 105 and 87.07% against the PARC 700 using the feature set of Kaplan et al. (2004). Burke et al. (2004a) provides detailed analysis of these results. This paper presents an evaluation of the annotation algorithm against PropBank (Kingsbury and Palmer, 2002). PropBank identifies the semantic arguments of each predicate in the Penn-II treebank and annotates their semantic roles. As PropBank was developed independently of any grammar formalism it provides a platform for making more meaningful comparisons between parsing technologies than was previously possible. PropBank also allows a much larger scale evaluation than the smaller DCU 105 and PARC 700 gold standards. In order to perform the evaluation, first, we automatically converted the PropBank annotations into a dependency format. Second, we developed conversion software to produce PropBank-style semantic annotations in dependency format from the f-structures automatically acquired by the annotation algorithm from Penn-II. The evaluation was performed using the evaluation software of Crouch et al. (2002) and Riezler et al. (2002). Using the Penn-II Wall Street Journal Section 24 as the development set, currently we achieve an f-score of 76.58% against PropBank for the Section 23 test set
    • 

    corecore