21 research outputs found

    Assessing the contribution of shallow and deep knowledge sources for word sense disambiguation

    No full text
    Corpus-based techniques have proved to be very beneficial in the development of efficient and accurate approaches to word sense disambiguation (WSD) despite the fact that they generally represent relatively shallow knowledge. It has always been thought, however, that WSD could also benefit from deeper knowledge sources. We describe a novel approach to WSD using inductive logic programming to learn theories from first-order logic representations that allows corpus-based evidence to be combined with any kind of background knowledge. This approach has been shown to be effective over several disambiguation tasks using a combination of deep and shallow knowledge sources. Is it important to understand the contribution of the various knowledge sources used in such a system. This paper investigates the contribution of nine knowledge sources to the performance of the disambiguation models produced for the SemEval-2007 English lexical sample task. The outcome of this analysis will assist future work on WSD in concentrating on the most useful knowledge sources

    Ontologies and Information Extraction

    Full text link
    This report argues that, even in the simplest cases, IE is an ontology-driven process. It is not a mere text filtering method based on simple pattern matching and keywords, because the extracted pieces of texts are interpreted with respect to a predefined partial domain model. This report shows that depending on the nature and the depth of the interpretation to be done for extracting the information, more or less knowledge must be involved. This report is mainly illustrated in biology, a domain in which there are critical needs for content-based exploration of the scientific literature and which becomes a major application domain for IE

    What's in a Message?

    Get PDF
    In this paper we present the first step in a larger series of experiments for the induction of predicate/argument structures. The structures that we are inducing are very similar to the conceptual structures that are used in Frame Semantics (such as FrameNet). Those structures are called messages and they were previously used in the context of a multi-document summarization system of evolving events. The series of experiments that we are proposing are essentially composed from two stages. In the first stage we are trying to extract a representative vocabulary of words. This vocabulary is later used in the second stage, during which we apply to it various clustering approaches in order to identify the clusters of predicates and arguments--or frames and semantic roles, to use the jargon of Frame Semantics. This paper presents in detail and evaluates the first stage

    Otistik Ve Zihinsel Engelli Çocuklar İçin Doğal Dil İşleme Tabanlı Bir Yardım Aracı: Bir Başlangıç Çalışması

    Get PDF
    DergiPark: 245928trakyafbdEngelli çocukların, eğitim ve gelişim olanaklarına mümkün olduğunca kolay ve etkin bir biçimde erişebilmesinin sağlanması, toplum için hem yasal hem de vicdani bir sorumluluktur. Yardımcı teknolojiler, engelli çocukların eğitim faaliyetlerine tam ve yeterli biçimde katılabilmesi için büyük olanaklar sunarlar. Bu makale, otistik ve zihinsel engelli çocukların eğitim ve öğretimine yardımcı olmak için geliştirilen bir yazılım aracını sunmaktadır. Bu araç ile engelli çocukların ifadeler ve onlara karşılık gelen kavramlar arasında resimler aracılığıyla bağlantı kurmalarına yardımcı olmak amaçlanmaktadır. Kullanıcının sistemle olan etkileşiminin doğal dil ifadeleriyle kurulmasını sağlamanın, iletişimi kısıtlanmış anahtar kelimelerle sınırlandırmaktan daha etkin olduğu gerçeğini dikkate alarak aracımızı bir Doğal Dil İşleme (DDİ) modülü ile donattık. Bu modül aracın omurgası olarak görev yapmakta ve doğal dil ifadelerini birleşme tabanlı (unification-based) bir dilbilgisi kullanarak anlamsal çerçeveler şeklinde çözümlemektedir. Giriş ifadeleri, ilgili resimlerle anlamsal çerçeveler aracılığıyla eşleştirilmektedir. Bu anlamsal çerçeveler, resimleri biçimsel bakımdan değil, içerikleri açısından temsil ettiği için, sistem esnek bir şekilde çalışabilmektedir.It is both a legal and conscientious responsibility of the society to enable children with disabilities to have access to and receive education and training as easily and effectively as possible. Assistive technology offers great opportunities for disabled students to participate in educational activities fully and adequately. This paper presents a software tool developed to assist the education and training of autistic and mentally retarded children. The tool is intended to help the disabled child establish the bridge between expressions and the concepts they refer to via relevant images. Taking into consideration the fact that enabling the user to interact with the system using natural language expressions will be much more effective compared to a system constraining the communication to a limited set of isolated keywords, the tool has been equipped with a Natural Language Processing (NLP) module. This module functions as the backbone of the tool. It analyzes natural language expressions into semantic frames using a unification-based grammar. Input expressions are mapped onto relevant images via the mediation of semantic frames. As these semantics frames represent the content of images, rather than their formal aspects, the system is able to operate on a flexible basis

    D6.1: Technologies and Tools for Lexical Acquisition

    Get PDF
    This report describes the technologies and tools to be used for Lexical Acquisition in PANACEA. It includes descriptions of existing technologies and tools which can be built on and improved within PANACEA, as well as of new technologies and tools to be developed and integrated in PANACEA platform. The report also specifies the Lexical Resources to be produced. Four main areas of lexical acquisition are included: Subcategorization frames (SCFs), Selectional Preferences (SPs), Lexical-semantic Classes (LCs), for both nouns and verbs, and Multi-Word Expressions (MWEs)

    A Survey on Semantic Processing Techniques

    Full text link
    Semantic processing is a fundamental research domain in computational linguistics. In the era of powerful pre-trained language models and large language models, the advancement of research in this domain appears to be decelerating. However, the study of semantics is multi-dimensional in linguistics. The research depth and breadth of computational semantic processing can be largely improved with new technologies. In this survey, we analyzed five semantic processing tasks, e.g., word sense disambiguation, anaphora resolution, named entity recognition, concept extraction, and subjectivity detection. We study relevant theoretical research in these fields, advanced methods, and downstream applications. We connect the surveyed tasks with downstream applications because this may inspire future scholars to fuse these low-level semantic processing tasks with high-level natural language processing tasks. The review of theoretical research may also inspire new tasks and technologies in the semantic processing domain. Finally, we compare the different semantic processing techniques and summarize their technical trends, application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN 1566-2535. The equal contribution mark is missed in the published version due to the publication policies. Please contact Prof. Erik Cambria for detail

    The Generation of Compound Nominals to Represent the Essence of Text The COMMIX System

    Get PDF
    This thesis concerns the COMMIX system, which automatically extracts information on what a text is about, and generates that information in the highly compacted form of compound nominal expressions. The expressions generated are complex and may include novel terms which do not appear themselves in the input text. From the practical point of view, the work is driven by the need for better representations of content: for representations which are shorter and more concise than would appear in an abstract, yet more informative and representative of the actual aboutness than commonly occurs in indexing expressions and key terms. This additional layer of representation is referred to in this work as pertaining to the essence of a particular text. From a theoretical standpoint, the thesis shows how the compound nominal as a construct can be successfully employed in these highly informative representations. It involves an exploration of the claim that there is sufficient semantic information contained within the standard dictionary glosses for individual words to enable the construction of useful and highly representative novel compound nominal expressions, without recourse to standard syntactic and statistical methods. It shows how a shallow semantic approach to content identification which is based on lexical overlap can produce some very encouraging results. The methodology employed, and described herein, is domain-independent, and does not require the specification of templates with which the input text must comply. In these two respects, the methodology developed in this work avoids two of the most common problems associated with information extraction. As regards the evaluation of this type of work, the thesis introduces and utilises the notion of percentage attainment value, which is used in conjunction with subjects' opinions about the degree to which the aboutness terms succeed in indicating the subject matter of the texts for which they were generated

    Using Analogy to Acquire Commonsense Knowledge from Human Contributors

    Get PDF
    The goal of the work reported here is to capture the commonsense knowledge of non-expert human contributors. Achieving this goal will enable more intelligent human-computer interfaces and pave the way for computers to reason about our world. In the domain of natural language processing, it will provide the world knowledge much needed for semantic processing of natural language. To acquire knowledge from contributors not trained in knowledge engineering, I take the following four steps: (i) develop a knowledge representation (KR) model for simple assertions in natural language, (ii) introduce cumulative analogy, a class of nearest-neighbor based analogical reasoning algorithms over this representation, (iii) argue that cumulative analogy is well suited for knowledge acquisition (KA) based on a theoretical analysis of effectiveness of KA with this approach, and (iv) test the KR model and the effectiveness of the cumulative analogy algorithms empirically. To investigate effectiveness of cumulative analogy for KA empirically, Learner, an open source system for KA by cumulative analogy has been implemented, deployed, and evaluated. (The site "1001 Questions," is available at http://teach-computers.org/learner.html). Learner acquires assertion-level knowledge by constructing shallow semantic analogies between a KA topic and its nearest neighbors and posing these analogies as natural language questions to human contributors. Suppose, for example, that based on the knowledge about "newspapers" already present in the knowledge base, Learner judges "newspaper" to be similar to "book" and "magazine." Further suppose that assertions "books contain information" and "magazines contain information" are also already in the knowledge base. Then Learner will use cumulative analogy from the similar topics to ask humans whether "newspapers contain information." Because similarity between topics is computed based on what is already known about them, Learner exhibits bootstrapping behavior --- the quality of its questions improves as it gathers more knowledge. By summing evidence for and against posing any given question, Learner also exhibits noise tolerance, limiting the effect of incorrect similarities. The KA power of shallow semantic analogy from nearest neighbors is one of the main findings of this thesis. I perform an analysis of commonsense knowledge collected by another research effort that did not rely on analogical reasoning and demonstrate that indeed there is sufficient amount of correlation in the knowledge base to motivate using cumulative analogy from nearest neighbors as a KA method. Empirically, evaluating the percentages of questions answered affirmatively, negatively and judged to be nonsensical in the cumulative analogy case compares favorably with the baseline, no-similarity case that relies on random objects rather than nearest neighbors. Of the questions generated by cumulative analogy, contributors answered 45% affirmatively, 28% negatively and marked 13% as nonsensical; in the control, no-similarity case 8% of questions were answered affirmatively, 60% negatively and 26% were marked as nonsensical
    corecore