283 research outputs found

    DataDriven and Ontological Analysis of FrameNet for Natural Language Reasoning. In

    Get PDF
    Abstract This paper focuses on the improvement of the conceptual structure of FrameNet for the sake of applying this resource to knowledgeintensive NLP tasks requiring reasoning, such as question answering, information extraction etc. Ontological analysis supported by data-driven methods is used for axiomatizing, enriching and cleaning up frame relations. The impact of the achieved axiomatization is investigated on recognizing textual entailment

    Sharing Semantic Resources

    Get PDF
    The Semantic Web is an extension of the current Web in which information, so far created for human consumption, becomes machine readable, ā€œenabling computers and people to work in cooperationā€. To turn into reality this vision several challenges are still open among which the most important is to share meaning formally represented with ontologies or more generally with semantic resources. This Semantic Web long-term goal has many convergences with the activities in the field of Human Language Technology and in particular in the development of Natural Language Processing applications where there is a great need of multilingual lexical resources. For instance, one of the most important lexical resources, WordNet, is also commonly regarded and used as an ontology. Nowadays, another important phenomenon is represented by the explosion of social collaboration, and Wikipedia, the largest encyclopedia in the world, is object of research as an up to date omni comprehensive semantic resource. The main topic of this thesis is the management and exploitation of semantic resources in a collaborative way, trying to use the already available resources as Wikipedia and Wordnet. This work presents a general environment able to turn into reality the vision of shared and distributed semantic resources and describes a distributed three-layer architecture to enable a rapid prototyping of cooperative applications for developing semantic resources

    Natural Language Processing for Requirements Formalization: How to Derive New Approaches?

    Full text link
    It is a long-standing desire of industry and research to automate the software development and testing process as much as possible. In this process, requirements engineering (RE) plays a fundamental role for all other steps that build on it. Model-based design and testing methods have been developed to handle the growing complexity and variability of software systems. However, major effort is still required to create specification models from a large set of functional requirements provided in natural language. Numerous approaches based on natural language processing (NLP) have been proposed in the literature to generate requirements models using mainly syntactic properties. Recent advances in NLP show that semantic quantities can also be identified and used to provide better assistance in the requirements formalization process. In this work, we present and discuss principal ideas and state-of-the-art methodologies from the field of NLP in order to guide the readers on how to create a set of rules and methods for the semi-automated formalization of requirements according to their specific use case and needs. We discuss two different approaches in detail and highlight the iterative development of rule sets. The requirements models are represented in a human- and machine-readable format in the form of pseudocode. The presented methods are demonstrated on two industrial use cases from the automotive and railway domains. It shows that using current pre-trained NLP models requires less effort to create a set of rules and can be easily adapted to specific use cases and domains. In addition, findings and shortcomings of this research area are highlighted and an outlook on possible future developments is given.Comment: 26 pages, 7 figure

    The Semantic Shadow : Combining User Interaction with Context Information for Semantic Web-Site Annotation

    Get PDF
    This thesis develops the concept of the Semantic Shadow (SemS), a model for managing contentual and structural annotations on web page elements and their values. The model supports a contextual weighting of the annotated information, allowing to specify the annotation values in relation to the evaluation context. A procedure is presented, which allows to manage and process this context-dependent meta information on web page elements using a dedicated programming interface. Two distinct implementations for the model have been developed: One based on Java objects, the other using the Resource Description Framework (RDF) as modeling backend. This RDF-based storage allows to integrate the annotations of the Semantic Shadow with other information of the Semantic Web. To demonstrate the application of the Semantic Shadow concept, a procedure to optimize web based user interfaces based on the structural semantics has been developed: Assuming a mobile client, a requested web page is dynamically adapted by a proxy prototype, where the context-awareness of the adaptation can be directly modeled alongside with the structural annotations. To overcome the drawback of missing annotations for existing web pages, this thesis introduces a concept to derive context-dependent meta-information on the web pages from their usage: From the observation of the users' interaction with a web page, certain context-dependent structural information about the concerned web page elements can be derived and stored in the annotation model of the Semantic Shadow concept.In dieser Arbeit wird das Konzept des Semantic Shadow (dt. Semantischer Schatten) entwickelt, ein Programmier-Modell um Webseiten-Elemente mit inhaltsbezogenen und strukturellen Anmerkungen zu versehen. Das Modell unterstĆ¼tzt dabei eine kontextabhƤngige Gewichtung der Anmerkungen, so dass eine Anmerkung in Bezug zum Auswertungs-Kontext gesetzt werden kann. Zur Verwaltung und Verarbeitung dieser kontextbezogenen Meta-Informationen fĆ¼r Webseiten-Elemente wurde im Rahmen der Arbeit eine Programmierschnittstelle definiert. Dazu wurden zwei Implementierungen der Schnittstelle entwickelt: Eine basiert ausschlieƟlich auf Java-Objekten, die andere baut auf einem RDF-Modell auf. Die RDF-basierte Persistierung erlaubt eine Integration der Semantic-Shadow-Anmerkungen mit anderen Anwendungen des Semantic Webs. Um die Anwendungsmƶglichkeiten des Semantic-Shadow-Konzepts darzustellen, wurde eine Vorgehensweise zur Optimierung von webbasierten Benutzerschnittstellen auf Grundlage von semantischen Strukturinformationen entwickelt: Wenn ein mobiler Benutzer eine Webseite anfordert, wird diese dynamisch durch einen Proxy angepasst. Die KontextabhƤngigkeit dieser Anpassung wird dabei bereits direkt mit den Struktur-Anmerkungen modelliert. FĆ¼r bestehende Webseiten liegen zumeist keine Annotationen vor. Daher wird in dieser Arbeit ein Konzept vorgestellt, kontextabhƤngige Meta-Informationen aus der Benutzung der Webseiten zu bestimmen: Durch Beobachtung der Benutzer-Interaktionen mit den Webseiten-Elementen ist es mƶglich bestimmte kontextabhƤngige Strukturinformationen abzuleiten und als Anmerkungen im Modell des Semantic-Shadow-Konzepts zu persistieren

    Design considerations for a hierarchical semantic compositional framework for medical natural language understanding

    Full text link
    Medical natural language processing (NLP) systems are a key enabling technology for transforming Big Data from clinical report repositories to information used to support disease models and validate intervention methods. However, current medical NLP systems fall considerably short when faced with the task of logically interpreting clinical text. In this paper, we describe a framework inspired by mechanisms of human cognition in an attempt to jump the NLP performance curve. The design centers about a hierarchical semantic compositional model (HSCM) which provides an internal substrate for guiding the interpretation process. The paper describes insights from four key cognitive aspects including semantic memory, semantic composition, semantic activation, and hierarchical predictive coding. We discuss the design of a generative semantic model and an associated semantic parser used to transform a free-text sentence into a logical representation of its meaning. The paper discusses supportive and antagonistic arguments for the key features of the architecture as a long-term foundational framework

    Improved Coreference Resolution Using Cognitive Insights

    Get PDF
    Coreference resolution is the task of extracting referential expressions, or mentions, in text and clustering these by the entity or concept they refer to. The sustained research interest in the task reflects the richness of reference expression usage in natural language and the difficulty in encoding insights from linguistic and cognitive theories effectively. In this thesis, we design and implement LIMERIC, a state-of-the-art coreference resolution engine. LIMERIC naturally incorporates both non-local decoding and entity-level modelling to achieve the highly competitive benchmark performance of 64.22% and 59.99% on the CoNLL-2012 benchmark with a simple model and a baseline feature set. As well as strong performance, a key contribution of this work is a reconceptualisation of the coreference task. We draw an analogy between shift-reduce parsing and coreference resolution to develop an algorithm which naturally mimics cognitive models of human discourse processing. In our feature development work, we leverage insights from cognitive theories to improve our modelling. Each contribution achieves statistically significant improvements and sum to gains of 1.65% and 1.66% on the CoNLL-2012 benchmark, yielding performance values of 65.76% and 61.27%. For each novel feature we propose, we contribute an accompanying analysis so as to better understand how cognitive theories apply to real language data. LIMERIC is at once a platform for exploring cognitive insights into coreference and a viable alternative to current systems. We are excited by the promise of incorporating our and further cognitive insights into more complex frameworks since this has the potential to both improve the performance of computational models, as well as our understanding of the mechanisms underpinning human reference resolution

    Using natural language processing for question answering in closed and open domains

    Get PDF
    With regard to the growth in the amount of social, environmental, and biomedical information available digitally, there is a growing need for Question Answering (QA) systems that can empower users to master this new wealth of information. Despite recent progress in QA, the quality of interpretation and extraction of the desired answer is not adequate. We believe that striving for higher accuracy in QA systems is subject to on-going research, i.e., it is better to have no answer is better than wrong answers. However, there are diverse queries, which the state of the art QA systems cannot interpret and answer properly. The problem of interpreting a question in a way that could preserve its syntactic-semantic structure is considered as one of the most important challenges in this area. In this work we focus on the problems of semantic-based QA systems and analyzing the effectiveness of NLP techniques, query mapping, and answer inferencing both in closed (first scenario) and open (second scenario) domains. For this purpose, the architecture of Semantic-based closed and open domain Question Answering System (hereafter ā€œScoQASā€) over ontology resources is presented with two different prototyping: Ontology-based closed domain and an open domain under Linked Open Data (LOD) resource. The ScoQAS is based on NLP techniques combining semantic-based structure-feature patterns for question classification and creating a question syntactic-semantic information structure (QSiS). The QSiS provides an actual potential by building constraints to formulate the related terms on syntactic-semantic aspects and generating a question graph (QGraph) which facilitates making inference for getting a precise answer in the closed domain. In addition, our approach provides a convenient method to map the formulated comprehensive information into SPARQL query template to crawl in the LOD resources in the open domain. The main contributions of this dissertation are as follows: 1. Developing ScoQAS architecture integrated with common and specific components compatible with closed and open domain ontologies. 2. Analysing userā€™s question and building a question syntactic-semantic information structure (QSiS), which is constituted by several processes of the methodology: question classification, Expected Answer Type (EAT) determination, and generated constraints. 3. Presenting an empirical semantic-based structure-feature pattern for question classification and generalizing heuristic constraints to formulate the relations between the features in the recognized pattern in terms of syntactical and semantical. 4. Developing a syntactic-semantic QGraph for representing core components of the question. 5. Presenting an empirical graph-based answer inference in the closed domain. In a nutshell, a semantic-based QA system is presented which provides some experimental results over the closed and open domains. The efficiency of the ScoQAS is evaluated using measures such as precision, recall, and F-measure on LOD challenges in the open domain. We focus on quantitative evaluation in the closed domain scenario. Due to the lack of predefined benchmark(s) in the first scenario, we define measures that demonstrate the actual complexity of the problem and the actual efficiency of the solutions. The results of the analysis corroborate the performance and effectiveness of our approach to achieve a reasonable accuracy.Con respecto al crecimiento en la cantidad de informaciĆ³n social, ambiental y biomĆ©dica disponible digitalmente, existe una creciente necesidad de sistemas de la bĆŗsqueda de la respuesta (QA) que puedan ofrecer a los usuarios la gestiĆ³n de esta nueva cantidad de informaciĆ³n. A pesar del progreso reciente en QA, la calidad de interpretaciĆ³n y extracciĆ³n de la respuesta deseada no es la adecuada. Creemos que trabajar para lograr una mayor precisiĆ³n en los sistemas de QA es todavĆ­a un campo de investigaciĆ³n abierto. Es decir, es mejor no tener respuestas que tener respuestas incorrectas. Sin embargo, existen diversas consultas que los sistemas de QA en el estado del arte no pueden interpretar ni responder adecuadamente. El problema de interpretar una pregunta de una manera que podrĆ­a preservar su estructura sintĆ”ctica-semĆ”ntica es considerado como uno de los desafĆ­os mĆ”s importantes en esta Ć”rea. En este trabajo nos centramos en los problemas de los sistemas de QA basados en semĆ”ntica y en el anĆ”lisis de la efectividad de las tĆ©cnicas de PNL, y la aplicaciĆ³n de consultas e inferencia respuesta tanto en dominios cerrados (primer escenario) como abiertos (segundo escenario). Para este propĆ³sito, la arquitectura del sistema de bĆŗsqueda de respuestas en dominios cerrados y abiertos basado en semĆ”ntica (en adelante "ScoQAS") sobre ontologĆ­as se presenta con dos prototipos diferentes: en dominio cerrado basado en el uso de ontologĆ­as y un dominio abierto dirigido a repositorios de Linked Open Data (LOD). El ScoQAS se basa en tĆ©cnicas de PNL que combinan patrones de caracterĆ­sticas de estructura semĆ”nticas para la clasificaciĆ³n de preguntas y la creaciĆ³n de una estructura de informaciĆ³n sintĆ”ctico-semĆ”ntica de preguntas (QSiS). El QSiS proporciona una manera la construcciĆ³n de restricciones para formular los tĆ©rminos relacionados en aspectos sintĆ”ctico-semĆ”nticos y generar un grafo de preguntas (QGraph) el cual facilita derivar inferencias para obtener una respuesta precisa en el dominio cerrado. AdemĆ”s, nuestro enfoque proporciona un mĆ©todo adecuado para aplicar la informaciĆ³n integral formulada en la plantilla de consulta SPARQL para navegar en los recursos LOD en el dominio abierto. Las principales contribuciones de este trabajo son los siguientes: 1. El desarrollo de la arquitectura ScoQAS integrada con componentes comunes y especĆ­ficos compatibles con ontologĆ­as de dominio cerrado y abierto. 2. El anĆ”lisis de la pregunta del usuario y la construcciĆ³n de una estructura de informaciĆ³n sintĆ”ctico-semĆ”ntica de las preguntas (QSiS), que estĆ” constituida por varios procesos de la metodologĆ­a: clasificaciĆ³n de preguntas, determinaciĆ³n del Tipo de Respuesta Esperada (EAT) y las restricciones generadas. 3. La presentaciĆ³n de un patrĆ³n empĆ­rico basado en la estructura semĆ”ntica para clasificar las preguntas y generalizar las restricciones heurĆ­sticas para formular las relaciones entre las caracterĆ­sticas en el patrĆ³n reconocido en tĆ©rminos sintĆ”cticos y semĆ”nticos. 4. El desarrollo de un QGraph sintĆ”ctico-semĆ”ntico para representar los componentes centrales de la pregunta. 5. La presentaciĆ³n de la respuesta inferida a partir de un grafo empĆ­rico en el dominio cerrado. En pocas palabras, se presenta un sistema semĆ”ntico de QA que proporciona algunos resultados experimentales sobre los dominios cerrados y abiertos. La eficiencia del ScoQAS se evalĆŗa utilizando medidas tales como una precisiĆ³n, cobertura y la medida-F en desafĆ­os LOD para el dominio abierto. Para el dominio cerrado, nos centramos en la evaluaciĆ³n cuantitativa; su precisiĆ³n se analiza en una ontologĆ­a empresarial. La falta de un banco la pruebas predefinidas es uno de los principales desafĆ­os de la evaluaciĆ³n en el primer escenario. Por lo tanto, definimos medidas que demuestran la complejidad real del problema y la eficiencia real de las soluciones. Los resultados del anĆ”lisis corroboran el rendimient

    Knowledge extraction from unstructured data and classification through distributed ontologies

    Get PDF
    The World Wide Web has changed the way humans use and share any kind of information. The Web removed several access barriers to the information published and has became an enormous space where users can easily navigate through heterogeneous resources (such as linked documents) and can easily edit, modify, or produce them. Documents implicitly enclose information and relationships among them which become only accessible to human beings. Indeed, the Web of documents evolved towards a space of data silos, linked each other only through untyped references (such as hypertext references) where only humans were able to understand. A growing desire to programmatically access to pieces of data implicitly enclosed in documents has characterized the last efforts of the Web research community. Direct access means structured data, thus enabling computing machinery to easily exploit the linking of different data sources. It has became crucial for the Web community to provide a technology stack for easing data integration at large scale, first structuring the data using standard ontologies and afterwards linking them to external data. Ontologies became the best practices to define axioms and relationships among classes and the Resource Description Framework (RDF) became the basic data model chosen to represent the ontology instances (i.e. an instance is a value of an axiom, class or attribute). Data becomes the new oil, in particular, extracting information from semi-structured textual documents on the Web is key to realize the Linked Data vision. In the literature these problems have been addressed with several proposals and standards, that mainly focus on technologies to access the data and on formats to represent the semantics of the data and their relationships. With the increasing of the volume of interconnected and serialized RDF data, RDF repositories may suffer from data overloading and may become a single point of failure for the overall Linked Data vision. One of the goals of this dissertation is to propose a thorough approach to manage the large scale RDF repositories, and to distribute them in a redundant and reliable peer-to-peer RDF architecture. The architecture consists of a logic to distribute and mine the knowledge and of a set of physical peer nodes organized in a ring topology based on a Distributed Hash Table (DHT). Each node shares the same logic and provides an entry point that enables clients to query the knowledge base using atomic, disjunctive and conjunctive SPARQL queries. The consistency of the results is increased using data redundancy algorithm that replicates each RDF triple in multiple nodes so that, in the case of peer failure, other peers can retrieve the data needed to resolve the queries. Additionally, a distributed load balancing algorithm is used to maintain a uniform distribution of the data among the participating peers by dynamically changing the key space assigned to each node in the DHT. Recently, the process of data structuring has gained more and more attention when applied to the large volume of text information spread on the Web, such as legacy data, news papers, scientific papers or (micro-)blog posts. This process mainly consists in three steps: \emph{i)} the extraction from the text of atomic pieces of information, called named entities; \emph{ii)} the classification of these pieces of information through ontologies; \emph{iii)} the disambigation of them through Uniform Resource Identifiers (URIs) identifying real world objects. As a step towards interconnecting the web to real world objects via named entities, different techniques have been proposed. The second objective of this work is to propose a comparison of these approaches in order to highlight strengths and weaknesses in different scenarios such as scientific and news papers, or user generated contents. We created the Named Entity Recognition and Disambiguation (NERD) web framework, publicly accessible on the Web (through REST API and web User Interface), which unifies several named entity extraction technologies. Moreover, we proposed the NERD ontology, a reference ontology for comparing the results of these technologies. Recently, the NERD ontology has been included in the NIF (Natural language processing Interchange Format) specification, part of the Creating Knowledge out of Interlinked Data (LOD2) project. Summarizing, this dissertation defines a framework for the extraction of knowledge from unstructured data and its classification via distributed ontologies. A detailed study of the Semantic Web and knowledge extraction fields is proposed to define the issues taken under investigation in this work. Then, it proposes an architecture to tackle the single point of failure issue introduced by the RDF repositories spread within the Web. Although the use of ontologies enables a Web where data is structured and comprehensible by computing machinery, human users may take advantage of it especially for the annotation task. Hence, this work describes an annotation tool for web editing, audio and video annotation in a web front end User Interface powered on the top of a distributed ontology. Furthermore, this dissertation details a thorough comparison of the state of the art of named entity technologies. The NERD framework is presented as technology to encompass existing solutions in the named entity extraction field and the NERD ontology is presented as reference ontology in the field. Finally, this work highlights three use cases with the purpose to reduce the amount of data silos spread within the Web: a Linked Data approach to augment the automatic classification task in a Systematic Literature Review, an application to lift educational data stored in Sharable Content Object Reference Model (SCORM) data silos to the Web of data and a scientific conference venue enhancer plug on the top of several data live collectors. Significant research efforts have been devoted to combine the efficiency of a reliable data structure and the importance of data extraction techniques. This dissertation opens different research doors which mainly join two different research communities: the Semantic Web and the Natural Language Processing community. The Web provides a considerable amount of data where NLP techniques may shed the light within it. The use of the URI as a unique identifier may provide one milestone for the materialization of entities lifted from a raw text to real world object

    Ontology Alignment using Biologically-inspired Optimisation Algorithms

    Get PDF
    It is investigated how biologically-inspired optimisation methods can be used to compute alignments between ontologies. Independent of particular similarity metrics, the developed techniques demonstrate anytime behaviour and high scalability. Due to the inherent parallelisability of these population-based algorithms it is possible to exploit dynamically scalable cloud infrastructures - a step towards the provisioning of Alignment-as-a-Service solutions for future semantic applications
    • ā€¦
    corecore