42 research outputs found

    EDBL: a General Lexical Basis for the Automatic Processing of Basque

    Get PDF
    EDBL (Euskararen Datu-Base Lexikala) is a general-purpose lexical database used in Basque text-processing tasks. It is a large repository of lexical knowledge (currently around 80,000 entries) that acts as basis and support in a number of different NLP tasks, thus providing lexical information for several language tools: morphological analysis, spell checking and correction, lemmatization and tagging, syntactic analysis, and so on. It has been designed to be neutral in relation to the different linguistic formalisms, and flexible and open enough to accept new types of information. A browser-based user interface makes the job of consulting the database, correcting and updating entries, adding new ones, etc. easy to the lexicographer. The paper presents the conceptual schema and the main features of the database, along with some problems encountered in its design and implementation in a commercial DBMS. Given the diversity of the lexical entities and the complex relationships existing among them, three total specializations have been defined under the main class of the hierarchy that represents the conceptual schema. The first one divides all the entries in EDBL into Basque standard and non-standard entries. The second divides the units in the database into dictionary entries (classified into the different parts-of-speech) and other entries (mainly non-independent morphemes and irregularly inflected forms). Finally, another total specialization has been established between single-word entries and multiword lexical units; this permits us to describe the morphotactics of single-word entries, and the constitution and surface realization schemas of multiword lexical units.A hierarchy of typed feature structures (FS) has been designed to map the entities and relationships in the database conceptual schema. The FSs are coded in TEI-conformant SGML, and Feature Structure Declarations (FSD) have been made for all the types of the hierarchy. Feature structures are used as a delivery format to export the lexical information from the database. The information coded in this way is subsequently used as input by the different language analysis tools

    Semantic analysis in the automation of ER modelling through natural language processing

    Get PDF

    Simplifying syntactic and semantic parsing of NL-based queries in advanced application domains

    Get PDF
    The paper presents a high level query language (MDDQL) for databases, which relies on an ontology driven automaton. This is simulated by the human-computer interaction mode for the query construction process, which is driven by an inference engine operating upon a frames based ontology description. Therefore, given that the query construction process implicitly leads to the contemporary construction of high level query trees prior to submission of the query for transformation and execution to a semantic middle-ware, syntactic and semantic parsing of a query with conventional techniques, i.e., after completion of its formulation, becomes obsolete. To this extent, only, as meaningful as possible, queries can be constructed at a low typing, learning, syntactic and semantic parsing effort and regardless the preferred natural (sub)language. From a linguistics point o view, it turns out that the query construction mechanism can easily be adapted and work with families of natural languages, which underlie another type order such as Subject-Object-Verb as opposed to the typical Subject-Verb-Object type order, which underlie most European languages. The query construction mechanism has been proved as practical in advanced application domains, such as those provided by medical applications, with an advanced and hardly understood terminology for naive users and the public

    Crime Prevention on Social Networks Featuring Location Based Services

    Get PDF
    In the age of austerity crime is on the increase. The large online presence of the populous is fueling criminals with large amounts of data capable of turning and individual into a victim. The public awareness of the dangers of social networks is low and online crime analysis is in its infancy. This paper presents a novel system for the prevention of crime on social networks. The system will identify risks within users Geo-location information, status updates and online profile. The system analyses location based information as well as using Information Extraction templates and Natural Language Processing to identify threats. The system can successfully identify threats on a graded scale and provide feedback and advice to the user. The work highlights the importance of closely monitoring a digital footprin

    Semi-Automated Development of Conceptual Models from Natural Language Text

    Get PDF
    The process of converting natural language specifications into conceptual models requires detailed analysis of natural language text, and designers frequently make mistakes when undertaking this transformation manually. Although many approaches have been used to help designers translate natural language text into conceptual models, each approach has its limitations. One of the main limitations is the lack of a domain-independent ontology that can be used as a repository for entities and relationships, thus guiding the transition from natural language processing into a conceptual model. Such an ontology is not currently available because it would be very difficult and time consuming to produce. In this thesis, a semi-automated system for mapping natural language text into conceptual models is proposed. The model, which is called SACMES, combines a linguistic approach with an ontological approach and human intervention to achieve the task. The model learns from the natural language specifications that it processes, and stores the information that is learnt in a conceptual model ontology and a user history knowledge database. It then uses the stored information to improve performance and reduce the need for human intervention. The evaluation conducted on SACMES demonstrates that (1) designers’ creation of conceptual models is improved when using the system comparing with not using any system, and that (2) the performance of the system is improved by processing more natural language requirements, and thus, the need for human intervention has decreased. However, these advantages may be improved further through development of the learning and retrieval techniques used by the system

    Using Patterns for Keyword Search in RDF Graphs *

    Get PDF
    ABSTRACT An increasing number of RDF datasets are available on the Web. Querying RDF data requires the knowledge of a query language such as SPARQL; it also requires some information describing the content of these datasets. The goal of our work is to facilitate the querying of RDF datasets, and we present an approach for enabling users to search in RDF data using keywords. We introduce the notion of pattern to integrate external knowledge in the search process, which increases the quality of the results

    A Feasibility Study of Automated Support for Similarity Analysis of Natural Language Requirements in Market-Driven Development

    Get PDF
    In market-driven software development there is a strong need for support to handle congestion in the requirements engineering process, which may occur as the demand for short time-to-market is combined with a rapid arrival of new requirements from many different sources. Automated analysis of the continuous flow of incoming requirements provides an opportunity to increase the efficiency of the requirements engineering process. This paper presents empirical evaluations of the benefit of automated similarity analysis of textual requirements, where existing information retrieval techniques are used to statistically measure requirements similarity. The results show that automated analysis of similarity among textual requirements is a promising technique that may provide effective support in identifying relationships between requirements

    Content warehouses

    Get PDF
    Nowadays, content management systems are an established technology. Based on the experiences from several application scenarios we discuss the points of contact between content management systems and other disciplines of information systems engineering like data warehouses, data mining, and data integration. We derive a system architecture called "content warehouse" that integrates these technologies and defines a more general and more sophisticated view on content management. As an example, a system for the collection, maintenance, and evaluation of biological content like survey data or multimedia resources is shown as a case study

    OWL Reasoners still useable in 2023

    Full text link
    In a systematic literature and software review over 100 OWL reasoners/systems were analyzed to see if they would still be usable in 2023. This has never been done in this capacity. OWL reasoners still play an important role in knowledge organisation and management, but the last comprehensive surveys/studies are more than 8 years old. The result of this work is a comprehensive list of 95 standalone OWL reasoners and systems using an OWL reasoner. For each item, information on project pages, source code repositories and related documentation was gathered. The raw research data is provided in a Github repository for anyone to use

    Semantic Tagging for the Urdu Language:Annotated Corpus and Multi-Target Classification Methods

    Get PDF
    Extracting and analysing meaning-related information from natural language data has attracted the attention of researchers in various fields, such as natural language processing, corpus linguistics, information retrieval, and data science. An important aspect of such automatic information extraction and analysis is the annotation of language data using semantic tagging tools. Different semantic tagging tools have been designed to carry out various levels of semantic analysis, for instance, named entity recognition and disambiguation, sentiment analysis, word sense disambiguation, content analysis, and semantic role labelling. Common to all of these tasks, in the supervised setting, is the requirement for a manually semantically annotated corpus, which acts as a knowledge base from which to train and test potential word and phrase-level sense annotations. Many benchmark corpora have been developed for various semantic tagging tasks, but most are for English and other European languages. There is a dearth of semantically annotated corpora for the Urdu language, which is widely spoken and used around the world. To fill this gap, this study presents a large benchmark corpus and methods for the semantic tagging task for the Urdu language. The proposed corpus contains 8,000 tokens in the following domains or genres: news, social media, Wikipedia, and historical text (each domain having 2K tokens). The corpus has been manually annotated with 21 major semantic fields and 232 sub-fields with the USAS (UCREL Semantic Analysis System) semantic taxonomy which provides a comprehensive set of semantic fields for coarse-grained annotation. Each word in our proposed corpus has been annotated with at least one and up to nine semantic field tags to provide a detailed semantic analysis of the language data, which allowed us to treat the problem of semantic tagging as a supervised multi-target classification task. To demonstrate how our proposed corpus can be used for the development and evaluation of Urdu semantic tagging methods, we extracted local, topical and semantic features from the proposed corpus and applied seven different supervised multi-target classifiers to them. Results show an accuracy of 94% on our proposed corpus which is free and publicly available to download
    corecore