752 research outputs found

    Підхід до розробки інформаційної системи для екстракції даних з веб

    Get PDF
    Today, the Internet contains a huge number of sources of information, which is constantly used in our daily lives. It often happens that similar in meaning information is presented in different forms on different resources (for example, electronic libraries, online stores, news sites and etc.). In this paper, we analyze the extraction of information from certain type of web sources that is required by the user. The analysis of the data extraction problem was carried out. When considering the main approaches to data extraction, the strengths and weaknesses of each were identified. The main aspects of the extraction of web knowledge were formulated. Approaches and information technologies for solving problems of syntactic analysis based on existing information systems are analyzed. Based on the analysis, the task of developing models and software components for extracting data from certain types of web resources were solving. A conceptual model of extracting data was developed taking into account web space as an external data source. A requirements specification for the software component was created, which will allow to continue working on the project and to clearly understand the requirements and constraints for implementation. During the process of modeling software, the following diagrams have been developed, such as activities, sequences and deployments, which will then be used to create the finished software application. For further development of the software, a programming platform and types of testing (load and modular) were defined. The obtained results allow to state that the proposed design solution, which will be implemented as a prototype of the software system, can perform the task of extracting data from different sources on the basis of a single semantic template.Сьогодні Інтернет містить величезну кількість джерел інформації, яка постійно використовується в нашому щоденному житті. Часто буває, що схожа за змістом інформація представлена в різній формі на різних ресурсах (наприклад, електронні бібліотеки, інтернет-магазини, новинні сайти). У даній роботі аналізується вилучення інформації з веб-джерел певного типу, яке потрібно користувачеві. Проведено аналіз проблеми вилучення даних. При розгляді основних підходів до екстракції даних були виділені сильні і слабкі сторони кожного. Сформульовано основні аспекти вилучення веб-знань. Проаналізовано підходи та інформаційні технології вирішення проблем синтаксичного аналізу на основі існуючих інформаційних систем. На основі проведеного аналізу була сформована задача розробки моделей і програмних компонентів для отримання даних з веб-ресурсів певного типу. Розроблено концептуальну модель вилучення даних з урахуванням веб-простору як зовнішнього джерела даних. Була створена специфікація вимог для програмного компонента, що дозволить продовжити роботу над проектом, щоб чітко розуміти вимоги і обмеження для реалізації. При моделюванні програмного забезпечення були розроблені наступні діаграми, такі як діаграми класів, активності, послідовності і розгортання, які потім будуть використовуватися для створення готового додатка. Для подальшої розробки програмного забезпечення була визначена платформа програмування і види тестування (навантажувальний і модульне). Отримані результати дозволяють стверджувати, що пропоноване проектне рішення, яке буде реалізовано у вигляді прототипу програмної системи, може виконувати завдання екстракції даних з різних джерел на основі одного семантичного шаблону

    Intelligent Query Answering with Contextual Knowledge for Relational Databases

    Get PDF
    We are proposing a keyword-based query interface for knowledge bases - including relational or deductive databases - based on contextual background knowledge such as suitable join conditions or synonyms. Join conditions could be extracted from existing referential integrity (foreign key) constaints of the database schema. They could also be learned from other, previous database queries, if the database schema does not contain foreign key constraints. Given a textual representation - a word list - of a query to a relational database, one may parse the list into a structured term. The intelligent and cooperative part of our approach is to hypothesize the semantics of the word list and to find suitable links between the concepts mentioned in the query using contextual knowledge, more precisely join conditions between the database tables. We use a knowledge-based parser based on an extension of Definite Clause Grammars (Dcg) that are interweaved with calls to the database schema to suitably annotate the tokens as table names, table attributes, attribute values or relationships linking tables. Our tool DdQl yields the possible queries in a special domain specific rule language that extends Datalog, from which the user can choose one

    Lexical and Grammar Resource Engineering for Runyankore & Rukiga: A Symbolic Approach

    Get PDF
    Current research in computational linguistics and natural language processing (NLP) requires the existence of language resources. Whereas these resources are available for a few well-resourced languages, there are many languages that have been neglected. Among the neglected and / or under-resourced languages are Runyankore and Rukiga (henceforth referred to as Ry/Rk). Recently, the NLP community has started to acknowledge that resources for under-resourced languages should also be given priority. Why? One reason being that as far as language typology is concerned, the few well-resourced languages do not represent the structural diversity of the remaining languages. The central focus of this thesis is about enabling the computational analysis and generation of utterances in Ry/Rk. Ry/Rk are two closely related languages spoken by about 3.4 and 2.4 million people respectively. They belong to the Nyoro-Ganda (JE10) language zone of the Great Lakes, Narrow Bantu of the Niger-Congo language family.The computational processing of these languages is achieved by formalising the grammars of these two languages using Grammatical Framework (GF) and its Resource Grammar Library (RGL). In addition to the grammar, a general-purpose computational lexicon for the two languages is developed. Although we utilise the lexicon to tremendously increase the lexical coverage of the grammars, the lexicon can be used for other NLP tasks.In this thesis a symbolic / rule-based approach is taken because the lack of adequate languages resources makes the use of data-driven NLP approaches unsuitable for these languages

    Definite Clause Grammars with Parse Trees: Extension for Prolog

    Get PDF
    Definite Clause Grammars (DCGs) are a convenient way to specify possibly non-context-free grammars for natural and formal languages. They can be used to progressively build a parse tree as grammar rules are applied by providing an extra argument in the DCG rule\u27s head. In the simplest way, this is a structure that contains the name of the used nonterminal. This extension of a DCG has been proposed for natural language processing in the past and can be done automatically in Prolog using term expansion. We extend this approach by a meta-nonterminal to specify optional and sequences of nonterminals, as these structures are common in grammars for formal, domain-specific languages. We specify a term expansion that represents these sequences as lists while preserving the grammar\u27s ability to be used both for parsing and serialising, i.e. to create a parse tree by a given source code and vice-versa. We show that this mechanism can be used to lift grammars specified in extended Backus-Naur form (EBNF) to generate parse trees. As a case study, we present a parser for the Prolog programming language itself based only on the grammars given in the ISO Prolog standard which produces corresponding parse trees

    Natural language software registry (second edition)

    Get PDF
    corecore