158,411 research outputs found

    A New Open Information Extraction System Using Sentence Difficulty Estimation

    Get PDF
    The World Wide Web has a considerable amount of information expressed using natural language. While unstructured text is often difficult for machines to understand, Open Information Extraction (OIE) is a relation-independent extraction paradigm designed to extract assertions directly from massive and heterogeneous corpora. Allocation of low-cost computational resources is a main demand for Open Relation Extraction (ORE) systems. A large number of ORE methods have been proposed recently, covering a wide range of NLP tools, from ``shallow'' (e.g., part-of-speech tagging) to ``deep'' (e.g., semantic role labeling). There is a trade-off between NLP tools depth versus efficiency (computational cost) of ORE systems. This paper describes a novel approach called Sentence Difficulty Estimator for Open Information Extraction (SDE-OIE) for automatic estimation of relation extraction difficulty by developing some difficulty classifiers. These classifiers dedicate the input sentence to an appropriate OIE extractor in order to decrease the overall computational cost. Our evaluations show that an intelligent selection of a proper depth of ORE systems has a significant improvement on the effectiveness and scalability of SDE-OIE. It avoids wasting resources and achieves almost the same performance as its constituent deep extractor in a more reasonable time

    Towards information profiling: data lake content metadata management

    Get PDF
    There is currently a burst of Big Data (BD) processed and stored in huge raw data repositories, commonly called Data Lakes (DL). These BD require new techniques of data integration and schema alignment in order to make the data usable by its consumers and to discover the relationships linking their content. This can be provided by metadata services which discover and describe their content. However, there is currently a lack of a systematic approach for such kind of metadata discovery and management. Thus, we propose a framework for the profiling of informational content stored in the DL, which we call information profiling. The profiles are stored as metadata to support data analysis. We formally define a metadata management process which identifies the key activities required to effectively handle this.We demonstrate the alternative techniques and performance of our process using a prototype implementation handling a real-life case-study from the OpenML DL, which showcases the value and feasibility of our approach.Peer ReviewedPostprint (author's final draft

    StoryDroid: Automated Generation of Storyboard for Android Apps

    Full text link
    Mobile apps are now ubiquitous. Before developing a new app, the development team usually endeavors painstaking efforts to review many existing apps with similar purposes. The review process is crucial in the sense that it reduces market risks and provides inspiration for app development. However, manual exploration of hundreds of existing apps by different roles (e.g., product manager, UI/UX designer, developer) in a development team can be ineffective. For example, it is difficult to completely explore all the functionalities of the app in a short period of time. Inspired by the conception of storyboard in movie production, we propose a system, StoryDroid, to automatically generate the storyboard for Android apps, and assist different roles to review apps efficiently. Specifically, StoryDroid extracts the activity transition graph and leverages static analysis techniques to render UI pages to visualize the storyboard with the rendered pages. The mapping relations between UI pages and the corresponding implementation code (e.g., layout code, activity code, and method hierarchy) are also provided to users. Our comprehensive experiments unveil that StoryDroid is effective and indeed useful to assist app development. The outputs of StoryDroid enable several potential applications, such as the recommendation of UI design and layout code

    Applying semantic web technologies to knowledge sharing in aerospace engineering

    Get PDF
    This paper details an integrated methodology to optimise Knowledge reuse and sharing, illustrated with a use case in the aeronautics domain. It uses Ontologies as a central modelling strategy for the Capture of Knowledge from legacy docu-ments via automated means, or directly in systems interfacing with Knowledge workers, via user-defined, web-based forms. The domain ontologies used for Knowledge Capture also guide the retrieval of the Knowledge extracted from the data using a Semantic Search System that provides support for multiple modalities during search. This approach has been applied and evaluated successfully within the aerospace domain, and is currently being extended for use in other domains on an increasingly large scale

    Development of a food antioxidant complex of plant origin

    Get PDF
    Розроблено антиоксидантний комплекс для використання в оліях, жирах та продуктах харчування, що потребують збагачення біологічно активними речовинами рослинного походження. Досліджено раціональні умови одержан- ня водно-етанольних екстрактів із рослинної сировини: кори дубу, листя евкаліпту та листя зеленого чаю. Отримано антиоксидант, що здатен запобігати окисненню жировмісних продуктів, зберігаючи їх високу поживну цінність. Антиоксидантні речовини природного походження дозволять створювати збалансовані за складом продукти з підвищеним строком придатності із збереженням початкового природного складу та структури компонентів. Також розроблений антиоксидант є додатковим джерелом речовин, що допомагають організму боротися з вільними радикалами, що утворюються внаслідок фізичних та психічних навантажень. До складу антиоксидантів рослинного походжен- ня входять антиоксидантні вітаміни (токофероли та аскорбінова кислота), рослинні феноли та тиолові антиоксиданти (глутатіон, ліпоєва кислота), мікроелементи. Ці складові беруть участь у процесах гальмування окиснення. Також до таких антиоксидантів входять селен, цинк, фолати та інші речовини. Для планування експериментальних досліджень для кожного виду рослинної сировини стосовно антиоксидантної активності виділених речовин використано повний факторний експеримент першого порядку. Виявлено синергізм дії антиоксидантних речовин при одночасному використанні екстрактів з кори дубу, листя евкаліпта, листя зеленого чаю. Розроблений антиоксидант підвищує період індукції модельної речовини (олії соняшникової) у 2,7 рази, тоді як під час використання антиоксидантів окремо з кожного виду рослин найкращий показник збільшення періоду індукції склав 1,9. Отже, розроблений антиоксидант здатен сприяти збереженню якості і безпечності жировмісних продуктів харчування. Використання даного антиоксиданту може бути запропоновано для продуктів харчування людей, що потребують додаткового введення антиоксидантів та біологічно активних речовин до раціону харчування. Зокрема, це важливо для спортсменів

    Toward Entity-Aware Search

    Get PDF
    As the Web has evolved into a data-rich repository, with the standard "page view," current search engines are becoming increasingly inadequate for a wide range of query tasks. While we often search for various data "entities" (e.g., phone number, paper PDF, date), today's engines only take us indirectly to pages. In my Ph.D. study, we focus on a novel type of Web search that is aware of data entities inside pages, a significant departure from traditional document retrieval. We study the various essential aspects of supporting entity-aware Web search. To begin with, we tackle the core challenge of ranking entities, by distilling its underlying conceptual model Impression Model and developing a probabilistic ranking framework, EntityRank, that is able to seamlessly integrate both local and global information in ranking. We also report a prototype system built to show the initial promise of the proposal. Then, we aim at distilling and abstracting the essential computation requirements of entity search. From the dual views of reasoning--entity as input and entity as output, we propose a dual-inversion framework, with two indexing and partition schemes, towards efficient and scalable query processing. Further, to recognize more entity instances, we study the problem of entity synonym discovery through mining query log data. The results we obtained so far have shown clear promise of entity-aware search, in its usefulness, effectiveness, efficiency and scalability
    corecore