84 research outputs found

    WHISK: Web Hosted Information into Summarized Knowledge

    Get PDF
    Today’s online content increases at an alarmingly rate which exceeds users’ ability to consume such content. Modern search techniques allow users to enter keyword queries to find content they wish to see. However, such techniques break down when users freely browse the internet without knowing exactly what they want. Users may have to invest an unnecessarily long time reading content to see if they are interested in it. Automatic text summarization helps relieve this problem by creating synopses that significantly reduce the text while preserving the key points. Steffen Lyngbaek created the SPORK summarization pipeline to solve the content overload in Reddit comment threads. Lyngbaek adapted the Opinosis graph model for extractive summarization and combined it with agglomerative hierarchical clustering and the Smith-Waterman algorithm to perform multi-document summarization on Reddit comments.This thesis presents WHISK as a pipeline for general multi-document text summarization based on SPORK. A generic data model in WHISK allows creating new drivers for different platforms to work with the pipeline. In addition to the existing Opinosis graph model adapted in SPORK, WHISK introduces two simplified graph models for the pipeline. The simplified models removes unnecessary restrictions inherited from Opinosis graph’s abstractive summarization origins. Performance measurements and a study with Digital Democracy compare the two new graph models against the Opinosis graph model. Additionally, the study evaluates WHISK’s ability to generate pull quotes from political discussions as summaries

    Answering Object Queries over Knowledge Bases with Expressive Underlying Description Logics

    Get PDF
    Many information sources can be viewed as collections of objects and descriptions about objects. The relationship between objects is often characterized by a set of constraints that semantically encode background knowledge of some domain. The most straightforward and fundamental way to access information in these repositories is to search for objects that satisfy certain selection criteria. This work considers a description logics (DL) based representation of such information sources and object queries, which allows for automated reasoning over the constraints accompanying objects. Formally, a knowledge base K=(T, A) captures constraints in the terminology (a TBox) T, and objects with their descriptions in the assertions (an ABox) A, using some DL dialect L. In such a setting, object descriptions are L-concepts and object identifiers correspond to individual names occurring in K. Correspondingly, object queries are the well known problem of instance retrieval in the underlying DL knowledge base K, which returns the identifiers of qualifying objects. This work generalizes instance retrieval over knowledge bases to provide users with answers in which both identifiers and descriptions of qualifying objects are given. The proposed query paradigm, called assertion retrieval, is favoured over instance retrieval since it provides more informative answers to users. A more compelling reason is related to performance: assertion retrieval enables a transfer of basic relational database techniques, such as caching and query rewriting, in the context of an assertion retrieval algebra. The main contributions of this work are two-fold: one concerns optimizing the fundamental reasoning task that underlies assertion retrieval, namely, instance checking, and the other establishes a query compilation framework based on the assertion retrieval algebra. The former is necessary because an assertion retrieval query can entail a large volume of instance checking requests in the form of K|= a:C, where "a" is an individual name and "C" is a L-concept. This work thus proposes a novel absorption technique, ABox absorption, to improve instance checking. ABox absorption handles knowledge bases that have an expressive underlying dialect L, for instance, that requires disjunctive knowledge. It works particularly well when knowledge bases contain a large number of concrete domain concepts for object descriptions. This work further presents a query compilation framework based on the assertion retrieval algebra to make assertion retrieval more practical. In the framework, a suite of rewriting rules is provided to generate a variety of query plans, with a focus on plans that avoid reasoning w.r.t. the background knowledge bases when sufficient cached results of earlier requests exist. ABox absorption and the query compilation framework have been implemented in a prototypical system, dubbed CARE Assertion Retrieval Engine (CARE). CARE also defines a simple yet effective cost model to search for the best plan generated by query rewriting. Empirical studies of CARE have shown that the proposed techniques in this work make assertion retrieval a practical application over a variety of domains

    Explaining and Predicting Abnormal Expenses at Large Scale using Knowledge Graph based Reasoning

    Get PDF
    International audienceGlobal business travel spend topped record-breaking 1.2TrillionUSDin2015,andwillreach1.2 Trillion USD in 2015, and will reach 1.6 Trillion by 2020 according to the Global Business Travel Association, the world's premier business travel and meetings trade organization. Existing expenses systems are designed for reporting expenses, their type and amount over pre-defined views such as time period, service or employee group. However such systems do not aim at systematically detecting abnormal expenses, and more importantly explaining their causes. Therefore deriving any actionable insight for optimising spending and saving from their analysis is time-consuming, cumbersome and often impossible. Towards this challenge we present AIFS, a system designed for expenses business owner and auditors. Our system is manipulating and combining semantic web and machine learning technologies for (i) identifying, (ii) explaining and (iii) predicting abnormal expenses claim by employees of large organisations. Our prototype of semantics-aware employee expenses analytics and reasoning, experimented with 191, 346 unique Accenture employees in 2015, has demonstrated scalability and accuracy for the tasks of explaining and predicting abnormal expenses

    Joint Learning of Word and Label Embeddings for Sequence Labelling in Spoken Language Understanding

    Full text link
    We propose an architecture to jointly learn word and label embeddings for slot filling in spoken language understanding. The proposed approach encodes labels using a combination of word embeddings and straightforward word-label association from the training data. Compared to the state-of-the-art methods, our approach does not require label embeddings as part of the input and therefore lends itself nicely to a wide range of model architectures. In addition, our architecture computes contextual distances between words and labels to avoid adding contextual windows, thus reducing memory footprint. We validate the approach on established spoken dialogue datasets and show that it can achieve state-of-the-art performance with much fewer trainable parameters.Comment: Accepted for publication at ASRU 201

    Preparation and imaging of intravascular high-frequency transducer

    Get PDF
    Intravascular ultrasound (IVUS) imaging is by far the most favorable imaging modality for coronary artery evaluation. IVUS transducer design and fabrication, a key technology for intravascular ultrasound imaging, has a significant impact on the performance of the imaging results. Herein, a 35-MHz side-looking IVUS transducer probe was developed. With a small aperture of 0.40 mm × 0.40 mm, the transducer exhibited a very wide -6 dB bandwidth of 85% and a very low insertion loss of -12 dB. Further, the in vitro IVUS imaging of a porcine coronary artery was performed to clearly display the vessel wall structure while the corresponding color-coded graph was constructed successfully to distinguish necrotic core and fibrous plaque via image processing. The results demonstrated that the imaging performance of the optimized design transducer performs favorably

    The impact on the soil microbial community and enzyme activity of two earthworm species during the bioremediation of pentachlorophenol-contaminated soils

    Get PDF
    The ecological effect of earthworms on the fate of soil pentachlorophenol (PCP) differs with species. This study addressed the roles and mechanisms by which two earthworm species (epigeic Eisenia fetida and endogeic Amynthas robustus E. Perrier) affect the soil microbial community and enzyme activity during the bioremediation of PCP-contaminated soils. A. robustus removed more soil PCP than did E. foetida. A. robustus improved nitrogen utilisation efficiency and soil oxidation more than did E. foetida, whereas the latter promoted the organic matter cycle in the soil. Both earthworm species significantly increased the amount of cultivable bacteria and actinomyces in soils, enhancing the utilisation rate of the carbon source (i.e. carbohydrates, carboxyl acids, and amino acids) and improving the richness and evenness of the soil microbial community. Additionally, earthworm treatment optimized the soil microbial community and increased the amount of the PCP-4-monooxygenase gene. Phylogenic classification revealed stimulation of indigenous PCP bacterial degraders, as assigned to the families Flavobacteriaceae, Pseudomonadaceae and Sphingobacteriacea, by both earthworms. A. robustus and E. foetida specifically promoted Comamonadaceae and Moraxellaceae PCP degraders, respectively

    Personalizing Actions in Context for Risk Management using Semantic Web Technologies

    Get PDF
    International audienceThe process of managing risks of client contracts is manual and resource-consuming, particularly so for Fortune 500 companies. As an example, Accenture assesses the risk of eighty thousand contracts every year. For each contract, different types of data will be consolidated from many sources and used to compute its risk tier. For high-risk tier contracts, a Quality Assurance Director (QAD) is assigned to mitigate or even prevent the risk. The QAD gathers and selects the recommended actions during regular portfolio review meetings to enable leadership to take the appropriate actions. In this paper, we propose to automatically personalize and contextualize actions to improve the efficacy. Our approach integrates enterprise and external data into a knowledge graph and interprets actions based on QADs' profiles through semantic reasoning over this knowledge graph. User studies showed that QADs could efficiently select actions that better mitigate the risk than the existing approach
    corecore