708 research outputs found

    MinoanER: Schema-Agnostic, Non-Iterative, Massively Parallel Resolution of Web Entities

    Get PDF
    Entity Resolution (ER) aims to identify different descriptions in various Knowledge Bases (KBs) that refer to the same entity. ER is challenged by the Variety, Volume and Veracity of entity descriptions published in the Web of Data. To address them, we propose the MinoanER framework that simultaneously fulfills full automation, support of highly heterogeneous entities, and massive parallelization of the ER process. MinoanER leverages a token-based similarity of entities to define a new metric that derives the similarity of neighboring entities from the most important relations, as they are indicated only by statistics. A composite blocking method is employed to capture different sources of matching evidence from the content, neighbors, or names of entities. The search space of candidate pairs for comparison is compactly abstracted by a novel disjunctive blocking graph and processed by a non-iterative, massively parallel matching algorithm that consists of four generic, schema-agnostic matching rules that are quite robust with respect to their internal configuration. We demonstrate that the effectiveness of MinoanER is comparable to existing ER tools over real KBs exhibiting low Variety, but it outperforms them significantly when matching KBs with high Variety.Comment: Presented at EDBT 2001

    Optimal joint path computation and rate allocation for real-time traffic

    Get PDF
    Computing network paths under worst-case delay constraints has been the subject of abundant literature in the past two decades. Assuming Weighted Fair Queueing scheduling at the nodes, this translates to computing paths and reserving rates at each link. The problem is NP-hard in general, even for a single path; hence polynomial-time heuristics have been proposed in the past, that either assume equal rates at each node, or compute the path heuristically and then allocate the rates optimally on the given path. In this paper we show that the above heuristics, albeit finding optimal solutions quite often, can lead to failing of paths at very low loads, and that this could be avoided by solving the problem, i.e., path computation and rate allocation, jointly at optimality. This is possible by modeling the problem as a mixed-integer second-order cone program and solving it optimally in split-second times for relatively large networks on commodity hardware; this approach can also be easily turned into a heuristic one, trading a negligible increase in blocking probability for one order of magnitude of computation time. Extensive simulations show that these methods are feasible in today's ISPs networks and they significantly outperform the existing schemes in terms of blocking probability

    Spatial variability of soil structure and its impact on transport processes and some associated land qualities

    Get PDF
    This thesis treats the impact of soil spatial variability on spatial variability of simulated land qualities. A sequence of procedures that were done to determine this impact is described in chapters 2 and 3. The subchapters correspond to seven manuscripts that either have appeared in or have been submitted to peer-reviewed journals.In chapter 2 attention is paid to methods to inventory spatial variability of soil characteristics related to the structure of the soil. A method was developed to construct confidence intervals to point count results in case of spatial dependency of the point observations on a soil thin section. It was concluded, that confidence intervals obtained following the traditional method by assuming all observations independent, will be much narrower than those where spatial dependency structure is taken into account.Two other papers in chapter 2 describe a method to translate soil profile descriptions into soil physical input data for computer models that simulate solute flow. The concept of functional layers is introduced. A functional layer is a combination of soil layers showing comparable soil physical behaviour related to water flow. The functional layer approach was tested and accepted for examples of disturbed and thinly stratified soils by calculating functional properties of the layer under defined hydrological conditions. When functional layers are established, mapping the thickness, starting depth and type of functional layers provides spatial information about soil physical characteristics. In one paper in chapter 2 the number of necessary observations in this mapping procedure is optimized by application of geostatistical methods and a sequential sampling test.In chapter three the impact of variability of the structure of the soil on variability of crop yields and nitrate leaching is investigated. One paper describes a field scale empirical study where barley grain yield variability is correlated to variability of soil characteristics and simulated transpiration deficits. Simulation model inputs were obtained using the functional layer approach described in chapter 2. Regression functions based on simulated transpiration deficits only could explain 43% of the variance in yields, which suggested that variability of transpiration may be an important factor causing yield variability. This hypothesis was tested in a next paper in which remote sensing estimates of the leaf area index were used to obtain estimates of the potential transpiration with a high spatial accuracy. Incorporating space- and time series of the leaf area index into a crop growth model resulted in a prediction of yield variability that could explain 39% of measured variability. Variability of plant- available water, expressed by the actual transpiration, is an important factor causing yield variability.Two papers in chapter three describe how a combined solute flow and crop growth model was used to evaluate the spatial varying effect of fertilizing scenarios. 'Me spatial interpolation method Disjunctive Kriging was used to translate spatial variability of simulated nitrate leaching into maps of the probability that a threshold leaching concentration is exceeded. It was also investigated, whether the number of simulations could be minimized using Disjunctive CoKriging and available spatial information. It was concluded, that different soil units within one agricultural field showed a different leaching response and crop yield response to identical fertilizer treatments, and that yield variability will increase when fertilizer levels approach the level for maximal production

    Multi-source heterogeneous intelligence fusion

    Get PDF

    Named Entity Resolution in Personal Knowledge Graphs

    Full text link
    Entity Resolution (ER) is the problem of determining when two entities refer to the same underlying entity. The problem has been studied for over 50 years, and most recently, has taken on new importance in an era of large, heterogeneous 'knowledge graphs' published on the Web and used widely in domains as wide ranging as social media, e-commerce and search. This chapter will discuss the specific problem of named ER in the context of personal knowledge graphs (PKGs). We begin with a formal definition of the problem, and the components necessary for doing high-quality and efficient ER. We also discuss some challenges that are expected to arise for Web-scale data. Next, we provide a brief literature review, with a special focus on how existing techniques can potentially apply to PKGs. We conclude the chapter by covering some applications, as well as promising directions for future research.Comment: To appear as a book chapter by the same name in an upcoming (Oct. 2023) book `Personal Knowledge Graphs (PKGs): Methodology, tools and applications' edited by Tiwari et a

    POLIS: a probabilistic summarisation logic for structured documents

    Get PDF
    PhDAs the availability of structured documents, formatted in markup languages such as SGML, RDF, or XML, increases, retrieval systems increasingly focus on the retrieval of document-elements, rather than entire documents. Additionally, abstraction layers in the form of formalised retrieval logics have allowed developers to include search facilities into numerous applications, without the need of having detailed knowledge of retrieval models. Although automatic document summarisation has been recognised as a useful tool for reducing the workload of information system users, very few such abstraction layers have been developed for the task of automatic document summarisation. This thesis describes the development of an abstraction logic for summarisation, called POLIS, which provides users (such as developers or knowledge engineers) with a high-level access to summarisation facilities. Furthermore, POLIS allows users to exploit the hierarchical information provided by structured documents. The development of POLIS is carried out in a step-by-step way. We start by defining a series of probabilistic summarisation models, which provide weights to document-elements at a user selected level. These summarisation models are those accessible through POLIS. The formal definition of POLIS is performed in three steps. We start by providing a syntax for POLIS, through which users/knowledge engineers interact with the logic. This is followed by a definition of the logics semantics. Finally, we provide details of an implementation of POLIS. The final chapters of this dissertation are concerned with the evaluation of POLIS, which is conducted in two stages. Firstly, we evaluate the performance of the summarisation models by applying POLIS to two test collections, the DUC AQUAINT corpus, and the INEX IEEE corpus. This is followed by application scenarios for POLIS, in which we discuss how POLIS can be used in specific IR tasks
    • …
    corecore