11 research outputs found

    Leveraging Semantic Annotations to Link Wikipedia and News Archives

    No full text
    The incomprehensible amount of information available online has made it difficult to retrospect on past events. We propose a novel linking problem to connect excerpts from Wikipedia summarizing events to online news articles elaborating on them. To address the linking problem, we cast it into an information retrieval task by treating a given excerpt as a user query with the goal to retrieve a ranked list of relevant news articles. We find that Wikipedia excerpts often come with additional semantics, in their textual descriptions, representing the time, geolocations, and named entities involved in the event. Our retrieval model leverages text and semantic annotations as different dimensions of an event by estimating independent query models to rank documents. In our experiments on two datasets, we compare methods that consider different combinations of dimensions and find that the approach that leverages all dimensions suits our problem best

    Maximum Cardinality Popular Matchings in Strict Two-sided Preference Lists

    No full text
    We consider the problem of computing a maximum cardinality {\em popular} matching in a bipartite graph G = (\A\cup\B, E) where each vertex u \in \A\cup\B ranks its neighbors in a strict order of preference. This is the same as an instance of the {\em stable marriage} problem with incomplete lists. A matching MM^* is said to be popular if there is no matching MM such that more vertices are better off in MM than in MM^*. \smallskip Popular matchings have been extensively studied in the case of one-sided preference lists, i.e., only vertices of \A have preferences over their neighbors while vertices in \B have no preferences; polynomial time algorithms have been shown here to determine if a given instance admits a popular matching or not and if so, to compute one with maximum cardinality. It has very recently been shown that for two-sided preference lists, the problem of determining if a given instance admits a popular matching or not is NP-complete. However this hardness result assumes that preference lists have {\em ties}. When preference lists are {\em strict}, it is easy to show that popular matchings always exist since stable matchings always exist and they are popular. But the complexity of computing a maximum cardinality popular matching was unknown. In this paper we show an O(mn)O(mn) algorithm for this problem, where n = |\A| + |\B| and m=Em = |E|

    Diversifying Search Results Using Time

    No full text
    Getting an overview of a historic entity or event can be difficult in search results, especially if important dates concerning the entity or event are not known beforehand. For such information needs, users would benefit if returned results covered diverse dates, thus giving an overview of what has happened throughout history. Diversifying search results based on important dates can be a building block for applications, for instance, in digital humanities. Historians would thus be able to quickly explore longitudinal document collections by querying for entities or events without knowing associated important dates apriori. In this work, we describe an approach to diversify search results using temporal expressions (e.g., in the 1990s) from their contents. Our approach first identifies time intervals of interest to the given keyword query based on pseudo-relevant documents. It then re-ranks query results so as to maximize the coverage of identified time intervals. We present a novel and objective evaluation for our proposed approach. We test the effectiveness of our methods on the New York Times Annotated corpus and the Living Knowledge corpus, collectively consisting of around 6 million documents. Using history-oriented queries and encyclopedic resources we show that our method indeed is able to present search results diversified along time

    Finding Images of Rare and Ambiguous Entities

    No full text

    Symmetry Detection in Large Scale City Scans

    No full text
    In this report we present a novel method for detecting partial symmetries in very large point clouds of 3D city scans. Unlike previous work, which was limited to data sets of a few hundred megabytes maximum, our method scales to very large scenes. We map the detection problem to a nearestneighbor search in a low-dimensional feature space, followed by a cascade of tests for geometric clustering of potential matches. Our algorithm robustly handles noisy real-world scanner data, obtaining a recognition performance comparable to state-of-the-art methods. In practice, it scales linearly with the scene size and achieves a high absolute throughput, processing half a terabyte of raw scanner data over night on a dual socket commodity PC

    New Results for Non-preemptive Speed Scaling

    No full text
    We consider the speed scaling problem introduced in the seminal paper of Yao et al.. In this problem, a number of jobs, each with its own processing volume, release time, and deadline needs to be executed on a speed-scalable processor. The power consumption of this processor is P(s)=sαP(s) = s^\alpha, where ss is the processing speed, and α>1\alpha > 1 is a constant. The total energy consumption is power integrated over time, and the goal is to process all jobs while minimizing the energy consumption. The preemptive version of the problem, along with its many variants, has been extensively studied over the years. However, little is known about the non-preemptive version of the problem, except that it is strongly NP-hard and allows a constant factor approximation. Up until now, the (general) complexity of this problem is unknown. In the present paper, we study an important special case of the problem, where the job intervals form a laminar family, and present a quasipolynomial-time approximation scheme for it, thereby showing that (at least) this special case is not APX-hard, unless NPDTIME(2poly(logn))NP \subseteq DTIME(2^{poly(\log n)}). The second contribution of this work is a polynomial-time algorithm for the special case of equal-volume jobs, where previously only a 2α2^\alpha approximation was known. In addition, we show that two other special cases of this problem allow fully polynomial-time approximation schemes (FPTASs)

    Real-time Text Queries with Tunable Term Pair Indexes

    No full text
    Term proximity scoring is an established means in information retrieval for improving result quality of full-text queries. Integrating such proximity scores into efficient query processing, however, has not been equally well studied. Existing methods make use of precomputed lists of documents where tuples of terms, usually pairs, occur together, usually incurring a huge index size compared to term-only indexes. This paper introduces a joint framework for trading off index size and result quality, and provides optimization techniques for tuning precomputed indexes towards either maximal result quality or maximal query processing performance, given an upper bound for the index size. The framework allows to selectively materialize lists for pairs based on a query log to further reduce index size. Extensive experiments with two large text collections demonstrate runtime improvements of several orders of magnitude over existing text-based processing techniques with reasonable index sizes

    {MDL4BMF}: Minimum Description Length for Boolean Matrix Factorization

    No full text
    Matrix factorizations—where a given data matrix is approximated by a prod- uct of two or more factor matrices—are powerful data mining tools. Among other tasks, matrix factorizations are often used to separate global structure from noise. This, however, requires solving the ‘model order selection problem’ of determining where fine-grained structure stops, and noise starts, i.e., what is the proper size of the factor matrices. Boolean matrix factorization (BMF)—where data, factors, and matrix product are Boolean—has received increased attention from the data mining community in recent years. The technique has desirable properties, such as high interpretability and natural sparsity. However, so far no method for selecting the correct model order for BMF has been available. In this paper we propose to use the Minimum Description Length (MDL) principle for this task. Besides solving the problem, this well-founded approach has numerous benefits, e.g., it is automatic, does not require a likelihood function, is fast, and, as experiments show, is highly accurate. We formulate the description length function for BMF in general—making it applicable for any BMF algorithm. We discuss how to construct an appropriate encoding, starting from a simple and intuitive approach, we arrive at a highly efficient data-to-model based encoding for BMF. We extend an existing algorithm for BMF to use MDL to identify the best Boolean matrix factorization, analyze the complexity of the problem, and perform an extensive experimental evaluation to study its behavior

    Videoscapes: Exploring Unstructured Video Collections

    No full text

    Fast Tracking of Hand and Finger Articulations Using a Single Depth Camera

    No full text