7 research outputs found

    Building and Maintaining Halls of Fame over a Database

    Full text link
    Halls of Fame are fascinating constructs. They represent the elite of an often very large amount of entities---persons, companies, products, countries etc. Beyond their practical use as static rankings, changes to them are particularly interesting---for decision making processes, as input to common media or novel narrative science applications, or simply consumed by users. In this work, we aim at detecting events that can be characterized by changes to a Hall of Fame ranking in an automated way. We describe how the schema and data of a database can be used to generate Halls of Fame. In this database scenario, by Hall of Fame we refer to distinguished tuples; entities, whose characteristics set them apart from the majority. We define every Hall of Fame as one specific instance of an SQL query, such that a change in its result is considered a noteworthy event. Identified changes (i.e., events) are ranked using lexicographic tradeoffs over event and query properties and presented to users or fed in higher-level applications. We have implemented a full-fledged prototype system that uses either database triggers or a Java based middleware for event identification. We report on an experimental evaluation using a real-world dataset of basketball statistics

    Leveraging Semantic Annotations to Link Wikipedia and News Archives

    No full text
    The incomprehensible amount of information available online has made it difficult to retrospect on past events. We propose a novel linking problem to connect excerpts from Wikipedia summarizing events to online news articles elaborating on them. To address the linking problem, we cast it into an information retrieval task by treating a given excerpt as a user query with the goal to retrieve a ranked list of relevant news articles. We find that Wikipedia excerpts often come with additional semantics, in their textual descriptions, representing the time, geolocations, and named entities involved in the event. Our retrieval model leverages text and semantic annotations as different dimensions of an event by estimating independent query models to rank documents. In our experiments on two datasets, we compare methods that consider different combinations of dimensions and find that the approach that leverages all dimensions suits our problem best

    Diversifying Search Results Using Time

    No full text
    Getting an overview of a historic entity or event can be difficult in search results, especially if important dates concerning the entity or event are not known beforehand. For such information needs, users would benefit if returned results covered diverse dates, thus giving an overview of what has happened throughout history. Diversifying search results based on important dates can be a building block for applications, for instance, in digital humanities. Historians would thus be able to quickly explore longitudinal document collections by querying for entities or events without knowing associated important dates apriori. In this work, we describe an approach to diversify search results using temporal expressions (e.g., in the 1990s) from their contents. Our approach first identifies time intervals of interest to the given keyword query based on pseudo-relevant documents. It then re-ranks query results so as to maximize the coverage of identified time intervals. We present a novel and objective evaluation for our proposed approach. We test the effectiveness of our methods on the New York Times Annotated corpus and the Living Knowledge corpus, collectively consisting of around 6 million documents. Using history-oriented queries and encyclopedic resources we show that our method indeed is able to present search results diversified along time

    New Results for Non-preemptive Speed Scaling

    No full text
    We consider the speed scaling problem introduced in the seminal paper of Yao et al.. In this problem, a number of jobs, each with its own processing volume, release time, and deadline needs to be executed on a speed-scalable processor. The power consumption of this processor is P(s)=sαP(s) = s^\alpha, where ss is the processing speed, and α>1\alpha > 1 is a constant. The total energy consumption is power integrated over time, and the goal is to process all jobs while minimizing the energy consumption. The preemptive version of the problem, along with its many variants, has been extensively studied over the years. However, little is known about the non-preemptive version of the problem, except that it is strongly NP-hard and allows a constant factor approximation. Up until now, the (general) complexity of this problem is unknown. In the present paper, we study an important special case of the problem, where the job intervals form a laminar family, and present a quasipolynomial-time approximation scheme for it, thereby showing that (at least) this special case is not APX-hard, unless NPDTIME(2poly(logn))NP \subseteq DTIME(2^{poly(\log n)}). The second contribution of this work is a polynomial-time algorithm for the special case of equal-volume jobs, where previously only a 2α2^\alpha approximation was known. In addition, we show that two other special cases of this problem allow fully polynomial-time approximation schemes (FPTASs)

    Learning Tuple Probabilities in Probabilistic Databases

    No full text
    Learning the parameters of complex probabilistic-relational models from labeled training data is a standard technique in machine learning, which has been intensively studied in the subfield of Statistical Relational Learning (SRL), but---so far---this is still an under-investigated topic in the context of Probabilistic Databases (PDBs). In this paper, we focus on learning the probability values of base tuples in a PDB from query answers, the latter of which are represented as labeled lineage formulas. Specifically, we consider labels in the form of pairs, each consisting of a Boolean lineage formula and a marginal probability that comes attached to the corresponding query answer. The resulting learning problem can be viewed as the inverse problem to confidence computations in PDBs: given a set of labeled query answers, learn the probability values of the base tuples, such that the marginal probabilities of the query answers again yield in the assigned probability labels. We analyze the learning problem from a theoretical perspective, devise two optimization-based objectives, and provide an efficient algorithm (based on Stochastic Gradient Descent) for solving these objectives. Finally, we conclude this work by an experimental evaluation on three real-world and one synthetic dataset, while competing with various techniques from SRL, reasoning in information extraction, and optimization

    Fast Tracking of Hand and Finger Articulations Using a Single Depth Camera

    No full text

    Videoscapes: Exploring Unstructured Video Collections

    No full text
    corecore