22,500 research outputs found

    Faster Query Answering in Probabilistic Databases using Read-Once Functions

    Full text link
    A boolean expression is in read-once form if each of its variables appears exactly once. When the variables denote independent events in a probability space, the probability of the event denoted by the whole expression in read-once form can be computed in polynomial time (whereas the general problem for arbitrary expressions is #P-complete). Known approaches to checking read-once property seem to require putting these expressions in disjunctive normal form. In this paper, we tell a better story for a large subclass of boolean event expressions: those that are generated by conjunctive queries without self-joins and on tuple-independent probabilistic databases. We first show that given a tuple-independent representation and the provenance graph of an SPJ query plan without self-joins, we can, without using the DNF of a result event expression, efficiently compute its co-occurrence graph. From this, the read-once form can already, if it exists, be computed efficiently using existing techniques. Our second and key contribution is a complete, efficient, and simple to implement algorithm for computing the read-once forms (whenever they exist) directly, using a new concept, that of co-table graph, which can be significantly smaller than the co-occurrence graph.Comment: Accepted in ICDT 201

    Investigating Rumor Propagation with TwitterTrails

    Get PDF
    Social media have become part of modern news reporting, used by journalists to spread information and find sources, or as a news source by individuals. The quest for prominence and recognition on social media sites like Twitter can sometimes eclipse accuracy and lead to the spread of false information. As a way to study and react to this trend, we introduce {\sc TwitterTrails}, an interactive, web-based tool ({\tt twittertrails.com}) that allows users to investigate the origin and propagation characteristics of a rumor and its refutation, if any, on Twitter. Visualizations of burst activity, propagation timeline, retweet and co-retweeted networks help its users trace the spread of a story. Within minutes {\sc TwitterTrails} will collect relevant tweets and automatically answer several important questions regarding a rumor: its originator, burst characteristics, propagators and main actors according to the audience. In addition, it will compute and report the rumor's level of visibility and, as an example of the power of crowdsourcing, the audience's skepticism towards it which correlates with the rumor's credibility. We envision {\sc TwitterTrails} as valuable tool for individual use, but we especially for amateur and professional journalists investigating recent and breaking stories. Further, its expanding collection of investigated rumors can be used to answer questions regarding the amount and success of misinformation on Twitter.Comment: 10 pages, 8 figures, under revie

    Grouping business news stories based on salience of named entities

    Get PDF
    In news aggregation systems focused on broad news domains, certain stories may appear in multiple articles. Depending on the relative importance of the story, the number of versions can reach dozens or hundreds within a day. The text in these versions may be nearly identical or quite different. Linking multiple versions of a story into a single group brings several important benefits to the end-user—reducing the cognitive load on the reader, as well as signaling the relative importance of the story. We present a grouping algorithm, and explore several vector-based representations of input documents: from a baseline using keywords, to a method using salience—a measure of importance of named entities in the text. We demonstrate that features beyond keywords yield substantial improvements, verified on a manually-annotated corpus of business news stories.Peer reviewe

    What May Visualization Processes Optimize?

    Full text link
    In this paper, we present an abstract model of visualization and inference processes and describe an information-theoretic measure for optimizing such processes. In order to obtain such an abstraction, we first examined six classes of workflows in data analysis and visualization, and identified four levels of typical visualization components, namely disseminative, observational, analytical and model-developmental visualization. We noticed a common phenomenon at different levels of visualization, that is, the transformation of data spaces (referred to as alphabets) usually corresponds to the reduction of maximal entropy along a workflow. Based on this observation, we establish an information-theoretic measure of cost-benefit ratio that may be used as a cost function for optimizing a data visualization process. To demonstrate the validity of this measure, we examined a number of successful visualization processes in the literature, and showed that the information-theoretic measure can mathematically explain the advantages of such processes over possible alternatives.Comment: 10 page

    Market shaping as an answer to ambiguities. The case of credit derivatives.

    Get PDF
    Building on Smith (1989), we describe the social processes surrounding a new financial OTC derivatives market, the market for credit derivatives. We show that in contradiction to more traditional derivatives, credit derivatives generate ambiguities of a cognitive and political nature. By conducting an in-depth longitudinal qualitative study from 1996 to 2004, we document the efforts made by the promoters of the market to alleviate these ambiguities and show how the size of resources needed results in the leadership of the most powerful. We thus provide a socially based explanation for the concentration and lack of transparency of the market. Our research exemplifies the contradictions between the rhetorical justification of financial innovations provided by financial theory and the empirical realities of a modern derivative market. It suggests that the actual structure of the market might best be understood by paying attention to the way different cognitive and political communities react to these contradictions. [ABSTRACT FROM AUTHOR]Credit derivatives; Social processes; Derivative securities; Over-the-counter markets; Construction sociale d'un marché financier;
    • …
    corecore