22,500 research outputs found
Faster Query Answering in Probabilistic Databases using Read-Once Functions
A boolean expression is in read-once form if each of its variables appears
exactly once. When the variables denote independent events in a probability
space, the probability of the event denoted by the whole expression in
read-once form can be computed in polynomial time (whereas the general problem
for arbitrary expressions is #P-complete). Known approaches to checking
read-once property seem to require putting these expressions in disjunctive
normal form. In this paper, we tell a better story for a large subclass of
boolean event expressions: those that are generated by conjunctive queries
without self-joins and on tuple-independent probabilistic databases. We first
show that given a tuple-independent representation and the provenance graph of
an SPJ query plan without self-joins, we can, without using the DNF of a result
event expression, efficiently compute its co-occurrence graph. From this, the
read-once form can already, if it exists, be computed efficiently using
existing techniques. Our second and key contribution is a complete, efficient,
and simple to implement algorithm for computing the read-once forms (whenever
they exist) directly, using a new concept, that of co-table graph, which can be
significantly smaller than the co-occurrence graph.Comment: Accepted in ICDT 201
Investigating Rumor Propagation with TwitterTrails
Social media have become part of modern news reporting, used by journalists
to spread information and find sources, or as a news source by individuals. The
quest for prominence and recognition on social media sites like Twitter can
sometimes eclipse accuracy and lead to the spread of false information. As a
way to study and react to this trend, we introduce {\sc TwitterTrails}, an
interactive, web-based tool ({\tt twittertrails.com}) that allows users to
investigate the origin and propagation characteristics of a rumor and its
refutation, if any, on Twitter. Visualizations of burst activity, propagation
timeline, retweet and co-retweeted networks help its users trace the spread of
a story. Within minutes {\sc TwitterTrails} will collect relevant tweets and
automatically answer several important questions regarding a rumor: its
originator, burst characteristics, propagators and main actors according to the
audience. In addition, it will compute and report the rumor's level of
visibility and, as an example of the power of crowdsourcing, the audience's
skepticism towards it which correlates with the rumor's credibility. We
envision {\sc TwitterTrails} as valuable tool for individual use, but we
especially for amateur and professional journalists investigating recent and
breaking stories. Further, its expanding collection of investigated rumors can
be used to answer questions regarding the amount and success of misinformation
on Twitter.Comment: 10 pages, 8 figures, under revie
Grouping business news stories based on salience of named entities
In news aggregation systems focused on broad news domains, certain stories may appear in multiple articles. Depending on the relative importance of the story, the number of versions can reach dozens or hundreds within a day. The text in these versions may be nearly identical or quite different. Linking multiple versions of a story into a single group brings several important benefits to the end-user—reducing the cognitive load on the reader, as well as signaling the relative importance of the story. We present a grouping algorithm, and explore several vector-based representations of input documents: from a baseline using keywords, to a method using salience—a measure of importance of named entities in the text. We demonstrate that features beyond keywords yield substantial improvements, verified on a manually-annotated corpus of business news stories.Peer reviewe
What May Visualization Processes Optimize?
In this paper, we present an abstract model of visualization and inference
processes and describe an information-theoretic measure for optimizing such
processes. In order to obtain such an abstraction, we first examined six
classes of workflows in data analysis and visualization, and identified four
levels of typical visualization components, namely disseminative,
observational, analytical and model-developmental visualization. We noticed a
common phenomenon at different levels of visualization, that is, the
transformation of data spaces (referred to as alphabets) usually corresponds to
the reduction of maximal entropy along a workflow. Based on this observation,
we establish an information-theoretic measure of cost-benefit ratio that may be
used as a cost function for optimizing a data visualization process. To
demonstrate the validity of this measure, we examined a number of successful
visualization processes in the literature, and showed that the
information-theoretic measure can mathematically explain the advantages of such
processes over possible alternatives.Comment: 10 page
Market shaping as an answer to ambiguities. The case of credit derivatives.
Building on Smith (1989), we describe the social processes surrounding a new financial OTC derivatives market, the market for credit derivatives. We show that in contradiction to more traditional derivatives, credit derivatives generate ambiguities of a cognitive and political nature. By conducting an in-depth longitudinal qualitative study from 1996 to 2004, we document the efforts made by the promoters of the market to alleviate these ambiguities and show how the size of resources needed results in the leadership of the most powerful. We thus provide a socially based explanation for the concentration and lack of transparency of the market. Our research exemplifies the contradictions between the rhetorical justification of financial innovations provided by financial theory and the empirical realities of a modern derivative market. It suggests that the actual structure of the market might best be understood by paying attention to the way different cognitive and political communities react to these contradictions. [ABSTRACT FROM AUTHOR]Credit derivatives; Social processes; Derivative securities; Over-the-counter markets; Construction sociale d'un marché financier;
- …