5,342 research outputs found

    SODA: Generating SQL for Business Users

    Full text link
    The purpose of data warehouses is to enable business analysts to make better decisions. Over the years the technology has matured and data warehouses have become extremely successful. As a consequence, more and more data has been added to the data warehouses and their schemas have become increasingly complex. These systems still work great in order to generate pre-canned reports. However, with their current complexity, they tend to be a poor match for non tech-savvy business analysts who need answers to ad-hoc queries that were not anticipated. This paper describes the design, implementation, and experience of the SODA system (Search over DAta Warehouse). SODA bridges the gap between the business needs of analysts and the technical complexity of current data warehouses. SODA enables a Google-like search experience for data warehouses by taking keyword queries of business users and automatically generating executable SQL. The key idea is to use a graph pattern matching algorithm that uses the metadata model of the data warehouse. Our results with real data from a global player in the financial services industry show that SODA produces queries with high precision and recall, and makes it much easier for business users to interactively explore highly-complex data warehouses.Comment: VLDB201

    On the Role of Social Identity and Cohesion in Characterizing Online Social Communities

    Get PDF
    Two prevailing theories for explaining social group or community structure are cohesion and identity. The social cohesion approach posits that social groups arise out of an aggregation of individuals that have mutual interpersonal attraction as they share common characteristics. These characteristics can range from common interests to kinship ties and from social values to ethnic backgrounds. In contrast, the social identity approach posits that an individual is likely to join a group based on an intrinsic self-evaluation at a cognitive or perceptual level. In other words group members typically share an awareness of a common category membership. In this work we seek to understand the role of these two contrasting theories in explaining the behavior and stability of social communities in Twitter. A specific focal point of our work is to understand the role of these theories in disparate contexts ranging from disaster response to socio-political activism. We extract social identity and social cohesion features-of-interest for large scale datasets of five real-world events and examine the effectiveness of such features in capturing behavioral characteristics and the stability of groups. We also propose a novel measure of social group sustainability based on the divergence in group discussion. Our main findings are: 1) Sharing of social identities (especially physical location) among group members has a positive impact on group sustainability, 2) Structural cohesion (represented by high group density and low average shortest path length) is a strong indicator of group sustainability, and 3) Event characteristics play a role in shaping group sustainability, as social groups in transient events behave differently from groups in events that last longer

    Term-Specific Eigenvector-Centrality in Multi-Relation Networks

    Get PDF
    Fuzzy matching and ranking are two information retrieval techniques widely used in web search. Their application to structured data, however, remains an open problem. This article investigates how eigenvector-centrality can be used for approximate matching in multi-relation graphs, that is, graphs where connections of many different types may exist. Based on an extension of the PageRank matrix, eigenvectors representing the distribution of a term after propagating term weights between related data items are computed. The result is an index which takes the document structure into account and can be used with standard document retrieval techniques. As the scheme takes the shape of an index transformation, all necessary calculations are performed during index tim

    RDF Querying

    Get PDF
    Reactive Web systems, Web services, and Web-based publish/ subscribe systems communicate events as XML messages, and in many cases require composite event detection: it is not sufficient to react to single event messages, but events have to be considered in relation to other events that are received over time. Emphasizing language design and formal semantics, we describe the rule-based query language XChangeEQ for detecting composite events. XChangeEQ is designed to completely cover and integrate the four complementary querying dimensions: event data, event composition, temporal relationships, and event accumulation. Semantics are provided as model and fixpoint theories; while this is an established approach for rule languages, it has not been applied for event queries before
    • ā€¦
    corecore