255 research outputs found

    Distributive Join Strategy Based on Tuple Inversion

    Get PDF
    In this paper, we propose a new direction for distributive join operations. We assume that there will be a scalable distributed computer system in which many computers (processors) are connected through a communication network that can be in a LAN or as part of the Internet with sufficient bandwidth. A relational database is then distributed across this network of processors. However, in our approach, the distribution of the database is very fine-grained and is based on the Distributed Hash Table (DHT) concept. A tuple of a table is assigned to a specific processor by using a fair hash function applied to its key value. For each joinable attribute, an inverted file list is further generated and distributed again based on the DHT. This pre-distribution is done when the tuple enters the system and therefore does not require any distribution of data tuples on the fly when the join is executed. When a join operation request is broadcast, each processor performs a local join and the results are sent back to a query processor which, in turn, merges the join results and returns them to the user. Note that the distribution of the DHT of the inverted file lists can be either pre-processed or distributed on the fly. If the lists are pre-processed and distributed, they have to be maintained. We evaluate our approach by comparing it empirically to two other approaches: the naive join method and the fully distributed join method. The results show a significantly higher performance of our method for a wide range of possible parameter

    Toward Entity-Aware Search

    Get PDF
    As the Web has evolved into a data-rich repository, with the standard "page view," current search engines are becoming increasingly inadequate for a wide range of query tasks. While we often search for various data "entities" (e.g., phone number, paper PDF, date), today's engines only take us indirectly to pages. In my Ph.D. study, we focus on a novel type of Web search that is aware of data entities inside pages, a significant departure from traditional document retrieval. We study the various essential aspects of supporting entity-aware Web search. To begin with, we tackle the core challenge of ranking entities, by distilling its underlying conceptual model Impression Model and developing a probabilistic ranking framework, EntityRank, that is able to seamlessly integrate both local and global information in ranking. We also report a prototype system built to show the initial promise of the proposal. Then, we aim at distilling and abstracting the essential computation requirements of entity search. From the dual views of reasoning--entity as input and entity as output, we propose a dual-inversion framework, with two indexing and partition schemes, towards efficient and scalable query processing. Further, to recognize more entity instances, we study the problem of entity synonym discovery through mining query log data. The results we obtained so far have shown clear promise of entity-aware search, in its usefulness, effectiveness, efficiency and scalability

    Metric and latticial medians

    Get PDF
    This paper presents the -linked- notions of metric and latticial medians and it explains what is the median procedure for the consensus problems, in particular in the case of the aggregation of linear orders. First we consider the medians of a v-tuple of arbitrary or particular binary relations.. Then we study in depth the difficult (in fact NP-difficult) problem of finding the median orders of a profile of linear orders. More generally, we consider the medians of v-tuples of elements of a semilattice and we describe the median semilattices, i.e. the semilattices were medians are easily computable.Ce texte présente les notions -reliées- de médianes métriques et latticielles et explique le rôle de la procédure médiane dans les problèmes de consensus, notamment dans le cas de l'agrégation d'ordres totaux.. Après avoir étudié les médianes d'un v-uple de relations binaires arbitraires ou particulières, on étudie en détail le problème -difficile (NP-difficile)- d'obtention des ordres médians d'un profil d'ordres totaux. Plus généralement on considère les médianes de v-uples d'éléments d'un demi-treillis (ou d'un treillis) et l'on décrit les demi-treillis à médianes,i.e. ceux où l'obtention des médianes est aisée

    Lattices and discrete methods in cooperative games and decisions

    Get PDF
    Questa tesi si pone l'obiettivo di presentare la teoria dei giochi, in particolare di quelli cooperativi, insieme alla teoria delle decisioni, inquadrandole formalmente in termini di matematica discreta. Si tratta di due campi dove l'indagine si origina idealmente da questioni applicative, e dove tuttavia sono sorti e sorgono problemi più tipicamente teorici che hanno interessato e interessano gli ambienti matematico e informatico. Anche se i contributi iniziali sono stati spesso formulati in ambito continuo e utilizzando strumenti tipici di teoria della misura, tuttavia oggi la scelta di modelli e metodi discreti appare la più idonea. L'idea generale è quindi quella di guardare fin da subito al complesso dei modelli e dei risultati che si intendono presentare attraverso la lente della teoria dei reticoli. Ciò consente di avere una visione globale più nitida e di riuscire agilmente ad intrecciare il discorso considerando congiuntamente la teoria dei giochi e quella delle decisioni. Quindi, dopo avere introdotto gli strumenti necessari, si considerano modelli e problemi con il fine preciso di analizzare dapprima risultati storici e solidi, proseguendo poi verso situazioni più recenti, più complesse e nelle quali i risultati raggiunti possono suscitare perplessità. Da ultimo, vengono presentate alcune questioni aperte ed associati spunti per la ricerca

    Cacti and filtered distributive laws

    Full text link
    Motivated by the second author's construction of a classifying space for the group of pure symmetric automorphisms of a free product, we introduce and study a family of topological operads, the operads of based cacti, defined for every pointed topological space (Y,∙)(Y,\bullet). These operads also admit linear versions, which are defined for every augmented graded cocommutative coalgebra CC. We show that the homology of the topological operad of based YY-cacti is the linear operad of based H∗(Y)H_*(Y)-cacti. In addition, we show that for every coalgebra CC the operad of based CC-cacti is Koszul. To prove the latter result, we use the criterion of Koszulness for operads due to the first author, utilising the notion of a filtered distributive law between two quadratic operads. We also present a new proof of that criterion which works over the ground field of arbitrary characteristic.Comment: 30 page

    Efficient Incremental Data Analysis

    Get PDF
    Many data-intensive applications require real-time analytics over streaming data. In a growing number of domains -- sensor network monitoring, social web applications, clickstream analysis, high-frequency algorithmic trading, and fraud detections to name a few -- applications continuously monitor stream events to promptly react to certain data conditions. These applications demand responsive analytics even when faced with high volume and velocity of incoming changes, large numbers of users, and complex processing requirements. Developing suitable online analytics engine that meets these requirements is challenging. In this thesis, we study techniques for efficient online processing of complex analytical queries, ranging from standard database queries to complex machine learning and digital signal processing workflows. First, we focus on the problem of efficient incremental computation for database queries. We have developed a system, called DBToaster, that compiles declarative queries into high-performance stream processing engines that keep query results (views) fresh at very high update rates. At the heart of our system is a recursive query compilation algorithm that materializes a set of supporting higher-order delta views to achieve a substantially lower view maintenance cost. We study the trade-offs between single-tuple and batch incremental processing in local execution, and we present a novel approach for compiling view maintenance code into data-parallel programs optimized for distributed execution. DBToaster supports millions of complete view refreshes per second for a broad range of queries and outperforms commercial database and stream engines by orders of magnitude. We also study the incremental computation for queries written as iterative linear algebra, which can capture many machine learning and scientific calculations. We have developed a framework, called LINVIEW, for capturing deltas of linear algebra programs and understanding their computational cost. Linear algebra operations tend to cause an avalanche effect where even very local changes to the input matrices spread out and infect all of the intermediate results and the final view, causing incremental view maintenance to lose its performance benefit over re-evaluation. We develop techniques based on matrix factorizations to contain such epidemics of change and make incremental view maintenance of linear algebra practical and usually substantially cheaper than re-evaluation. We show, both analytically and experimentally, the usefulness of these techniques when applied to standard analytics tasks. Our last research question concerns the integration of general-purpose query processors and domain-specific operations to enable deep data exploration in both online and offline analysis. We advocate a deep integration of signal processing operations and general-purpose query processors. We demonstrate that in-situ processing of tempo-relational and signal data through a unified query language empowers users to express end-to-end workflows more succinctly inside one system while at the same time offering orders of magnitude better performance than existing popular data management systems

    LINVIEW: Incremental View Maintenance for Complex Analytical Queries

    Full text link
    Many analytics tasks and machine learning problems can be naturally expressed by iterative linear algebra programs. In this paper, we study the incremental view maintenance problem for such complex analytical queries. We develop a framework, called LINVIEW, for capturing deltas of linear algebra programs and understanding their computational cost. Linear algebra operations tend to cause an avalanche effect where even very local changes to the input matrices spread out and infect all of the intermediate results and the final view, causing incremental view maintenance to lose its performance benefit over re-evaluation. We develop techniques based on matrix factorizations to contain such epidemics of change. As a consequence, our techniques make incremental view maintenance of linear algebra practical and usually substantially cheaper than re-evaluation. We show, both analytically and experimentally, the usefulness of these techniques when applied to standard analytics tasks. Our evaluation demonstrates the efficiency of LINVIEW in generating parallel incremental programs that outperform re-evaluation techniques by more than an order of magnitude.Comment: 14 pages, SIGMO

    Constructive Canonicity of Inductive Inequalities

    Get PDF
    We prove the canonicity of inductive inequalities in a constructive meta-theory, for classes of logics algebraically captured by varieties of normal and regular lattice expansions. This result encompasses Ghilardi-Meloni's and Suzuki's constructive canonicity results for Sahlqvist formulas and inequalities, and is based on an application of the tools of unified correspondence theory. Specifically, we provide an alternative interpretation of the language of the algorithm ALBA for lattice expansions: nominal and conominal variables are respectively interpreted as closed and open elements of canonical extensions of normal/regular lattice expansions, rather than as completely join-irreducible and meet-irreducible elements of perfect normal/regular lattice expansions. We show the correctness of ALBA with respect to this interpretation. From this fact, the constructive canonicity of the inequalities on which ALBA succeeds follows by an adaptation of the standard argument. The claimed result then follows as a consequence of the success of ALBA on inductive inequalities
    • …