82,493 research outputs found

    Partial Answers for Unavailable Data Sources

    Get PDF
    Projet RODINMany heterogeneous database system products and prototypes exist today; they will soon be deployed in a wide variety of environments. All existing systems suffer from an {\em Achilles' heel}: if some sources are unavailable when accessed, these systems either silently ignore them or generate an error, i.e. they {\em ungraciously fail}. This behavior is improper in environments where there is a non-negligible probability that data sources cannot be accessed (e.g., Internet). In this paper, we propose a novel approach to this issue where, in presence of unavailable data sources, the answer to a query is a {\em partial answer}. A partial answer is itself a query that results from theo partial evaluation of the original query; it is composed of the data that have been obtained and processed during the evaluation and of a representation of the unfinished work to be done. Partial answers can be resubmitted to the system in order to obtain the final answer to the original query, or another partial answer. Additionally, the application program can extract information from a partial answer through the use of a secondary query. This secondary query is called a {\em parachute query}. In this paper we give a taxonomy of partial answers and parachute queries. We present algorithms for the evaluation of queries in presence of unavailable data sources, and we describe an implementation

    Performance Guarantees for Distributed Reachability Queries

    Get PDF
    In the real world a graph is often fragmented and distributed across different sites. This highlights the need for evaluating queries on distributed graphs. This paper proposes distributed evaluation algorithms for three classes of queries: reachability for determining whether one node can reach another, bounded reachability for deciding whether there exists a path of a bounded length between a pair of nodes, and regular reachability for checking whether there exists a path connecting two nodes such that the node labels on the path form a string in a given regular expression. We develop these algorithms based on partial evaluation, to explore parallel computation. When evaluating a query Q on a distributed graph G, we show that these algorithms possess the following performance guarantees, no matter how G is fragmented and distributed: (1) each site is visited only once; (2) the total network traffic is determined by the size of Q and the fragmentation of G, independent of the size of G; and (3) the response time is decided by the largest fragment of G rather than the entire G. In addition, we show that these algorithms can be readily implemented in the MapReduce framework. Using synthetic and real-life data, we experimentally verify that these algorithms are scalable on large graphs, regardless of how the graphs are distributed.Comment: VLDB201

    Scaling Heterogeneous Databases and the Design of Disco

    Get PDF
    Access to large numbers of data sources introduces new problems for users of heterogeneous distributed databases. End users and application programmers must deal with unavailable data sources. Database administrators must deal with incorporating new sources into the model. Database implementors must deal with the translation of queries between query languages and schemas. The Distributed Information Search COmponent (Disco) 1 addresses these problems. Query processing semantics are developed to process queries over data sources which do not return answers. Data modeling techniques manage connections to data sources. The component interface to data sources flexibly handles different query languages and translates queries. This paper describes (a) the distributed mediator architecture ofDisco, (b) its query processing semantics, (c) the data model and its modeling of data source connections, and (d) the interface to underlying data sources. 1

    Statistical modelling of international migration flows

    No full text
    The paper deals with uncertainty in estimating international migration flows for an interlinked system of countries. The related problems are discussed on the example of a dedicated model 'IMEM' (Integrated Model of European Migration). The IMEM is a hierarchical Bayesian model, which allows for combining data from different countries with meta-data on definitions and collection methods, as well as with relevant expert information. The model is applied to 31 EU and EFTA countries for the period 2002–2008. The expert opinion comes from a two-round Delphi survey carried out amongst 11 European experts on issues related to migration statistics. The adopted Bayesian approach allows for a coherent quantification of uncertainty stemming from different sources (data discrepancies, model parameters, and expert judgement). The outcomes produced by the model – whole posterior distributions of estimated flows – can be then used for assessing the true magnitude of flows at the European level, taking into account relative costs of overestimating or underestimating of migration flows. In this context, problems related to application of the decision statistical analysis to multidimensional problems are briefly discusse

    Monitoring regional economies in Australia: why and how

    Get PDF
    Economic activity is inherently variable and monitoring it is a major challenge, especially in regional economies where resources are fewer and activity is more variable. Using a recent study of the Riverland region, the authors set out the information available, its limitations and means by which it may be extended. It is argued that monitoring must respond to specific needs and extend to information beyond the scope of the merely economic. It is not simply a matter of tracking commonly used economic variables but of understanding specific economic challenges and using that understanding to target economic and social information.Monitoring, regions, socio-economic, Community/Rural/Urban Development,

    Measuring and Managing Answer Quality for Online Data-Intensive Services

    Full text link
    Online data-intensive services parallelize query execution across distributed software components. Interactive response time is a priority, so online query executions return answers without waiting for slow running components to finish. However, data from these slow components could lead to better answers. We propose Ubora, an approach to measure the effect of slow running components on the quality of answers. Ubora randomly samples online queries and executes them twice. The first execution elides data from slow components and provides fast online answers; the second execution waits for all components to complete. Ubora uses memoization to speed up mature executions by replaying network messages exchanged between components. Our systems-level implementation works for a wide range of platforms, including Hadoop/Yarn, Apache Lucene, the EasyRec Recommendation Engine, and the OpenEphyra question answering system. Ubora computes answer quality much faster than competing approaches that do not use memoization. With Ubora, we show that answer quality can and should be used to guide online admission control. Our adaptive controller processed 37% more queries than a competing controller guided by the rate of timeouts.Comment: Technical Repor

    Inmate Legal Information Requests Analysis: Empirical Data to Inform Library Purchases in Correctional Institutions

    Get PDF
    The introduction of legal content to Google Scholar made United States case law and law journal articles accessible to an unprecedented extent. With case law freely available and accurate bibliographic information for articles, could Google Scholar be accurate and complete enough for correctional institutions to forgo purchasing either print publications or fee-based services for these materials? This article empirically assesses whether Google Scholar can reliably answer the questions of inmates in a correctional facility, the Baltimore City Detention Center. As a comparison, the same questions are tested in Westlaw Correctional, a subscription database marketed to correctional institutions

    Do School Lunch Subsidies Change the Dietary Patterns of Children from Low- Income Households?

    Get PDF
    This article examines the effects of school lunch subsidies provided through the meanstested component of the National School Lunch Program on the dietary patterns of children age 10- to 13 yr in the USA. Analyzing data on 5,140 public school children in 5th grade during spring 2004, we find significant increases in the number of servings of fruit, green salad, carrots, other vegetables, and 100 percent fruit juice consumed in one week for subsidized children relative to unsubsidized children. The effects on fruit and other vegetable consumption are stronger among the children receiving a full subsidy, as opposed to only a partial subsidy, and indicate the size of the subsidy is an important policy lever underlying the program's effectiveness. Overall, the findings provide the strongest empirical evidence to date that the means-tested school lunch subsidies increase children's consumption over a time period longer than one school day.National School Lunch Program, Dietary Patterns, Means-Tested Subsidies
    corecore