178 research outputs found

    BlinkDB: queries with bounded errors and bounded response times on very large data

    Get PDF
    In this paper, we present BlinkDB, a massively parallel, approximate query engine for running interactive SQL queries on large volumes of data. BlinkDB allows users to trade-off query accuracy for response time, enabling interactive queries over massive data by running queries on data samples and presenting results annotated with meaningful error bars. To achieve this, BlinkDB uses two key ideas: (1) an adaptive optimization framework that builds and maintains a set of multi-dimensional stratified samples from original data over time, and (2) a dynamic sample selection strategy that selects an appropriately sized sample based on a query's accuracy or response time requirements. We evaluate BlinkDB against the well-known TPC-H benchmarks and a real-world analytic workload derived from Conviva Inc., a company that manages video distribution over the Internet. Our experiments on a 100 node cluster show that BlinkDB can answer queries on up to 17 TBs of data in less than 2 seconds (over 200 x faster than Hive), within an error of 2-10%.National Science Foundation (U.S.) (CISE Expeditions Award CCF-1139158)United States. Defense Advanced Research Projects Agency (XData Award FA8750-12-2-0331)

    Blink and it's done: Interactive queries on very large data

    Get PDF
    In this demonstration, we present BlinkDB, a massively parallel, sampling-based approximate query processing framework for running interactive queries on large volumes of data. The key observation in BlinkDB is that one can make reasonable decisions in the absence of perfect answers. BlinkDB extends the Hive/HDFS stack and can handle the same set of SPJA (selection, projection, join and aggregate) queries as supported by these systems. BlinkDB provides real-time answers along with statistical error guarantees, and can scale to petabytes of data and thousands of machines in a fault-tolerant manner. Our experiments using the TPC-H benchmark and on an anonymized real-world video content distribution workload from Conviva Inc. show that BlinkDB can execute a wide range of queries up to 150x faster than Hive on MapReduce and 10--150x faster than Shark (Hive on Spark) over tens of terabytes of data stored across 100 machines, all with an error of 2--10%.National Science Foundation (U.S.) (CISE Expeditions Award CCF-1139158)QUALCOMM Inc.Amazon.com (Firm)Google (Firm)SAP CorporationBlue GojiCisco Systems, Inc.Cloudera, Inc.Ericsson, Inc.General Electric CompanyHewlett-Packard CompanyIntel CorporationMarkLogic CorporationMicrosoft CorporationNetAppOracle CorporationSplunk Inc.VMware, Inc.United States. Defense Advanced Research Projects Agency (Contract FA8650-11-C-7136

    Sentinel lymph node biopsy in squamous cell carcinoma of the head and neck: 10 years of experience

    Get PDF
    Sentinel node (SN) biopsy of head and neck cancer is still considered investigational, and agreement on the width of the surgical sampling has not yet been reached. From May 1999 to Dec 2009, 209 consecutive patients entered a prospective study: 61.7% had primary tumour of the oral cavity and 23.9% of the oropharynx. SN was not found in 26 patients. Based on these data and definitive histopathological analysis, we proposed six hypothetic scenarios to understand the percentage of neck recurrences following different treatments Among patients with identified SN, 54 cases were pN+: 47 in SN and 7 in a different node. Considering the six hypothetic scenarios: "only SN removal", "SN level dissection", "neck dissection from the tumour site to SN level", "selective neck dissection of three levels (SND)", "dissection from level I to IV" and "comprehensive I-V dissection", neck recurrences could be expected in 6.5%, 3.8%, 2.18%, 2.73%, 1.09% and 1.09% of cases, respectively. SN biopsy can be considered a useful tool to personalize the surgical approach to a N0 carcinoma. The minimum treatment of the neck is probably dissection of the levels between the primary tumour and the level containing the SN(s). Outside the framework of a clinical study, the best treatment can still be considered SND

    Knowing when you're wrong: Building fast and reliable approximate query processing systems

    Get PDF
    Modern data analytics applications typically process massive amounts of data on clusters of tens, hundreds, or thousands of machines to support near-real-time decisions.The quantity of data and limitations of disk and memory bandwidth often make it infeasible to deliver answers at interactive speeds. However, it has been widely observed that many applications can tolerate some degree of inaccuracy. This is especially true for exploratory queries on data, where users are satisfied with "close-enough" answers if they can come quickly. A popular technique for speeding up queries at the cost of accuracy is to execute each query on a sample of data, rather than the whole dataset. To ensure that the returned result is not too inaccurate, past work on approximate query processing has used statistical techniques to estimate "error bars" on returned results. However, existing work in the sampling-based approximate query processing (S-AQP) community has not validated whether these techniques actually generate accurate error bars for real query workloads. In fact, we find that error bar estimation often fails on real world production workloads. Fortunately, it is possible to quickly and accurately diagnose the failure of error estimation for a query. In this paper, we show that it is possible to implement a query approximation pipeline that produces approximate answers and reliable error bars at interactive speeds.National Science Foundation (U.S.) (CISE Expeditions Award CCF-1139158)Lawrence Berkeley National Laboratory (Award 7076018)United States. Defense Advanced Research Projects Agency (XData Award FA8750-12-2-0331)Amazon.com (Firm)Google (Firm)SAP CorporationThomas and Stacey Siebel FoundationApple Computer, Inc.Cisco Systems, Inc.Cloudera, Inc.EMC CorporationEricsson, Inc.Facebook (Firm

    Produção de girassol em resposta à utilização de boro e a adubação nitrogenada de cobertura.

    Get PDF
    Resumo: Um experimento foi realizado na safrinha 2013 para avaliar a produtividade e altura de planta em função das doses de boro (B) e adubação nitrogenada na cultura do girassol. A semeadura foi realizada em 25 de fevereiro de 2013 com o híbrido de girassol BRS 323, em sucessão à soja em área de histórico de plantio direto. Foram utilizados seis doses de B no plantio (0, 1, 2, 4, 8 e 16 kg ha), na forma de ácido bórico (17% B), com e sem a aplicação de 50 kg ha de N, em cobertura, na forma de ureia (45% N), aplicado 20 dias após a emergência das plantas, com 4 repetições. Não se observou efeito de doses de B para as variáveis analisadas. No entanto, o rendimento de girassol e a altura das plantas foram fortemente influenciados pela aplicação de N em cobertura. Abstract: An experiment was carried out in 2013 growing season to evaluate yield and plant height in function of doses of boron (B) and nitrogen fertilization on the sunflower crop. Sowing was performed on February 25, 2013 using the sunflower hybid BS 323, in sucession to soybean, in a non-till area. We used 6 B doses at planting (0, 1, 2, 4, 8 and 16 kg ha th) using as boric acid (17% B) and, 50 kg ha N by urea (45% N) in topdressing, applied 20 days after seedling emergence, with 4 replications. We observed no effect of doses of B for the analyzed variables. However, the sunflower yield and plant height were strongly influenced by the presence of nitrogen
    corecore