5,530 research outputs found

    Performance assessment of urban precinct design: a scoping study

    Get PDF
    Executive Summary: Significant advances have been made over the past decade in the development of scientifically and industry accepted tools for the performance assessment of buildings in terms of energy, carbon, water, indoor environment quality etc. For resilient, sustainable low carbon urban development to be realised in the 21st century, however, will require several radical transitions in design performance beyond the scale of individual buildings. One of these involves the creation and application of leading edge tools (not widely available to built environment professions and practitioners) capable of being applied to an assessment of performance across all stages of development at a precinct scale (neighbourhood, community and district) in either greenfield, brownfield or greyfield settings. A core aspect here is the development of a new way of modelling precincts, referred to as Precinct Information Modelling (PIM) that provides for transparent sharing and linking of precinct object information across the development life cycle together with consistent, accurate and reliable access to reference data, including that associated with the urban context of the precinct. Neighbourhoods are the ‘building blocks’ of our cities and represent the scale at which urban design needs to make its contribution to city performance: as productive, liveable, environmentally sustainable and socially inclusive places (COAG 2009). Neighbourhood design constitutes a major area for innovation as part of an urban design protocol established by the federal government (Department of Infrastructure and Transport 2011, see Figure 1). The ability to efficiently and effectively assess urban design performance at a neighbourhood level is in its infancy. This study was undertaken by Swinburne University of Technology, University of New South Wales, CSIRO and buildingSMART Australasia on behalf of the CRC for Low Carbon Living

    SkyDOT (Sky Database for Objects in the Time Domain): A Virtual Observatory for Variability Studies at LANL

    Full text link
    The mining of Virtual Observatories (VOs) is becoming a powerful new method for discovery in astronomy. Here we report on the development of SkyDOT (Sky Database for Objects in the Time domain), a new Virtual Observatory, which is dedicated to the study of sky variability. The site will confederate a number of massive variability surveys and enable exploration of the time domain in astronomy. We discuss the architecture of the database and the functionality of the user interface. An important aspect of SkyDOT is that it is continuously updated in near real time so that users can access new observations in a timely manner. The site will also utilize high level machine learning tools that will allow sophisticated mining of the archive. Another key feature is the real time data stream provided by RAPTOR (RAPid Telescopes for Optical Response), a new sky monitoring experiment under construction at Los Alamos National Laboratory (LANL).Comment: to appear in SPIE proceedings vol. 4846, 11 pages, 5 figure

    Data Mining the SDSS SkyServer Database

    Full text link
    An earlier paper (Szalay et. al. "Designing and Mining MultiTerabyte Astronomy Archives: The Sloan Digital Sky Survey," ACM SIGMOD 2000) described the Sloan Digital Sky Survey's (SDSS) data management needs by defining twenty database queries and twelve data visualization tasks that a good data management system should support. We built a database and interfaces to support both the query load and also a website for ad-hoc access. This paper reports on the database design, describes the data loading pipeline, and reports on the query implementation and performance. The queries typically translated to a single SQL statement. Most queries run in less than 20 seconds, allowing scientists to interactively explore the database. This paper is an in-depth tour of those queries. Readers should first have studied the companion overview paper Szalay et. al. "The SDSS SkyServer, Public Access to the Sloan Digital Sky Server Data" ACM SIGMOND 2002.Comment: 40 pages, Original source is at http://research.microsoft.com/~gray/Papers/MSR_TR_O2_01_20_queries.do

    The impact of spatial data redundancy on SOLAP query performance

    Get PDF
    Geographic Data Warehouses (GDW) are one of the main technologies used in decision-making processes and spatial analysis, and the literature proposes several conceptual and logical data models for GDW. However, little effort has been focused on studying how spatial data redundancy affects SOLAP (Spatial On-Line Analytical Processing) query performance over GDW. In this paper, we investigate this issue. Firstly, we compare redundant and non-redundant GDW schemas and conclude that redundancy is related to high performance losses. We also analyze the issue of indexing, aiming at improving SOLAP query performance on a redundant GDW. Comparisons of the SB-index approach, the star-join aided by R-tree and the star-join aided by GiST indicate that the SB-index significantly improves the elapsed time in query processing from 25% up to 99% with regard to SOLAP queries defined over the spatial predicates of intersection, enclosure and containment and applied to roll-up and drill-down operations. We also investigate the impact of the increase in data volume on the performance. The increase did not impair the performance of the SB-index, which highly improved the elapsed time in query processing. Performance tests also show that the SB-index is far more compact than the star-join, requiring only a small fraction of at most 0.20% of the volume. Moreover, we propose a specific enhancement of the SB-index to deal with spatial data redundancy. This enhancement improved performance from 80 to 91% for redundant GDW schemas.FAPESPCNPqCoordenacao de Aperfeicoamento de Pessoal de Nivel Superior (CAPES)INEPFINE

    Diamond Dicing

    Get PDF
    In OLAP, analysts often select an interesting sample of the data. For example, an analyst might focus on products bringing revenues of at least 100 000 dollars, or on shops having sales greater than 400 000 dollars. However, current systems do not allow the application of both of these thresholds simultaneously, selecting products and shops satisfying both thresholds. For such purposes, we introduce the diamond cube operator, filling a gap among existing data warehouse operations. Because of the interaction between dimensions the computation of diamond cubes is challenging. We compare and test various algorithms on large data sets of more than 100 million facts. We find that while it is possible to implement diamonds in SQL, it is inefficient. Indeed, our custom implementation can be a hundred times faster than popular database engines (including a row-store and a column-store).Comment: 29 page

    Comparative Analysis of Decision Tree Algorithms for Data Warehouse Fragmentation

    Get PDF
    One of the main problems faced by Data Warehouse designers is fragmentation.Several studies have proposed data mining-based horizontal fragmentation methods.However, not exists a horizontal fragmentation technique that uses a decision tree. This paper presents the analysis of different decision tree algorithms to select the best one to implement the fragmentation method. Such analysis was performed under version 3.9.4 of Weka, considering four evaluation metrics (Precision, ROC Area, Recall and F-measure) for different selected data sets using the Star Schema Benchmark. The results showed that the two best algorithms were J48 and Random Forest in most cases; nevertheless, J48 was selected because it is more efficient in building the model.One of the main problems faced by Data Warehouse designers is fragmentation.Several studies have proposed data mining-based horizontal fragmentation methods.However, not exists a horizontal fragmentation technique that uses a decision tree. This paper presents the analysis of different decision tree algorithms to select the best one to implement the fragmentation method. Such analysis was performed under version 3.9.4 of Weka, considering four evaluation metrics (Precision, ROC Area, Recall and F-measure) for different selected data sets using the Star Schema Benchmark. The results showed that the two best algorithms were J48 and Random Forest in most cases; nevertheless, J48 was selected because it is more efficient in building the model

    Emergent relational schemas for RDF

    Get PDF

    Evaluating partitioning and bucketing strategies for Hive-based Big Data Warehousing systems

    Get PDF
    Hive has long been one of the industry-leading systems for Data Warehousing in Big Data contexts, mainly organizing data into databases, tables, partitions and buckets, stored on top of an unstructured distributed file system like HDFS. Some studies were conducted for understanding the ways of optimizing the performance of several storage systems for Big Data Warehousing. However, few of them explore the impact of data organization strategies on query performance, when using Hive as the storage technology for implementing Big Data Warehousing systems. Therefore, this paper evaluates the impact of data partitioning and bucketing in Hive-based systems, testing different data organization strategies and verifying the efficiency of those strategies in query performance. The obtained results demonstrate the advantages of implementing Big Data Warehouses based on denormalized models and the potential benefit of using adequate partitioning strategies. Defining the partitions aligned with the attributes that are frequently used in the conditions/filters of the queries can significantly increase the efficiency of the system in terms of response time. In the more intensive workload benchmarked in this paper, overall decreases of about 40% in processing time were verified. The same is not verified with the use of bucketing strategies, which shows potential benefits in very specific scenarios, suggesting a more restricted use of this functionality, namely in the context of bucketing two tables by the join attribute of these tables.This work is supported by COMPETE: POCI-01-0145- FEDER-007043 and FCT—Fundação para a CiĂȘncia e Tecnologia within the Project Scope: UID/CEC/00319/2013, and by European Structural and Investment Funds in the FEDER com-ponent, through the Operational Competitiveness and Internationalization Programme (COMPETE 2020) [Project no. 002814; Funding Reference: POCI-01-0247-FEDER-002814]
    • 

    corecore