39,602 research outputs found

    The OTree: multidimensional indexing with efficient data sampling for HPC

    Get PDF
    Spatial big data is considered an essential trend in future scientific and business applications. Indeed, research instruments, medical devices, and social networks generate hundreds of petabytes of spatial data per year. However, many authors have pointed out that the lack of specialized frameworks for multidimensional Big Data is limiting possible applications and precluding many scientific breakthroughs. Paramount in achieving High-Performance Data Analytics is to optimize and reduce the I/O operations required to analyze large data sets. To do so, we need to organize and index the data according to its multidimensional attributes. At the same time, to enable fast and interactive exploratory analysis, it is vital to generate approximate representations of large datasets efficiently. In this paper, we propose the Outlook Tree (or OTree), a novel Multidimensional Indexing with efficient data Sampling (MIS) algorithm. The OTree enables exploratory analysis of large multidimensional datasets with arbitrary precision, a vital missing feature in current distributed data management solutions. Our algorithm reduces the indexing overhead and achieves high performance even for write-intensive HPC applications. Indeed, we use the OTree to store the scientific results of a study on the efficiency of drug inhalers. Then we compare the OTree implementation on Apache Cassandra, named Qbeast, with PostgreSQL and plain storage. Lastly, we demonstrate that our proposal delivers better performance and scalability.Peer ReviewedPostprint (author's final draft

    Towards a Scalable Dynamic Spatial Database System

    Get PDF
    With the rise of GPS-enabled smartphones and other similar mobile devices, massive amounts of location data are available. However, no scalable solutions for soft real-time spatial queries on large sets of moving objects have yet emerged. In this paper we explore and measure the limits of actual algorithms and implementations regarding different application scenarios. And finally we propose a novel distributed architecture to solve the scalability issues.Comment: (2012

    Fixed Effect Estimation of Large T Panel Data Models

    Get PDF
    This article reviews recent advances in fixed effect estimation of panel data models for long panels, where the number of time periods is relatively large. We focus on semiparametric models with unobserved individual and time effects, where the distribution of the outcome variable conditional on covariates and unobserved effects is specified parametrically, while the distribution of the unobserved effects is left unrestricted. Compared to existing reviews on long panels (Arellano and Hahn 2007; a section in Arellano and Bonhomme 2011) we discuss models with both individual and time effects, split-panel Jackknife bias corrections, unbalanced panels, distribution and quantile effects, and other extensions. Understanding and correcting the incidental parameter bias caused by the estimation of many fixed effects is our main focus, and the unifying theme is that the order of this bias is given by the simple formula p/n for all models discussed, with p the number of estimated parameters and n the total sample size.Comment: 40 pages, 1 tabl

    The Role of Geographic Proximity And Industrial Structure In Metropolitan Area Business Cycles

    Get PDF
    Measurement and prediction of aggregate economic fluctuations at the region, state, and metropolitan area level is a major challenge. As data quality and analytical techniques have improved, the analysis of coincident economic cycle indicators (CEI) has progressed from national to regional to state levels. This paper continues the trend of geographic disaggregation by constructing and analyzing CEI at the MSA level. The theoretical advantage of MSA level indexes is that they reflect labor market areas. Given lack of quarterly economic time series at the MSA level, we construct a new variable, the EPI (export price index). The EPI is an index number constructed to measure changes in the prices of goods produced by major industries located in metropolitan areas. Using non-agricultural employment and the EPI as MSA-specific variables, we are able to estimate following a Stock/Watson type single factor models. We find that, for larger states, with multiple MSAs, there is substantial variation in the amplitude and timing of cycles across MSAs. Further tests group MSAs within states by applying cluster analysis to the state series for the MSAs within a state. The groupings are interesting for two reasons. First, they confirm the differences observed within states. Secondly, and perhaps most important, the groupings of cyclically similar MSAs are not always based on geographic proximity as might be expected. It appears that industrial similarity of the MSA economies is also important for cyclical performance
    • …
    corecore