Search CORE

39,602 research outputs found

The OTree: multidimensional indexing with efficient data sampling for HPC

Author: Becerra Fontal Yolanda
Calmet Hadrien
Cugnasco Cesare
Eguzkitza Ane Beatriz
Houzeaux Guillaume
Labarta Mancho Jesús José
Santamaria Mateu Pol
Sirvent Pardell Raül
Torres Viñals Jordi
Publication venue: Institute of Electrical and Electronics Engineers (IEEE)
Publication date: 01/01/2019
Field of study

Spatial big data is considered an essential trend in future scientific and business applications. Indeed, research instruments, medical devices, and social networks generate hundreds of petabytes of spatial data per year. However, many authors have pointed out that the lack of specialized frameworks for multidimensional Big Data is limiting possible applications and precluding many scientific breakthroughs. Paramount in achieving High-Performance Data Analytics is to optimize and reduce the I/O operations required to analyze large data sets. To do so, we need to organize and index the data according to its multidimensional attributes. At the same time, to enable fast and interactive exploratory analysis, it is vital to generate approximate representations of large datasets efficiently. In this paper, we propose the Outlook Tree (or OTree), a novel Multidimensional Indexing with efficient data Sampling (MIS) algorithm. The OTree enables exploratory analysis of large multidimensional datasets with arbitrary precision, a vital missing feature in current distributed data management solutions. Our algorithm reduces the indexing overhead and achieves high performance even for write-intensive HPC applications. Indeed, we use the OTree to store the scientific results of a study on the efficiency of drug inhalers. Then we compare the OTree implementation on Apache Cassandra, named Qbeast, with PostgreSQL and plain storage. Lastly, we demonstrate that our proposal delivers better performance and scalability.Peer ReviewedPostprint (author's final draft

Towards a Scalable Dynamic Spatial Database System

Author: Diaconu Raluca
Keller Joaquín
Valero Mathieu
Publication venue
Publication date: 16/11/2012
Field of study

With the rise of GPS-enabled smartphones and other similar mobile devices, massive amounts of location data are available. However, no scalable solutions for soft real-time spatial queries on large sets of moving objects have yet emerged. In this paper we explore and measure the limits of actual algorithms and implementations regarding different application scenarios. And finally we propose a novel distributed architecture to solve the scalability issues.Comment: (2012

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Fixed Effect Estimation of Large T Panel Data Models

Author: Andersen E
Arellano M
Arellano M
Arellano M
Baltagi B
Bonhomme S
Candelaria LE
Chen M
Chernozhukov V
Chudik A
Cox DR
Cox DR
Dzemski A
Evdokimov K
Galvao AF
Hahn J
Iván Fernández-Val
Martin Weidner
Matzkin RL
Pakel C
Pakes A
Palmgren J
Quenouille MH
Rasch G
Shi X
Stammann A
Sun Y
Torgovitsky A
Tukey JW
Wooldridge JM
Publication venue
Publication date: 01/01/2018
Field of study

This article reviews recent advances in fixed effect estimation of panel data models for long panels, where the number of time periods is relatively large. We focus on semiparametric models with unobserved individual and time effects, where the distribution of the outcome variable conditional on covariates and unobserved effects is specified parametrically, while the distribution of the unobserved effects is left unrestricted. Compared to existing reviews on long panels (Arellano and Hahn 2007; a section in Arellano and Bonhomme 2011) we discuss models with both individual and time effects, split-panel Jackknife bias corrections, unbalanced panels, distribution and quantile effects, and other extensions. Understanding and correcting the incidental parameter bias caused by the estimation of many fixed effects is our main focus, and the unifying theme is that the order of this bias is given by the simple formula p/n for all models discussed, with p the number of estimated parameters and n the total sample size.Comment: 40 pages, 1 tabl

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

The Role of Geographic Proximity And Industrial Structure In Metropolitan Area Business Cycles

Author: Hollar Michael
Pennington-Cross Anthony
Yezer Anthony
Publication venue: e-Publications@Marquette
Publication date: 01/01/2010
Field of study

Measurement and prediction of aggregate economic fluctuations at the region, state, and metropolitan area level is a major challenge. As data quality and analytical techniques have improved, the analysis of coincident economic cycle indicators (CEI) has progressed from national to regional to state levels. This paper continues the trend of geographic disaggregation by constructing and analyzing CEI at the MSA level. The theoretical advantage of MSA level indexes is that they reflect labor market areas. Given lack of quarterly economic time series at the MSA level, we construct a new variable, the EPI (export price index). The EPI is an index number constructed to measure changes in the prices of goods produced by major industries located in metropolitan areas. Using non-agricultural employment and the EPI as MSA-specific variables, we are able to estimate following a Stock/Watson type single factor models. We find that, for larger states, with multiple MSAs, there is substantial variation in the amplitude and timing of cycles across MSAs. Further tests group MSAs within states by applying cluster analysis to the state series for the MSAs within a state. The groupings are interesting for two reasons. First, they confirm the differences observed within states. Secondly, and perhaps most important, the groupings of cyclically similar MSAs are not always based on geographic proximity as might be expected. It appears that industrial similarity of the MSA economies is also important for cyclical performance

CiteSeerX