84,199 research outputs found

    Recursive tree traversal dependence analysis

    Get PDF
    While there has been much work done on analyzing and transforming regular programs that operate over linear arrays and dense matrices, comparatively little has been done to try to carry these optimizations over to programs that operate over heap-based data structures using pointers. Previous work has shown that point blocking, a technique similar to loop tiling in regular programs, can help increase the temporal locality of repeated tree traversals. Point blocking, however, has only been shown to work on tree traversals where each traversal is fully independent and would allow parallelization, greatly limiting the types of applications that this transformation could be applied to.^ The purpose of this study is to develop a new framework for analyzing recursive methods that perform traversals over trees, called tree dependence analysis. This analysis translates dependence analysis techniques for regular programs to the irregular space, identifying the structure of dependences within a recursive method that traverses trees. In this study, a dependence test that exploits the dependence structure of such programs is developed, and is shown to be able to prove the legality of several locality— and parallelism-enhancing transformations, including point blocking. In addition, the analysis is extended with a novel path-dependent, conditional analysis to refine the dependence test and prove the legality of transformations for a wider range of algorithms. These analyses are then used to show that several common algorithms that manipulate trees recursively are amenable to several locality— and parallelism-enhancing transformations. This work shows that classical dependence analysis techniques, which have largely been confined to nested loops over array data structures, can be extended and translated to work for complex, recursive programs that operate over pointer-based data structures

    Bayesian model averaging over tree-based dependence structures for multivariate extremes

    Full text link
    Describing the complex dependence structure of extreme phenomena is particularly challenging. To tackle this issue we develop a novel statistical algorithm that describes extremal dependence taking advantage of the inherent hierarchical dependence structure of the max-stable nested logistic distribution and that identifies possible clusters of extreme variables using reversible jump Markov chain Monte Carlo techniques. Parsimonious representations are achieved when clusters of extreme variables are found to be completely independent. Moreover, we significantly decrease the computational complexity of full likelihood inference by deriving a recursive formula for the nested logistic model likelihood. The algorithm performance is verified through extensive simulation experiments which also compare different likelihood procedures. The new methodology is used to investigate the dependence relationships between extreme concentration of multiple pollutants in California and how these pollutants are related to extreme weather conditions. Overall, we show that our approach allows for the representation of complex extremal dependence structures and has valid applications in multivariate data analysis, such as air pollution monitoring, where it can guide policymaking

    Measuring association with recursive rank binning

    Full text link
    Pairwise measures of dependence are a common tool to map data in the early stages of analysis with several modern examples based on maximized partitions of the pairwise sample space. Following a short survey of modern measures of dependence, we introduce a new measure which recursively splits the ranks of a pair of variables to partition the sample space and computes the χ2\chi^2 statistic on the resulting bins. Splitting logic is detailed for splits maximizing a score function and randomly selected splits. Simulations indicate that random splitting produces a statistic conservatively approximated by the χ2\chi^2 distribution without a loss of power to detect numerous different data patterns compared to maximized binning. Though it seems to add no power to detect dependence, maximized recursive binning is shown to produce a natural visualization of the data and the measure. Applying maximized recursive rank binning to S&P 500 constituent data suggests the automatic detection of tail dependence.Comment: 59 pages, 22 figure

    XML content warehousing: Improving sociological studies of mailing lists and web data

    Get PDF
    In this paper, we present the guidelines for an XML-based approach for the sociological study of Web data such as the analysis of mailing lists or databases available online. The use of an XML warehouse is a flexible solution for storing and processing this kind of data. We propose an implemented solution and show possible applications with our case study of profiles of experts involved in W3C standard-setting activity. We illustrate the sociological use of semi-structured databases by presenting our XML Schema for mailing-list warehousing. An XML Schema allows many adjunctions or crossings of data sources, without modifying existing data sets, while allowing possible structural evolution. We also show that the existence of hidden data implies increased complexity for traditional SQL users. XML content warehousing allows altogether exhaustive warehousing and recursive queries through contents, with far less dependence on the initial storage. We finally present the possibility of exporting the data stored in the warehouse to commonly-used advanced software devoted to sociological analysis

    US Disposable Personal Income and Housing Price Index: A Fractional Integration Analysis

    Get PDF
    This paper examines the relationship between US disposable personal income (DPI) and house price index (HPI) during the last twenty years applying fractional integration and long-range dependence techniques to monthly data from January 1991 to July 2010. The empirical findings indicate that the stochastic properties of the two series are such that cointegration cannot hold between them, as mean reversion occurs in the case of DPI but not of HPI. Also, recursive analysis shows that the estimated fractional parameter is relatively stable over time for DPI whilst it increases throughout the sample for HPI. Interestingly, the estimates tend to converge toward the unit root case after 2008 once the bubble had burst. The implications for explaining the recent financial crisis and choosing appropriate policy actions are discussed.Personal Disposable Income, House Price Index, Fractional Integration

    US Disposable Personal Income and Housing Price Index: A Fractional Integration Analysis

    Get PDF
    This paper examines the relationship between US disposable personal income (DPI) and house price index (HPI) during the last twenty years applying fractional integration and long-range dependence techniques to monthly data from January 1991 to July 2010. The empirical findings indicate that the stochastic properties of the two series are such that cointegration cannot hold between them, as mean reversion occurs in the case of DPI but not of HPI. Also, recursive analysis shows that the estimated fractional parameter is relatively stable over time for DPI whilst it increases throughout the sample for HPI. Interestingly, the estimates tend to converge toward the unit root case after 2008 once the bubble had burst. The implications for explaining the recent financial crisis and choosing appropriate policy actions are discussed.personal disposable income, house price index, fractional integration

    Parallelizing irregular C codes assisted by interprocedural shape analysis

    Full text link
    In the new multicore architecture arena, the problem of improving the performance of a code is more in the soft-ware side than in the hardware one. However, optimizing irregular dynamic data structure based codes for such ar-chitectures is not easy, either by hand or compiler assisted. Regarding this last approach, shape analysis is a static tech-nique that achieves abstraction of dynamic memory and can help to disambiguate, quite accurately, memory references in programs that create and traverse recursive data struc-tures. This kind of analysis has promising applicability for accurate data dependence tests in loops or recursive func-tions that traverse dynamic data structures. However, sup-port for interprocedural programs in shape analysis is still a challenge, especially in the presence of recursive func-tions. In this work we present a novel fully context-sensitive interprocedural shape analysis algorithm that supports re-cursion and can be used to uncover parallelism. Our ap-proach is based on three key ideas: i) intraprocedural sup-port based on “Coexistent Links Sets ” to precisely describe the memory configurations during the abstract interpreta-tion of the C code; ii) interprocedural support based on “Recursive Flow Links ” to trace the state of pointers in previous calls; and iii) annotations of the read/written heap locations during the program analysis. We present prelim-inary experiments that reveal that our technique compares favorably with related work, and obtains precise memory abstractions in a variety of recursive programs that create and manipulate dynamic data structures. We have also im-plemented a data dependence test over our interprocedural shape analysis. With this test we have obtained promis-ing results, automatically detecting parallelism in three C codes, which have been successfully parallelized

    A MULTIVARIATE I(2) COINTEGRATION ANALYSIS OF GERMAN HYPERINFLATION

    Get PDF
    This paper re-examines the Cagan model of German hyperinflation during the 1920s under the twin hypotheses that the system contains variables that are I(2) and that a linear trend is required in the cointegrating relations. Using the recently developed I(2) cointegration analysis developed by Johansen (1992, 1995, 1997) extended by Paruolo (1996) and Rahbek et al. (1999) we find that the linear trend hypothesis is rejected for the sample. However, we provide conclusive evidence that money supply and the price level have a common I(2) component. Then, the validity of Cagan’s model is tested via a transformation of the I(2) to an I(1) model between real money balances and money growth or inflation. This transformation is not imposed on the data but it is shown to satisfy the statistical property of polynomial cointegration. Evidence is obtained in favor of cointegration between the two sets of variables which is however weakened by the sample dependence of the trace test that the application of the recursive stability tests for cointegrated VAR models show.I(2) analysis, hyperinflation, cointegration, identification, temporal stability

    Generalized Points-to Graphs: A New Abstraction of Memory in the Presence of Pointers

    Full text link
    Flow- and context-sensitive points-to analysis is difficult to scale; for top-down approaches, the problem centers on repeated analysis of the same procedure; for bottom-up approaches, the abstractions used to represent procedure summaries have not scaled while preserving precision. We propose a novel abstraction called the Generalized Points-to Graph (GPG) which views points-to relations as memory updates and generalizes them using the counts of indirection levels leaving the unknown pointees implicit. This allows us to construct GPGs as compact representations of bottom-up procedure summaries in terms of memory updates and control flow between them. Their compactness is ensured by the following optimizations: strength reduction reduces the indirection levels, redundancy elimination removes redundant memory updates and minimizes control flow (without over-approximating data dependence between memory updates), and call inlining enhances the opportunities of these optimizations. We devise novel operations and data flow analyses for these optimizations. Our quest for scalability of points-to analysis leads to the following insight: The real killer of scalability in program analysis is not the amount of data but the amount of control flow that it may be subjected to in search of precision. The effectiveness of GPGs lies in the fact that they discard as much control flow as possible without losing precision (i.e., by preserving data dependence without over-approximation). This is the reason why the GPGs are very small even for main procedures that contain the effect of the entire program. This allows our implementation to scale to 158kLoC for C programs
    • …
    corecore