6,033 research outputs found

    AutoParallel: A Python module for automatic parallelization and distributed execution of affine loop nests

    Get PDF
    The last improvements in programming languages, programming models, and frameworks have focused on abstracting the users from many programming issues. Among others, recent programming frameworks include simpler syntax, automatic memory management and garbage collection, which simplifies code re-usage through library packages, and easily configurable tools for deployment. For instance, Python has risen to the top of the list of the programming languages due to the simplicity of its syntax, while still achieving a good performance even being an interpreted language. Moreover, the community has helped to develop a large number of libraries and modules, tuning them to obtain great performance. However, there is still room for improvement when preventing users from dealing directly with distributed and parallel computing issues. This paper proposes and evaluates AutoParallel, a Python module to automatically find an appropriate task-based parallelization of affine loop nests to execute them in parallel in a distributed computing infrastructure. This parallelization can also include the building of data blocks to increase task granularity in order to achieve a good execution performance. Moreover, AutoParallel is based on sequential programming and only contains a small annotation in the form of a Python decorator so that anyone with little programming skills can scale up an application to hundreds of cores.Comment: Accepted to the 8th Workshop on Python for High-Performance and Scientific Computing (PyHPC 2018

    Matchmaker, Matchmaker, Make Me a Match: Migration of Populations via Marriages in the Past

    Full text link
    The study of human mobility is both of fundamental importance and of great potential value. For example, it can be leveraged to facilitate efficient city planning and improve prevention strategies when faced with epidemics. The newfound wealth of rich sources of data---including banknote flows, mobile phone records, and transportation data---has led to an explosion of attempts to characterize modern human mobility. Unfortunately, the dearth of comparable historical data makes it much more difficult to study human mobility patterns from the past. In this paper, we present an analysis of long-term human migration, which is important for processes such as urbanization and the spread of ideas. We demonstrate that the data record from Korean family books (called "jokbo") can be used to estimate migration patterns via marriages from the past 750 years. We apply two generative models of long-term human mobility to quantify the relevance of geographical information to human marriage records in the data, and we find that the wide variety in the geographical distributions of the clans poses interesting challenges for the direct application of these models. Using the different geographical distributions of clans, we quantify the "ergodicity" of clans in terms of how widely and uniformly they have spread across Korea, and we compare these results to those obtained using surname data from the Czech Republic. To examine population flow in more detail, we also construct and examine a population-flow network between regions. Based on the correlation between ergodicity and migration in Korea, we identify two different types of migration patterns: diffusive and convective. We expect the analysis of diffusive versus convective effects in population flows to be widely applicable to the study of mobility and migration patterns across different cultures.Comment: 24 pages, 23 figures, 5 table

    Project Quality of Offshore Virtual Teams Engaged in Software Requirements Analysis: An Exploratory Comparative Study

    Get PDF
    The off-shore software development companies in countries such as India use a global delivery model in which initial requirement analysis phase of software projects get executed at client locations to leverage frequent and deep interaction between user and developer teams. Subsequent phases such as design, coding and testing are completed at off-shore locations. Emerging trends indicate an increasing interest in off-shoring even requirements analysis phase using computer mediated communication. We conducted an exploratory research study involving students from Management Development Institute (MDI), India and Marquette University (MU), USA to determine quality of such off-shored requirements analysis projects. Our findings suggest that project quality of teams engaged in pure off-shore mode is comparable to that of teams engaged in collocated mode. However, the effect of controls such as user project monitoring on the quality of off-shored projects needs to be studied further

    Truss Decomposition in Massive Networks

    Full text link
    The k-truss is a type of cohesive subgraphs proposed recently for the study of networks. While the problem of computing most cohesive subgraphs is NP-hard, there exists a polynomial time algorithm for computing k-truss. Compared with k-core which is also efficient to compute, k-truss represents the "core" of a k-core that keeps the key information of, while filtering out less important information from, the k-core. However, existing algorithms for computing k-truss are inefficient for handling today's massive networks. We first improve the existing in-memory algorithm for computing k-truss in networks of moderate size. Then, we propose two I/O-efficient algorithms to handle massive networks that cannot fit in main memory. Our experiments on real datasets verify the efficiency of our algorithms and the value of k-truss.Comment: VLDB201

    Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments

    Full text link
    We propose strategies to estimate and make inference on key features of heterogeneous effects in randomized experiments. These key features include best linear predictors of the effects using machine learning proxies, average effects sorted by impact groups, and average characteristics of most and least impacted units. The approach is valid in high dimensional settings, where the effects are proxied by machine learning methods. We post-process these proxies into the estimates of the key features. Our approach is generic, it can be used in conjunction with penalized methods, deep and shallow neural networks, canonical and new random forests, boosted trees, and ensemble methods. It does not rely on strong assumptions. In particular, we don't require conditions for consistency of the machine learning methods. Estimation and inference relies on repeated data splitting to avoid overfitting and achieve validity. For inference, we take medians of p-values and medians of confidence intervals, resulting from many different data splits, and then adjust their nominal level to guarantee uniform validity. This variational inference method is shown to be uniformly valid and quantifies the uncertainty coming from both parameter estimation and data splitting. We illustrate the use of the approach with two randomized experiments in development on the effects of microcredit and nudges to stimulate immunization demand.Comment: 53 pages, 6 figures, 15 table
    • …
    corecore