425 research outputs found

    木を用いた構造化並列プログラミング

    Get PDF
    High-level abstractions for parallel programming are still immature. Computations on complicated data structures such as pointer structures are considered as irregular algorithms. General graph structures, which irregular algorithms generally deal with, are difficult to divide and conquer. Because the divide-and-conquer paradigm is essential for load balancing in parallel algorithms and a key to parallel programming, general graphs are reasonably difficult. However, trees lead to divide-and-conquer computations by definition and are sufficiently general and powerful as a tool of programming. We therefore deal with abstractions of tree-based computations. Our study has started from Matsuzaki’s work on tree skeletons. We have improved the usability of tree skeletons by enriching their implementation aspect. Specifically, we have dealt with two issues. We first have implemented the loose coupling between skeletons and data structures and developed a flexible tree skeleton library. We secondly have implemented a parallelizer that transforms sequential recursive functions in C into parallel programs that use tree skeletons implicitly. This parallelizer hides the complicated API of tree skeletons and makes programmers to use tree skeletons with no burden. Unfortunately, the practicality of tree skeletons, however, has not been improved. On the basis of the observations from the practice of tree skeletons, we deal with two application domains: program analysis and neighborhood computation. In the domain of program analysis, compilers treat input programs as control-flow graphs (CFGs) and perform analysis on CFGs. Program analysis is therefore difficult to divide and conquer. To resolve this problem, we have developed divide-and-conquer methods for program analysis in a syntax-directed manner on the basis of Rosen’s high-level approach. Specifically, we have dealt with data-flow analysis based on Tarjan’s formalization and value-graph construction based on a functional formalization. In the domain of neighborhood computations, a primary issue is locality. A naive parallel neighborhood computation without locality enhancement causes a lot of cache misses. The divide-and-conquer paradigm is known to be useful also for locality enhancement. We therefore have applied algebraic formalizations and a tree-segmenting technique derived from tree skeletons to the locality enhancement of neighborhood computations.電気通信大学201

    Algorithmic skeleton framework for the orchestration of GPU computations

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaThe Graphics Processing Unit (GPU) is gaining popularity as a co-processor to the Central Processing Unit (CPU), due to its ability to surpass the latter’s performance in certain application fields. Nonetheless, harnessing the GPU’s capabilities is a non-trivial exercise that requires good knowledge of parallel programming. Thus, providing ways to extract such computational power has become an emerging research topic. In this context, there have been several proposals in the field of GPGPU (Generalpurpose Computation on Graphics Processing Unit) development. However, most of these still offer a low-level abstraction of the GPU computing model, forcing the developer to adapt application computations in accordance with the SPMD model, as well as to orchestrate the low-level details of the execution. On the other hand, the higher-level approaches have limitations that prevent the full exploitation of GPUs when the purpose goes beyond the simple offloading of a kernel. To this extent, our proposal builds on the recent trend of applying the notion of algorithmic patterns (skeletons) to GPU computing. We propose Marrow, a high-level algorithmic skeleton framework that expands the set of skeletons currently available in this field. Marrow’s skeletons orchestrate the execution of OpenCL computations and introduce optimizations that overlap communication and computation, thus conjoining programming simplicity with performance gains in many application scenarios. Additionally, these skeletons can be combined (nested) to create more complex applications. We evaluated the proposed constructs by confronting them against the comparable skeleton libraries for GPGPU, as well as against hand-tuned OpenCL programs. The results are favourable, indicating that Marrow’s skeletons are both flexible and efficient in the context of GPU computing.FCT-MCTES - financing the equipmen

    Systematic Development of Correct Bulk Synchronous Parallel Programs

    Full text link

    The primordial non-Gaussianity of local type (f_NL) in the WMAP 5-year data: the length distribution of CMB skeleton

    Get PDF
    We present skeleton studies of non-Gaussianity in the CMB temperature anisotropy observed in the WMAP5 data. The local skeleton is traced on the 2D sphere by cubic spline interpolation which leads to more accurate estimation of the intersection positions between the skeleton and the secondary pixels than conventional linear interpolation. We demonstrate that the skeleton-based estimator of non-Gaussianity of the local type (f_NL) - the departure of the length distribution from the corresponding Gaussian expectation - yields an unbiased and sufficiently converged f_NL-likelihood. We analyse the skeleton statistics in the WMAP5 combined V- and W-band data outside the Galactic base-mask determined from the KQ75 sky-coverage. The results are consistent with Gaussian simulations of the the best-fitting cosmological model, but deviate from the previous results determined using the WMAP1 data. We show that it is unlikely that the improved skeleton tracing method, the omission of Q-band data, the modification of the foreground-template fitting method or the absence of 6 extended regions in the new mask contribute to such a deviation. However, the application of the Kp0 base-mask in data processing does improve the consistency with the WMAP1 results. The f_NL-likelihoods of the data are estimated at 9 different smoothing levels. It is unexpected that the best-fit values show positive correlation with the smoothing scales. Further investigation argues against a point-source or goodness-of-fit explanation but finds that about 30% of either Gaussian or f_NL samples having better goodness-of-fit than the WMAP5 show a similar correlation. We present the estimate f_NL=47.3+/-34.9 (1sigma error) determined from the first four smoothing angles and f_NL=76.8+/-43.1 for the combination of all nine. The former result may be overestimated at the 0.21sigma-level because of point sources.Comment: 17 pages, 14 figures, 5 tables, accepted for publication in MNRA

    PiCo: A Domain-Specific Language for Data Analytics Pipelines

    Get PDF
    In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models—for which only informal (and often confusing) semantics is generally provided—all share a common under- lying model, namely, the Dataflow model. Using this model as a starting point, it is possible to categorize and analyze almost all aspects about Big Data analytics tools from a high level perspective. This analysis can be considered as a first step toward a formal model to be exploited in the design of a (new) framework for Big Data analytics. By putting clear separations between all levels of abstraction (i.e., from the runtime to the user API), it is easier for a programmer or software designer to avoid mixing low level with high level aspects, as we are often used to see in state-of-the-art Big Data analytics frameworks. From the user-level perspective, we think that a clearer and simple semantics is preferable, together with a strong separation of concerns. For this reason, we use the Dataflow model as a starting point to build a programming environment with a simplified programming model implemented as a Domain-Specific Language, that is on top of a stack of layers that build a prototypical framework for Big Data analytics. The contribution of this thesis is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm, Google Dataflow), thus making it easier to understand high-level data-processing applications written in such frameworks. As result of this analysis, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level. Second, we propose a programming environment based on such layered model in the form of a Domain-Specific Language (DSL) for processing data collections, called PiCo (Pipeline Composition). The main entity of this programming model is the Pipeline, basically a DAG-composition of processing elements. This model is intended to give the user an unique interface for both stream and batch processing, hiding completely data management and focusing only on operations, which are represented by Pipeline stages. Our DSL will be built on top of the FastFlow library, exploiting both shared and distributed parallelism, and implemented in C++11/14 with the aim of porting C++ into the Big Data world

    Reliable massively parallel symbolic computing : fault tolerance for a distributed Haskell

    Get PDF
    As the number of cores in manycore systems grows exponentially, the number of failures is also predicted to grow exponentially. Hence massively parallel computations must be able to tolerate faults. Moreover new approaches to language design and system architecture are needed to address the resilience of massively parallel heterogeneous architectures. Symbolic computation has underpinned key advances in Mathematics and Computer Science, for example in number theory, cryptography, and coding theory. Computer algebra software systems facilitate symbolic mathematics. Developing these at scale has its own distinctive set of challenges, as symbolic algorithms tend to employ complex irregular data and control structures. SymGridParII is a middleware for parallel symbolic computing on massively parallel High Performance Computing platforms. A key element of SymGridParII is a domain specific language (DSL) called Haskell Distributed Parallel Haskell (HdpH). It is explicitly designed for scalable distributed-memory parallelism, and employs work stealing to load balance dynamically generated irregular task sizes. To investigate providing scalable fault tolerant symbolic computation we design, implement and evaluate a reliable version of HdpH, HdpH-RS. Its reliable scheduler detects and handles faults, using task replication as a key recovery strategy. The scheduler supports load balancing with a fault tolerant work stealing protocol. The reliable scheduler is invoked with two fault tolerance primitives for implicit and explicit work placement, and 10 fault tolerant parallel skeletons that encapsulate common parallel programming patterns. The user is oblivious to many failures, they are instead handled by the scheduler. An operational semantics describes small-step reductions on states. A simple abstract machine for scheduling transitions and task evaluation is presented. It defines the semantics of supervised futures, and the transition rules for recovering tasks in the presence of failure. The transition rules are demonstrated with a fault-free execution, and three executions that recover from faults. The fault tolerant work stealing has been abstracted in to a Promela model. The SPIN model checker is used to exhaustively search the intersection of states in this automaton to validate a key resiliency property of the protocol. It asserts that an initially empty supervised future on the supervisor node will eventually be full in the presence of all possible combinations of failures. The performance of HdpH-RS is measured using five benchmarks. Supervised scheduling achieves a speedup of 757 with explicit task placement and 340 with lazy work stealing when executing Summatory Liouville up to 1400 cores of a HPC architecture. Moreover, supervision overheads are consistently low scaling up to 1400 cores. Low recovery overheads are observed in the presence of frequent failure when lazy on-demand work stealing is used. A Chaos Monkey mechanism has been developed for stress testing resiliency with random failure combinations. All unit tests pass in the presence of random failure, terminating with the expected results

    Exploring cavity dynamics in biomolecular systems

    Get PDF
    Background The internal cavities of proteins are dynamic structures and their dynamics may be associated with conformational changes which are required for the functioning of the protein. In order to study the dynamics of these internal protein cavities, appropriate tools are required that allow rapid identification of the cavities as well as assessment of their time-dependent structures. Results In this paper, we present such a tool and give results that illustrate the applicability for the analysis of molecular dynamics trajectories. Our algorithm consists of a pre-processing step where the structure of the cavity is computed from the Voronoi diagram of the van der Waals spheres based on coordinate sets from the molecular dynamics trajectory. The pre-processing step is followed by an interactive stage, where the user can compute, select and visualize the dynamic cavities. Importantly, the tool we discuss here allows the user to analyze the time-dependent changes of the components of the cavity structure. An overview of the cavity dynamics is derived by rendering the dynamic cavities in a single image that gives the cavity surface colored according to its time-dependent dynamics. Conclusion The Voronoi-based approach used here enables the user to perform accurate computations of the geometry of the internal cavities in biomolecules. For the first time, it is possible to compute dynamic molecular paths that have a user-defined minimum constriction size. To illustrate the usefulness of the tool for understanding protein dynamics, we probe the dynamic structure of internal cavities in the bacteriorhodopsin proton pump

    HIGH QUALITY HUMAN 3D BODY MODELING, TRACKING AND APPLICATION

    Get PDF
    Geometric reconstruction of dynamic objects is a fundamental task of computer vision and graphics, and modeling human body of high fidelity is considered to be a core of this problem. Traditional human shape and motion capture techniques require an array of surrounding cameras or subjects wear reflective markers, resulting in a limitation of working space and portability. In this dissertation, a complete process is designed from geometric modeling detailed 3D human full body and capturing shape dynamics over time using a flexible setup to guiding clothes/person re-targeting with such data-driven models. As the mechanical movement of human body can be considered as an articulate motion, which is easy to guide the skin animation but has difficulties in the reverse process to find parameters from images without manual intervention, we present a novel parametric model, GMM-BlendSCAPE, jointly taking both linear skinning model and the prior art of BlendSCAPE (Blend Shape Completion and Animation for PEople) into consideration and develop a Gaussian Mixture Model (GMM) to infer both body shape and pose from incomplete observations. We show the increased accuracy of joints and skin surface estimation using our model compared to the skeleton based motion tracking. To model the detailed body, we start with capturing high-quality partial 3D scans by using a single-view commercial depth camera. Based on GMM-BlendSCAPE, we can then reconstruct multiple complete static models of large pose difference via our novel non-rigid registration algorithm. With vertex correspondences established, these models can be further converted into a personalized drivable template and used for robust pose tracking in a similar GMM framework. Moreover, we design a general purpose real-time non-rigid deformation algorithm to accelerate this registration. Last but not least, we demonstrate a novel virtual clothes try-on application based on our personalized model utilizing both image and depth cues to synthesize and re-target clothes for single-view videos of different people

    Reconfigurable cable driven parallel mechanism

    Get PDF
    Due to the fast growth in industry and in order to reduce manufacturing budget, increase the quality of products and increase the accuracy of manufactured products in addition to assure the safety of workers, people relied on mechanisms for such purposes. Recently, cable driven parallel mechanisms (CDPMs) have attracted much attention due to their many advantages over conventional parallel mechanisms, such as the significantly large workspace and the dynamics capacity. In addition, it has lower mass compared to other parallel mechanisms because of its negligible mass cables compared to the rigid links. In many applications it is required that human interact with machines and robots to achieve tasks precisely and accurately. Therefore, a new domain of scientific research has been introduced, that is human robot interaction, where operators can share the same workspace with robots and machines such as cable driven mechanisms. One of the main requirements due to this interaction that robots should respond to human actions in accurate, harmless way. In addition, the trajectory of the end effector is coming now from the operator and it is very essential that the initial trajectory is kept unchanged to perform tasks such assembly, operating or pick and place while avoiding the cables to interfere with each other or collide with the operator. Accordingly, many issues have been raised such as control, vibrations and stability due the contact between human and robot. Also, one of the most important issues is to guarantee collision free space (to avoid collision between cables and operator and to avoid collisions between cables itself). The aim of this research project is to model, design, analysis and implement reconfigurable six degrees of freedom parallel mechanism driven by eight cables. The main contribution of this work will be as follow. First, develop a nonlinear model and solve the forward and inverse kinematics issue of a fully constrained CDPM given that the attachment points on the rails are moving vertically (conventional cable driven mechanisms have fixed attachment points on the rails) while controlling the cable lengths. Second, the new idea of reconfiguration is then used to avoid interference between cables and between cables and operator limbs in real time by moving one cable’s attachment point on the frame to increase the shortest distance between them while keeping the trajectory of the end effector unchanged. Third, the new proposed approach was tested by creating a simulated intended cable-cable and cable-human interference trajectory, hence detecting and avoiding cable-cable and cable-human collision using the proposed real time reconfiguration while maintaining the initial end effector trajectory. Fourth, study the effect of relocating the attachment points on the constant-orientation wrench feasible workspace of the CDPM. En raison de la croissance de la demande de produits personnalisés et de la nécessité de réduire les coûts de fabrication tout en augmentant la qualité des produits et en augmentant la personnalisation des produits fabriqués en plus d'assurer la sécurité des travailleurs, les concepteurs se sont appuyés sur des mécanismes robotiques afin d’atteindre ces objectifs. Récemment, les mécanismes parallèles entraînés par câble (MPEC) ont attiré beaucoup d'attention en raison de leurs nombreux avantages par rapport aux mécanismes parallèles conventionnels, tels que l'espace de travail considérablement grand et la capacité dynamique. De plus, ce mécanisme a une masse plus faible par rapport à d'autres mécanismes parallèles en raison de ses câbles de masse négligeable comparativement aux liens rigides. Dans de nombreuses applications, il est nécessaire que l’humain interagisse avec les machines et les robots pour réaliser des tâches avec précision et rapidité. Par conséquent, un nouveau domaine de recherche scientifique a été introduit, à savoir l'interaction humain-robot, où les opérateurs peuvent partager le même espace de travail avec des robots et des machines telles que les mécanismes entraînés par des câbles. L'une des principales exigences en raison de cette interaction que les robots doivent répondre aux actions humaines d'une manière sécuritaire et collaboratif. En conséquence, de nombreux problèmes ont été soulevés tels que la commande et la stabilité dues au contact physique entre l’humain et le robot. Aussi, l'un des enjeux les plus importants est de garantir un espace sans collision (pour éviter les collisions entre des câbles et un opérateur et éviter les collisions entre les câbles entre eux). Le but de ce projet de recherche est de modéliser, concevoir, analyser et mettre en œuvre un mécanisme parallèle reconfigurable à six degrés de liberté entraîné par huit câbles. La principale contribution de ces travaux de recherche est de développer un modèle non linéaire et résolvez le problème de cinématique direct et inverse d'un CDPM entièrement contraint étant donné que les points d'attache sur les rails se déplacent verticalement (les mécanismes entraînés par des câbles conventionnels ont des points d'attache fixes sur les rails) tout en contrôlant les longueurs des câbles. Dans une deuxième étape, l’idée de la reconfiguration est ensuite utilisée pour éviter les interférences entre les câbles et entre les câbles et les membres d’un opérateur en temps réel en déplaçant un point de fixation du câble sur le cadre pour augmenter la distance la plus courte entre eux tout en gardant la trajectoire de l'effecteur terminal inchangée. Troisièmement, la nouvelle approche proposée a été évaluée et testée en créant une trajectoire d'interférence câble-câble et câble-humain simulée, détectant et évitant ainsi les collisions câble-câble et câble-humain en utilisant la reconfiguration en temps réel proposée tout en conservant la trajectoire effectrice finale. Enfin la dernière étape des travaux de recherche consiste à étudiez l'effet du déplacement des points d'attache sur l'espace de travail réalisable du CDPM
    corecore