720 research outputs found

    A Massively Parallel 2D Rectangle Placement Method

    Get PDF
    Layout design is a frequently occurring process that oftencombines human and computer reasoning. Because of the combinatorialnature of the problem, solving even a small size input involves searchinga prohibitively large state space. An algorithm PEMS (Pseudo-exhaustiveEdge Minimizing Search) is proposed for approximating a 2D rectanglepacking variant of the problem. The proposed method is inspiredby MERA (Minimum Enclosing of Rectangle Area) [1] and MEGA(Minimum Enclosing Under Gravitational Attraction) [2], yet produceshigher quality solutions, in terms of final space utilization. To addressthe performance cost, a CUDA based acceleration algorithm is developedwith significant speedup

    Melting of hexagonal skyrmion states in chiral magnets

    Get PDF
    Skyrmions are spiral structures observed in thin films of certain magnetic materials (Uchida et al 2006 Science 311 359–61). Of the phases allowed by the crystalline symmetries of these materials (Yi et al 2009 Phys. Rev. B 80 054416), only the hexagonally packed phases (SCh) have been observed. Here the melting of the SCh phase is investigated using Monte Carlo simulations. In addition to the usual measure of skyrmion density, chiral charge, a morphological measure is considered. In doing so it is shown that the low-temperature reduction in chiral charge is associated with a change in skyrmion profiles rather than skyrmion destruction. At higher temperatures, the loss of six-fold symmetry is associated with the appearance of elongated skyrmions that disrupt the hexagonal packing

    FaST-GShare: Enabling Efficient Spatio-Temporal GPU Sharing in Serverless Computing for Deep Learning Inference

    Full text link
    Serverless computing (FaaS) has been extensively utilized for deep learning (DL) inference due to the ease of deployment and pay-per-use benefits. However, existing FaaS platforms utilize GPUs in a coarse manner for DL inferences, without taking into account spatio-temporal resource multiplexing and isolation, which results in severe GPU under-utilization, high usage expenses, and SLO (Service Level Objectives) violation. There is an imperative need to enable an efficient and SLO-aware GPU-sharing mechanism in serverless computing to facilitate cost-effective DL inferences. In this paper, we propose \textbf{FaST-GShare}, an efficient \textit{\textbf{Fa}aS-oriented \textbf{S}patio-\textbf{T}emporal \textbf{G}PU \textbf{Sharing}} architecture for deep learning inferences. In the architecture, we introduce the FaST-Manager to limit and isolate spatio-temporal resources for GPU multiplexing. In order to realize function performance, the automatic and flexible FaST-Profiler is proposed to profile function throughput under various resource allocations. Based on the profiling data and the isolation mechanism, we introduce the FaST-Scheduler with heuristic auto-scaling and efficient resource allocation to guarantee function SLOs. Meanwhile, FaST-Scheduler schedules function with efficient GPU node selection to maximize GPU usage. Furthermore, model sharing is exploited to mitigate memory contention. Our prototype implementation on the OpenFaaS platform and experiments on MLPerf-based benchmark prove that FaST-GShare can ensure resource isolation and function SLOs. Compared to the time sharing mechanism, FaST-GShare can improve throughput by 3.15x, GPU utilization by 1.34x, and SM (Streaming Multiprocessor) occupancy by 3.13x on average.Comment: The paper has been accepted by ACM ICPP 202

    Robust object-based algorithms for direct shadow simulation

    Get PDF
    En informatique graphique, les algorithmes de générations d'ombres évaluent la quantité de lumière directement perçue par une environnement virtuel. Calculer précisément des ombres est cependant coûteux en temps de calcul. Dans cette dissertation, nous présentons un nouveau système basé objet robuste, qui permet de calculer des ombres réalistes sur des scènes dynamiques et ce en temps interactif. Nos contributions incluent notamment le développement de nouveaux algorithmes de génération d'ombres douces ainsi que leur mise en oeuvre efficace sur processeur graphique. Nous commençons par formaliser la problématique du calcul d'ombres directes. Tout d'abord, nous définissons ce que sont les ombres directes dans le contexte général du transport de la lumière. Nous étudions ensuite les techniques interactives qui génèrent des ombres directes. Suite à cette étude nous montrons que mêmes les algorithmes dit physiquement réalistes se reposent sur des approximations. Nous mettons également en avant, que malgré leur contraintes géométriques, les algorithmes d'ombres basées objet sont un bon point de départ pour résoudre notre problématique de génération efficace et robuste d'ombres directes. Basé sur cette observation, nous étudions alors le système basé objet existant et mettons en avant ses problèmes de robustesse. Nous proposons une nouvelle technique qui améliore la qualité des ombres générées par ce système en lui ajoutant une étape de mélange de pénombres. Malgré des propriétés et des résultats convaincants, les limitations théoriques et de mise en oeuvre limite la qualité générale et les performances de cet algorithme. Nous présentons ensuite un nouvel algorithme d'ombres basées objet. Cet algorithme combine l'efficacité de l'approche basée objet temps réel avec la précision de sa généralisation au rendu hors ligne. Notre algorithme repose sur l'évaluation locale du nombre d'objets entre deux points : la complexité de profondeur. Nous décrivons comment nous utilisons cet algorithme pour échantillonner la complexité de profondeur entre les surfaces visibles d'une scène et une source lumineuse. Nous générons ensuite des ombres à partir de cette information soit en modulant l'éclairage direct soit en intégrant numériquement l'équation d'illumination directe. Nous proposons ensuite une extension de notre algorithme afin qu'il puisse prendre en compte les ombres projetées par des objets semi-opaque. Finalement, nous présentons une mise en oeuvre efficace de notre système qui démontre que des ombres basées objet peuvent être générées de façon efficace et ce même sur une scène dynamique. En rendu temps réel, il est commun de représenter des objets très détaillés encombinant peu de triangles avec des textures qui représentent l'opacité binaire de l'objet. Les techniques de génération d'ombres basées objet ne traitent pas de tels triangles dit "perforés". De par leur nature, elles manipulent uniquement les géométries explicitement représentées par des primitives géométriques. Nous présentons une nouvel algorithme basé objet qui lève cette limitation. Nous soulignons que notre méthode peut être efficacement combinée avec les systèmes existants afin de proposer un système unifié basé objet qui génère des ombres à la fois pour des maillages classiques et des géométries perforées. La mise en oeuvre proposée montre finalement qu'une telle combinaison fournit une solution élégante, efficace et robuste à la problématique générale de l'éclairage direct et ce aussi bien pour des applications temps réel que des applications sensibles à la la précision du résultat.Direct shadow algorithms generate shadows by simulating the direct lighting interaction in a virtual environment. The main challenge with the accurate direct shadow problematic is its computational cost. In this dissertation, we develop a new robust object-based shadow framework that provides realistic shadows at interactive frame rate on dynamic scenes. Our contributions include new robust object-based soft shadow algorithms and efficient interactive implementations. We start, by formalizing the direct shadow problematic. Following the light transport problematic, we first formalize what are robust direct shadows. We then study existing interactive direct shadow techniques and outline that the real time direct shadow simulation remains an open problem. We show that even the so called physically plausible soft shadow algorithms still rely on approximations. Nevertheless we exhibit that, despite their geometric constraints, object-based approaches seems well suited when targeting accurate solutions. Starting from the previous analyze, we investigate the existing object-based shadow framework and discuss about its robustness issues. We propose a new technique that drastically improve the resulting shadow quality by improving this framework with a penumbra blending stage. We present a practical implementation of this approach. From the obtained results, we outline that, despite desirable properties, the inherent theoretical and implementation limitations reduce the overall quality and performances of the proposed algorithm. We then present a new object-based soft shadow algorithm. It merges the efficiency of the real time object-based shadows with the accuracy of its offline generalization. The proposed algorithm lies onto a new local evaluation of the number of occluders between twotwo points (\ie{} the depth complexity). We describe how we use this algorithm to sample the depth complexity between any visible receiver and the light source. From this information, we compute shadows by either modulate the direct lighting or numerically solve the direct illumination with an accuracy depending on the light sampling strategy. We then propose an extension of our algorithm in order to handle shadows cast by semi opaque occluders. We finally present an efficient implementation of this framework that demonstrates that object-based shadows can be efficiently used on complex dynamic environments. In real time rendering, it is common to represent highly detailed objects with few triangles and transmittance textures that encode their binary opacity. Object-based techniques do not handle such perforated triangles. Due to their nature, they can only evaluate the shadows cast by models whose their shape is explicitly defined by geometric primitives. We describe a new robust object-based algorithm that addresses this main limitation. We outline that this method can be efficiently combine with object-based frameworks in order to evaluate approximative shadows or simulate the direct illumination for both common meshes and perforated triangles. The proposed implementation shows that such combination provides a very strong and efficient direct lighting framework, well suited to many domains ranging from quality sensitive to performance critical applications

    Hydrodynamics of Suspensions of Passive and Active Rigid Particles: A Rigid Multiblob Approach

    Get PDF
    We develop a rigid multiblob method for numerically solving the mobility problem for suspensions of passive and active rigid particles of complex shape in Stokes flow in unconfined, partially confined, and fully confined geometries. As in a number of existing methods, we discretize rigid bodies using a collection of minimally-resolved spherical blobs constrained to move as a rigid body, to arrive at a potentially large linear system of equations for the unknown Lagrange multipliers and rigid-body motions. Here we develop a block-diagonal preconditioner for this linear system and show that a standard Krylov solver converges in a modest number of iterations that is essentially independent of the number of particles. For unbounded suspensions and suspensions sedimented against a single no-slip boundary, we rely on existing analytical expressions for the Rotne-Prager tensor combined with a fast multipole method or a direct summation on a Graphical Processing Unit to obtain an simple yet efficient and scalable implementation. For fully confined domains, such as periodic suspensions or suspensions confined in slit and square channels, we extend a recently-developed rigid-body immersed boundary method to suspensions of freely-moving passive or active rigid particles at zero Reynolds number. We demonstrate that the iterative solver for the coupled fluid and rigid body equations converges in a bounded number of iterations regardless of the system size. We optimize a number of parameters in the iterative solvers and apply our method to a variety of benchmark problems to carefully assess the accuracy of the rigid multiblob approach as a function of the resolution. We also model the dynamics of colloidal particles studied in recent experiments, such as passive boomerangs in a slit channel, as well as a pair of non-Brownian active nanorods sedimented against a wall.Comment: Under revision in CAMCOS, Nov 201

    SaLoBa: Maximizing Data Locality and Workload Balance for Fast Sequence Alignment on GPUs

    Full text link
    Sequence alignment forms an important backbone in many sequencing applications. A commonly used strategy for sequence alignment is an approximate string matching with a two-dimensional dynamic programming approach. Although some prior work has been conducted on GPU acceleration of a sequence alignment, we identify several shortcomings that limit exploiting the full computational capability of modern GPUs. This paper presents SaLoBa, a GPU-accelerated sequence alignment library focused on seed extension. Based on the analysis of previous work with real-world sequencing data, we propose techniques to exploit the data locality and improve workload balancing. The experimental results reveal that SaLoBa significantly improves the seed extension kernel compared to state-of-the-art GPU-based methods.Comment: Published at IPDPS'2
    corecore