720 research outputs found
A Massively Parallel 2D Rectangle Placement Method
Layout design is a frequently occurring process that oftencombines human and computer reasoning. Because of the combinatorialnature of the problem, solving even a small size input involves searchinga prohibitively large state space. An algorithm PEMS (Pseudo-exhaustiveEdge Minimizing Search) is proposed for approximating a 2D rectanglepacking variant of the problem. The proposed method is inspiredby MERA (Minimum Enclosing of Rectangle Area) [1] and MEGA(Minimum Enclosing Under Gravitational Attraction) [2], yet produceshigher quality solutions, in terms of final space utilization. To addressthe performance cost, a CUDA based acceleration algorithm is developedwith significant speedup
Melting of hexagonal skyrmion states in chiral magnets
Skyrmions are spiral structures observed in thin films of certain magnetic materials (Uchida et al 2006 Science 311 359–61). Of the phases allowed by the crystalline symmetries of these materials (Yi et al 2009 Phys. Rev. B 80 054416), only the hexagonally packed phases (SCh) have been observed. Here the melting of the SCh phase is investigated using Monte Carlo simulations. In addition to the usual measure of skyrmion density, chiral charge, a morphological measure is considered. In doing so it is shown that the low-temperature reduction in chiral charge is associated with a change in skyrmion profiles rather than skyrmion destruction. At higher temperatures, the loss of six-fold symmetry is associated with the appearance of elongated skyrmions that disrupt the hexagonal packing
FaST-GShare: Enabling Efficient Spatio-Temporal GPU Sharing in Serverless Computing for Deep Learning Inference
Serverless computing (FaaS) has been extensively utilized for deep learning
(DL) inference due to the ease of deployment and pay-per-use benefits. However,
existing FaaS platforms utilize GPUs in a coarse manner for DL inferences,
without taking into account spatio-temporal resource multiplexing and
isolation, which results in severe GPU under-utilization, high usage expenses,
and SLO (Service Level Objectives) violation. There is an imperative need to
enable an efficient and SLO-aware GPU-sharing mechanism in serverless computing
to facilitate cost-effective DL inferences. In this paper, we propose
\textbf{FaST-GShare}, an efficient \textit{\textbf{Fa}aS-oriented
\textbf{S}patio-\textbf{T}emporal \textbf{G}PU \textbf{Sharing}} architecture
for deep learning inferences. In the architecture, we introduce the
FaST-Manager to limit and isolate spatio-temporal resources for GPU
multiplexing. In order to realize function performance, the automatic and
flexible FaST-Profiler is proposed to profile function throughput under various
resource allocations. Based on the profiling data and the isolation mechanism,
we introduce the FaST-Scheduler with heuristic auto-scaling and efficient
resource allocation to guarantee function SLOs. Meanwhile, FaST-Scheduler
schedules function with efficient GPU node selection to maximize GPU usage.
Furthermore, model sharing is exploited to mitigate memory contention. Our
prototype implementation on the OpenFaaS platform and experiments on
MLPerf-based benchmark prove that FaST-GShare can ensure resource isolation and
function SLOs. Compared to the time sharing mechanism, FaST-GShare can improve
throughput by 3.15x, GPU utilization by 1.34x, and SM (Streaming
Multiprocessor) occupancy by 3.13x on average.Comment: The paper has been accepted by ACM ICPP 202
Robust object-based algorithms for direct shadow simulation
En informatique graphique, les algorithmes de générations d'ombres évaluent la quantité de lumière directement perçue par une environnement virtuel. Calculer précisément des ombres est cependant coûteux en temps de calcul. Dans cette dissertation, nous présentons un nouveau système basé objet robuste, qui permet de calculer des ombres réalistes sur des scènes dynamiques et ce en temps interactif. Nos contributions incluent notamment le développement de nouveaux algorithmes de génération d'ombres douces ainsi que leur mise en oeuvre efficace sur processeur graphique. Nous commençons par formaliser la problématique du calcul d'ombres directes. Tout d'abord, nous définissons ce que sont les ombres directes dans le contexte général du transport de la lumière. Nous étudions ensuite les techniques interactives qui génèrent des ombres directes. Suite à cette étude nous montrons que mêmes les algorithmes dit physiquement réalistes se reposent sur des approximations. Nous mettons également en avant, que malgré leur contraintes géométriques, les algorithmes d'ombres basées objet sont un bon point de départ pour résoudre notre problématique de génération efficace et robuste d'ombres directes. Basé sur cette observation, nous étudions alors le système basé objet existant et mettons en avant ses problèmes de robustesse. Nous proposons une nouvelle technique qui améliore la qualité des ombres générées par ce système en lui ajoutant une étape de mélange de pénombres. Malgré des propriétés et des résultats convaincants, les limitations théoriques et de mise en oeuvre limite la qualité générale et les performances de cet algorithme. Nous présentons ensuite un nouvel algorithme d'ombres basées objet. Cet algorithme combine l'efficacité de l'approche basée objet temps réel avec la précision de sa généralisation au rendu hors ligne. Notre algorithme repose sur l'évaluation locale du nombre d'objets entre deux points : la complexité de profondeur. Nous décrivons comment nous utilisons cet algorithme pour échantillonner la complexité de profondeur entre les surfaces visibles d'une scène et une source lumineuse. Nous générons ensuite des ombres à partir de cette information soit en modulant l'éclairage direct soit en intégrant numériquement l'équation d'illumination directe. Nous proposons ensuite une extension de notre algorithme afin qu'il puisse prendre en compte les ombres projetées par des objets semi-opaque. Finalement, nous présentons une mise en oeuvre efficace de notre système qui démontre que des ombres basées objet peuvent être générées de façon efficace et ce même sur une scène dynamique. En rendu temps réel, il est commun de représenter des objets très détaillés encombinant peu de triangles avec des textures qui représentent l'opacité binaire de l'objet. Les techniques de génération d'ombres basées objet ne traitent pas
de tels triangles dit "perforés". De par leur nature, elles manipulent uniquement les géométries explicitement représentées par des primitives géométriques. Nous présentons une nouvel algorithme basé objet qui lève cette limitation. Nous soulignons que notre méthode peut être efficacement combinée avec les systèmes existants afin de proposer un système unifié basé objet qui génère des ombres à la fois pour des maillages classiques et des géométries
perforées. La mise en oeuvre proposée montre finalement qu'une telle combinaison fournit une solution élégante, efficace et robuste à la problématique générale de l'éclairage direct et ce aussi bien pour des applications temps réel que des applications sensibles à la la précision du résultat.Direct shadow algorithms generate shadows by simulating the direct lighting interaction in a virtual environment. The main challenge with the accurate direct shadow problematic is its computational cost. In this dissertation, we develop a new robust object-based shadow framework that provides realistic shadows at interactive frame rate on dynamic scenes. Our contributions include new robust object-based soft shadow algorithms and efficient interactive implementations. We start, by formalizing the direct shadow problematic. Following the light transport problematic, we first formalize what are robust direct shadows. We then study existing interactive direct shadow techniques and outline that the real time direct shadow simulation remains an open problem. We show that even the so called physically plausible soft shadow algorithms still rely on approximations. Nevertheless we exhibit that, despite their geometric constraints, object-based approaches seems well suited when targeting accurate solutions. Starting from the previous analyze, we investigate the existing object-based shadow framework and discuss about its robustness issues. We propose a new technique that drastically improve the resulting shadow quality by improving this framework with a penumbra blending stage. We present a practical implementation of this approach. From the obtained results, we outline that, despite desirable properties, the inherent theoretical and implementation limitations reduce the overall quality and performances of the proposed algorithm. We then present a new object-based soft shadow algorithm. It merges the efficiency of the real time object-based shadows with the accuracy of its offline generalization. The proposed algorithm lies onto a new local evaluation of the number of occluders between points (\ie{} the depth complexity). We describe how we use this algorithm to sample the depth complexity between any visible receiver and the light source. From this information, we compute shadows by either modulate the direct lighting or numerically solve the direct illumination with an accuracy depending on the light sampling strategy. We then propose an extension of our algorithm in order to handle shadows cast by semi opaque occluders. We finally present an efficient implementation of this framework that demonstrates that object-based shadows can be efficiently used on complex dynamic environments. In real time rendering, it is common to represent highly detailed objects with few triangles and transmittance textures that encode their binary opacity. Object-based techniques do not handle such perforated triangles. Due to their nature, they can only evaluate the shadows cast by models whose their shape is explicitly defined by geometric primitives. We describe a new robust object-based algorithm that addresses this main limitation. We outline that this method can be efficiently combine with object-based frameworks in order to evaluate approximative shadows or simulate the direct illumination for both common meshes and perforated triangles. The proposed implementation shows that such combination provides a very strong and efficient direct lighting framework, well suited to many domains ranging from quality sensitive to performance critical applications
Hydrodynamics of Suspensions of Passive and Active Rigid Particles: A Rigid Multiblob Approach
We develop a rigid multiblob method for numerically solving the mobility
problem for suspensions of passive and active rigid particles of complex shape
in Stokes flow in unconfined, partially confined, and fully confined
geometries. As in a number of existing methods, we discretize rigid bodies
using a collection of minimally-resolved spherical blobs constrained to move as
a rigid body, to arrive at a potentially large linear system of equations for
the unknown Lagrange multipliers and rigid-body motions. Here we develop a
block-diagonal preconditioner for this linear system and show that a standard
Krylov solver converges in a modest number of iterations that is essentially
independent of the number of particles. For unbounded suspensions and
suspensions sedimented against a single no-slip boundary, we rely on existing
analytical expressions for the Rotne-Prager tensor combined with a fast
multipole method or a direct summation on a Graphical Processing Unit to obtain
an simple yet efficient and scalable implementation. For fully confined
domains, such as periodic suspensions or suspensions confined in slit and
square channels, we extend a recently-developed rigid-body immersed boundary
method to suspensions of freely-moving passive or active rigid particles at
zero Reynolds number. We demonstrate that the iterative solver for the coupled
fluid and rigid body equations converges in a bounded number of iterations
regardless of the system size. We optimize a number of parameters in the
iterative solvers and apply our method to a variety of benchmark problems to
carefully assess the accuracy of the rigid multiblob approach as a function of
the resolution. We also model the dynamics of colloidal particles studied in
recent experiments, such as passive boomerangs in a slit channel, as well as a
pair of non-Brownian active nanorods sedimented against a wall.Comment: Under revision in CAMCOS, Nov 201
SaLoBa: Maximizing Data Locality and Workload Balance for Fast Sequence Alignment on GPUs
Sequence alignment forms an important backbone in many sequencing
applications. A commonly used strategy for sequence alignment is an approximate
string matching with a two-dimensional dynamic programming approach. Although
some prior work has been conducted on GPU acceleration of a sequence alignment,
we identify several shortcomings that limit exploiting the full computational
capability of modern GPUs. This paper presents SaLoBa, a GPU-accelerated
sequence alignment library focused on seed extension. Based on the analysis of
previous work with real-world sequencing data, we propose techniques to exploit
the data locality and improve workload balancing. The experimental results
reveal that SaLoBa significantly improves the seed extension kernel compared to
state-of-the-art GPU-based methods.Comment: Published at IPDPS'2
- …