8,977 research outputs found
Approximate Computing Survey, Part I: Terminology and Software & Hardware Approximation Techniques
The rapid growth of demanding applications in domains applying multimedia
processing and machine learning has marked a new era for edge and cloud
computing. These applications involve massive data and compute-intensive tasks,
and thus, typical computing paradigms in embedded systems and data centers are
stressed to meet the worldwide demand for high performance. Concurrently, the
landscape of the semiconductor field in the last 15 years has constituted power
as a first-class design concern. As a result, the community of computing
systems is forced to find alternative design approaches to facilitate
high-performance and/or power-efficient computing. Among the examined
solutions, Approximate Computing has attracted an ever-increasing interest,
with research works applying approximations across the entire traditional
computing stack, i.e., at software, hardware, and architectural levels. Over
the last decade, there is a plethora of approximation techniques in software
(programs, frameworks, compilers, runtimes, languages), hardware (circuits,
accelerators), and architectures (processors, memories). The current article is
Part I of our comprehensive survey on Approximate Computing, and it reviews its
motivation, terminology and principles, as well it classifies and presents the
technical details of the state-of-the-art software and hardware approximation
techniques.Comment: Under Review at ACM Computing Survey
Recommended from our members
It's about Time: Analytical Time Periodization
This paper presents a novel approach to the problem of time periodization, which involves dividing the time span of a complex dynamic phenomenon into periods that enclose different relatively stable states or development trends. The challenge lies in finding such a division of the time that takes into account diverse behaviours of multiple components of the phenomenon while being simple and easy to interpret. Despite the importance of this problem, it has not received sufficient attention in the fields of visual analytics and data science. We use a real-world example from aviation and an additional usage scenario on analysing mobility trends during the COVID-19 pandemic to develop and test an analytical workflow that combines computational and interactive visual techniques. We highlight the differences between the two cases and show how they affect the use of different techniques. Through our investigation of possible variations in the time periodization problem, we discuss the potential of our approach to be used in various applications. Our contributions include defining and investigating an earlier neglected problem type, developing a practical and reproducible approach to solving problems of this type, and uncovering potential for formalization and development of computational methods
NetClone: Fast, Scalable, and Dynamic Request Cloning for Microsecond-Scale RPCs
Spawning duplicate requests, called cloning, is a powerful technique to
reduce tail latency by masking service-time variability. However, traditional
client-based cloning is static and harmful to performance under high load,
while a recent coordinator-based approach is slow and not scalable. Both
approaches are insufficient to serve modern microsecond-scale Remote Procedure
Calls (RPCs). To this end, we present NetClone, a request cloning system that
performs cloning decisions dynamically within nanoseconds at scale. Rather than
the client or the coordinator, NetClone performs request cloning in the network
switch by leveraging the capability of programmable switch ASICs. Specifically,
NetClone replicates requests based on server states and blocks redundant
responses using request fingerprints in the switch data plane. To realize the
idea while satisfying the strict hardware constraints, we address several
technical challenges when designing a custom switch data plane. NetClone can be
integrated with emerging in-network request schedulers like RackSched. We
implement a NetClone prototype with an Intel Tofino switch and a cluster of
commodity servers. Our experimental results show that NetClone can improve the
tail latency of microsecond-scale RPCs for synthetic and real-world application
workloads and is robust to various system conditions.Comment: 13 pages, ACM SIGCOMM 202
RPDP: An Efficient Data Placement based on Residual Performance for P2P Storage Systems
Storage systems using Peer-to-Peer (P2P) architecture are an alternative to
the traditional client-server systems. They offer better scalability and fault
tolerance while at the same time eliminate the single point of failure. The
nature of P2P storage systems (which consist of heterogeneous nodes) introduce
however data placement challenges that create implementation trade-offs (e.g.,
between performance and scalability). Existing Kademlia-based DHT data
placement method stores data at closest node, where the distance is measured by
bit-wise XOR operation between data and a given node. This approach is highly
scalable because it does not require global knowledge for placing data nor for
the data retrieval. It does not however consider the heterogeneous performance
of the nodes, which can result in imbalanced resource usage affecting the
overall latency of the system. Other works implement criteria-based selection
that addresses heterogeneity of nodes, however often cause subsequent data
retrieval to require global knowledge of where the data stored. This paper
introduces Residual Performance-based Data Placement (RPDP), a novel data
placement method based on dynamic temporal residual performance of data nodes.
RPDP places data to most appropriate selected nodes based on their throughput
and latency with the aim to achieve lower overall latency by balancing data
distribution with respect to the individual performance of nodes. RPDP relies
on Kademlia-based DHT with modified data structure to allow data subsequently
retrieved without the need of global knowledge. The experimental results
indicate that RPDP reduces the overall latency of the baseline Kademlia-based
P2P storage system (by 4.87%) and it also reduces the variance of latency among
the nodes, with minimal impact to the data retrieval complexity
Variability-aware Neo4j for Analyzing a Graphical Model of a Software Product Line
A Software product line (SPLs) eases the development of families of related products by managing and integrating a collection of mandatory and optional features (units of functionality). Individual products can be derived from the product line by selecting among the optional features. Companies that successfully employ SPLs report dramatic improvements in rapid product development, software quality, labour needs, support for mass customization, and time to market.
In a product line of reasonable size, it is impractical to verify every product because the number of possible feature combinations is exponential in the number of features. As a result, developers might verify a small fraction of products and limit the choices offered to consumers, thereby foregoing one of the greatest promises of product lines — mass customization.
To improve the efficiency of analyzing SPLs, (1) we analyze a model of an SPL rather than its code and (2) we analyze the SPL model itself rather than models of its products. We extract a model comprising facts (e.g., functions, variables, assignments) from an SPL’s source-code artifacts. The facts from different software components are linked together into a lightweight model of the code, called a factbase. The resulting factbase is a typed graphical model that can be analyzed using the Neo4j graph database.
In this thesis, we lift the Neo4j query engine to reason over a factbase of an entire SPL. By lifting the Neo4j query engine, we enable any analysis that can be expressed in the query language to be applicable to an SPL model. The lifted analyses return variability-aware results, in which each result is annotated with a feature expression denoting the products to which the result applies.
We evaluated lifted Neo4j on five real-world open-source SPLs, with respect to ten commonly used analyses of interest. The first evaluation aims at comparing the performance of a post-processing approach versus an on-the-fly approach computing the feature expressions that annotate to variability-aware results of lifted Neo4j. In general, the on-the-fly approach has a smaller runtime compared to the post-processing approach. The second evaluation aims at assessing the overhead of analyzing a model of an SPL versus a model of a single product, which ranges from 1.88% to 456%. In the third evaluation, we compare the outputs and performance of lifted Neo4j to a related work that employs the variability-aware V-Soufflé Datalog engine. We found that lifted Neo4j is usually more efficient than V-Soufflé when returning the same results (i.e., the end points of path results). When lifted Neo4j returns complete path results, it is generally slower than V-Soufflé, although lifted Neo4j can outperform V-Soufflé on analyses that return short fixed-length paths
DataComp: In search of the next generation of multimodal datasets
Multimodal datasets are a critical component in recent breakthroughs such as
Stable Diffusion and GPT-4, yet their design does not receive the same research
attention as model architectures or training algorithms. To address this
shortcoming in the ML ecosystem, we introduce DataComp, a testbed for dataset
experiments centered around a new candidate pool of 12.8 billion image-text
pairs from Common Crawl. Participants in our benchmark design new filtering
techniques or curate new data sources and then evaluate their new dataset by
running our standardized CLIP training code and testing the resulting model on
38 downstream test sets. Our benchmark consists of multiple compute scales
spanning four orders of magnitude, which enables the study of scaling trends
and makes the benchmark accessible to researchers with varying resources. Our
baseline experiments show that the DataComp workflow leads to better training
sets. In particular, our best baseline, DataComp-1B, enables training a CLIP
ViT-L/14 from scratch to 79.2% zero-shot accuracy on ImageNet, outperforming
OpenAI's CLIP ViT-L/14 by 3.7 percentage points while using the same training
procedure and compute. We release DataComp and all accompanying code at
www.datacomp.ai
Intelligent computing : the latest advances, challenges and future
Computing is a critical driving force in the development of human civilization. In recent years, we have witnessed the emergence of intelligent computing, a new computing paradigm that is reshaping traditional computing and promoting digital revolution in the era of big data, artificial intelligence and internet-of-things with new computing theories, architectures, methods, systems, and applications. Intelligent computing has greatly broadened the scope of computing, extending it from traditional computing on data to increasingly diverse computing paradigms such as perceptual intelligence, cognitive intelligence, autonomous intelligence, and human computer fusion intelligence. Intelligence and computing have undergone paths of different evolution and development for a long time but have become increasingly intertwined in recent years: intelligent computing is not only intelligence-oriented but also intelligence-driven. Such cross-fertilization has prompted the emergence and rapid advancement of intelligent computing
Open Heterogeneous Quorum Systems
In contrast to proof-of-work replication, Byzantine replicated systems
maintain consistency with higher throughput, modest energy consumption, and
deterministic liveness guarantees. If complemented with open membership and
heterogeneous trust, they have the potential to serve as a global financial
infrastructure. This paper presents a general model of heterogeneous quorum
systems, where each participant can declare its own quorums, and captures the
consistency, availability, and inclusion properties of these systems. In order
to support open membership, it then presents reconfiguration protocols for
heterogeneous quorum systems: joining and leaving of a process, and adding and
removing of a quorum. It presents trade-offs for the properties that
reconfigurations can preserve, and accordingly, presents reconfiguration
protocols and proves their correctness. It further presents a graph
characterization of heterogeneous quorum systems, and its application for
reconfiguration optimization
Efficient finite element methods for solving high-frequency time-harmonic acoustic wave problems in heterogeneous media
This thesis focuses on the efficient numerical solution of frequency-domain wave propagation problems using finite element methods. In the first part of the manuscript, the development of domain decomposition methods is addressed, with the aim of overcoming the limitations of state-of-the art direct and iterative solvers. To this end, a non-overlapping substructured domain decomposition method with high-order absorbing conditions used as transmission conditions (HABC DDM) is first extended to deal with cross-points, where more than two subdomains meet. The handling of cross-points is a well-known issue for non-overlapping HABC DDMs. Our methodology proposes an efficient solution for lattice-type domain partitions, where the domains meet at right angles. The method is based on the introduction of suitable relations and additional transmission variables at the cross-points, and its effectiveness is demonstrated on several test cases. A similar non-overlapping substructured DDM is then proposed with Perfectly Matched Layers instead of HABCs used as transmission conditions (PML DDM). The proposed approach naturally considers cross-points for two-dimensional checkerboard domain partitions through Lagrange multipliers used for the weak coupling between subproblems defined on rectangular subdomains and the surrounding PMLs. Two discretizations for the Lagrange multipliers and several stabilization strategies are proposed and compared. The performance of the HABC and PML DDM is then compared on test cases of increasing complexity, from two-dimensional wave scattering in homogeneous media to three-dimensional wave propagation in highly heterogeneous media. While the theoretical developments are carried out for the scalar Helmholtz equation for acoustic wave propagation, the extension to elastic wave problems is also considered, highlighting the potential for further generalizations to other physical contexts. The second part of the manuscript is devoted to the presentation of the computational tools developed during the thesis and which were used to produce all the numerical results: GmshFEM, a new C++ finite element library based on the application programming interface of the open-source finite element mesh generator Gmsh; and GmshDDM, a distributed domain decomposition library based on GmshFEM.Cette thèse porte sur la résolution numérique efficace de problèmes de propagation d'ondes dans le domaine fréquentiel avec la méthode des éléments finis. Dans la première partie du manuscrit, le développement de méthodes de décomposition de domaine est abordé, dans le but de surmonter les limitations des solveurs directs et itératifs de l'état de l'art. À cette fin, une méthode de décomposition de domaine sous-structurée sans recouvrement avec des conditions absorbante d'ordre élevé utilisées comme conditions de transmission (HABC DDM) est d'abord étendue pour traiter les points de jonction, où plus de deux sous-domaines se rencontrent. Le traitement des points de jonction est un problème bien connu pour les HABC DDM sans recouvrement. La méthodologie proposée mène à une solution efficace pour les partitions en damier, où les domaines se rencontrent à angle droit. La méthode est basée sur l'introduction de variables de transmission supplémentaires aux points de jonction, et son efficacité est démontrée sur plusieurs cas-tests. Une DDM sans recouvrement similaire est ensuite proposée avec des couches parfaitement adaptées au lieu des HABC (DDM PML). L'approche proposée prend naturellement en compte les points de jonction des partitions de domaine en damier par le biais de multiplicateurs de Lagrange couplant les sous-domaines et les couches PML adjacentes. Deux discrétisations pour les multiplicateurs de Lagrange et plusieurs stratégies de stabilisation sont proposées et comparées. Les performances des DDM HABC et PML sont ensuite comparées sur des cas-tests de complexité croissante, allant de la diffraction d'ondes dans des milieux homogènes bidimensionnelles à la propagation d'ondes tridimensionnelles dans des milieux hautement hétérogènes. Alors que les développements théoriques sont effectués pour l'équation scalaire de Helmholtz pour la simulation d'ondes acoustiques, l'extension aux problèmes d'ondes élastiques est également considérée, mettant en évidence le potentiel de généralisation des méthodes développées à d'autres contextes physiques. La deuxième partie du manuscrit est consacrée à la présentation des outils de calcul développés au cours de la thèse et qui ont été utilisés pour produire tous les résultats numériques : GmshFEM, une nouvelle bibliothèque d'éléments finis C++ basée sur le générateur de maillage open-source Gmsh ; et GmshDDM, une bibliothèque de décomposition de domaine distribuée basée sur GmshFEM
FBCHS: Fuzzy Based Cluster Head Selection Protocol to Enhance Network Lifetime of WSN
With enormous evolution in Microelectronics, Wireless Sensor Networks (WSNs) have played a vital role in every aspect of daily life. Technological advancement has led to new ways of thinking and of developing infrastructure for sensing, monitoring, and computational tasks. The sensor network constitutes multiple sensor nodes for monitoring, tracking, and surveillance of remote objects in the network area. Battery replacement and recharging are almost impossible; therefore, the aim is to develop an efficient routing protocol for the sensor network. The Fuzzy Based Cluster Head Selection (FBCHS) protocol is proposed, which partitions the network into several regions based on node energy levels. The proposed protocol uses an artificial intelligence technique to select the Cluster Head (CH) based on maximum node Residual Energy (RE) and minimum distance. The transmission of data to the Base Station (BS) is accomplished via static clustering and the hybrid routing technique. The simulation results of the FBCHS protocol are com- pared to the SEP protocol and show improvement in the stability period and improved overall performance of the network
- …