2,029 research outputs found
Modeling Solder Ball Array Interconnects for Power Module Optimization
PowerSynth is a software platform that can co-optimize power modules utilizing a 2D topology and wire bond interconnects. The novel 3D architectures being proposed at the University of Arkansas utilize solder ball interconnects instead of wire bonds. Therefore, they currently cannot be optimized using PowerSynth. This paper examines methods to accurately model the parasitic inductance of solder balls and ball grid arrays so they may be implemented into software for optimization. Proposed mathematical models are validated against ANSYS Electromagnetics Suite simulations. A comparison of the simulated data shows that mathematical models are well suited for implementation into optimization software platforms. Experimental measurements proved to be inconclusive and necessitate future work
A review of conducting polymers in electrical contact applications
A review of recent developments in fretting studies in electrical contacts is presented, focusing on developments in conducting polymer surfaces. Fretting is known to be a major cause of contact deterioration and failure; commonly exhibited as the contact resistance increases from a few milliohms, in the case of a new metallic contacts, to in excess of several ohms for exposed contacts. Two technologies are discussed; firstly extrinsically conducting polymer (ECP), where highly conductive interconnects are formed using metallized particles embedded within a high temperature polymer compound, and secondly; intrinsically conducting polymers (ICPs) are discussed. These latter surfaces are new developments which are beginning to show potential for the application discussed. This paper presents the work on the ICPs using poly(3,4-ethylenedioxythiophene)/poly(4-styrenesulfonate) (PEDOT /PSS) and its blends from secondary doping of dimethylformamide (DMF)PEDOT/PSS. Two different processing techniques namely dropcoating and spin coating have been employed to develop test samples and their functionality were assessed by two independent studies of temperature and fretting motion. The review leads to a number of recommendations for further studies into the application of conducting polymers for contacts with micro-movement.<br/
Energy challenges for ICT
The energy consumption from the expanding use of information and communications technology (ICT) is unsustainable with present drivers, and it will impact heavily on the future climate change. However, ICT devices have the potential to contribute signi - cantly to the reduction of CO2 emission and enhance resource e ciency in other sectors, e.g., transportation (through intelligent transportation and advanced driver assistance systems and self-driving vehicles), heating (through smart building control), and manu- facturing (through digital automation based on smart autonomous sensors). To address the energy sustainability of ICT and capture the full potential of ICT in resource e - ciency, a multidisciplinary ICT-energy community needs to be brought together cover- ing devices, microarchitectures, ultra large-scale integration (ULSI), high-performance computing (HPC), energy harvesting, energy storage, system design, embedded sys- tems, e cient electronics, static analysis, and computation. In this chapter, we introduce challenges and opportunities in this emerging eld and a common framework to strive towards energy-sustainable ICT
Comparative Analysis of Prior Knowledge-Based Machine Learning Metamodels for Modeling Hybrid CopperâGraphene On-Chip Interconnects
In this article, machine learning (ML) metamodels have been developed in order to predict the per-unit-length parameters of hybrid copperâgraphene on-chip interconnects based on their structural geometry and layout. ML metamodels within the context of this article include artificial neural networks, support vector machines (SVMs), and least-square SVMs. The salient feature of all these ML metamodels is that they exploit the prior knowledge of the p.u.l. parameters of the interconnects obtained from cheap empirical models to reduce the number of expensive full-wave electromagnetic (EM) simulations required to extract the training data. Thus, the proposed ML metamodels are referred to as prior knowledge-based machine learning (PKBML) metamodels. The PKBML metamodels offer the same accuracy as conventional ML metamodels trained exclusively by full-wave EM solver data, but at the expense of far smaller training time costs. In this article, detailed comparative analysis of the proposed PKBML metamodels have been performed using multiple numerical examples
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation
TensorFlow has been the most widely adopted Machine/Deep Learning framework.
However, little exists in the literature that provides a thorough understanding
of the capabilities which TensorFlow offers for the distributed training of
large ML/DL models that need computation and communication at scale. Most
commonly used distributed training approaches for TF can be categorized as
follows: 1) Google Remote Procedure Call (gRPC), 2) gRPC+X: X=(InfiniBand
Verbs, Message Passing Interface, and GPUDirect RDMA), and 3) No-gRPC: Baidu
Allreduce with MPI, Horovod with MPI, and Horovod with NVIDIA NCCL. In this
paper, we provide an in-depth performance characterization and analysis of
these distributed training approaches on various GPU clusters including the Piz
Daint system (6 on Top500). We perform experiments to gain novel insights along
the following vectors: 1) Application-level scalability of DNN training, 2)
Effect of Batch Size on scaling efficiency, 3) Impact of the MPI library used
for no-gRPC approaches, and 4) Type and size of DNN architectures. Based on
these experiments, we present two key insights: 1) Overall, No-gRPC designs
achieve better performance compared to gRPC-based approaches for most
configurations, and 2) The performance of No-gRPC is heavily influenced by the
gradient aggregation using Allreduce. Finally, we propose a truly CUDA-Aware
MPI Allreduce design that exploits CUDA kernels and pointer caching to perform
large reductions efficiently. Our proposed designs offer 5-17X better
performance than NCCL2 for small and medium messages, and reduces latency by
29% for large messages. The proposed optimizations help Horovod-MPI to achieve
approximately 90% scaling efficiency for ResNet-50 training on 64 GPUs.
Further, Horovod-MPI achieves 1.8X and 3.2X higher throughput than the native
gRPC method for ResNet-50 and MobileNet, respectively, on the Piz Daint
cluster.Comment: 10 pages, 9 figures, submitted to IEEE IPDPS 2019 for peer-revie
Using high resolution displays for high resolution cardiac data
The ability to perform fast, accurate, high resolution visualization is fundamental
to improving our understanding of anatomical data. As the volumes of data
increase from improvements in scanning technology, the methods applied to rendering
and visualization must evolve. In this paper we address the interactive display of
data from high resolution MRI scanning of a rabbit heart and subsequent histological
imaging. We describe a visualization environment involving a tiled LCD panel
display wall and associated software which provide an interactive and intuitive user
interface.
The oView software is an OpenGL application which is written for the VRJuggler
environment. This environment abstracts displays and devices away from the
application itself, aiding portability between different systems, from desktop PCs to
multi-tiled display walls. Portability between display walls has been demonstrated
through its use on walls at both Leeds and Oxford Universities. We discuss important
factors to be considered for interactive 2D display of large 3D datasets,
including the use of intuitive input devices and level of detail aspects
Recommended from our members
Nasics: A `Fabric-Centric\u27 Approach Towards Integrated Nanosystems
This dissertation addresses the fundamental problem of how to build computing systems for the nanoscale. With CMOS reaching fundamental limits, emerging nanomaterials such as semiconductor nanowires, carbon nanotubes, graphene etc. have been proposed as promising alternatives. However, nanoelectronics research has largely focused on a `device-first\u27 mindset without adequately addressing system-level capabilities, challenges for integration and scalable assembly.
In this dissertation, we propose to develop an integrated nano-fabric, (broadly defined as nanostructures/devices in conjunction with paradigms for assembly, inter-connection and circuit styles), as opposed to approaches that focus on MOSFET replacement devices as the ultimate goal. In the `fabric-centric\u27 mindset, design choices at individual levels are made compatible with the fabric as a whole and minimize challenges for nanomanufacturing while achieving system-level benefits vs. scaled CMOS.
We present semiconductor nanowire based nano-fabrics incorporating these fabric-centric principles called NASICs and N3ASICs and discuss how we have taken them from initial design to experimental prototype. Manufacturing challenges are mitigated through careful design choices at multiple levels of abstraction. Regular fabrics with limited customization mitigate overlay alignment requirements. Cross-nanowire FET devices and interconnect are assembled together as part of the uniform regular fabric without the need for arbitrary fine-grain interconnection at the nanoscale, routing or device sizing. Unconventional circuit styles are devised that are compatible with regular fabric layouts and eliminate the requirement for using complementary devices.
Core fabric concepts are introduced and validated. Detailed analyses on device-circuit co-design and optimization, cascading, noise and parameter variation are presented. Benchmarking of nanowire processor designs vs. equivalent scaled 16nm CMOS shows up to 22X area, 30X power benefits at comparable performance, and with overlay precision that is achievable with present-day technology. Building on the extensive manufacturing-friendly fabric framework, we present recent experimental efforts and key milestones that have been attained towards realizing a proof-of-concept prototype at dimensions of 30nm and below
System-Level Design and Virtual Prototyping of a Telecommunication Application on a NUMA Platform
International audienceThe use of model-driven approaches for embedded system design has become a common practice. Among these model-driven approaches, only a few of them include the generation of a full-system simulation comprising operating system, code generation for tasks and hardware simulation models. Even less common is the extension to massively parallel, NoC based designs, such as required for high performance streaming applications where dozens of tasks are replicated onto identical general purpose processor cores of a Multi-processor System-on-chip (MP-SoC). We present the extension of a system-level tool to handle clustered Network-on-Chip (NoC) with virtual prototyping platforms. On the one hand, the automatic generation of the virtual prototype becomes more complex as topcell, address mapping and linker script have to be adapted. On the other hand, the exploration of the design space is particularly important for this class of applications, as performance may strongly be impacted by Non Uniform Memory Access (NUMA)
- âŠ