1,124 research outputs found
Recommended from our members
Data-dependent cycle-accurate power modeling of RTL-level IPs using machine learning
In a chip design project, early design planning has a strong impact on the schedule and the cost of design. Power estimation is part of early design planning, and it greatly affects design decisions. Power modeling performed at a high level of abstraction is fast but inaccurate due to lack of circuit switching activity information. By contrast, power modeling performed at a low level of abstraction is more accurate as the synthesized circuit synthesis is known, but this simulation is typically slow. This report explores a power modeling approach performed at register transfer level (RTL). It exploits machine learning models in order to have a fast yet relatively accurate cycle-by-cycle power estimation. The approach is data-dependent, where cycle-specific models are trained based on the switching activity of signals obtained from RTL simulation and cycle-by-cycle power values obtained from a reference gate-level simulation of an existing RTL design. Therefore, if any changes are applied to the RTL design, re-training of models is required. The approach aims at obtaining fast yet accurate power predictions for new invocations of a given trained model using signal activity information collected during simulation of the unmodified RTL. At a low level, the complete visibility of signals in a design unintuitively might cause overtraining the model leading to inaccurate estimation. The suggested model employs automatic feature selection in each cycle. Based on the invocations used to train the cycle-by-cycle models, only signals that may switch during a given cycle will be selected as the features for their respective cycle-specific model. The method was tested on an 8-by-8 DCT design and the power estimates were within 6.5% of those from a commercial power analysis tool. This report also simulates and compares the approach of cycle-specific models to the approach of a single global model for all cycles and show that the cycle-specific approach is twice as accurate.Electrical and Computer Engineerin
Parallel processing for digital picture comparison
In picture processing an important problem is to identify two digital pictures of the same scene taken under different lighting conditions. This kind of problem can be found in remote sensing, satellite signal processing and the related areas. The identification can be done by transforming the gray levels so that the gray level histograms of the two pictures are closely matched. The transformation problem can be solved by using the packing method. Researchers propose a VLSI architecture consisting of m x n processing elements with extensive parallel and pipelining computation capabilities to speed up the transformation with the time complexity 0(max(m,n)), where m and n are the numbers of the gray levels of the input picture and the reference picture respectively. If using uniprocessor and a dynamic programming algorithm, the time complexity will be 0(m(3)xn). The algorithm partition problem, as an important issue in VLSI design, is discussed. Verification of the proposed architecture is also given
SAGA: A project to automate the management of software production systems
The project to automate the management of software production systems is described. The SAGA system is a software environment that is designed to support most of the software development activities that occur in a software lifecycle. The system can be configured to support specific software development applications using given programming languages, tools, and methodologies. Meta-tools are provided to ease configuration. Several major components of the SAGA system are completed to prototype form. The construction methods are described
ToPoliNano: Nanoarchitectures Design Made Real
Many facts about emerging nanotechnologies are yet to be assessed. There are still major concerns, for instance, about maximum achievable device density, or about which architecture is best fit for a specific application. Growing complexity requires taking into account many aspects of technology, application and architecture at the same time. Researchers face problems that are not new per se, but are now subject to very different constraints, that need to be captured by design tools. Among the emerging nanotechnologies, two-dimensional nanowire based arrays represent promising nanostructures, especially for massively parallel computing architectures. Few attempts have been done, aimed at giving the possibility to explore architectural solutions, deriving information from extensive and reliable nanoarray characterization. Moreover, in the nanotechnology arena there is still not a clear winner, so it is important to be able to target different technologies, not to miss the next big thing. We present a tool, ToPoliNano, that enables such a multi-technological characterization in terms of logic behavior, power and timing performance, area and layout constraints, on the basis of specific technological and topological descriptions. This tool can aid the design process, beside providing a comprehensive simulation framework for DC and timing simulations, and detailed power analysis. Design and simulation results will be shown for nanoarray-based circuits. ToPoliNano is the first real design tool that tackles the top down design of a circuit based on emerging technologie
The Xpress Transfer Protocol (XTP): A tutorial (short version)
The Xpress Transfer Protocol (XTP) is a reliable, light weight transfer layer protocol. Current transport layer protocols such as DoD's Transmission Control Protocol (TCP) and ISO's Transport Protocol (TP) were not designed for the next generation of high speed, interconnected reliable networks such as fiber distributed data interface (FDDI) and the gigabit/second wide area networks. Unlike all previous transport layer protocols, XTP is being designed to be implemented in hardware as a VLSI chip set. By streamlining the protocol, combining the transport and network layers, and utilizing the increased speed and parallelization possible with a VLSI implementation, XTP will be able to provide the end-to-end data transmission rates demanded in the high speed networks without compromising reliability and functionality. This tutorial briefly describes the operation of the XTP protocol and in particular, its error, flow and rate control; inter-networking addressing mechanisms; and multicast support features, as defined in the XTP Protocol Definition Revision 3.4
Estimación estadística de consumo en FPGAs
Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, junio de 200
Efficient Parallel and Distributed Algorithms for GIS Polygon Overlay Processing
Polygon clipping is one of the complex operations in computational geometry. It is used in Geographic Information Systems (GIS), Computer Graphics, and VLSI CAD. For two polygons with n and m vertices, the number of intersections can be O(nm). In this dissertation, we present the first output-sensitive CREW PRAM algorithm, which can perform polygon clipping in O(log n) time using O(n + k + k\u27) processors, where n is the number of vertices, k is the number of intersections, and k\u27 is the additional temporary vertices introduced due to the partitioning of polygons. The current best algorithm by Karinthi, Srinivas, and Almasi does not handle self-intersecting polygons, is not output-sensitive and must employ O(n^2) processors to achieve O(log n) time. The second parallel algorithm is an output-sensitive PRAM algorithm based on Greiner-Hormann algorithm with O(log n) time complexity using O(n + k) processors. This is cost-optimal when compared to the time complexity of the best-known sequential plane-sweep based algorithm for polygon clipping. For self-intersecting polygons, the time complexity is O(((n + k) log n log log n)/p) using p
In addition to these parallel algorithms, the other main contributions in this dissertation are 1) multi-core and many-core implementation for clipping a pair of polygons and 2) MPI-GIS and Hadoop Topology Suite for distributed polygon overlay using a cluster of nodes. Nvidia GPU and CUDA are used for the many-core implementation. The MPI based system achieves 44X speedup while processing about 600K polygons in two real-world GIS shapefiles 1) USA Detailed Water Bodies and 2) USA Block Group Boundaries) within 20 seconds on a 32-node (8 cores each) IBM iDataPlex cluster interconnected by InfiniBand technology
High level behavioural modelling of boundary scan architecture.
This project involves the development of a software tool
which enables the integration of the IEEE 1149.1/JTAG
Boundary Scan Test Architecture automatically into an ASIC
(Application Specific Integrated Circuit) design. The tool requires the original design (the ASIC) to be described in VHDL-IEEE 1076 Hardware Description Language. The tool consists of the two major elements: i) A parsing and insertion algorithm developed and implemented in 'C';
ii) A high level model of the Boundary Scan Test
Architecture implemented in 'VHDL'. The parsing and insertion algorithm is developed to deal with identifying the design Input/Output (I/O) terminals, their types and the order they appear in the ASIC design. It then attaches suitable Boundary Scan Cells to each I/O, except power and ground and inserts the high level models of the full Boundary Scan Architecture into the ASIC without altering the design core structure
Cause Clue Clauses: Error Localization using Maximum Satisfiability
Much effort is spent everyday by programmers in trying to reduce long,
failing execution traces to the cause of the error. We present a new algorithm
for error cause localization based on a reduction to the maximal satisfiability
problem (MAX-SAT), which asks what is the maximum number of clauses of a
Boolean formula that can be simultaneously satisfied by an assignment. At an
intuitive level, our algorithm takes as input a program and a failing test, and
comprises the following three steps. First, using symbolic execution, we encode
a trace of a program as a Boolean trace formula which is satisfiable iff the
trace is feasible. Second, for a failing program execution (e.g., one that
violates an assertion or a post-condition), we construct an unsatisfiable
formula by taking the trace formula and additionally asserting that the input
is the failing test and that the assertion condition does hold at the end.
Third, using MAX-SAT, we find a maximal set of clauses in this formula that can
be satisfied together, and output the complement set as a potential cause of
the error. We have implemented our algorithm in a tool called bug-assist for C
programs. We demonstrate the surprising effectiveness of the tool on a set of
benchmark examples with injected faults, and show that in most cases,
bug-assist can quickly and precisely isolate the exact few lines of code whose
change eliminates the error. We also demonstrate how our algorithm can be
modified to automatically suggest fixes for common classes of errors such as
off-by-one.Comment: The pre-alpha version of the tool can be downloaded from
http://bugassist.mpi-sws.or
Technology-generic tool for interconnect reliability projections in 3D integrated circuits
Supervised by Donald E. Troxel and Carl V. Thompson.Also issued as Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001.Includes bibliographical references (p. 107-112).by Syed Mohiul Alam
- …