12 research outputs found

    Timing optimization during the physical synthesis of cell-based VLSI circuits

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia de Automação e Sistemas, Florianópolis, 2016.Abstract : The evolution of CMOS technology made possible integrated circuits with billions of transistors assembled into a single silicon chip, giving rise to the jargon Very-Large-Scale Integration (VLSI). The required clock frequency affects the performance of a VLSI circuit and induces timing constraints that must be properly handled by synthesis tools. During the physical synthesis of VLSI circuits, several optimization techniques are used to iteratively reduce the number of timing violations until the target clock frequency is met. The dramatic increase of interconnect delay under technology scaling represents one of the major challenges for the timing closure of modern VLSI circuits. In this scenario, effective interconnect synthesis techniques play a major role. That is why this thesis targets two timing optimization problems for effective interconnect synthesis: Incremental Timing-Driven Placement (ITDP) and Incremental Timing-Driven Layer Assignment (ITLA). For solving the ITDP problem, this thesis proposes a new Lagrangian Relaxation formulation that minimizes timing violations for both setup and hold timing constraints. This work also proposes a netbased technique that uses Lagrange multipliers as net-weights, which are dynamically updated using an accurate timing analyzer. The netbased technique makes use of a novel discrete search to relocate cells by employing the Euclidean distance to define a proper neighborhood. For solving the ITLA problem, this thesis proposes a network flow approach that handles simultaneously critical and non-critical segments, and exploits a few flow conservation conditions to extract timing information for each net segment individually, thereby enabling the use of an external timing engine. The experimental validation using benchmark suites derived from industrial circuits demonstrates the effectiveness of the proposed techniques when compared with state-of-the-art works.A evolução da tecnologia CMOS viabilizou a fabricação de circuitos integrados contendo bilhões de transistores em uma única pastilha de silício, dando origem ao jargão Very-Large-Scale Integration (VLSI). A frequência-alvo de operação de um circuito VLSI afeta o seu desempenho e induz restrições de timing que devem ser manipuladas pelas ferramentas de síntese. Durante a síntese física de circuitos VLSI, diversas técnicas de otimização são usadas para iterativamente reduzir o número de violações de timing até que a frequência-alvo de operação seja atingida. O aumento dramático do atraso das interconexões devido à evolução tecnológica representa um dos maiores desafios para o fluxo de timing closure de circuitos VLSI contemporâneos. Nesse cenário, técnicas de síntese de interconexão eficientes têm um papel fundamental. Por este motivo, esta tese aborda dois problemas de otimização de timing para uma síntese eficiente das interconexões de um circuito VLSI: Incremental Timing-Driven Placement (ITDP) e Incremental Timing-Driven Layer Assignment (ITLA). Para resolver o problema de ITDP, esta tese propõe uma nova formulação utilizando Relaxação Lagrangeana que tem por objetivo a minimização simultânea das violações de timing para restrições do tipo setup e hold. Este trabalho também propõe uma técnica que utiliza multiplicadores de Lagrange como pesos para as interconexões, os quais são atualizados dinamicamente através dos resultados de uma ferramenta de análise de timing. Tal técnica realoca as células do circuito por meio de uma nova busca discreta que adota a distância Euclidiana como vizinhança.Para resolver o problema de ITLA, esta tese propõe uma abordagem em fluxo em redes que otimiza simultaneamente segmentos críticos e não-críticos, e explora algumas condições de fluxo para extrair as informações de timing para cada segmento individualmente, permitindo assim o uso de uma ferramenta de timing externa. A validação experimental, utilizando benchmarks derivados de circuitos industriais, demonstra a eficiência das técnicas propostas quando comparadas com trabalhos estado da arte

    Improving Programming Support for Hardware Accelerators Through Automata Processing Abstractions

    Full text link
    The adoption of hardware accelerators, such as Field-Programmable Gate Arrays, into general-purpose computation pipelines continues to rise, driven by recent trends in data collection and analysis as well as pressure from challenging physical design constraints in hardware. The architectural designs of many of these accelerators stand in stark contrast to the traditional von Neumann model of CPUs. Consequently, existing programming languages, maintenance tools, and techniques are not directly applicable to these devices, meaning that additional architectural knowledge is required for effective programming and configuration. Current programming models and techniques are akin to assembly-level programming on a CPU, thus placing significant burden on developers tasked with using these architectures. Because programming is currently performed at such low levels of abstraction, the software development process is tedious and challenging and hinders the adoption of hardware accelerators. This dissertation explores the thesis that theoretical finite automata provide a suitable abstraction for bridging the gap between high-level programming models and maintenance tools familiar to developers and the low-level hardware representations that enable high-performance execution on hardware accelerators. We adopt a principled hardware/software co-design methodology to develop a programming model providing the key properties that we observe are necessary for success, namely performance and scalability, ease of use, expressive power, and legacy support. First, we develop a framework that allows developers to port existing, legacy code to run on hardware accelerators by leveraging automata learning algorithms in a novel composition with software verification, string solvers, and high-performance automata architectures. Next, we design a domain-specific programming language to aid programmers writing pattern-searching algorithms and develop compilation algorithms to produce finite automata, which supports efficient execution on a wide variety of processing architectures. Then, we develop an interactive debugger for our new language, which allows developers to accurately identify the locations of bugs in software while maintaining support for high-throughput data processing. Finally, we develop two new automata-derived accelerator architectures to support additional applications, including the detection of security attacks and the parsing of recursive and tree-structured data. Using empirical studies, logical reasoning, and statistical analyses, we demonstrate that our prototype artifacts scale to real-world applications, maintain manageable overheads, and support developers' use of hardware accelerators. Collectively, the research efforts detailed in this dissertation help ease the adoption and use of hardware accelerators for data analysis applications, while supporting high-performance computation.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155224/1/angstadt_1.pd

    High-level synthesis of triple modular redundant FPGA circuits with energy efficient error recovery mechanisms

    Full text link
    There is a growing interest in deploying commercial SRAM-based Field Programmable Gate Array (FPGA) circuits in space due to their low cost, reconfigurability, high logic capacity and rich I/O interfaces. However, their configuration memory (CM) is vulnerable to ionising radiation which raises the need for effective fault-tolerant design techniques. This thesis provides the following contributions to mitigate the negative effects of soft errors in SRAM FPGA circuits. Triple Modular Redundancy (TMR) with periodic CM scrubbing or Module-based CM error recovery (MER) are popular techniques for mitigating soft errors in FPGA circuits. However, this thesis shows that MER does not recover CM soft errors in logic instantiated outside the reconfigurable regions of TMR modules. To address this limitation, a hybrid error recovery mechanism, namely FMER, is proposed. FMER uses selective periodic scrubbing and MER to recover CM soft errors inside and outside the reconfigurable regions of TMR modules, respectively. Experimental results indicate that TMR circuits with FMER achieve higher dependability with less energy consumption than those using periodic scrubbing or MER alone. An imperative component of MER and FMER is the reconfiguration control network (RCN) that transfers the minority reports of TMR components, i.e., which, if any, TMR module needs recovery, to the FPGA's reconfiguration controller (RC). Although several reliable RCs have been proposed, a study of reliable RCNs has not been previously reported. This thesis fills this research gap, by proposing a technique that transfers the circuit's minority reports to the RC via the configuration-layer of the FPGA. This reduces the resource utilisation of the RCN and therefore its failure rate. Results show that the proposed RCN achieves higher reliability than alternative RCN architectures reported in the literature. The last contribution of this thesis is a high-level synthesis (HLS) tool, namely TLegUp, developed within the LegUp HLS framework. TLegUp triplicates Xilinx 7-series FPGA circuits during HLS rather than during the register-transfer level pre- or post-synthesis flow stage, as existing computer-aided design tools do. Results show that TLegUp can generate non-partitioned TMR circuits with 500x less soft error sensitivity than non-triplicated functional equivalent baseline circuits, while utilising 3-4x more resources and having 11% lower frequency

    Real-Time Trigger and online Data Reduction based on Machine Learning Methods for Particle Detector Technology

    Get PDF
    Moderne Teilchenbeschleuniger-Experimente generieren während zur Laufzeit immense Datenmengen. Die gesamte erzeugte Datenmenge abzuspeichern, überschreitet hierbei schnell das verfügbare Budget für die Infrastruktur zur Datenauslese. Dieses Problem wird üblicherweise durch eine Kombination von Trigger- und Datenreduktionsmechanismen adressiert. Beide Mechanismen werden dabei so nahe wie möglich an den Detektoren platziert um die gewünschte Reduktion der ausgehenden Datenraten so frühzeitig wie möglich zu ermöglichen. In solchen Systeme traditionell genutzte Verfahren haben währenddessen ihre Mühe damit eine effiziente Reduktion in modernen Experimenten zu erzielen. Die Gründe dafür liegen zum Teil in den komplexen Verteilungen der auftretenden Untergrund Ereignissen. Diese Situation wird bei der Entwicklung der Detektorauslese durch die vorab unbekannten Eigenschaften des Beschleunigers und Detektors während des Betriebs unter hoher Luminosität verstärkt. Aus diesem Grund wird eine robuste und flexible algorithmische Alternative benötigt, welche von Verfahren aus dem maschinellen Lernen bereitgestellt werden kann. Da solche Trigger- und Datenreduktion-Systeme unter erschwerten Bedingungen wie engem Latenz-Budget, einer großen Anzahl zu nutzender Verbindungen zur Datenübertragung und allgemeinen Echtzeitanforderungen betrieben werden müssen, werden oft FPGAs als technologische Basis für die Umsetzung genutzt. Innerhalb dieser Arbeit wurden mehrere Ansätze auf Basis von FPGAs entwickelt und umgesetzt, welche die vorherrschenden Problemstellungen für das Belle II Experiment adressieren. Diese Ansätze werden über diese Arbeit hinweg vorgestellt und diskutiert werden

    Estudio, diseño e integración de un sistema basado en FPGA para el cálculo del tiempo de vuelo aplicado a equipos PET

    Get PDF
    La Medicina Nuclear ha experimentado avances significativos en los últimos años debido a la mejora en materiales, sistemas electrónicos, técnicas de algoritmia, de procesado etc., que han permitido que su aplicación se haya extendido considerablemente. Una de las técnicas que más ha progresado en este ámbito ha sido la Tomografía por Emisión de Positrones (PET, del inglés Positron Emission Tomography), consistente en un método no invasivo y muy útil para la evaluación de anomalías de tipo cancerosas. Este sistema está basado en un principio de toma de datos y procesado mediante el cual se obtienen imágenes de la distribución espacial y temporal de los procesos metabólicos que se generan en el interior del organismo. Los sistemas PET están formados por un conjunto de detectores, colocados habitualmente en anillo, de forma que cada uno de ellos proporciona información acerca de los eventos que se han producido en su interior. Uno de los motivos por el cual los sistemas PET han evolucionado de forma tan significativa, ha sido la aparición de técnicas que permiten determinar el Tiempo de Vuelo (TOF, del inglés Time of Flight) de los fotones que se generan a causa de la aniquilación de los positrones con su antipartícula, los electrones. La determinación del TOF permite establecer con mayor precisión la ubicación de los eventos que se generan y, por tanto, facilita la labor de reconstrucción de la imagen que, en última instancia, utilizará el equipo médico para el diagnóstico y/o tratamiento. En esta Tesis se parte de la hipótesis de desarrollar un sistema basado en Dispositivos Lógicos Reconfigurables (FPGAs, del inglés Field Programmable Gate Arrays) para la integración de un Convertidor Digital de Tiempo (TDC, del inglés Time-to-Digital Converter) para la medida precisa de tiempos con capacidad para el cálculo de la diferencia temporal de las partículas gamma para su posterior aplicación en sistemas PET. Inicialmente, se describe el entorno dentro del cual surge la necesidad de la implementación de tal sistema y se formula una premisa de partida. A continuación, se exponen los principios básicos del PET así como el estado del arte de los sistemas similares. Seguidamente, se plantean los principios del cálculo del TOF con FPGAs y se justifica el esquema adoptado, entrando en detalle en cada una de sus partes. Tras la implementación, se presentan los primeros resultados de medida de tiempos, obteniendo resoluciones menores de 100 ps para múltiples canales y caracterizando el sistema ante variaciones de temperatura. Una vez caracterizado el sistema, se presentan las pruebas realizadas con un prototipo PET de mama y con tecnología de detectores FotoMultiplicadores Sensibles a la Posición (PSPMTs, del inglés, Position Sensitive PhotoMultiplier Tubes), haciendo medidas de TOF para distintos supuestos. Tras esta primera prueba, se pasa a la implementación de dos módulos de FotoMultiplicadores basados en Silicio (SiPMs, del inglés Silicon PhotoMultipliers), detectores que presentan con respecto a los PSPMTs, entre otras ventajas, inmunidad a elevados campos magnéticos. Esto es de vital importancia si se pretende que el PET trabaje en combinación con una Resonancia Magnética (MR, del inglés Magnetic Resonance), como es el caso. Los dos módulos detectores se componen de un solo píxel y, para cada uno, se diseña su electrónica de acondicionamiento, teniendo en cuenta los parámetros más influyentes en la resolución temporal. Tras estos resultados, se pasa a probar el sistema en una matriz de 144 SiPMs, optimizando además diversos parámetros de impacto directo en el funcionamiento del sistema y, por tanto, en la resolución temporal alcanzada (hasta 700 ps). Por último, demostradas las capacidades del sistema, se lleva a cabo un proceso de optimización, tanto del TDC, que permite mejorar la resolución a valores menores de 40 ps, como de un algoritmo de coincidencias, el cual se encarga de identificar pares de detectores que han registrado un evento dentro de cierta ventana temporal. Finalmente, se recogen las conclusiones de la Tesis y las líneas futuras en las que se va a trabajar. Asimismo, se presentan las diversas participaciones, tanto en revistas de impacto como en congresos.Nuclear Medicine has undergone significant advances in recent years due to improvements in materials, electronics, software techniques, processing etc., which has allowed to considerably extend its application. One technique that has progressed in this area has been the Positron Emission Tomography (PET) based on a non-invasive method with its especial relevance in the evaluation of cancer diagnosis and assessment, among others. This system is based on the principle of data collection and processing from which images of the spatial and temporal distribution of the metabolic processes that are generated inside the body are obtained. The imaging system consists of a set of detectors, normally placed in a ring geometry, so that each one provides information about events that have occurred inside. One of the reasons that have significantly evolved in PET systems is the development of techniques to determine the Time-of-Flight (TOF) of the photons that are generated due to the annihilation of positrons with their antiparticle, the electron. Determining TOF allows one for a more precise location of the events that are generated inside the ring and, therefore, facilitates the task of image reconstruction that ultimately use the medical equipment for the diagnosis and/or treatment. This Thesis begins with the assumption of developing a system based on Field Programmable Gate Arrays (FPGAs) for the integration of a Time- to-Digital Converter (TDC) in order to precisely carry out time measurements. This would permit the estimation of the TOF of the gamma particles for subsequent application in PET systems. First of all, the environment for the application is introduced, justifying the need of the purposed system. Following, the basic principles of PET and the state-of-the-art of similar systems are introduced. Then, the principles of Time-of-Flight based on FPGAs are discussed, and the adopted scheme explained, going into detail in each of its parts. After the development, the initial time measurement results are presented, achieving time resolutions below 100 ps for multiple channels. Once characterized, the system is tested with a breast PET prototype, whose technology detectors are based on Position Sensitive PhotoMultiplier Tubes (PSPMTs), performing TOF measurements for different scenarios. After this point, tests based on two Silicon Photomultipliers (SiPMs) modules were carried out. SiPMs are immune to magnetic fields, among other advantages. This is an important feature since there is a significant interest in combining PET and Magnetic Resonances (MR). Each of the two detector modules used are composed of a single crystal pixel. The electronic conditioning circuits are designed, taking into account the most influential parameters in time resolution. After these results, an array of 144 SiPMs is tested, optimizing several parameters, which directly impact on the system performance. Having demonstrated the system capabilities, an optimization process is devised. On the one hand, TDC measurements are enhanced up to 40 ps of precision. On the other hand, a coincidence algorithm is developed, which is responsible of identifying detector pairs that have registered an event within certain time window. Finally, the Thesis conclusions and the future work are presented, followed by the references. A list of publications and attended congresses are also provided

    Online learning on the programmable dataplane

    Get PDF
    This thesis makes the case for managing computer networks with datadriven methods automated statistical inference and control based on measurement data and runtime observations—and argues for their tight integration with programmable dataplane hardware to make management decisions faster and from more precise data. Optimisation, defence, and measurement of networked infrastructure are each challenging tasks in their own right, which are currently dominated by the use of hand-crafted heuristic methods. These become harder to reason about and deploy as networks scale in rates and number of forwarding elements, but their design requires expert knowledge and care around unexpected protocol interactions. This makes tailored, per-deployment or -workload solutions infeasible to develop. Recent advances in machine learning offer capable function approximation and closed-loop control which suit many of these tasks. New, programmable dataplane hardware enables more agility in the network— runtime reprogrammability, precise traffic measurement, and low latency on-path processing. The synthesis of these two developments allows complex decisions to be made on previously unusable state, and made quicker by offloading inference to the network. To justify this argument, I advance the state of the art in data-driven defence of networks, novel dataplane-friendly online reinforcement learning algorithms, and in-network data reduction to allow classification of switchscale data. Each requires co-design aware of the network, and of the failure modes of systems and carried traffic. To make online learning possible in the dataplane, I use fixed-point arithmetic and modify classical (non-neural) approaches to take advantage of the SmartNIC compute model and make use of rich device local state. I show that data-driven solutions still require great care to correctly design, but with the right domain expertise they can improve on pathological cases in DDoS defence, such as protecting legitimate UDP traffic. In-network aggregation to histograms is shown to enable accurate classification from fine temporal effects, and allows hosts to scale such classification to far larger flow counts and traffic volume. Moving reinforcement learning to the dataplane is shown to offer substantial benefits to stateaction latency and online learning throughput versus host machines; allowing policies to react faster to fine-grained network events. The dataplane environment is key in making reactive online learning feasible—to port further algorithms and learnt functions, I collate and analyse the strengths of current and future hardware designs, as well as individual algorithms

    LUX-ZEPLIN (LZ) Technical Design Report

    Get PDF
    In this Technical Design Report (TDR) we describe the LZ detector to be built at the Sanford Underground Research Facility (SURF). The LZ dark matter experiment is designed to achieve sensitivity to a WIMP-nucleon spin-independent cross section of three times ten to the negative forty-eighth square centimeters

    Fuelling the zero-emissions road freight of the future: routing of mobile fuellers

    Get PDF
    The future of zero-emissions road freight is closely tied to the sufficient availability of new and clean fuel options such as electricity and Hydrogen. In goods distribution using Electric Commercial Vehicles (ECVs) and Hydrogen Fuel Cell Vehicles (HFCVs) a major challenge in the transition period would pertain to their limited autonomy and scarce and unevenly distributed refuelling stations. One viable solution to facilitate and speed up the adoption of ECVs/HFCVs by logistics, however, is to get the fuel to the point where it is needed (instead of diverting the route of delivery vehicles to refuelling stations) using "Mobile Fuellers (MFs)". These are mobile battery swapping/recharging vans or mobile Hydrogen fuellers that can travel to a running ECV/HFCV to provide the fuel they require to complete their delivery routes at a rendezvous time and space. In this presentation, new vehicle routing models will be presented for a third party company that provides MF services. In the proposed problem variant, the MF provider company receives routing plans of multiple customer companies and has to design routes for a fleet of capacitated MFs that have to synchronise their routes with the running vehicles to deliver the required amount of fuel on-the-fly. This presentation will discuss and compare several mathematical models based on different business models and collaborative logistics scenarios

    Conference on Intelligent Robotics in Field, Factory, Service, and Space (CIRFFSS 1994), volume 1

    Get PDF
    The AIAA/NASA Conference on Intelligent Robotics in Field, Factory, Service, and Space (CIRFFSS '94) was originally proposed because of the strong belief that America's problems of global economic competitiveness and job creation and preservation can partly be solved by the use of intelligent robotics, which are also required for human space exploration missions. Individual sessions addressed nuclear industry, agile manufacturing, security/building monitoring, on-orbit applications, vision and sensing technologies, situated control and low-level control, robotic systems architecture, environmental restoration and waste management, robotic remanufacturing, and healthcare applications

    Text Similarity Between Concepts Extracted from Source Code and Documentation

    Get PDF
    Context: Constant evolution in software systems often results in its documentation losing sync with the content of the source code. The traceability research field has often helped in the past with the aim to recover links between code and documentation, when the two fell out of sync. Objective: The aim of this paper is to compare the concepts contained within the source code of a system with those extracted from its documentation, in order to detect how similar these two sets are. If vastly different, the difference between the two sets might indicate a considerable ageing of the documentation, and a need to update it. Methods: In this paper we reduce the source code of 50 software systems to a set of key terms, each containing the concepts of one of the systems sampled. At the same time, we reduce the documentation of each system to another set of key terms. We then use four different approaches for set comparison to detect how the sets are similar. Results: Using the well known Jaccard index as the benchmark for the comparisons, we have discovered that the cosine distance has excellent comparative powers, and depending on the pre-training of the machine learning model. In particular, the SpaCy and the FastText embeddings offer up to 80% and 90% similarity scores. Conclusion: For most of the sampled systems, the source code and the documentation tend to contain very similar concepts. Given the accuracy for one pre-trained model (e.g., FastText), it becomes also evident that a few systems show a measurable drift between the concepts contained in the documentation and in the source code.</p
    corecore