635 research outputs found

    From FPGA to ASIC: A RISC-V processor experience

    Get PDF
    This work document a correct design flow using these tools in the Lagarto RISC- V Processor and the RTL design considerations that must be taken into account, to move from a design for FPGA to design for ASIC

    Interconnect design for the edge computing system-on-chip

    Get PDF
    Nowadays the majority of system-on-chips are designed by placing various IP blocks such as CPUs, memories and accelerators on the same chip. With the advantage of silicon manufacturing technologies, it has become possible to place hundreds of CPU cores and other design blocks on the same chip. A communication system that transfers data between chip components largely affects overall chip performance, computational speed and response time for external events. Firstly, this thesis studies the main on-chip interconnect design paradigms. According to the presented research, various architectures may be chosen for an interconnect design depending on the required complexity and number of subsystems. The shared and hybrid bus interconnects are one of the oldest means of on-chip communication. They are efficient for small systems with no more than ten IP blocks. The crossbars or bus matrix interconnects can help to build on-chip communication systems which can efficiently interconnect dozens of system-on-chip modules. The networks-on-chip can provide a communication solution for large scale chip designs with hundreds of IP blocks. The second part of this thesis focuses on the novel Ballast chip implementation and its interconnect design. The Ballast is a heterogeneous multiprocessor chip designed for edge computing and general-purpose computing applications. In this thesis Ballast interconnect was designed from scratch by using a cascaded crossbar approach by connecting three open-sourced AXI protocol bus matrices. The designed interconnect allows to efficiently connect 6 bus masters with 9 slaves and provides up to 9,6 GB/s bandwidth for the most productive CPU subsystem

    Integrated Circuit Design in US High-Energy Physics

    Full text link
    This whitepaper summarizes the status, plans, and challenges in the area of integrated circuit design in the United States for future High Energy Physics (HEP) experiments. It has been submitted to CPAD (Coordinating Panel for Advanced Detectors) and the HEP Community Summer Study 2013(Snowmass on the Mississippi) held in Minnesota July 29 to August 6, 2013. A workshop titled: US Workshop on IC Design for High Energy Physics, HEPIC2013 was held May 30 to June 1, 2013 at Lawrence Berkeley National Laboratory (LBNL). A draft of the whitepaper was distributed to the attendees before the workshop, the content was discussed at the meeting, and this document is the resulting final product. The scope of the whitepaper includes the following topics: Needs for IC technologies to enable future experiments in the three HEP frontiers Energy, Cosmic and Intensity Frontiers; Challenges in the different technology and circuit design areas and the related R&D needs; Motivation for using different fabrication technologies; Outlook of future technologies including 2.5D and 3D; Survey of ICs used in current experiments and ICs targeted for approved or proposed experiments; IC design at US institutes and recommendations for collaboration in the future

    Real-Time Trigger and online Data Reduction based on Machine Learning Methods for Particle Detector Technology

    Get PDF
    Moderne Teilchenbeschleuniger-Experimente generieren während zur Laufzeit immense Datenmengen. Die gesamte erzeugte Datenmenge abzuspeichern, überschreitet hierbei schnell das verfügbare Budget für die Infrastruktur zur Datenauslese. Dieses Problem wird üblicherweise durch eine Kombination von Trigger- und Datenreduktionsmechanismen adressiert. Beide Mechanismen werden dabei so nahe wie möglich an den Detektoren platziert um die gewünschte Reduktion der ausgehenden Datenraten so frühzeitig wie möglich zu ermöglichen. In solchen Systeme traditionell genutzte Verfahren haben währenddessen ihre Mühe damit eine effiziente Reduktion in modernen Experimenten zu erzielen. Die Gründe dafür liegen zum Teil in den komplexen Verteilungen der auftretenden Untergrund Ereignissen. Diese Situation wird bei der Entwicklung der Detektorauslese durch die vorab unbekannten Eigenschaften des Beschleunigers und Detektors während des Betriebs unter hoher Luminosität verstärkt. Aus diesem Grund wird eine robuste und flexible algorithmische Alternative benötigt, welche von Verfahren aus dem maschinellen Lernen bereitgestellt werden kann. Da solche Trigger- und Datenreduktion-Systeme unter erschwerten Bedingungen wie engem Latenz-Budget, einer großen Anzahl zu nutzender Verbindungen zur Datenübertragung und allgemeinen Echtzeitanforderungen betrieben werden müssen, werden oft FPGAs als technologische Basis für die Umsetzung genutzt. Innerhalb dieser Arbeit wurden mehrere Ansätze auf Basis von FPGAs entwickelt und umgesetzt, welche die vorherrschenden Problemstellungen für das Belle II Experiment adressieren. Diese Ansätze werden über diese Arbeit hinweg vorgestellt und diskutiert werden

    Network-on-Chip

    Get PDF
    Addresses the Challenges Associated with System-on-Chip Integration Network-on-Chip: The Next Generation of System-on-Chip Integration examines the current issues restricting chip-on-chip communication efficiency, and explores Network-on-chip (NoC), a promising alternative that equips designers with the capability to produce a scalable, reusable, and high-performance communication backbone by allowing for the integration of a large number of cores on a single system-on-chip (SoC). This book provides a basic overview of topics associated with NoC-based design: communication infrastructure design, communication methodology, evaluation framework, and mapping of applications onto NoC. It details the design and evaluation of different proposed NoC structures, low-power techniques, signal integrity and reliability issues, application mapping, testing, and future trends. Utilizing examples of chips that have been implemented in industry and academia, this text presents the full architectural design of components verified through implementation in industrial CAD tools. It describes NoC research and developments, incorporates theoretical proofs strengthening the analysis procedures, and includes algorithms used in NoC design and synthesis. In addition, it considers other upcoming NoC issues, such as low-power NoC design, signal integrity issues, NoC testing, reconfiguration, synthesis, and 3-D NoC design. This text comprises 12 chapters and covers: The evolution of NoC from SoC—its research and developmental challenges NoC protocols, elaborating flow control, available network topologies, routing mechanisms, fault tolerance, quality-of-service support, and the design of network interfaces The router design strategies followed in NoCs The evaluation mechanism of NoC architectures The application mapping strategies followed in NoCs Low-power design techniques specifically followed in NoCs The signal integrity and reliability issues of NoC The details of NoC testing strategies reported so far The problem of synthesizing application-specific NoCs Reconfigurable NoC design issues Direction of future research and development in the field of NoC Network-on-Chip: The Next Generation of System-on-Chip Integration covers the basic topics, technology, and future trends relevant to NoC-based design, and can be used by engineers, students, and researchers and other industry professionals interested in computer architecture, embedded systems, and parallel/distributed systems

    Integrated Circuits and Systems for Smart Sensory Applications

    Get PDF
    Connected intelligent sensing reshapes our society by empowering people with increasing new ways of mutual interactions. As integration technologies keep their scaling roadmap, the horizon of sensory applications is rapidly widening, thanks to myriad light-weight low-power or, in same cases even self-powered, smart devices with high-connectivity capabilities. CMOS integrated circuits technology is the best candidate to supply the required smartness and to pioneer these emerging sensory systems. As a result, new challenges are arising around the design of these integrated circuits and systems for sensory applications in terms of low-power edge computing, power management strategies, low-range wireless communications, integration with sensing devices. In this Special Issue recent advances in application-specific integrated circuits (ASIC) and systems for smart sensory applications in the following five emerging topics: (I) dedicated short-range communications transceivers; (II) digital smart sensors, (III) implantable neural interfaces, (IV) Power Management Strategies in wireless sensor nodes and (V) neuromorphic hardware

    Verkkoliikenteen hajauttaminen rinnakkaisprosessoitavaksi ohjelmoitavan piirin avulla

    Get PDF
    The expanding diversity and amount of traffic in the Internet requires increasingly higher performing devices for protecting our networks against malicious activities. The computational load of these devices may be divided over multiple processing nodes operating in parallel to reduce the computation load of a single node. However, this requires a dedicated controller that can distribute the traffic to and from the nodes at wire-speed. This thesis concentrates on the system topologies and on the implementation aspects of the controller. A field-programmable gate array (FPGA) device, based on a reconfigurable logic array, is used for implementation because of its integrated circuit like performance and high-grain programmability. Two hardware implementations were developed; a straightforward design for 1-gigabit Ethernet, and a modular, highly parameterizable design for 10-gigabit Ethernet. The designs were verified by simulations and synthesizable testbenches. The designs were synthesized on different FPGA devices while varying parameters to analyze the achieved performance. High-end FPGA devices, such as Altera Stratix family, met the target processing speed of 10-gigabit Ethernet. The measurements show that the controller's latency is comparable to a typical switch. The results confirm that reconfigurable hardware is the proper platform for low-level network processing where the performance is prioritized over other features. The designed architecture is versatile and adaptable to applications expecting similar characteristics.Internetin edelleen lisääntyvä ja monipuolistuva liikenne vaatii entistä tehokkaampia laitteita suojaamaan tietoliikenneverkkoja tunkeutumisia vastaan. Tietoliikennelaitteiden kuormaa voidaan jakaa rinnakkaisille yksiköille, jolloin yksittäisen laitteen kuorma pienenee. Tämä kuitenkin vaatii erityisen kontrolloijan, joka kykenee hajauttamaan liikennettä yksiköille linjanopeudella. Tämä tutkimus keskittyy em. kontrolloijan järjestelmätopologioiden tutkimiseen sekä kontrolloijan toteuttamiseen ohjelmoitavalla piirillä, kuten kenttäohjelmoitava järjestelmäpiiri (eng. field programmable gate-array, FPGA). Kontrolloijasta tehtiin yksinkertainen toteutus 1-gigabitin Ethernet-verkkoihin sekä modulaarinen ja parametrisoitu toteutus 10-gigabitin Ethernet-verkkoihin. Toteutukset verifioitiin simuloimalla sekä käyttämällä syntetisoituvia testirakenteita. Toteutukset syntetisoitiin eri FPGA-piireille vaihtelemalla samalla myös toteutuksen parametrejä. Tehokkaimmat FPGA-piirit, kuten Altera Stratix -piirit, saavuttivat 10-gigabitin prosessointivaatimukset. Mittaustulokset osoittavat, että kontrollerin vasteaika ei poikkea tavallisesta verkkokytkimestä. Työn tulokset vahvistavat käsitystä, että ohjelmoitavat piirit soveltuvat hyvin verkkoliikenteen matalantason prosessointiin, missä vaaditaan ensisijaisesti suorituskykyä. Suunniteltu arkkitehtuuri on monipuolinen ja soveltuu joustavuutensa ansiosta muihin samantyyppiseen sovelluksiin

    Dynamic reconfiguration frameworks for high-performance reliable real-time reconfigurable computing

    Get PDF
    The sheer hardware-based computational performance and programming flexibility offered by reconfigurable hardware like Field-Programmable Gate Arrays (FPGAs) make them attractive for computing in applications that require high performance, availability, reliability, real-time processing, and high efficiency. Fueled by fabrication process scaling, modern reconfigurable devices come with ever greater quantities of on-chip resources, allowing a more complex variety of applications to be developed. Thus, the trend is that technology giants like Microsoft, Amazon, and Baidu now embrace reconfigurable computing devices likes FPGAs to meet their critical computing needs. In addition, the capability to autonomously reprogramme these devices in the field is being exploited for reliability in application domains like aerospace, defence, military, and nuclear power stations. In such applications, real-time computing is important and is often a necessity for reliability. As such, applications and algorithms resident on these devices must be implemented with sufficient considerations for real-time processing and reliability. Often, to manage a reconfigurable hardware device as a computing platform for a multiplicity of homogenous and heterogeneous tasks, reconfigurable operating systems (ROSes) have been proposed to give a software look to hardware-based computation. The key requirements of a ROS include partitioning, task scheduling and allocation, task configuration or loading, and inter-task communication and synchronization. Existing ROSes have met these requirements to varied extents. However, they are limited in reliability, especially regarding the flexibility of placing the hardware circuits of tasks on device’s chip area, the problem arising more from the partitioning approaches used. Indeed, this problem is deeply rooted in the static nature of the on-chip inter-communication among tasks, hampering the flexibility of runtime task relocation for reliability. This thesis proposes the enabling frameworks for reliable, available, real-time, efficient, secure, and high-performance reconfigurable computing by providing techniques and mechanisms for reliable runtime reconfiguration, and dynamic inter-circuit communication and synchronization for circuits on reconfigurable hardware. This work provides task configuration infrastructures for reliable reconfigurable computing. Key features, especially reliability-enabling functionalities, which have been given little or no attention in state-of-the-art are implemented. These features include internal register read and write for device diagnosis; configuration operation abort mechanism, and tightly integrated selective-area scanning, which aims to optimize access to the device’s reconfiguration port for both task loading and error mitigation. In addition, this thesis proposes a novel reliability-aware inter-task communication framework that exploits the availability of dedicated clocking infrastructures in a typical FPGA to provide inter-task communication and synchronization. The clock buffers and networks of an FPGA use dedicated routing resources, which are distinct from the general routing resources. As such, deploying these dedicated resources for communication sidesteps the restriction of static routes and allows a better relocation of circuits for reliability purposes. For evaluation, a case study that uses a NASA/JPL spectrometer data processing application is employed to demonstrate the improved reliability brought about by the implemented configuration controller and the reliability-aware dynamic communication infrastructure. It is observed that up to 74% time saving can be achieved for selective-area error mitigation when compared to state-of-the-art vendor implementations. Moreover, an improvement in overall system reliability is observed when the proposed dynamic communication scheme is deployed in the data processing application. Finally, one area of reconfigurable computing that has received insufficient attention is security. Meanwhile, considering the nature of applications which now turn to reconfigurable computing for accelerating compute-intensive processes, a high premium is now placed on security, not only of the device but also of the applications, from loading to runtime execution. To address security concerns, a novel secure and efficient task configuration technique for task relocation is also investigated, providing configuration time savings of up to 32% or 83%, depending on the device; and resource usage savings in excess of 90% compared to state-of-the-art

    Readout Electronics for the Upgraded ITS Detector in the ALICE Experiment

    Get PDF
    ALICE is undergoing upgrades during the Long Shutdown (LS) 2 of the LHC to improve its performance and capabilities, and to prepare the experiment for the increases in luminosity provided by the LHC in Run 3 and Run 4. One of the most extensive upgrades of the experiment (and the topic of this thesis) is the replacement of the Inner Tracking System (ITS) in its entirety with a new and upgraded system. The new ITS consists exclusively of pixel sensors organized in seven cylindrical layers, and offers significantly improved tracking capabilities at higher interaction rates. And in contrast to the previous system, which would only trigger on a subset of the available events that were deemed “interesting”, the upgraded ITS will capture all events; either in a triggered mode using minimum-bias triggers, or in a “trigger-less” continuous mode where event data is continuously read out. The key component of the upgrade is a novel pixel sensor chip, the ALPIDE, which was developed at CERN specifically for the ALICE ITS upgrade. The seven layers of the ITS is assembled from sub-assemblies of sensor chips referred to as staves, and the entire detector consists of 24 120 chips in total. The staves come in three different configurations; they range from 9 chips per stave for the innermost layers, and up to 196 chips per stave in the outer layers. The number of control and data links, as well as the bit-rate of the data links, differs widely between the staves as well. Data readout from the high-speed copper links of the detector requires dedicated readout electronics in the vicinity of the detector. The core component of this system is the FPGA-based Readout Unit (RU). It facilitates the readout of the data links and transfer data to the experiment’s server farms via optical links; provides control, configuration and monitoring of the sensor chips using the same optical links, as well as over CAN-bus for redundancy; distributes trigger signals to the sensor, either by forwarding the minimum-bias triggers of the experiment, or by local generation of trigger pulses for the continuous mode. And the field-programmable devices of the RU allows for future updates and changes of functionality, which can be performed remotely via several redundant paths to the RUs. This is an important feature, since the RUs are not easily accessible when they are installed in the cavern of the experiment and will be exposed to radiation when the LHC is in operation. Radiation tolerance has been an important concern during the development of the FPGA designs, as well as the RU hardware itself, since radiation-induced errors in the RUs are expected during operation. Techniques such as Triple Modular Redundancy (TMR) were used in the FPGA designs to mitigate these effects. One example is the radiation tolerant CAN controller design which is introduced in this thesis. A different challenge, which is also addressed in this thesis, is the monitoring of internal status and quantities such as temperature and voltage in the ALPIDE chips. This is performed over the ALPIDE’s control bus, but must be carefully coordinated as the control bus is also used for triggers. The detector and readout electronics are designed to operate under a wide set of conditions. Considering events from Pb–Pb collisions, which may have thousands of pixel hits in the detector, a typical pp event has comparatively few pixel hits, but the collision rate is significantly higher for pp runs than it is for Pb–Pb runs. And the detector can be used with two triggering modes, where the continuous trigger mode has additional parameters for trigger period. A simulation model of the ALPIDE and ITS, presented in this thesis, was developed to simulate the readout performance and efficiency of the detector under a wide set of circumstances. The simulated results show that the detector should perform with a high efficiency at the collision rates that are planned for Run 3. Initial plans for a dedicated hardware, to handle and coordinate busy status for the detector, was deemed superfluous and the plans were canceled based on these results. Collision rates higher than those planned for Run 3 were also simulated to yield parameters for optimal performance at those rates. For the RU, which was designed to interface to three widely different stave designs, the simulations quantified the amount of data the readout electronics will have to handle depending on the detector layer and operating conditions. Furthermore, the simulation model was adapted for simulations of two other ALPIDE-based detector projects; the Proton CT (pCT) project at University of Bergen (UiB), a Digital Tracking Calorimeter (DTC) used for dose planning of particle therapy in cancer treatment; and the planned Forward Calorimeter (FoCal) for ALICE, where there will be two layers of pixel sensors among the 18 layers of Si-W calorimeter pads in the electromagnetic part of the detector (FoCal-E). Since the size of a calorimeter pad is relatively large, around 1 cm², the fine grained pixels of the ALPIDE (29.24 µm × 26.88 µm) will help distinguish between multiple showers and improve the overall spatial resolution of the detector. The simulations helped prove the feasibility of the ALPIDE for this detector, from a readout perspective, and FoCal was later approved by the LHCC committee at CERN.Doktorgradsavhandlin
    corecore