: The upgrades of the Large Hadron Collider (LHC) at CERN and the experiments in 2019/20 and 2024/26 will allow to increase the instantaneous luminosity to L = 2 × 10 34 cm −2 s −1 and L = 5 − 7 × 10 34 cm −2 s −1 , respectively. For the High Luminosity (HL) HL-LHC phase, the expected mean number of interactions per bunch crossing will be 55 at L = 2 × 10 34 cm −2 s −1 and 140 at L = 5 × 10 34 cm −2 s −1 . This increase drastically impacts the ATLAS trigger system and trigger rates. For the ATLAS Muon Spectrometer, a replacement of the innermost endcap stations, the so-called "Small Wheels", which are operating in a magnetic field, is therefore planned for 2019/20 to be able to maintain a low pT threshold for single muons and excellent tracking capability in the HL-LHC regime. The New Small Wheels will feature two new detector technologies: resistive Micromegas and small strip Thin Gap Chambers comprising a system of 2.4 million readout channels. Both detector technologies will provide trigger and tracking primitives fully compliant with the post-2026 HL-LHC operation. To allow for some safety margin, the design studies assume a maximum instantaneous luminosity of L = 7 × 10 34 cm −2 s −1 , 200 pile-up events, trigger rates of 1 MHz at Level-0 and 400 KHz at Level-1. A radiation dose of 1700 Gy (innermost radius) is expected. The on-detector electronics will be implemented on some 8000 boards; four different custom ASICs will be used. The large number of readout channels, high speed output data rate, harsh radiation and magnetic environment, small available space, poor access and low power consumption all impose great challenges for the system design. The overall design and first results from integration of the electronics in a vertical slice test will be presented.
The data from the ROC are received by a Gigabit Transceiver Link (GBT) [12] and sent via fiber to a general purpose network with a high-availability interface, called FELIX. All hits in the VMM ASIC are stored until a Level-0 trigger (Phase 2, 1 MHz rate, up to 10 µs fixed latency) or a Level-1 trigger (Phase 1, 100 KHz or Phase 2, "low latency" scenario, up to 800 MHz) and then only hits in the bunch crossings (BC) "near" the triggering BC are selected for transfer to the ROC ASIC.
The ROC will aggregate data from up to eight VMMs and send it either unfiltered (after Phase 1 of the ATLAS detector upgrade) or filter it based on a Level-1 trigger (after Phase 2). For Phase 2 and the originally planned two-level trigger the ROC then buffers the Level-0 accepted (L0A) events until a Level-1 Accept (L1A)trigger (400 KHz rate, up to 60 µs variable latency) further reduces the number of events to transmit. For single-level trigger readout all events are passed, but the buffering in the ROC still results in further derandomization of the event data.
For the trigger path of the MM detectors the ADDC will receive the ART data from 64 VMMs (eight MM front-end boards). The first arrival hit address is selected in real time and the selected data will be sent to the trigger processors. The NSW sTGC trigger system makes use of coincidences of detector Pads to identify regions where a muon candidate was detected. To reduce the amount of data sent to the off-detector trigger processors, only sTGC strip information coming from the regions selected by the pad trigger logic are transmitted off-detector. The TDS and Router electronics handle the serialization and subsequent routing of active trigger data signals from the front-end of the sTGC detectors to track-finding processors. The L1DDC placed on the rim will be used to provide high quality clocks, configuration or additional data to the pad trigger and router boards.
The trigger signals from both the sTGC and the MM chambers will be processed by the NSW Trigger Processor cards located in USA15. The trigger algorithms will be implemented in conventional FPGAs housed in Advance Telecommunication Computing Architecture (ATCA) crates. The two chamber technologies will share the same hardware and, as much as possible, the same firmware. The output of the trigger processor will be sent to the new sector logic boards, to be combined with the Big Wheel trigger and define the final Level 1 trigger signal.
Board description

Front End boards
Due to the different characteristics of both detector technologies (MM and sTGCs) three different types of Front-End boards (FE) will be fabricated: one for the MM which is called MMFE8 (the "8" depicts the number of VMMs), and two for the sTGC, the strip FE (sFE) and pad FE (pFE) boards. In total 4096 MMFE8, 768 sFE and 768 pFE will be fabricated.
The VMM is a radiation tolerant ASIC fabricated in the 130 nm Global Foundries 8RF-DM process (former IBM 8RF-DM). The block diagram of a channel and the common blocks is shown in figure 2 . The VMM is composed of 64 front-end channels each providing a low-noise charge amplifier (CA) with adaptive feedback, a shaper with baseline stabilizer, a discriminator with trimmer, a peak detector, a time detector, some logic and a dedicated digital output for Time-over-Threshold (ToT) or Time-to-Peak (TtP) measurements. Shared among channels are the bias circuits, a temperature sensor, a test pulse generator, two 10-bit DACs for adjusting the threshold and test pulse amplitudes, a mixed-signal multiplexer, the control logic and the ART which consists of dedicated digital outputs (flag and address) for the first above-threshold event. The peak detector measures the peak amplitude and stores it in an analogue memory. The time detector measures the peak timing using a time-to-amplitude converter (TAC). The TAC value is stored in an analogue memory and the ramp duration is adjustable. The peak and time detectors are followed by a set of three low-power ADCs (a 6-bit, a 10-bit, and a 8-bit).
The data from up to eight VMMs are passed to the ROC ASIC that merges hits from the VMMs, reformats the data, adds headers and interfaces with the L1DDC. The data transfer from the VMMs to the ROC is via two serial lines (even bits on one, odd bits on the other) running at 160 MHz with Double Data Rate (DDR) giving a total bandwidth of 640 Mbps. The Readout Controller supplies the clock for this transfer. 8b/10b encoding is used with at least one comma character, K28.5, transmitted continuously between event data. The 8b/10b encoding reduces the effective bandwidth to 512 Mbps.
The L0A is sent by the ROC to the VMMs to select the data to be sent to the ROC while the L1A is used by the ROC to select data for readout to FELIX. For single-level trigger readout (rate up to 1 MHz) both L0A and L1A are sent at the same bunch crossing. For a two-level trigger, the L0A rate can be up to 1MHz but the L1A rate will be less. For the data transmission to the Gigabit Transceiver (GBTX) ASIC of the L1DDC, differential lines are used each running at 80, 160 or 320 Mbps using one lane, or 640 Mbps using two lanes. In order to have data rates greater than 640 Mbps from one Readout Controller, as are needed at the inner radii of both detectors, several differential pairs must run in parallel. The ROC therefore has four independent sub-Readout Controllers, SROCs, each with a dedicated double-lane differential pair and configurable data rate, as shown in figure 3 . For flexibility, the ROC has a crossbar that allows routing up to eight VMMs to up to four different SROCs. The time-over-threshold from each channel is sent to the pad TDS (pTDS) ASIC. This ASIC has 104 inputs that are divided into 13 groups with eight pads per group and there is a configurable local BC clock for each group. Timing is adjustable (in steps of 3.125 ns within a range of 25 ns) since different pads can have very different trace length. In case there is a pad fired, the pad TDS uses the leading edge of the pulse signal to sample the BC clock in the group it belongs to obtain the BCID for that pad. The sampled BCIDs are kept in the ring buffers. At the end of BC k, the TDS will check the ring buffer of each channel to see if there is a hit with a BCID equals to k. A channel is marked as "yes" as long as there is a match in the ring buffer, otherwise, a "no" will be recorded. All data except the header bits are scrambled to keep DC-balanced, serialized and transmitted with a speed of 4.8 Gbps.
The 6-bit charge information from 128 VMM channels is sent to the strip TDS (sTDS) ASIC. The design of the sTDS can be divided into three major parts: VMM interface, Preprocessor, and Serialization. The VMM interface reads in the VMM output data and stores it in ring buffers awaiting track road information from the pad trigger logic. The Preprocessor performs the BCID and pad trigger matching and select strips within the pad road for transmission to the router. The Serialization part prepares the trigger data and serializes it for transmission to the on-rim router.
To configure and monitor the VMM, TDS and ROC ASICs the GBT -Slow Control Adapter (GBT-SCA) ASIC will be used [13] . The GBT-SCA implements a point-to-multi point connection between one GBT optical link and several front end ASICs. The GBT-SCA connects to a dedicated electrical port on the GBTX ASIC that provides 80 Mbps of bidirectional data traffic. If needed, more than one GBT-SCA ASIC can be connected to a GBTX thus increasing the control and monitoring capabilities in the system. The GBT-SCA features several I/O ports to interface with the embedded front-end ASICs. There are 16 I 2 C buses, one JTAG controller port, four 8-bit wide parallel-ports, a memory bus controller and an ADC to monitor up to eight external analogue signals, as shown in figure 4. All these ports are accessible from the counting room electronics, via the GBT optical link system. Special design techniques are being employed to protect the operation of the GBT-SCA against radiation induced Single-Event-Upsets to a level that is compatible with phase-2 running. The power distribution of the FE and all the custom made electronic boards will be made through a custom DC-DC converter produced at CERN called FEAST [14] . FEAST is a singlephase synchronous buck converter developed to provide an efficient solution for the distribution of power in high energy physics experiments. It has been designed for flawless functionality in a harsh radiation and magnetic field environment. The converter is capable of continuous operation up to more than 200 Mrad (Si) total ionizing dose and an integrated particle fluence of 5−8×10 14 n/cm −2 (1 MeV equivalent). FEAST has been designed for operation in a strong magnetic field in excess of 40,000 Gauss and has been optimized for air-core inductors of 400 − 500 nH. It provides a POL regulation from a 5 − 12 V supply rail with 4 A load and it's protection features include OverCurrent, Over-Temperature and Under-Voltage to improve system-level security in the event of fault conditions.
L1DDC
The L1DDC will be the interface between multiple FE boards and the FELIX network interface. This is achieved using the high speed serializer/deserializer GBTX ASIC [15] developed at CERN. The GBTX is a radiation tolerant ASIC fabricated using the IBM/GlobalFoundries 130 nm CMOS technology. The GBTX is capable of multiplexing a number of serial links (E-Links) to a single fiber. One E-Link, consists of three differential pairs (6 wires) being the clock (Clk+ and Clk-), the data in (Din+ and Din-) and the data out (Dout+ and Dout-). The GBTX can support up to 40 E-Links divided into five groups called banks, as shown in figure 5 . Each bank can support up to eight E-Links at 80 Mbps, four E-Links at 160 Mbps or two ELinks at 320 Mbps. The GBTX has an extra Slow Control (SC) E-Link with a fixed rate at 80 Mbps for slow control information. The GBTX has a Clock and Data Recovery (CDR) circuit which receives high speed serial data from a custom made optical transceiver, the Versatile Tranciever (VTRx) [16] . The VTRx consists of two radiation tolerant ASICs, the GigaBit Laser Driver (GBLD) [17] and the GigaBit TransImpedance Amplifier (GBTIA). It recovers and generates an appropriate high speed clock to correctly sample the incoming data stream. The serial data is then de-serialized and decoded, with appropriate error corrections, and finally DeSCRambled (DSCR). In the transmitter part the data are SCRambled (SCR), to obtain DC balance, and then encoded with a Forward Error Correction (FEC) code before being serialized and sent to the optical transceiver. The internal registers of the GBTX can be modified via the optical link itself or via an I 2 C slave interface. L1DDC provides also the clock and Bunch Crossing Reset (BCR) signals to the ADDC board. 512 L1DDC boards will be fabricated for the MM and 544 (512 for the DAQ and 32 for the trigger path) for the sTGC detectors.
ADDC
The ADDC will receive the ART data (the address of the strip of the first hit) from 64 VMMs (eight MMFE8s). A priority based hit selection is implemented in real time, and the selected data will be sent to trigger processors. The basic plan for the ADDC is to use two customized ART ASICs, see figure 6 for a schematic diagram, to deserialize and align the 64 channels of ART data (32 channels for each ART) and complete the hit selection processing. After the hit selection processing is finished, the selected data from each ART ASIC will be sent to one corresponding GBTX. One Versatile Twin-Transmitter (VTTx) module with two fiber cables will be used to transmit the data from the two GBTx chips and then transmit the data to the MM trigger processor. A GBT-SCA ASIC will be used also for the configuration and monitoring the ART and GBTX ASICs. In total 512 ADDC boards will be used in the NSW. For each input, the ART ASIC has a programmable delay. The purpose of these delays is to skew the input signals to avoid setup or hold violations on the local clock phase. Next the signals are deserialized and are forwarded to the programmable deadtime gate, which temporarily masks VMM channels after a hit. After a priority-based hit selection (of up to eight hits from 32 inputs) is implemented and the data are formatted for transmission via a GBTX ASIC as shown in figure 6.
PAD trigger
From the pTDS, pad signals are sent to the pad trigger logic, which looks for hit coincidences in a tower of logical Pads within one Bunch Crossing. A three out of four majority logic is used for both sTGC quadruplets to select a muon candidate. The geometrical coordinates of the candidate together with its corresponding BCID are sent to the front-end sTGC sTDS ASICs, which then send only selected strip data to the routers. Each wheel is equipped with 16 Pad trigger logic boards on its rim, one per sector, for a total of 32 boards. The rim is a few meters away of the interaction point; the radiation and magnetic fields are not so strong, which makes it possible to implement the PAD logic in commercial FPGAs.
Router
If a candidate track is identified, the pad trigger logic selects the band of strips in each layer (band-ID) associated with the triggered pads, and distributes the IDs of the bands to the sTDS. The sTDS then transmit the strip charges, BCID, band-ID, and φ-ID (total 120 bits) on twinax fast serial copper wires to the signal packet Router on the periphery of wheel. The Routers remove NULL data and forward valid strip data to a limited number of optoelectronic outputs. The data are sent by fiber to track finding processors in USA15 where centroids and track segments are calculated and sent to sector logic to be combined with candidate tracks from the Big Wheel. Router boards will be placed on the rim inside commercial boxes along with PAD and RIM-L1DDC boards and will use also commercial FPGAs. Eight router boards will serve a NSW sector resulting in 256 router boards.
FELIX
FELIX will be used as gateway between dedicated links connecting to detectors and trigger electronics, links providing timing and trigger information, and a commodity network (Ethernet or Infiniband). Software running on server PCs connected to a commodity network -using "Commercial Off The Shelf" (COTS) switches -will interact via FELIX with on-detector and trigger electronics for configuration, control, monitoring and calibration. Via FELIX event data to be read out will be passed to server PCs where software will build event fragments, which will be in turn passed on to the Read Out System (ROS), a subsystem of the ATLAS DAQ system that receives and buffers data from all sub-detectors and trigger systems.
Trigger Processor
The main goal of the NSW trigger is to provide additional information to the muon Level-1 trigger in order to dramatically reduce fake triggers arising from particles that are not high-p T muons originating in the interaction point (IP). Low energy particles, mainly protons, produced in the material located between the Small Wheel and the end-cap middle station, the Big Wheel (BW), are a major source of fake triggers. These particles can cross the end-cap trigger chambers at an angle similar to that of real high p T muons. The NSW trigger signal is based on track segments produced online by the sTGC and MM chambers comprising the NSW detectors. These candidate track segments are input to the new sector logic that uses the information to corroborate trigger candidates from the Big Wheel TGC chambers. The sector logic sends Level-1 trigger candidates to the ATLAS Muon Central Trigger system. The trigger system for the endcap muon detectors in ATLAS currently relies on the use of the middle layer and the reconstruction of projective segments pointing to the IP by the TGCs.
Vertical Slice integration
During August 2016 two weeks were allocated for testing the existing NSW electronics. The integration took place in building 188 at CERN in the Vertical Slice laboratory. At that time the first prototypes of ADDC, L1DDC, MMFE8, sFE pFE, FELIX and the trigger processor algorithm were available. PAD trigger, Routers, RIM-L1DDC were not fabricated and subsequently weren't tested. During these two weeks these tests were implemented:
• Testing of the MMFE8 FPGA firmware (the ROC was not available). The communication was done via 1 Gbps Ethernet using the UDP/IP protocol. The FPGA was able to configure/readout the eight VMMs with the MMFE8 mounted on a small sized MM chamber and use of an Fe-55 iron source to produce real events. The firmware was tested with the internal pulser (emulating the trigger) and with external triggers produced by an external module.
• The serial communication protocol for the E-Links between the MMFE8 and the L1DDC was also verified. The two boards have custom ASICs and a commercial FPGA using different technical standards for their Inputs and Outputs (IO). The GBTX differential output signals use the SLVS standard [18] (V cm = 0.2 V and a swing of 400 mV) and for the inputs the SLVS or LVDS (V cm = 1.2 V) standard. The MMFE8 on the other side uses a non-standard LVDS (V cm = 0.6 V) for both inputs and outputs. The compatibility of these standards at a rate of 80 Mbps were verified.
• Trigger data were sent successfully from the MMFE8 to the ADDC through serial streams and finally via a fiber and the GBTX protocol to the trigger processor (algorithm running on a VC707 Xilinx evaluation board).
• A link was successfully established between an L1DDC and FELIX and the communication protocol was verified. Moreover, a loopback was achieved by sending data from FELIX (firmware running on a VC709 Xilinx evaluation board mounted inside a regular PC) through the L1DDC to the MMFE8 and then back to FELIX.
• Software testing. The corresponding software was able to handle the incoming packets through Ethernet, analyse and process the data but also to configure the eight VMMs of the MMFE8 boards.
• The sTGC FE boards were using only raw Ethernet so only the Firmware were tested. These test included mainly configuration and readout of multiple VMMs but without using the official software that supports the UDP/IP protocol.
