55 research outputs found
A New Design of Ultra-Flattened Near-zero Dispersion PCF Using Selectively Liquid Infiltration
The paper report new results of chromatic dispersion in Photonic Crystal
Fibers (PCFs) through appropriate designing of index-guiding triangular-lattice
structure devised, with a selective infiltration of only the first air-hole
ring with index-matching liquid. Our proposed structure can be implemented for
both ultra-low and ultra-flattened dispersion over a wide wavelength range. The
dependence of dispersion parameter of the PCF on infiltrating liquid indices,
hole-to-hole distance and air-hole diameter are investigated in details. The
result establishes the design to yield a dispersion of 0+-0.15ps/ (nm.km) in
the communication wavelength band. We propose designs pertaining to
infiltrating practical liquid for near-zero ultra-flat dispersion of
D=0+-0.48ps/ (nm.km) achievable over a bandwidth of 276-492nm in the wavelength
range of 1.26 {\mu}m to 1.80{\mu}m realization.Comment: 6 pages, 13 figures, 1 tabl
Recommended from our members
Model-Architecture Co-design of Deep Neural Networks for Embedded Systems
In deep learning, a convolutional neural network (ConvNet or CNN) is a powerful tool for building interesting embedded applications that use data to make predictions. An application running on an embedded system typically has limited access to memory resources, processing power, and storage. Implementing deep convolutional neural network-based inference on resource-constrained devices can be very challenging, as these environments cannot usually make use of the massive computing power and storage that are present in cloud server environments. Furthermore, the constantly evolving nature of modern deep network architecture aggravates the problem by making it necessary to balance flexibility against specialisation to avoid the inability to adapt. However, much of the baseline architecture of a deep convolutional neural network stayed the same. With careful optimisation of the most common and widely occurring layer architectures, it is typically possible to accelerate these emerging workloads for resource-constrained embedded systems.
This thesis makes four contributions. I first developed a lossy three-stage low-rank approximation scheme that can reduce the computational complexity of a pre-trained model by 3-5x and up to 8-9x for individual convolutional layers. This scheme requires restructuring of the convolutional layers and generally suits the scenario where both the training data and trained model are available.
In many scenarios, the training data is not available for fine-tuning any loss in prediction accuracy if structural changes are made to a model as a post-processing step. Besides the lack of availability of training data, there are other situations where the architecture of a model cannot be changed after training. My second contribution handles this scenario by using a low-level optimisation scheme that requires no changes to the model architecture, unlike the low-rank approximation scheme. This novel scheme uses a modified version of the Cook-Toom algorithm to reduce the computational intensity of commonly occurring dense and spatial convolutional layers and speedup inference time by 2-4x.
My third contribution is an efficient implementation of the Cook-Toom class of algorithms on ubiquitous Arm's low-power Cortex processor. Unlike the direct convolution, computing convolutions using the modified Cook-Toom algorithm requires a different data processing pipeline as it involves pre- and post-transformations of the intermediate activations. I introduced a multi-channel multi-region (MCMR) scheme to enable an efficient implementation of the fast Cook-Toom algorithm. I demonstrate that by effectively using SIMD instructions and the MCMR scheme an average 2-3x and a peak 4x per layer speedup is easily achievable.
My final contribution is the Cook-Toom accelerator, a custom hardware architecture for modern convolutional neural networks. This accelerator architecture is designed from the ground up to address some of the limitations of a resource-constrained SIMD processor. I also illustrate how new emerging layer types can be mapped efficiently to the same flexible architecture without any modification
Quarc: a high-efficiency network on-chip architecture
The novel Quarc NoC architecture, inspired by the Spidergon scheme is introduced as a NoC architecture that is highly efficient in performing collective communication operations including broadcast and multicast. The efficiency of the Quarc architecture is achieved through balancing the traffic which is the result of the modifications applied to the topology and the routing elements of the Spidergon NoC. This paper provides an ASIC implementation of both architectures using UMCpsilas 0.13 mum CMOS technology and demonstrates an analysis and comparison of the cost and performance between the Quarc and the Spidergon NoCs
Near-elliptic core triangular-lattice and square-lattice PCFs: a comparison of birefringence, cut-off and GVD characteristics towards fiber device application
In this work, detailed numerical analysis of the near-elliptic core
index-guiding triangular-lattice and square-lattice photonic crystal fiber
(PCFs) are reported for birefringence, single mode, cut-off behavior, group
velocity dispersion and effective area properties. For the same relative values
of d/P, triangular-lattice PCFs show higher birefringence whereas the
square-lattice PCFs show a wider range of single-mode operation. Square-lattice
PCF was found to be endlessly single-mode for higher air-filling fraction
(d/P). Smaller lengths of triangular-lattice PCF are required for dispersion
compensation whereas PCFs with square-lattice with nearer relative dispersion
slope (RDS) can better compensate the broadband dispersion. Square-lattice PCFs
show ZDW red-shifted, making it preferable for mid-IR supercontinuum generation
(SCG) with highly non-linear chalcogenide material. Square-lattice PCFs show
higher dispersion slope that leads to compression of the broadband, thus
accumulating more power in the pulse. On the other hand, triangular-lattice PCF
with flat dispersion profile can generate broader SCG. Square-lattice PCF with
low Group Velocity Dispersion (GVD) at the anomalous dispersion corresponds to
higher dispersion length and higher degree of solitonic interaction. The
effective area of square-lattice PCF is always greater than its
triangular-lattice counterpart making it better suited for high power
applications. Smaller length of symmetric-core PCF for dispersion compensation,
while broadband dispersion compensation can be better performed with
asymmetric-core PCF. Mid-Infrared SCG can be better performed with
asymmetric-core PCF with compressed and high power pulse, while wider range of
SCG can be performed with symmetric core PCF. Thus, this study will be
extremely useful for realizing fiber towards a custom application around these
characteristics.Comment: 10 pages, 17 figure
On the Reduction of Computational Complexity of Deep Convolutional Neural Networks.
Deep convolutional neural networks (ConvNets), which are at the heart of many new emerging applications, achieve remarkable performance in audio and visual recognition tasks. Unfortunately, achieving accuracy often implies significant computational costs, limiting deployability. In modern ConvNets it is typical for the convolution layers to consume the vast majority of computational resources during inference. This has made the acceleration of these layers an important research area in academia and industry. In this paper, we examine the effects of co-optimizing the internal structures of the convolutional layers and underlying implementation of fundamental convolution operation. We demonstrate that a combination of these methods can have a big impact on the overall speedup of a ConvNet, achieving a ten-fold increase over baseline. We also introduce a new class of fast one-dimensional (1D) convolutions for ConvNets using the Toom-Cook algorithm. We show that our proposed scheme is mathematically well-grounded, robust, and does not require any time-consuming retraining, while still achieving speedups solely from convolutional layers with no loss in baseline accuracy
On the effects of quantisation on model uncertainty in Bayesian neural networks
Bayesian neural networks (BNNs) are making significant progress in many research areas where decision-making needs to be accompanied by uncertainty estimation. Being able to quantify uncertainty while making decisions is essential for understanding when the model is over-/under-confident, and hence BNNs are attracting interest in safety-critical applications, such as autonomous driving, healthcare, and robotics. Nevertheless, BNNs have not been as widely used in industrial practice, mainly because of their increased memory and compute costs. In this work, we investigate quantisation of BNNs by compressing 32-bit floating-point weights and activations to their integer counterparts, that has already been successful in reducing the compute demand in standard pointwise neural networks. We study three types of quantised BNNs, we evaluate them under a wide range of different settings, and we empirically demonstrate that a uniform quantisation scheme applied to BNNs does not substantially decrease their quality of uncertainty estimation
- …