1,975 research outputs found

    Formal verification of AI software

    Get PDF
    The application of formal verification techniques to Artificial Intelligence (AI) software, particularly expert systems, is investigated. Constraint satisfaction and model inversion are identified as two formal specification paradigms for different classes of expert systems. A formal definition of consistency is developed, and the notion of approximate semantics is introduced. Examples are given of how these ideas can be applied in both declarative and imperative forms

    Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine

    Get PDF
    Deep neural networks have achieved impressive results in computer vision and machine learning. Unfortunately, state-of-the-art networks are extremely compute and memory intensive which makes them unsuitable for mW-devices such as IoT end-nodes. Aggressive quantization of these networks dramatically reduces the computation and memory footprint. Binary-weight neural networks (BWNs) follow this trend, pushing weight quantization to the limit. Hardware accelerators for BWNs presented up to now have focused on core efficiency, disregarding I/O bandwidth and system-level efficiency that are crucial for deployment of accelerators in ultra-low power devices. We present Hyperdrive: a BWN accelerator dramatically reducing the I/O bandwidth exploiting a novel binary-weight streaming approach, which can be used for arbitrarily sized convolutional neural network architecture and input resolution by exploiting the natural scalability of the compute units both at chip-level and system-level by arranging Hyperdrive chips systolically in a 2D mesh while processing the entire feature map together in parallel. Hyperdrive achieves 4.3 TOp/s/W system-level efficiency (i.e., including I/Os)---3.1x higher than state-of-the-art BWN accelerators, even if its core uses resource-intensive FP16 arithmetic for increased robustness

    fpgaConvNet: A framework for mapping convolutional neural networks on FPGAs

    No full text
    Convolutional Neural Networks (ConvNets) are a powerful Deep Learning model, providing state-of-the-art accuracy to many emerging classification problems. However, ConvNet classification is a computationally heavy task, suffering from rapid complexity scaling. This paper presents fpgaConvNet, a novel domain-specific modelling framework together with an automated design methodology for the mapping of ConvNets onto reconfigurable FPGA-based platforms. By interpreting ConvNet classification as a streaming application, the proposed framework employs the Synchronous Dataflow (SDF) model of computation as its basis and proposes a set of transformations on the SDF graph that explore the performance-resource design space, while taking into account platform-specific resource constraints. A comparison with existing ConvNet FPGA works shows that the proposed fully-automated methodology yields hardware designs that improve the performance density by up to 1.62× and reach up to 90.75% of the raw performance of architectures that are hand-tuned for particular ConvNets

    An Architecture Description Language for Embedded Hardware Platforms

    Get PDF
    Embedded software development relies on various tools - compilers, simulators, execution time estimators - that encapsulate a more-or-less detailed knowledge of the target hardware platform. These tools can be costly to develop and maintain:significant benefits could be expected if they were automatically generated from models expressed in a dedicated modeling language.In contrast with Hardware Description Languages (HDLs), that focus on the internal structure and behavior of an electronic board of chip, Hardware Architecture Description Languages consider hardware as a platform for software execution. Such a platform will be described in terms of low-level programming interface (processor instruction set),resources (processing elements, memory and peripheral devices) and elementary services (arithmetic and logic operations, bus transactions).This paper gives an overview of HARMLESS (Hardware ARchitecture Modeling Language for Embedded Software Simulation), a new domain-specific language for modeling embedded hardware platforms. HARMLESS and its associated tools follow the Model-Driven Engineering philosophy: metamodeling and model transformations have been successfully applied to the automatic generation of processor simulators

    An Architecture Description Language for Embedded Hardware Platforms

    Get PDF
    Embedded software development relies on various tools - compilers, simulators, execution time estimators - that encapsulate a more-or-less detailed knowledge of the target hardware platform. These tools can be costly to develop and maintain:significant benefits could be expected if they were automatically generated from models expressed in a dedicated modeling language.In contrast with Hardware Description Languages (HDLs), that focus on the internal structure and behavior of an electronic board of chip, Hardware Architecture Description Languages consider hardware as a platform for software execution. Such a platform will be described in terms of low-level programming interface (processor instruction set),resources (processing elements, memory and peripheral devices) and elementary services (arithmetic and logic operations, bus transactions).This paper gives an overview of HARMLESS (Hardware ARchitecture Modeling Language for Embedded Software Simulation), a new domain-specific language for modeling embedded hardware platforms. HARMLESS and its associated tools follow the Model-Driven Engineering philosophy: metamodeling and model transformations have been successfully applied to the automatic generation of processor simulators

    A decision support tool for the order promising process with product homogeneity requirements in hybrid Make-To-Stock and Make-To-Order environments. Application to a ceramic tile company

    Full text link
    [EN] Order promising in manufacturing systems that produce non-uniform units of the same finished good becomes a more complex process when customer orders need to be served with homogeneous units. To facilitate this task, we propose a mathematical model-based decision tool to support the order promising process according to product homogeneity requirements in hybrid Make-To-Stock (MTS) and Make-To-Order (MTO) contexts. In these manufacturing environments, the comparison of Available-To-Promise (ATP) and/or Capable-To-Promise (CTP) quantities with homogeneous ones ordered by customers is necessary during the order commitment. To properly deal with customers' product uniformity requirements, different ATP consumption rules are implemented by defining a novel objective function. CTP modelling in these systems also entails having to address new aspects, such as estimating future homogeneous quantities in additional lots to the master plan, accomplishing minimum lot sizes and saving in setups when programming new lots. By including CTP in the order promising model, a closer integration with the master production schedule is achieved. The resulting mathematical model was applied to a ceramic tile company in different supply scenarios and execution modes, and at several availability levels (ATP and ATP&CTP). The results validate model performance and provide insights into the impact of ATP consumption rules on the profits made from committed customer orders in different scenarios for the specific ceramic tile company.This work was supported by the Spanish Ministry of Economy and Competitiveness with Grant DPI2011-23597 and the Universitat Polito cnica de Valencia with Grant Ref. PAID-06-11/1840.Alemany Díaz, MDM.; Ortiz Bas, Á.; Fuertes-Miquel, VS. (2018). A decision support tool for the order promising process with product homogeneity requirements in hybrid Make-To-Stock and Make-To-Order environments. Application to a ceramic tile company. Computers & Industrial Engineering. 122:219-234. https://doi.org/10.1016/j.cie.2018.05.040S21923412

    Spatial and temporal data parallelization of the H.261 video coding algorithm

    Get PDF
    In this paper, the parallelization of the H.261 video coding algorithm on the IBM SP2 multiprocessor system is described. The effect of parallelizing computations and communications in the spatial, temporal, and both spatial-temporal domains are considered through the study of frame rate, speedup, and implementation efficiency, which are modeled and measured with respect to the number of nodes (n) and parallel methods used. Four parallel algorithms were developed, of which the first two exploited the spatial parallelism in each frame, and the last two exploited both the temporal and spatial parallelism over a sequence of frames. The two spatial algorithms differ in that one utilizes a single communication master, while the other attempts to distribute communications across three masters. On the other hand, the spatial-temporal algorithms use a pipeline structure for exploiting the temporal parallelism together with either a single master or multiple masters. The best median speedup (frame rate) achieved was close to 15[15 frames per second (fps)] for 352 × 240 video on 24 nodes, and 13 (37 fps) for QCIF video, by the spatial algorithm with distributed communications. For n 10, with efficiency up to 70%. The spatial-temporal algorithms achieved average speedup performance, but are most scalable for large n.published_or_final_versio

    Highly parallel HEVC decoding for heterogeneous systems with CPU and GPU

    Get PDF
    The High Efficiency Video Coding HEVC standard provides a higher compression efficiency than other video coding standards but at the cost of an increased computational load, which makes hard to achieve real-time encoding/decoding for ultra high-resolution and high-quality video sequences. Graphics Processing Units GPU are known to provide massive processing capability for highly parallel and regular computing kernels, but not all HEVC decoding procedures are suited for GPU execution. Furthermore, if HEVC decoding is accelerated by GPUs, energy efficiency is another concern for heterogeneous CPU+GPU decoding. In this paper, a highly parallel HEVC decoder for heterogeneous CPU+GPU system is proposed. It exploits available parallelism in HEVC decoding on the CPU, GPU, and between the CPU and GPU devices simultaneously. On top of that, different workload balancing schemes can be selected according to the devoted CPU and GPU computing resources. Furthermore, an energy optimized solution is proposed by tuning GPU clock rates. Results show that the proposed decoder achieves better performance than the state-of-the-art CPU decoder, and the best performance among the workload balancing schemes depends on the available CPU and GPU computing resources. In particular, with an NVIDIA Titan X Maxwell GPU and an Intel Xeon E5-2699v3 CPU, the proposed decoder delivers 167 frames per second (fps) for Ultra HD 4K videos, when four CPU cores are used. Compared to the state-of-the-art CPU decoder using four CPU cores, the proposed decoder gains a speedup factor of . When decoding performance is bounded by the CPU, a system wise energy reduction up to 36% is achieved by using fixed (and lower) GPU clocks, compared to the default dynamic clock settings on the GPU.EC/H2020/688759/EU/Low-Power Parallel Computing on GPUs 2/LPGPU
    corecore