1,301 research outputs found

    Division and Inversion Over Finite Fields

    Get PDF

    Stochastic dividers for low latency neural networks

    Get PDF
    Due to the low complexity in arithmetic unit design, stochastic computing (SC) has attracted considerable interest to implement Artificial Neural Networks (ANNs) for resources-limited applications, because ANNs must usually perform a large number of arithmetic operations. To attain a high computation accuracy in an SC-based ANN, extended stochastic logic is utilized together with standard SC units and thus, a stochastic divider is required to perform the conversion between these logic representations. However, the conventional divider incurs in a large computation latency, so limits an SC implementation for ANNs used in applications needing high performance. Therefore, there is a need to design fast stochastic dividers for SC-based ANNs. Recent works (e.g., a binary searching and triple modular redundancy (BS-TMR) based stochastic divider) are targeting a reduction in computation latency, while keeping the same accuracy compared with the traditional design. However, this divider still requires NN iterations to deal with 2N2^{N} -bit stochastic sequences, and thus the latency increases in proportion to the sequence length. In this paper, a decimal searching and TMR (DS-TMR) based stochastic divider is initially proposed to further reduce the computation latency; it only requires two iterations to calculate the quotient, so regardless of the sequence length. Moreover, a trade-off design between accuracy and hardware is also presented. An SC-based Multi-Layer Perceptron (MLP) is then considered to show the effectiveness of the proposed dividers over current designs. Results show that when utilizing the proposed dividers, the MLP achieves the lowest computation latency while keeping the same classification accuracy; although incurring in an area increase, the overhead due to the proposed dividers is low over the entire MLP. When using as combined metric for both hardware design and computation complexity the product of the implementation area, latency, power and number of clock cycles, the proposed designs are also shown to be superior to the SC-based MLPs (at the same level of accuracy) employing other dividers found in the technical literature as well as the commonly used 32-bit floating point implementation.The work of Shanshan Liu, Farzad Niknia, and Fabrizio Lombardi was supported by the NSF Grant CCF-1953961 and Grant 1812467. The work of Pedro Reviriego was supported in part by the Spanish Ministry of Science and Innovation under project ACHILLES (Grant PID2019-104207RB-I00) and the Go2Edge Network (Grant RED2018-102585-T), and in part by the Madrid Community Research Agency under Grant TAPIR-CM P2018/TCS-4496. The work of Weiqiang Liu was supported by the NSFC under Grant 62022041 and Grant 61871216. The work of Ahmed Louri was supported by the NSF Grant CCF-1812495 and Grant 1953980

    Study of Different Images in Digital image Processing to Make a Coin Recoginition System

    Get PDF
    The advanced picture preparing manages building up a computerized framework to performs tests and activities on a computerized picture with the utilization of PC calculations. A picture is just a 2D numerical capacity f(x,y) where x and y are two on a level plane and vertically co-ordinates. Money acknowledgment is a standout amongst the most essential uses of picture handling. The cash acknowledgment framework is utilized as a part of numerous situations, for example, bank, business firms, railroads, shopping centers, departmental stores, government association, and so forth. Be that as it may, acknowledgment is done significantly utilizing equipment gadget. Additionally regular man can't think that its achievable to utilize it as equipment. So there is a need to automate the human push to perceive the cash. Think about the case of a bank; it needs to perceive the section from time to time they utilize the gadget which comprise of bright light .The financier keeps the money note on the gadget and endeavor to discover whether the watermark image, serial number and some different qualities of the notes are appropriate to get the category and check its credibility. This expands crafted by the broker. Rather if the financier utilizes the framework and mechanizes his work, the outcome will be considerably more exact. Same is the situation with regions, for example, shopping centers, speculation firms where such frameworks can be utilized. So there is expected to make less demanding approach to perceive the money notes

    Approximate Computing Survey, Part I: Terminology and Software & Hardware Approximation Techniques

    Full text link
    The rapid growth of demanding applications in domains applying multimedia processing and machine learning has marked a new era for edge and cloud computing. These applications involve massive data and compute-intensive tasks, and thus, typical computing paradigms in embedded systems and data centers are stressed to meet the worldwide demand for high performance. Concurrently, the landscape of the semiconductor field in the last 15 years has constituted power as a first-class design concern. As a result, the community of computing systems is forced to find alternative design approaches to facilitate high-performance and/or power-efficient computing. Among the examined solutions, Approximate Computing has attracted an ever-increasing interest, with research works applying approximations across the entire traditional computing stack, i.e., at software, hardware, and architectural levels. Over the last decade, there is a plethora of approximation techniques in software (programs, frameworks, compilers, runtimes, languages), hardware (circuits, accelerators), and architectures (processors, memories). The current article is Part I of our comprehensive survey on Approximate Computing, and it reviews its motivation, terminology and principles, as well it classifies and presents the technical details of the state-of-the-art software and hardware approximation techniques.Comment: Under Review at ACM Computing Survey

    Parameterized Implementation of K-means Clustering on Reconfigurable Systems

    Get PDF
    Processing power of pattern classification algorithms on conventional platforms has not been able to keep up with exponentially growing datasets. However, algorithms such as k-means clustering include significant potential parallelism that could be exploited to enhance processing speed on conventional platforms. A better and effective solution to speed-up the algorithm performance is the use of a hardware assist since parallel kernels can be partitioned and concurrently run on hardware as opposed to the sequential software flow. A parameterized hardware implementation of k-means clustering is presented as a proof of concept on the Pilchard Reconfigurable computing system. The hardware implementation is shown to have speedups of about 500 over conventional implementations on a general-purpose processor. A scalability analysis is done to provide a future direction to take the current implementation of 3 classes and scale it to over N classes

    Automated Deep Lineage Tree Analysis Using a Bayesian Single Cell Tracking Approach

    Get PDF
    Single-cell methods are beginning to reveal the intrinsic heterogeneity in cell populations, arising from the interplay of deterministic and stochastic processes. However, it remains challenging to quantify single-cell behaviour from time-lapse microscopy data, owing to the difficulty of extracting reliable cell trajectories and lineage information over long time-scales and across several generations. Therefore, we developed a hybrid deep learning and Bayesian cell tracking approach to reconstruct lineage trees from live-cell microscopy data. We implemented a residual U-Net model coupled with a classification CNN to allow accurate instance segmentation of the cell nuclei. To track the cells over time and through cell divisions, we developed a Bayesian cell tracking methodology that uses input features from the images to enable the retrieval of multi-generational lineage information from a corpus of thousands of hours of live-cell imaging data. Using our approach, we extracted 20,000 + fully annotated single-cell trajectories from over 3,500 h of video footage, organised into multi-generational lineage trees spanning up to eight generations and fourth cousin distances. Benchmarking tests, including lineage tree reconstruction assessments, demonstrate that our approach yields high-fidelity results with our data, with minimal requirement for manual curation. To demonstrate the robustness of our minimally supervised cell tracking methodology, we retrieve cell cycle durations and their extended inter- and intra-generational family relationships in 5,000 + fully annotated cell lineages. We observe vanishing cycle duration correlations across ancestral relatives, yet reveal correlated cyclings between cells sharing the same generation in extended lineages. These findings expand the depth and breadth of investigated cell lineage relationships in approximately two orders of magnitude more data than in previous studies of cell cycle heritability, which were reliant on semi-manual lineage data analysis

    Power and Thermal Management of System-on-Chip

    Get PDF
    corecore