3,727 research outputs found

    Real-time portable system for fabric defect detection using an ARM processor

    Get PDF
    Modern textile industry seeks to produce textiles as little defective as possible since the presence of defects can decrease the final price of products from 45% to 65%. Automated visual inspection (AVI) systems, based on image analysis, have become an important alternative for replacing traditional inspections methods that involve human tasks. An AVI system gives the advantage of repeatability when implemented within defined constrains, offering more objective and reliable results for particular tasks than human inspection. Costs of automated inspection systems development can be reduced using modular solutions with embedded systems, in which an important advantage is the low energy consumption. Among the possibilities for developing embedded systems, the ARM processor has been explored for acquisition, monitoring and simple signal processing tasks. In a recent approach we have explored the use of the ARM processor for defects detection by implementing the wavelet transform. However, the computation speed of the preprocessing was not yet sufficient for real time applications. In this approach we significantly improve the preprocessing speed of the algorithm, by optimizing matrix operations, such that it is adequate for a real time application. The system was tested for defect detection using different defect types. The paper is focused in giving a detailed description of the basis of the algorithm implementation, such that other algorithms may use of the ARM operations for fast implementations

    Fabric defect detection using the wavelet transform in an ARM processor

    Get PDF
    Small devices used in our day life are constructed with powerful architectures that can be used for industrial applications when requiring portability and communication facilities. We present in this paper an example of the use of an embedded system, the Zeus epic 520 single board computer, for defect detection in textiles using image processing. We implement the Haar wavelet transform using the embedded visual C++ 4.0 compiler for Windows CE 5. The algorithm was tested for defect detection using images of fabrics with five types of defects. An average of 95% in terms of correct defect detection was obtained, achieving a similar performance than using processors with float point arithmetic calculations

    Fuzzy memoization for floating-point multimedia applications

    Get PDF
    Instruction memoization is a promising technique to reduce the power consumption and increase the performance of future low-end/mobile multimedia systems. Power and performance efficiency can be improved by reusing instances of an already executed operation. Unfortunately, this technique may not always be worth the effort due to the power consumption and area impact of the tables required to leverage an adequate level of reuse. In this paper, we introduce and evaluate a novel way of understanding multimedia floating-point operations based on the fuzzy computation paradigm: performance and power consumption can be improved at the cost of small precision losses in computation. By exploiting this implicit characteristic of multimedia applications, we propose a new technique called tolerant memoization. This technique expands the capabilities of classic memoization by associating entries with similar inputs to the same output. We evaluate this new technique by measuring the effect of tolerant memoization for floating-point operations in a low-power multimedia processor and discuss the trade-offs between performance and quality of the media outputs. We report energy improvements of 12 percent for a set of key multimedia applications with small LUT of 6 Kbytes, compared to 3 percent obtained using previously proposed techniques.Peer ReviewedPostprint (published version

    Leveraging register windows to reduce physical registers to the bare minimum

    Get PDF
    Register window is an architectural technique that reduces memory operations required to save and restore registers across procedure calls. Its effectiveness depends on the size of the register file. Such register requirements are normally increased for out-of-order execution because it requires registers for the in-flight instructions, in addition to the architectural ones. However, a large register file has an important cost in terms of area and power and may even affect the cycle time. In this paper, we propose a software/hardware early register release technique that leverage register windows to drastically reduce the register requirements, and hence, reduce the register file cost. Contrary to the common belief that out-of-order processors with register windows would need a large physical register file, this paper shows that the physical register file size may be reduced to the bare minimum by using this novel microarchitecture. Moreover, our proposal has much lower hardware complexity than previous approaches, and requires minimal changes to a conventional register window scheme. Performance studies show that the proposed technique can reduce the number of physical registers to the number of logical registers plus one (minimum number to guarantee forward progress) and still achieve almost the same performance as an unbounded register file.Peer ReviewedPostprint (published version

    Early register release for out-of-order processors with register windows

    Get PDF
    Register windows is an architectural technique that reduces memory operations required to save and restore registers across procedure calls. Its effectiveness depends on the size of the register file. Such register requirements are normally increased for out-of-order execution because it requires registers for the in-flight instructions, in addition to the architectural ones. However, a large register file has an important cost in terms of area and power and may even affect the cycle time. In this paper we propose two early register release techniques that leverages register windows to drastically reduce the register requirements, and hence reduce the register file cost. Contrary to the common belief that out-of-order processors with register windows would need a large physical register file, this paper shows that the physical register file size may be reduced to the bare minimum by using this novel microarchitecture. Moreover, our proposal has much lower hardware complexity than previous approaches, and requires minimal changes to a conventional register window scheme. Performance studies show that the proposed technique can reduce the number of physical registers to the same number as logical registers plus one (minimum number to guarantee forward progress) and still achieve almost the same performance as an unbounded register file.Peer ReviewedPostprint (published version

    Micro-threading and FPGA implementation of a RISC microprocessor : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Computer Science at Massey University, Palmerston North, New Zealand

    Get PDF
    Appendix E removed due to copyright restrictions. Articles are available in the print copy held in the libraryThis thesis is the outcome of research in two areas of computer technology: microprocessor and multi-processor architectures (specifically from the perspective of how differently they tolerate highly-latent and non-deterministic events), and hardware design of complex digital systems containing both datapath and control (particularly microprocessors). This thesis starts by pointing out that in order to achieve high processing speeds, current popular superscalar microprocessors (e.g. Intel Pentiums, Digital Alpha, etc) rely heavily on the technique of speculating the outcome of instruction flow in order to predict the behaviour of non-deterministic computing operations (as in loading operands from high-latency memory into the processor). This is fine only if the speculation is correct. But, what if it isn't? If the speculation fails, this would mean that the processor has to abandon its current decision (which now proved to be the wrong one) for the instruction flow path taken and to start all over again with the other path (the actual correct one). This is a waste of valuable processing time and hardware resources and a reduction of performance when speculation fails. Therefore, these processors can achieve high performance only when the majority of speculations are successful (being able to predict the right path). In an attempt to overcome the above shortcomings, the first part of this thesis is an investigation of the novel vector micro-threading architecture as an alternative approach to the current superscalar-based speculative microprocessor designs. Micro-threading is based on the not-so-novel multithreading technique, which avoids speculation altogether and instead, starts running a different thread of instructions while waiting for the non-determinism to be resolved. This utilizes the chip resources more efficiently without waste of any processing power. The rest of this thesis focuses on the baseline RISC processor platform, the MIPS R2000, which is reviewed first then partially synthesized from the RTL (Register Transfer Level) description using VHDL and then simulated and tested. This is conducted in order for future research to build upon and add the micro-threading architectural add-ons and modifications. Keywords: Micro-threading, Latency Tolerance, FPGA Synthesis, RISC Architecture, MIPS R2000 processor, VHDL

    Design of an Adaptable Run-Time Reconfigurable Software-Defined Radio Processing Architecture

    Get PDF
    Processing power is a key technical challenge holding back the development of a high-performance software defined radio (SDR). Traditionally, SDR has utilized digital signal processors (DSPs), but increasingly complex algorithms, higher data rates, and multi-tasking needs have exceed the processing capabilities of modern DSPs. Reconfigurable computers, such as field-programmable gate arrays (FPGAs), are popular alternatives because of their performance gains over software for streaming data applications like SDR. However, FPGAs have not yet realized the ideal SDR because architectures have not fully utilized their partial reconfiguration (PR) capabilities to bring needed flexibility. A reconfigurable processor architecture is proposed that utilizes PR in reconfigurable computers to achieve a more sophisticated SDR. The proposed processor contains run-time swappable blocks whose parameters and interconnects are programmable. The architecture is analyzed for performance and flexibility and compared with available alternate technologies. For a sample QPSK algorithm, hardware performance gains of at least 44x are seen over modern desktop processors and DSPs while most of their flexibility and extensibility is maintained
    corecore