542 research outputs found

    An Algorithm for Integrated Subsystem Embodiment and System Synthesis

    Get PDF
    Consider the statement,'A system has two coupled subsystems, one of which dominates the design process. Each subsystem consists of discrete and continuous variables, and is solved using sequential analysis and solution.' To address this type of statement in the design of complex systems, three steps are required, namely, the embodiment of the statement in terms of entities on a computer, the mathematical formulation of subsystem models, and the resulting solution and system synthesis. In complex system decomposition, the subsystems are not isolated, self-supporting entities. Information such as constraints, goals, and design variables may be shared between entities. But many times in engineering problems, full communication and cooperation does not exist, information is incomplete, or one subsystem may dominate the design. Additionally, these engineering problems give rise to mathematical models involving nonlinear functions of both discrete and continuous design variables. In this dissertation an algorithm is developed to handle these types of scenarios for the domain-independent integration of subsystem embodiment, coordination, and system synthesis using constructs from Decision-Based Design, Game Theory, and Multidisciplinary Design Optimization. Implementation of the concept in this dissertation involves testing of the hypotheses using example problems and a motivating case study involving the design of a subsonic passenger aircraft

    Exploring Processor and Memory Architectures for Multimedia

    Get PDF
    Multimedia has become one of the cornerstones of our 21st century society and, when combined with mobility, has enabled a tremendous evolution of our society. However, joining these two concepts introduces many technical challenges. These range from having sufficient performance for handling multimedia content to having the battery stamina for acceptable mobile usage. When taking a projection of where we are heading, we see these issues becoming ever more challenging by increased mobility as well as advancements in multimedia content, such as introduction of stereoscopic 3D and augmented reality. The increased performance needs for handling multimedia come not only from an ongoing step-up in resolution going from QVGA (320x240) to Full HD (1920x1080) a 27x increase in less than half a decade. On top of this, there is also codec evolution (MPEG-2 to H.264 AVC) that adds to the computational load increase. To meet these performance challenges there has been processing and memory architecture advances (SIMD, out-of-order superscalarity, multicore processing and heterogeneous multilevel memories) in the mobile domain, in conjunction with ever increasing operating frequencies (200MHz to 2GHz) and on-chip memory sizes (128KB to 2-3MB). At the same time there is an increase in requirements for mobility, placing higher demands on battery-powered systems despite the steady increase in battery capacity (500 to 2000mAh). This leaves negative net result in-terms of battery capacity versus performance advances. In order to make optimal use of these architectural advances and to meet the power limitations in mobile systems, there is a need for taking an overall approach on how to best utilize these systems. The right trade-off between performance and power is crucial. On top of these constraints, the flexibility aspects of the system need to be addressed. All this makes it very important to reach the right architectural balance in the system. The first goal for this thesis is to examine multimedia applications and propose a flexible solution that can meet the architectural requirements in a mobile system. Secondly, propose an automated methodology of optimally mapping multimedia data and instructions to a heterogeneous multilevel memory subsystem. The proposed methodology uses constraint programming for solving a multidimensional optimization problem. Results from this work indicate that using today’s most advanced mobile processor technology together with a multi-level heterogeneous on-chip memory subsystem can meet the performance requirements for handling multimedia. By utilizing the automated optimal memory mapping method presented in this thesis lower total power consumption can be achieved, whilst performance for multimedia applications is improved, by employing enhanced memory management. This is achieved through reduced external accesses and better reuse of memory objects. This automatic method shows high accuracy, up to 90%, for predicting multimedia memory accesses for a given architecture

    Intrinsically Evolvable Artificial Neural Networks

    Get PDF
    Dedicated hardware implementations of neural networks promise to provide faster, lower power operation when compared to software implementations executing on processors. Unfortunately, most custom hardware implementations do not support intrinsic training of these networks on-chip. The training is typically done using offline software simulations and the obtained network is synthesized and targeted to the hardware offline. The FPGA design presented here facilitates on-chip intrinsic training of artificial neural networks. Block-based neural networks (BbNN), the type of artificial neural networks implemented here, are grid-based networks neuron blocks. These networks are trained using genetic algorithms to simultaneously optimize the network structure and the internal synaptic parameters. The design supports online structure and parameter updates, and is an intrinsically evolvable BbNN platform supporting functional-level hardware evolution. Functional-level evolvable hardware (EHW) uses evolutionary algorithms to evolve interconnections and internal parameters of functional modules in reconfigurable computing systems such as FPGAs. Functional modules can be any hardware modules such as multipliers, adders, and trigonometric functions. In the implementation presented, the functional module is a neuron block. The designed platform is suitable for applications in dynamic environments, and can be adapted and retrained online. The online training capability has been demonstrated using a case study. A performance characterization model for RC implementations of BbNNs has also been presented

    Survey on Combinatorial Register Allocation and Instruction Scheduling

    Full text link
    Register allocation (mapping variables to processor registers or memory) and instruction scheduling (reordering instructions to increase instruction-level parallelism) are essential tasks for generating efficient assembly code in a compiler. In the last three decades, combinatorial optimization has emerged as an alternative to traditional, heuristic algorithms for these two tasks. Combinatorial optimization approaches can deliver optimal solutions according to a model, can precisely capture trade-offs between conflicting decisions, and are more flexible at the expense of increased compilation time. This paper provides an exhaustive literature review and a classification of combinatorial optimization approaches to register allocation and instruction scheduling, with a focus on the techniques that are most applied in this context: integer programming, constraint programming, partitioned Boolean quadratic programming, and enumeration. Researchers in compilers and combinatorial optimization can benefit from identifying developments, trends, and challenges in the area; compiler practitioners may discern opportunities and grasp the potential benefit of applying combinatorial optimization

    On Design and Implementation of Generic Fuzzy Logic Controllers

    Get PDF
    Soft computing techniques, unlike traditional deterministic logic based computing techniques, sometimes also called as hard computing, are tolerant of imprecision, uncertainty, and approximation. The primary inspiration for soft computing is the human mind and its ability to address day-to-day problems. The primary constituents of soft computing techniques are Artificial Neural Network, Fuzzy Logic Systems, and Evolutionary Computing. This thesis presents design and implementation of a generic hardware architecture based Type-IMamdani fuzzy logic controller (FLC) implemented on a programmable device, which can be remotely configured in real-time over Ethernet. This reconfigurability is added as a feature to existing FLCs in literature. It enables users to change parameters (those drive the FLC systems) in real-time and eliminate repeated hardware programming whenever there is a need. Realization of these systems in real-time is difficult as the computational complexity increases exponentially with an increase in the number of inputs. Hence challenge lies in reducing the Rulebase significantly such that the inference time and the throughput time is perceivable for real-time applications. To achieve these objectives, a modified thresholded fired rules hypercube (MT-FRHC) algorithm for Rulebase reduction is proposed and implemented. MT-FRHC reduces the useful rules without compromising system accuracy and improves the cycle time in terms of fuzzy logic operations per second (FzLOPS). It is imperative to understand that there are over sixty reconfigurable parameters, and it becomes an arduous task for a user to manage them. Therefore, a genetic algorithm based parameter extraction technique is proposed. This will help to develop a course tuning and provide default parameters that can be later fine-tuned by the users remotely through the Web-based User Interface. A hardware software codesign architecture for FLC is developed on TI C6748 DSP hardware with Sys/BIOS RTOS and seamlessly integrated with a webbased user interface (WebUI) for reconfigurability. Fuzzy systems employ defuzzifier to convert the fuzzy output into the real world crisp output. Centroid of Area (CoA) method is most widely used defuzzification method for control applications. However, the prevalent method of CoA computation is based on the principle of Riemann sum which is computationally complex. A vertices based CoA (VBCoA) defuzzification method is introduced. It has been observed that the proposed VBCoA method for COA computation is faster than the Riemann sum based CoA computation. A code optimization technique, exclusive to TI DSPs, is implemented to achieve memory and machine cycle optimization. The WebUI is developed in accordance to a client–server model using ASP.NET. It acquires fuzzy parameters from users, and a server application is dedicated to handling data communication between the hardware and the server. Testing and analysis of this hardware G-FLCS has been carried out by using hardware-in-loop test to control various system models in Simulink environment which includes water level control in a two tank system, intelligent cruise control system, speed control of an armature controlled DC motor and anti-windup control. The performance of the proposed G-FLCS is compared to Fuzzy Inference System of Matlab Fuzzy Logic Toolbox and PID controller in terms of settling time, transient time and steady state error. This proposed MT-FRHC based G-FLCS with VBCoA defuzzification implemented on C6748 DSP was finally deployed to control the radial position of plasma in Aditya Tokamak fusion reactor. The proposed G-FLCS is observed to deliver a smooth and fast system response

    GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models

    Full text link
    The remarkable capabilities and intricate nature of Artificial Intelligence (AI) have dramatically escalated the imperative for specialized AI accelerators. Nonetheless, designing these accelerators for various AI workloads remains both labor- and time-intensive. While existing design exploration and automation tools can partially alleviate the need for extensive human involvement, they still demand substantial hardware expertise, posing a barrier to non-experts and stifling AI accelerator development. Motivated by the astonishing potential of large language models (LLMs) for generating high-quality content in response to human language instructions, we embark on this work to examine the possibility of harnessing LLMs to automate AI accelerator design. Through this endeavor, we develop GPT4AIGChip, a framework intended to democratize AI accelerator design by leveraging human natural languages instead of domain-specific languages. Specifically, we first perform an in-depth investigation into LLMs' limitations and capabilities for AI accelerator design, thus aiding our understanding of our current position and garnering insights into LLM-powered automated AI accelerator design. Furthermore, drawing inspiration from the above insights, we develop a framework called GPT4AIGChip, which features an automated demo-augmented prompt-generation pipeline utilizing in-context learning to guide LLMs towards creating high-quality AI accelerator design. To our knowledge, this work is the first to demonstrate an effective pipeline for LLM-powered automated AI accelerator generation. Accordingly, we anticipate that our insights and framework can serve as a catalyst for innovations in next-generation LLM-powered design automation tools.Comment: Accepted by ICCAD 202
    corecore