542 research outputs found
An Algorithm for Integrated Subsystem Embodiment and System Synthesis
Consider the statement,'A system has two coupled subsystems, one of which dominates the design process. Each subsystem consists of discrete and continuous variables, and is solved using sequential analysis and solution.' To address this type of statement in the design of complex systems, three steps are required, namely, the embodiment of the statement in terms of entities on a computer, the mathematical formulation of subsystem models, and the resulting solution and system synthesis. In complex system decomposition, the subsystems are not isolated, self-supporting entities. Information such as constraints, goals, and design variables may be shared between entities. But many times in engineering problems, full communication and cooperation does not exist, information is incomplete, or one subsystem may dominate the design. Additionally, these engineering problems give rise to mathematical models involving nonlinear functions of both discrete and continuous design variables. In this dissertation an algorithm is developed to handle these types of scenarios for the domain-independent integration of subsystem embodiment, coordination, and system synthesis using constructs from Decision-Based Design, Game Theory, and Multidisciplinary Design Optimization. Implementation of the concept in this dissertation involves testing of the hypotheses using example problems and a motivating case study involving the design of a subsonic passenger aircraft
Exploring Processor and Memory Architectures for Multimedia
Multimedia has become one of the cornerstones of our 21st century society and, when combined with mobility, has enabled a tremendous evolution of our society. However, joining these two concepts introduces many technical challenges. These range from having sufficient performance for handling multimedia content to having the battery stamina for acceptable mobile usage. When taking a projection of where we are heading, we see these issues becoming ever more challenging by increased mobility as well as advancements in multimedia content, such as introduction of stereoscopic 3D and augmented reality. The increased performance needs for handling multimedia come not only from an ongoing step-up in resolution going from QVGA (320x240) to Full HD (1920x1080) a 27x increase in less than half a decade. On top of this, there is also codec evolution (MPEG-2 to H.264 AVC) that adds to the computational load increase. To meet these performance challenges there has been processing and memory architecture advances (SIMD, out-of-order superscalarity, multicore processing and heterogeneous multilevel memories) in the mobile domain, in conjunction with ever increasing operating frequencies (200MHz to 2GHz) and on-chip memory sizes (128KB to 2-3MB). At the same time there is an increase in requirements for mobility, placing higher demands on battery-powered systems despite the steady increase in battery capacity (500 to 2000mAh). This leaves negative net result in-terms of battery capacity versus performance advances. In order to make optimal use of these architectural advances and to meet the power limitations in mobile systems, there is a need for taking an overall approach on how to best utilize these systems. The right trade-off between performance and power is crucial. On top of these constraints, the flexibility aspects of the system need to be addressed. All this makes it very important to reach the right architectural balance in the system. The first goal for this thesis is to examine multimedia applications and propose a flexible solution that can meet the architectural requirements in a mobile system. Secondly, propose an automated methodology of optimally mapping multimedia data and instructions to a heterogeneous multilevel memory subsystem. The proposed methodology uses constraint programming for solving a multidimensional optimization problem. Results from this work indicate that using today’s most advanced mobile processor technology together with a multi-level heterogeneous on-chip memory subsystem can meet the performance requirements for handling multimedia. By utilizing the automated optimal memory mapping method presented in this thesis lower total power consumption can be achieved, whilst performance for multimedia applications is improved, by employing enhanced memory management. This is achieved through reduced external accesses and better reuse of memory objects. This automatic method shows high accuracy, up to 90%, for predicting multimedia memory accesses for a given architecture
Intrinsically Evolvable Artificial Neural Networks
Dedicated hardware implementations of neural networks promise to provide faster, lower power operation when compared to software implementations executing on processors. Unfortunately, most custom hardware implementations do not support intrinsic training of these networks on-chip. The training is typically done using offline software simulations and the obtained network is synthesized and targeted to the hardware offline. The FPGA design presented here facilitates on-chip intrinsic training of artificial neural networks. Block-based neural networks (BbNN), the type of artificial neural networks implemented here, are grid-based networks neuron blocks. These networks are trained using genetic algorithms to simultaneously optimize the network structure and the internal synaptic parameters. The design supports online structure and parameter updates, and is an intrinsically evolvable BbNN platform supporting functional-level hardware evolution. Functional-level evolvable hardware (EHW) uses evolutionary algorithms to evolve interconnections and internal parameters of functional modules in reconfigurable computing systems such as FPGAs. Functional modules can be any hardware modules such as multipliers, adders, and trigonometric functions. In the implementation presented, the functional module is a neuron block. The designed platform is suitable for applications in dynamic environments, and can be adapted and retrained online. The online training capability has been demonstrated using a case study. A performance characterization model for RC implementations of BbNNs has also been presented
Recommended from our members
Sparse algorithms for decoding and identification of neural circuits
The brain, as an information processing machine, surpasses any man-made computational device, both in terms of its capabilities and its efficiency. Neuroscience research has made great strides since the foundational works of Cajal and Golgi. However, we still have very little understanding about the algorithmic underpinnings of the brain as an information processor. Identifying mechanistic models of the functional building blocks of the brain will have significant impact not just on neuroscience, but also on artificial computational systems. This provides the main motivation for the work presented in this thesis, summarily i) biologically-inspired algorithms that can be efficiently implemented in silico, ii) functional identification of the processing in certain types of neural circuits, and iii) a collaborative ecosystem for brain research in a model organism, towards the synergistic goal of understanding functional mechanisms employed by the brain.
First, this thesis provides a highly parallelizable, biologically-inspired, motion detection algorithm that is based upon the temporal processing of the local (spatial) phase of a visual stimulus. The relation of the phase based motion detector to the widely studied Reichardt detector model, is discussed. Examples are provided comparing the performance of the proposed algorithm with the Reichardt detector as well as the optic flow algorithm, which is the workhorse for motion detection in computer vision. Further, it is shown through examples that the phase based motion detection model provides intuitive explanations for reverse-phi based illusory motion percepts.
Then, tractable algorithms are presented for decoding with and identification of neural circuits, comprised of processing that can be described by a second-order Volterra kernel (quadratic filter). It is shown that the Reichardt detector, as well as models of cortical complex cells, can be described by this structure. Examples are provided for decoding of visual stimuli encoded by a population of Reichardt detector cells and complex cells, as well as their identification from observed spike times. Further, the phase based motion detection model is shown to be equivalent to a second-order Volterra kernel acting on two normalized inputs. Subsequently, a general model that computes the ratio of two non-linear functionals, each comprising linear (first order Volterra kernel) and quadratic (second-order Volterra kernel) filters, is proposed. It is shown that, even under these highly non-linear operations, a population of cells can encode stimuli faithfully using a number of measurements that are proportional to the bandwidth of the input stimulus. Tractable algorithms are devised to identify the divisive normalization model and examples of identification are provided for both simulated and biological data. Additionally, an extended framework, comprising parallel channels of divisively normalized cells each subjected to further divisive normalization from lateral feedback connections, is proposed. An algorithm is formulated for identifying all the components in this extended framework from controlled stimulus presentation and observed outputs samples.
Finally, the thesis puts forward the Fruit Fly Brain Observatory (FFBO), an initiative to enable a collaborative ecosystem for fruit fly brain research. Key applications in FFBO, and the software and computational infrastructure enabling them, are described along with case studies
Survey on Combinatorial Register Allocation and Instruction Scheduling
Register allocation (mapping variables to processor registers or memory) and
instruction scheduling (reordering instructions to increase instruction-level
parallelism) are essential tasks for generating efficient assembly code in a
compiler. In the last three decades, combinatorial optimization has emerged as
an alternative to traditional, heuristic algorithms for these two tasks.
Combinatorial optimization approaches can deliver optimal solutions according
to a model, can precisely capture trade-offs between conflicting decisions, and
are more flexible at the expense of increased compilation time.
This paper provides an exhaustive literature review and a classification of
combinatorial optimization approaches to register allocation and instruction
scheduling, with a focus on the techniques that are most applied in this
context: integer programming, constraint programming, partitioned Boolean
quadratic programming, and enumeration. Researchers in compilers and
combinatorial optimization can benefit from identifying developments, trends,
and challenges in the area; compiler practitioners may discern opportunities
and grasp the potential benefit of applying combinatorial optimization
On Design and Implementation of Generic Fuzzy Logic Controllers
Soft computing techniques, unlike traditional deterministic logic based computing techniques, sometimes also called as hard computing, are tolerant of imprecision, uncertainty, and approximation. The primary inspiration for soft computing is the human mind and its ability to address day-to-day problems. The primary constituents of soft computing techniques are Artificial Neural Network, Fuzzy Logic Systems, and Evolutionary Computing. This thesis presents design and implementation of a generic hardware architecture based Type-IMamdani fuzzy logic controller (FLC) implemented on a programmable device, which can be remotely configured in real-time over Ethernet. This reconfigurability is added as a feature to existing FLCs in literature. It enables users to change parameters (those drive the FLC systems) in real-time and eliminate repeated hardware programming whenever there is a need. Realization of these systems in real-time is difficult as the computational complexity increases exponentially with an increase in the number of inputs. Hence challenge lies in reducing the Rulebase significantly such that the inference time and the throughput time is perceivable for real-time applications. To achieve these objectives, a modified thresholded fired rules hypercube (MT-FRHC) algorithm for Rulebase reduction is proposed and implemented. MT-FRHC reduces the useful rules without compromising system accuracy and improves the cycle time in terms of fuzzy logic operations per second (FzLOPS). It is imperative to understand that there are over sixty reconfigurable parameters, and it becomes an arduous task for a user to manage them. Therefore, a genetic algorithm based parameter extraction technique is proposed. This will help to develop a course tuning and provide default parameters that can be later fine-tuned by the users remotely through the Web-based User Interface. A hardware software codesign architecture for FLC is developed on TI C6748 DSP hardware with Sys/BIOS RTOS and seamlessly integrated with a webbased user interface (WebUI) for reconfigurability. Fuzzy systems employ defuzzifier to convert the fuzzy output into the real world crisp output. Centroid of Area (CoA) method is most widely used defuzzification method for control applications. However, the prevalent method of CoA computation is based on the principle of Riemann sum which is computationally complex. A vertices based CoA (VBCoA) defuzzification method is introduced. It has been observed that the proposed VBCoA method for COA computation is faster than the Riemann sum based CoA computation. A code optimization technique, exclusive to TI DSPs, is implemented to achieve memory and machine cycle optimization. The WebUI is developed in accordance to a client–server model using ASP.NET. It acquires fuzzy parameters from users, and a server application is dedicated to handling data communication between the hardware and the server. Testing and analysis of this hardware G-FLCS has been carried out by using hardware-in-loop test to control various system models in Simulink environment which includes water level control in a two tank system, intelligent cruise control system, speed control of an armature controlled DC motor and anti-windup control. The performance of the proposed G-FLCS is compared to Fuzzy Inference System of Matlab Fuzzy Logic Toolbox and PID controller in terms of settling time, transient time and steady state error. This proposed MT-FRHC based G-FLCS with VBCoA defuzzification implemented on C6748 DSP was finally deployed to control the radial position of plasma in Aditya Tokamak fusion reactor. The proposed G-FLCS is observed to deliver a smooth and fast system response
GPT4AIGChip: Towards Next-Generation AI Accelerator Design Automation via Large Language Models
The remarkable capabilities and intricate nature of Artificial Intelligence
(AI) have dramatically escalated the imperative for specialized AI
accelerators. Nonetheless, designing these accelerators for various AI
workloads remains both labor- and time-intensive. While existing design
exploration and automation tools can partially alleviate the need for extensive
human involvement, they still demand substantial hardware expertise, posing a
barrier to non-experts and stifling AI accelerator development. Motivated by
the astonishing potential of large language models (LLMs) for generating
high-quality content in response to human language instructions, we embark on
this work to examine the possibility of harnessing LLMs to automate AI
accelerator design. Through this endeavor, we develop GPT4AIGChip, a framework
intended to democratize AI accelerator design by leveraging human natural
languages instead of domain-specific languages. Specifically, we first perform
an in-depth investigation into LLMs' limitations and capabilities for AI
accelerator design, thus aiding our understanding of our current position and
garnering insights into LLM-powered automated AI accelerator design.
Furthermore, drawing inspiration from the above insights, we develop a
framework called GPT4AIGChip, which features an automated demo-augmented
prompt-generation pipeline utilizing in-context learning to guide LLMs towards
creating high-quality AI accelerator design. To our knowledge, this work is the
first to demonstrate an effective pipeline for LLM-powered automated AI
accelerator generation. Accordingly, we anticipate that our insights and
framework can serve as a catalyst for innovations in next-generation
LLM-powered design automation tools.Comment: Accepted by ICCAD 202
- …