967 research outputs found

    Doctor of Philosophy

    Get PDF
    dissertationThe embedded system space is characterized by a rapid evolution in the complexity and functionality of applications. In addition, the short time-to-market nature of the business motivates the use of programmable devices capable of meeting the conflicting constraints of low-energy, high-performance, and short design times. The keys to achieving these conflicting constraints are specialization and maximally extracting available application parallelism. General purpose processors are flexible but are either too power hungry or lack the necessary performance. Application-specific integrated circuits (ASICS) efficiently meet the performance and power needs but are inflexible. Programmable domain-specific architectures (DSAs) are an attractive middle ground, but their design requires significant time, resources, and expertise in a variety of specialties, which range from application algorithms to architecture and ultimately, circuit design. This dissertation presents CoGenE, a design framework that automates the design of energy-performance-optimal DSAs for embedded systems. For a given application domain and a user-chosen initial architectural specification, CoGenE consists of a a Compiler to generate execution binary, a simulator Generator to collect performance/energy statistics, and an Explorer that modifies the current architecture to improve energy-performance-area characteristics. The above process repeats automatically until the user-specified constraints are achieved. This removes or alleviates the time needed to understand the application, manually design the DSA, and generate object code for the DSA. Thus, CoGenE is a new design methodology that represents a significant improvement in performance, energy dissipation, design time, and resources. This dissertation employs the face recognition domain to showcase a flexible architectural design methodology that creates "ASIC-like" DSAs. The DSAs are instruction set architecture (ISA)-independent and achieve good energy-performance characteristics by coscheduling the often conflicting constraints of data access, data movement, and computation through a flexible interconnect. This represents a significant increase in programming complexity and code generation time. To address this problem, the CoGenE compiler employs integer linear programming (ILP)-based 'interconnect-aware' scheduling techniques for automatic code generation. The CoGenE explorer employs an iterative technique to search the complete design space and select a set of energy-performance-optimal candidates. When compared to manual designs, results demonstrate that CoGenE produces superior designs for three application domains: face recognition, speech recognition and wireless telephony. While CoGenE is well suited to applications that exhibit a streaming behavior, multithreaded applications like ray tracing present a different but important challenge. To demonstrate its generality, CoGenE is evaluated in designing a novel multicore N-wide SIMD architecture, known as StreamRay, for the ray tracing domain. CoGenE is used to synthesize the SIMD execution cores, the compiler that generates the application binary, and the interconnection subsystem. Further, separating address and data computations in space reduces data movement and contention for resources, thereby significantly improving performance compared to existing ray tracing approaches

    Working With Incremental Spatial Data During Parallel (GPU) Computation

    Get PDF
    Central to many complex systems, spatial actors require an awareness of their local environment to enable behaviours such as communication and navigation. Complex system simulations represent this behaviour with Fixed Radius Near Neighbours (FRNN) search. This algorithm allows actors to store data at spatial locations and then query the data structure to find all data stored within a fixed radius of the search origin. The work within this thesis answers the question: What techniques can be used for improving the performance of FRNN searches during complex system simulations on Graphics Processing Units (GPUs)? It is generally agreed that Uniform Spatial Partitioning (USP) is the most suitable data structure for providing FRNN search on GPUs. However, due to the architectural complexities of GPUs, the performance is constrained such that FRNN search remains one of the most expensive common stages between complex systems models. Existing innovations to USP highlight a need to take advantage of recent GPU advances, reducing the levels of divergence and limiting redundant memory accesses as viable routes to improve the performance of FRNN search. This thesis addresses these with three separate optimisations that can be used simultaneously. Experiments have assessed the impact of optimisations to the general case of FRNN search found within complex system simulations and demonstrated their impact in practice when applied to full complex system models. Results presented show the performance of the construction and query stages of FRNN search can be improved by over 2x and 1.3x respectively. These improvements allow complex system simulations to be executed faster, enabling increases in scale and model complexity

    Efficient Wave-based Sound Propagation and Optimization for Computer-Aided Design

    Get PDF
    Acoustic phenomena have a large impact on our everyday lives, from influencing our enjoyment of music in a concert hall, to affecting our concentration at school or work, to potentially negatively impacting our health through deafening noises. The sound that reaches our ears is absorbed, reflected, and filtered by the shape, topology, and materials present in the environment. However, many computer simulation techniques for solving these sound propagation problems are either computationally expensive or inaccurate. Additionally, the costs of some methods are dramatically increased in design optimization processes in which several iterations of sound propagation evaluation are necessary. The primary goal of this dissertation is to present techniques for efficiently solving the sound propagation problem and related optimization problems for computer-aided design. First, we propose a parallel method for solving large acoustic propagation problems, scalable to tens of thousands of cores. Second, we present two novel techniques for optimizing certain acoustic characteristics such as reverberation time or sound clarity using wave-based sound propagation. Finally, we show how hybrid sound propagation algorithms can be used to improve the performance of acoustic optimization problems and present two algorithms for noise minimization and speech intelligibility improvement that use this hybrid approach. All the algorithms we present are evaluated on various benchmarks that are computer models of architectural scenes. These benchmarks include challenging environments for existing sound propagation algorithms, such as large indoor or outdoor scenes, structural complex scenes, or the prevalence of difficult-to-model sound propagation phenomena. Using the techniques put forth in this dissertation, we can solve many challenging sound propagation and optimization problems on the scenes in an efficient manner. We are able to accurately model sound propagation using wave-based approaches up to \SI{10}{\kilo\hertz} (the full range of human speech) and for the full range of human hearing (22kHz) using our hybrid approach. Our noise minimization methods show improvements of up to 13dB in noise reduction on some scenes, and we show a 71\% improvement in speech intelligibility using our algorithm.Doctor of Philosoph

    Shadow Computations using Robust Epsilon Visibility

    Get PDF
    Analytic visibility algorithms, for example methods which compute a subdivided mesh to represent shadows, are notoriously unrobust and hard to use in practice. We present a new method based on a generalized definition of extremal stabbing lines, which are the extremities of shadow boundaries. We treat scenes containing multiple edges or vertices in degenerate configurations, (e.g., collinear or coplanar). We introduce a robust epsilon method to determine whether each generalized extremal stabbing line is blocked, or is touched by these scene elements, and thus added to the line's generators. We develop robust blocker predicates for polygons which are smaller than epsilon. For larger values, small shadow features merge and eventually disappear. We can thus robustly connect generalized extremal stabbing lines in degenerate scenes to form shadow boundaries. We show that our approach is consistent, and that shadow boundary connectivity is preserved when features merge. We have implemented our algorithm, and show that we can robustly compute analytic shadow boundaries to the precision of our chosen epsilon threshold for non-trivial models, containing numerous degeneracies

    Localization and cooperative communication methods for cognitive radio

    Get PDF
    We study localization of nearby nodes and cooperative communication for cognitive radios. Cognitive radios sensing their environment to estimate the channel gain between nodes can cooperate and adapt their transmission power to maximize the capacity of the communication between two nodes. We study the end-to-end capacity of a cooperative relaying scheme using orthogonal frequency-division modulation (OFDM) modulation, under power constraints for both the base station and the relay station. The relay uses amplify-and-forward and decodeand-forward cooperative relaying techniques to retransmit messages on a subset of the available subcarriers. The power used in the base station and the relay station transmitters is allocated to maximize the overall system capacity. The subcarrier selection and power allocation are obtained based on convex optimization formulations and an iterative algorithm. Additionally, decode-and-forward relaying schemes are allowed to pair source and relayed subcarriers to increase further the capacity of the system. The proposed techniques outperforms non-selective relaying schemes over a range of relay power budgets. Cognitive radios can be used for opportunistic access of the radio spectrum by detecting spectrum holes left unused by licensed primary users. We introduce a spectrum holes detection approach, which combines blind modulation classification, angle of arrival estimation and number of sources detection. We perform eigenspace analysis to determine the number of sources, and estimate their angles of arrival (AOA). In addition, we classify detected sources as primary or secondary users with their distinct second-orde one-conjugate cyclostationarity features. Extensive simulations carried out indicate that the proposed system identifies and locates individual sources correctly, even at -4 dB signal-to-noise ratios (SNR). In environments with a high density of scatterers, several wireless channels experience non-line-of-sight (NLOS) condition, increasing the localization error, even when the AOA estimate is accurate. We present a real-time localization solver (RTLS) for time-of-arrival (TOA) estimates using ray-tracing methods on the map of the geometry of walls and compare its performance with classical TOA trilateration localization methods. Extensive simulations and field trials for indoor environments show that our method increases the coverage area from 1.9% of the floor to 82.3 % and the accuracy by a 10-fold factor when compared with trilateration. We implemented our ray tracing model in C++ using the CGAL computational geometry algorithm library. We illustrate the real-time property of our RTLS that performs most ray tracing tasks in a preprocessing phase with time and space complexity analyses and profiling of our software

    AUTOMATIC PORTAL GENERATION FOR 3D AUDIO - FROM TRIANGLE SOUP TO A PORTAL SYSTEM

    Get PDF
    The purpose of this paper is to investigate an algorithm for generating an automatic portal system. This has been accomplished based on a given set of triangles. The proposed solution was designed to enhance the performance of a sound beam-tracing engine. This solution can also be used for other areas where portal systems are applicable. The provided technical solution emphasizes the beam tracing engine's requirements. Our approach is based on the work of Haumont et al. (with additional improvements), resulting in improved scene segmentation and lower computational complexity. We examined voxelization techniques and their properties, and have adjusted these to fit the requirements of a beam-tracing engine. As a result of our investigation, a new method for finding portal placement has been developed by adjusting the orientation of the found portals to fit the neighboring scene walls. In addition, we replaced Haumont et al.'s prevoxelization step, which is used for erasing geometrical details (for example, thin walls). This was done by smoothing the distance field that, in effect, eliminated incorrectly positioned portals. The results of our work remove the requirement for walls that separate rooms to have a particular thickness. We also describe a method for building a structure that accelerates real-time queries for determining the area where a given point is located. All of the presented techniques allow for the use of larger sized voxels, which increases performance and reduces memory requirements (not only during the preprocessing phase but also during real-time usage). The proposed solutions were tested using scenarios with scenes of varying complexity

    Visualization for the Physical Sciences

    Get PDF
    • …
    corecore