44 research outputs found

    RMESH Algorithms for Parallel String Matching

    Get PDF
    String matching problem received much attention over the years due to its importance in various applications such as text/file comparison, DNA sequencing, search engines, and spelling correction. Especially with the introduction of search engines dealing with tremendous amount of textual information presented on the world wide web and the research on DNA sequencing, this problem deserves special attention and any algorithmic or hardware improvements to speed up the process will benefit these important applications. In this paper, we present three algorithms for string matching on reconfigurable mesh architectures. Given a text T of length n and a pattern P of length m, the first algorithm finds the exact matching between T and P in O(1) time on a 2-dimensional RMESH of size (n-m+1) * m. The second algorithm finds the approximate matching between T and P in O(k) time on a 2D RMESH, where k is the maximum edit distance between T and P. The third algorithm allows only the replacement operation in the calculation of the edit distance and finds an approximate matching between T and P in constant-time on a 3D RMESH

    A Field Programmable Gate Array Architecture for Two-Dimensional Partial Reconfiguration

    Get PDF
    Reconfigurable machines can accelerate many applications by adapting to their needs through hardware reconfiguration. Partial reconfiguration allows the reconfiguration of a portion of a chip while the rest of the chip is busy working on tasks. Operating system models have been proposed for partially reconfigurable machines to handle the scheduling and placement of tasks. They are called OS4RC in this dissertation. The main goal of this research is to address some problems that come from the gap between OS4RC and existing chip architectures and the gap between OS4RC models and practical applications. Some existing OS4RC models are based on an impractical assumption that there is no data exchange channel between IP (Intellectual Property) circuits residing on a Field Programmable Gate Array (FPGA) chip and between an IP circuit and FPGA I/O pins. For models that do not have such an assumption, their inter-IP communication channels have severe drawbacks. Those channels do not work well with 2-D partial reconfiguration. They are not suitable for intensive data stream processing. And frequently they are very complicated to design and very expensive. To address these problems, a new chip architecture that can better support inter-IP and IP-I/O communication is proposed and a corresponding OS4RC kernel is then specified. The proposed FPGA architecture is based on an array of clusters of configurable logic blocks, with each cluster serving as a partial reconfiguration unit, and a mesh of segmented buses that provides inter-IP and IP-I/O communication channels. The proposed OS4RC kernel takes care of the scheduling, placement, and routing of circuits under the constraints of the proposed architecture. Features of the new architecture in turns reduce the kernel execution times and enable the runtime scheduling, placement and routing. The area cost and the configuration memory size of the new chip architecture are calculated and analyzed. And the efficiency of the OS4RC kernel is evaluated via simulation using three different task models

    Run-time management of many-core SoCs: A communication-centric approach

    Get PDF
    The single core performance hit the power and complexity limits in the beginning of this century, moving the industry towards the design of multi- and many-core system-on-chips (SoCs). The on-chip communication between the cores plays a criticalrole in the performance of these SoCs, with power dissipation, communication latency, scalability to many cores, and reliability against the transistor failures as the main design challenges. Accordingly, we dedicate this thesis to the communicationcentered management of the many-core SoCs, with the goal to advance the state-ofthe-art in addressing these challenges. To this end, we contribute to on-chip communication of many-core SoCs in three main directions. First, we start with a synthesizable SoC with full system simulation. We demonstrate the importance of the networking overhead in a practical system, and propose our sophisticated network interface (NI) that offloads the work from SW to HW. Our results show around 5x and up to 50x higher network performance, compared to previous works. As the second direction of this thesis, we study the significance of run-time application mapping. We demonstrate that contiguous application mapping not only improves the network latency (by 23%) and power dissipation (by 50%), but also improves the system throughput (by 3%) and quality-of-service (QoS) of soft real-time applications (up to 100x less deadline misses). Also our hierarchical run-time application mapping provides 99.41% successful mapping when up to 8 links are broken. As the final direction of the thesis, we propose a fault-tolerant routing algorithm, the maze-routing. It is the first-in-class algorithm that provides guaranteed delivery, a fully-distributed solution, low area overhead (by 16x), and instantaneous reconfiguration (vs. 40K cycles down time of previous works), all at the same time. Besides the individual goals of each contribution, when applicable, we ensure that our solutions scale to extreme network sizes like 12x12 and 16x16. This thesis concludes that the communication overhead and its optimization play a significant role in the performance of many-core SoC

    Control of sectioned on-chip communication

    Get PDF

    Hypercube-Based Topologies With Incremental Link Redundancy.

    Get PDF
    Hypercube structures have received a great deal of attention due to the attractive properties inherent to their topology. Parallel algorithms targeted at this topology can be partitioned into many tasks, each of which running on one node processor. A high degree of performance is achievable by running every task individually and concurrently on each node processor available in the hypercube. Nevertheless, the performance can be greatly degraded if the node processors spend much time just communicating with one another. The goal in designing hypercubes is, therefore, to achieve a high ratio of computation time to communication time. The dissertation addresses primarily ways to enhance system performance by minimizing the communication time among processors. The need for improving the performance of hypercube networks is clearly explained. Three novel topologies related to hypercubes with improved performance are proposed and analyzed. Firstly, the Bridged Hypercube (BHC) is introduced. It is shown that this design is remarkably more efficient and cost-effective than the standard hypercube due to its low diameter. Basic routing algorithms such as one to one and broadcasting are developed for the BHC and proven optimal. Shortcomings of the BHC such as its asymmetry and limited application are clearly discussed. The Folded Hypercube (FHC), a symmetric network with low diameter and low degree of the node, is introduced. This new topology is shown to support highly efficient communications among the processors. For the FHC, optimal routing algorithms are developed and proven to be remarkably more efficient than those of the conventional hypercube. For both BHC and FHC, network parameters such as average distance, message traffic density, and communication delay are derived and comparatively analyzed. Lastly, to enhance the fault tolerance of the hypercube, a new design called Fault Tolerant Hypercube (FTH) is proposed. The FTH is shown to exhibit a graceful degradation in performance with the existence of faults. Probabilistic models based on Markov chain are employed to characterize the fault tolerance of the FTH. The results are verified by Monte Carlo simulation. The most attractive feature of all new topologies is the asymptotically zero overhead associated with them. The designs are simple and implementable. These designs can lead themselves to many parallel processing applications requiring high degree of performance

    Application of knowledge based engineering principles to intelligent automation systems

    Get PDF
    The automation of engineering processes provides many benefits over manual methods including significant cost and scheduling reduction as well as intangible advantages of greater consistency based on agreed methods, standardisation and simplification of complex problems and knowledge retention. Knowledge Based Engineering (KBE) and Design Automation (DA) are two sets of methodologies and technologies for automating engineering processes through software. KBE refers to the structured capture, modelling and deployment of engineering knowledge in high level intelligent systems that provide a wide scope of automation capability. KBE system development is supported by numerous mature methodologies that cover all aspects of the development process including: problem identification and feasibility studies, knowledge capture and modelling, and system design, development and deployment. Conversely, DA is the process of developing automated solutions to specific, well defined engineering tasks. The DA approach is characterised by agile software development methods, producing lower level systems that are intentionally limited in scope. DA-type solutions are more commonly adopted by industry than KBE applications due to shorter development schedules, lower cost and less complex development processes. However, DA application development is not as well supported by theoretical frameworks, and consequently, development processes can be unstructured and best practices not observed. The research presented in this thesis is divided into two key areas. Firstly, a methodology for automating engineering processes is proposed, with the aim of improving the accessibility of mature KBE methods to a broader industrial base. This methodology supports development of automation applications ranging in complexity from high level KBE systems to lower level DA applications. A complexity editing mechanism is introduced that relates detailed processes of KBE methodologies to a set of characteristics that can be exhibited by automated solutions. Depending on individual application requirements, complexity of automated solutions can lowered by deselecting one or more of these characteristics, omitting associated high-level processes from the development methodology. At the lowest level of complexity, the methodology provides a structured process for producing DA applications that incorporates principles of mature KBE methodologies. The second part of this research uses the proposed automation methodology to develop a system to automate the layout design of aircraft electrical harnesses. Increasing complexity of aircraft electrical systems has an associated increase in the number and size of electrical harnesses required to connect subsystems throughout the airframe. Current practices for layout design are highly manual, with many governing rules and best practices. The automation of this process will provide a significant reduction in low level, repetitive, manual work. The resulting automated routing tool implements path-finding techniques from computer game artificial intelligence and microprocessor design domains, together with new methods for incorporating the numerous design rules governing harness placement. The system was tested with a complex industrial test case, and was found to provide harness solutions in a fraction of the time and with comparable quality as equivalent manual design processes. The repeatability of the automated process can also minimise scheduling impacts caused by late design changes

    High level compilation for gate reconfigurable architectures

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001.Includes bibliographical references (p. 205-215).A continuing exponential increase in the number of programmable elements is turning management of gate-reconfigurable architectures as "glue logic" into an intractable problem; it is past time to raise this abstraction level. The physical hardware in gate-reconfigurable architectures is all low level - individual wires, bit-level functions, and single bit registers - hence one should look to the fetch-decode-execute machinery of traditional computers for higher level abstractions. Ordinary computers have machine-level architectural mechanisms that interpret instructions - instructions that are generated by a high-level compiler. Efficiently moving up to the next abstraction level requires leveraging these mechanisms without introducing the overhead of machine-level interpretation. In this dissertation, I solve this fundamental problem by specializing architectural mechanisms with respect to input programs. This solution is the key to efficient compilation of high-level programs to gate reconfigurable architectures. My approach to specialization includes several novel techniques. I develop, with others, extensive bitwidth analyses that apply to registers, pointers, and arrays. I use pointer analysis and memory disambiguation to target devices with blocks of embedded memory. My approach to memory parallelization generates a spatial hierarchy that enables easier-to-synthesize logic state machines with smaller circuits and no long wires.(cont.) My space-time scheduling approach integrates the techniques of high-level synthesis with the static routing concepts developed for single-chip multiprocessors. Using DeepC, a prototype compiler demonstrating my thesis, I compile a new benchmark suite to Xilinx Virtex FPGAs. Resulting performance is comparable to a custom MIPS processor, with smaller area (40 percent on average), higher evaluation speeds (2.4x), and lower energy (18x) and energy-delay (45x). Specialization of advanced mechanisms results in additional speedup, scaling with hardware area, at the expense of power. For comparison, I also target IBM's standard cell SA-27E process and the RAW microprocessor. Results include sensitivity analysis to the different mechanisms specialized and a grand comparison between alternate targets.by Jonathan William Babb.Ph.D

    Pathfinding Algorithm Optimization Via Evolution

    Get PDF
    Pathfinding is a popular computer science problem in both academic research and industrial development. The objective of pathfinding is to search for a path, often the shortest path, from one location to another on a graph. Many real world applications can be considered as pathfinding problems, including motion planning, video games, logistics, and decision making. Computer scientists have proposed different algorithms to efficiently search for the shortest path. A* search algorithm is the de facto pathfinding algorithm that uses a heuristic function to determine the best action to take based on the given information. It is the most popular pathfinding algorithm due to its simplicity and efficiency. The performance of A* is heavily dependent on the quality of the heuristic function. The heuristic function determines the search speed, accuracy, and memory consumption. Hence, designing good heuristic functions for specific domains becomes the primary research focus on pathfinding algorithm optimization. In this dissertation, we address and solve several commonly known challenges in pathfinding problems and A* algorithm. First, designing new heuristic functions is a difficult and time-consuming task, especially when they are used to solve complex problems. The task requires the user to have expert knowledge of the problem. Moreover, a single heuristic function might not be enough to digest all the provided information and return the best guidance during the search. Previous works suggest that multiple heuristics for complex problems can dramatically speed up the search. However, choosing the appropriate combination of heuristic functions is tricky. Current optimization approaches rely on hand-tuning the parameters via trial and error by engineers over many iterations. There is a need to reduce the difficulty of designing heuristic functions for search performance maximization. Our first contribution is to propose an improved A* with a self-evolving heuristic function named Evolutionary Heuristic A* (EHA*) that reduces engineering effort to design the heuristic function for A* and maximize the search performance. Our experiment results show that EHA* (i) preserves path optimality; (ii) is not limited to a particular application; (iii) speeds up the path searching process; and (iv) most importantly, dramatically reduces the difficulty for software engineers to design heuristic functions for A* search. Moreover, our work can be applied to other existing works on the performance improvement of A* search. Search, A* search suffers from poor performance on large search spaces. Although EHA* improves the quality of heuristic functions, large search space still leads to many unnecessary searches. Our second contribution is Regions Discovery Algorithm (RDA), a map clustering technique to partition a grid based map into different categories to reduce search spaces and increase search speed. Our approach reduces the size of search spaces by partitioning a graph into many segments and identifying the segments by their characteristics. By identifying segments in different categories, we can easily eliminate search space, such as rooms, that are not possible (better use needed?) to be part of the optimal solution. Unlike the existing approaches that might result in non-optimal solutions, our experiment results show that RDA guarantees optimal solutions. Our third contribution, the Hierarchical Evolutionary Heuristic A* (HEHA*), further improves the search ability of handling complex pathfinding problems and boosting the search performance, by reducing search spaces and exploiting parallelism techniques. HEHA* combines the strength of EHA* and RDA to reduce search spaces and improve search speed. HEHA* shows that it provides better search performance with less memory consumption. In the pre-processing phase, first HEHA* partitions a graph into different segments and then applies different optimized heuristic functions for each segment to maximize the search performance. During the online process, HEHA* searches on the abstract level first to reduce search area, and exploits parallelism to speed up the search. Fourth, we improve and apply HEHA* to Multi-Agent Pathfinding (MAPF) problems. MAPF is the fundamental problem of many robotic and logistic applications, where the main constraint is that all agents can find the shortest paths while not colliding with each other. While the current trend favors the central controlled system, our approach is to develop a distributed version of HEHA* that can efficiently plan the optimal path for each agent. Such a system requires data sharing and exchanging among the agents, so that each agent can make its own decision without a supervising system. Our experiment results show that the Multi-Agent version of HEHA* maintains a high success rate when the number of agents increases. While EHA* and HEHA* provide a novel approach for heuristic function design, the pre-processing times are not trivial. To boost the performance of the preprocessing steps in EHA* and HEHA*, we propose a FPGA-based reconfigurable hardware accelerator that is not bound to any specific applications as our fifth contribution. Since GA requires many independent processes, it is suitable to implement it in a hardware accelerator to gain maximum performance. We apply the following techniques to enhance performance: deep pipelining, reconfigurable computing, massive parallel processing, and degree of parallelism maximization. Our results show that the FPGA accelerator for EHA* improves the scalability, throughput, and latency

    Tissu numérique cellulaire à routage et configuration dynamiques

    Get PDF
    In the design of new machines or in the development of new concepts, mankind has often observed nature, looking for useful ideas and sources of inspiration. The design of electronic circuits is no exception, and a considerable number of realizations have drawn inspiration from three aspects of natural systems : the evolution of species (Phylogenesis), the development of an organism starting from a single cell (Ontogenesis), and learning, as performed by our brain (Epigenesis). These three axes, grouped under the acronym POE, have for the most part been exploited separately : evolutionary principles allow to solve problems for which it is hard to find a solution with a deterministic method, while some electronic circuits draw inspiration from healing process in living beings to achieve self-repair, and artificial neural networks have the capability to efficiently execute a wide range of tasks. At this time, no electronic tissue capable of bringing them together seems to exist. The introduction of reconfigurable circuits called Field Programmable Gate Arrays (FPGAs), whose behavior can be redefined as often as desired, made prototyping such systems easier. FPGAs, by allowing a relatively simple implementation in hardware, can considerably increase the systems' performance and are thus extensively used by researchers. However, they lack plasticity, not being able to easily modify themselves without an external intervention. This PhD thesis, developed in the framework of the European POEtic project, proposes to define a new reconfigurable electronic circuit, with the goal of supplying a new substrate for bio-inspired applications that bring all three axes into play. This circuit is mainly composed of a microprocessor and an array of reconfigurable elements, the latter having been designed during this thesis. Evolutionary processes are executed by the microprocessor, while epigenetic and ontogenetic mechanisms are applied in the reconfigurable array, to entities seen as multicellular artificial organisms. Relatively similar to current commercial FPGAs, this subsystem offers however some unique features. First, the basic elements of the array have the capability to partially reconfigure other elements. Auto-replication and differentiation mechanisms can exploit this capability to let an organism grow or to modify its behavior. Second, a distributed routing layer allows to dynamically create connections between parts of the circuit at runtime. With this feature, cells (artificial neurons, for example) implemented in the reconfigurable array can initiate new connections in order to modify the global system behavior. This distributed routing mechanism, one of the major contributions of this thesis, induced the realization of several algorithms. Based on a parallel implementation of Lee's algorithm, these algorithms are totally distributed, no global control being necessary to create new data paths. Four algorithms have been defined implemented in hardware in the form of routing units connected to 3, 4, 6, or 8 neighbors. These units are all identical and are responsible for the routing processes. An analysis of their properties allows us to define the best algorithm, coupled with the most efficient neighborhood, in terms of congestion and of the number of transistors needed for a hardware realization. We finish the routing chapter by proposing a fifth algorithm that, unlike the previous ones, is constructed only through local interactions between routing units. It offers a better scalability, at the price of increased hardware overhead. Finally, the POEtic chip, in which one of our algorithms has been implemented, has been physically realized. We present different POE mechanisms that take advantage of its new features. Among these mechanisms, we can notably cite auto-replication, evolvable hardware, developmental systems, and self-repair. All of these mechanisms have been developed with the help of a circuit simulator, also designed in the framework of this thesis
    corecore