44 research outputs found

    Co-design Hardware and Algorithm for Vector Search

    Full text link
    Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents. As performance demands for vector search systems surge, accelerated hardware offers a promising solution in the post-Moore's Law era. We introduce \textit{FANNS}, an end-to-end and scalable vector search framework on FPGAs. Given a user-provided recall requirement on a dataset and a hardware resource budget, \textit{FANNS} automatically co-designs hardware and algorithm, subsequently generating the corresponding accelerator. The framework also supports scale-out by incorporating a hardware TCP/IP stack in the accelerator. \textit{FANNS} attains up to 23.0Ă—\times and 37.2Ă—\times speedup compared to FPGA and CPU baselines, respectively, and demonstrates superior scalability to GPUs, achieving 5.5Ă—\times and 7.6Ă—\times speedup in median and 95\textsuperscript{th} percentile (P95) latency within an eight-accelerator configuration. The remarkable performance of \textit{FANNS} lays a robust groundwork for future FPGA integration in data centers and AI supercomputers.Comment: 11 page

    MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions

    Full text link
    Deep neural networks are widely used in personalized recommendation systems. Unlike regular DNN inference workloads, recommendation inference is memory-bound due to the many random memory accesses needed to lookup the embedding tables. The inference is also heavily constrained in terms of latency because producing a recommendation for a user must be done in about tens of milliseconds. In this paper, we propose MicroRec, a high-performance inference engine for recommendation systems. MicroRec accelerates recommendation inference by (1) redesigning the data structures involved in the embeddings to reduce the number of lookups needed and (2) taking advantage of the availability of High-Bandwidth Memory (HBM) in FPGA accelerators to tackle the latency by enabling parallel lookups. We have implemented the resulting design on an FPGA board including the embedding lookup step as well as the complete inference process. Compared to the optimized CPU baseline (16 vCPU, AVX2-enabled), MicroRec achieves 13.8~14.7x speedup on embedding lookup alone and 2.5$~5.4x speedup for the entire recommendation inference in terms of throughput. As for latency, CPU-based engines needs milliseconds for inferring a recommendation while MicroRec only takes microseconds, a significant advantage in real-time recommendation systems.Comment: Accepted by MLSys'21 (the 4th Conference on Machine Learning and Systems

    Molecular Composition of Oxygenated Organic Molecules and Their Contributions to Organic Aerosol in Beijing

    Get PDF
    The understanding at a molecular level of ambient secondary organic aerosol (SOA) formation is hampered by poorly constrained formation mechanisms and insufficient analytical methods. Especially in developing countries, SOA related haze is a great concern due to its significant effects on climate and human health. We present simultaneous measurements of gas-phase volatile organic compounds (VOCs), oxygenated organic molecules (OOMs), and particle-phase SOA in Beijing. We show that condensation of the measured OOMs explains 26-39% of the organic aerosol mass growth, with the contribution of OOMs to SOA enhanced during severe haze episodes. Our novel results provide a quantitative molecular connection from anthropogenic emissions to condensable organic oxidation product vapors, their concentration in particle-phase SOA, and ultimately to haze formation.Peer reviewe

    Sedimentary facies and variation of stable isotope composition of Upper Cambrian to Lower Ordovician strata in Southern Missouri: Implications for the origin of MVT deposits, and the geochemical and hydrological features of regional ore-forming fluids

    No full text
    Upper Cambrian and Lower Ordovician rocks in southeastern Missouri host the world-class Mississippi Valley-type (MVT) lead-zinc deposits of the region. Sedimentary facies of the lower part of the Upper Cambrian section are dominated by distinct clastic and carbonate facies belts associated with a high relief Precambrian topography. The upper part of the Upper Cambrian (post-Davis Formation) and the Lower Ordovician section were deposited under epeiric sea conditions on a low relief topography. These latter rocks are characterized by cyclic sequences of shallow water platform carbonates, with little lateral variation of facies. The Davis Formation is composed of interbedded carbonates and shales, and forms an effective aquiclude separating the upper and lower parts of the section into two distinct aquifers. Petrographic and cathodoluminescent studies of epigenetic dolomite cements in Cambro-Ordovician rocks document that: (1) dolomite cements of the Bonneterre Dolomite (lower aquifer) in the Viburnum Trend and the Old Lead Belt, which are related closely to Pb-Zn mineralization, have a relatively complex, four zone CL pattern; (2) dolomite cements, in the Bonneterre and Davis Formations, which are not related spatially to mineralization commonly display less complex CL patterns; (3) dolomite cements in the post-Davis part of the Cambrian and in the lower Ordovician section (upper aquifer) display a CL stratigraphy which appears to be unrelated to that observed lower in the section. Carbon isotope compositions of host dolomite show two types of statistical variation. From the bottom of the Bonneterre Dolomite to the top of the Davis Formation, δ13C values become higher (from -2.5 toward +3.0%). Above the Davis Formation through lower Ordovician strata, the trend reverses and δ13C values become lower (toward -3.0%o). A similar trend exists for δ18O values in the Bonneterre and Davis Formations, as values become higher up section. However, above the Davis Formation, δ18O values for host dolomites display no statistical trends. The trend of upwardly decreasing δ13C values in post-Davis rocks may be the result of a secular trend in ocean carbon during this time. The trend of upwardly increasing δ13C and δ18O values in the Bonneterre Dolomite and Davis Formation is likely the result of interaction with hydrothermal fluids emanating from the underlying Lamotte Sandstone, reflecting increased buffering by host dolomite as 12C- and 16O-enriched fluids moved higher in the section. Distribution of sedimentary facies had a profound effect on the hydrological framework of southern Missouri during the period of MVT Pb-Zn mineralization. Linear facies belts that developed on high-relief topography during Bonneterre and Davis time resulted in focused fluid flow and a greater degree of alteration of host dolomite. Broad, laterally continuous distribution of sedimentary facies in post-Davis rocks resulted in less focused fluid flow and alteration of the host dolomite. The distinct C and O isotopic trend observed in the Bonneterre-Davis Formations versus that observed in post-Davis rocks, coupled with differences in CL microstratigraphies of dolomite cements, indicate that these two parts of the section acted as distinct aquifers, with relatively little fluid communication during the Pb-Zn mineralizing event --Abstract, pages iii-iv

    Detecting anomalies in large number of moving objects

    No full text
    The need of detection of patterns and behaviors has been increasing in demand in the recent years as the quantity of moving objects rises. Examples of moving objects can be vehicles, human beings, animals or even vessels. By acquiring the positions of moving objects and analyzing them, we can find out the behaviors of the subjects (moving objects). Any behavior that deviates from the normal pattern can be used to interpret as urgent or even important to the subject. There are existing sources, reports on the geometric attributes of the positions, trajectories of moving objects; however the other important properties such as the semantics and the background geographical information are often left out. The objective of this FYP is to design and implement a program to do detection of patterns and moving objects anomalies from historical logs. The program will take in files containing geometric attributes of a human being and converting the data into a file that can be displayed onto Google Earth. Based on the current geometric position of the subject and the historical logs of previous travels, the program can detect any abnormal patterns and behaviors made by the subject.Bachelor of Engineering (Computer Science

    EasyNet: 100 Gbps Network for HLS

    No full text
    The massive deployment of FPGAs in data centers is opening up new opportunities for accelerating distributed applications. However, developing a distributed FPGA application remains difficult for two reasons. First, commonly available development frameworks (e.g., Xilinx Vitis) lack explicit support for networking. Developers are, thus, forced to build their own infrastructure to handle the data movement between the host, the FPGA, and the network. Second, distributed applications are made even more complex by using low level interfaces to access the network and process packets. Ideally, one needs to combine high performance with a simple interface for both point-to-point and collective operations. To overcome these inefficiencies and enable further research in networking and distributed application on FPGAs, we first show how to integrate an open-source 100 Gbps TCP/IP stack into a state-of-the-art FPGA development framework (Xilinx Vitis) without degrading its performance. Further, we provide a set of MPI-like communication primitives for both point-to-point and collective operations as a High Level Synthesis (HLS) library. Our point-to-point primitives saturate a 100 Gbps link and our collective primitives achieve low latency. With our approach, developers can write hardware kernels in high level languages with the network abstracted away behind standard interfaces. To evaluate the ease of use and performance in a real application, we distribute a K-Means algorithm with the new stack and achieve a 1.9X and 3.5X throughput increase with 2 FPGAs and 4 FPGAs respectively

    A Flexible K-Means Operator for Hybrid Databases

    No full text
    The K-means algorithm is widely used in unsupervised learning and data exploration. It is less used in analytical databases due to its high computational cost. K-means has been explored in great detail, mostly focusing on performance. However, in emerging hybrid CPU-FPGA databases where memory bandwidth is shared across software and hardware operators, two additional requirements arise. One is parameterization to avoid frequent reprogramming. The other is concurrent use to balance memory bandwidth and computation. Our design supports two operational modes that can be chosen at runtime, one for high query throughput and one for evaluating multiple clusters concurrently. The former targets speed up, while the latter targets efficient bandwidth utilization by increasing the amount of computation per input byte. Our design is competitive when compared to both existing FPGA-based solutions as well as highly optimized multi-core software implementations

    Experiment on Natural Frequency Change of Reinforced Concrete Members under Low Cycle Loading

    No full text
    The natural frequency change of reinforced concrete (RC) members during damage when subjected to low cycle loading was studied through horizontal cyclic loading experiments. Three groups of RC flexural members were subjected to horizontal, harmonic, low cycle loading to simulate earthquake conditions. The relation of instantaneous load, instantaneous displacement, and instantaneous natural frequency during loading was deduced. Using the resulting equation, the test members’ natural frequencies at any moment during loading could be calculated accurately. Then the natural frequency change curves and their fitting equations were also obtained. The impact of loading period T and loading amplitude A on a test member’s damage rate V was analyzed, which showed that the impact of T on V was quadratic, and the relation between A and V was linear. Finally, by fitting experimental data of number of loading cycles N, loading amplitude A, loading period T, and natural frequency ω, a three-variable function, ω(N, A, T), was determined, revealing the change process of test members’ frequencies under arbitrary harmonic vibrations
    corecore