204 research outputs found

    InkStream: Real-time GNN Inference on Streaming Graphs via Incremental Update

    Full text link
    Classic Graph Neural Network (GNN) inference approaches, designed for static graphs, are ill-suited for streaming graphs that evolve with time. The dynamism intrinsic to streaming graphs necessitates constant updates, posing unique challenges to acceleration on GPU. We address these challenges based on two key insights: (1) Inside the kk-hop neighborhood, a significant fraction of the nodes is not impacted by the modified edges when the model uses min or max as aggregation function; (2) When the model weights remain static while the graph structure changes, node embeddings can incrementally evolve over time by computing only the impacted part of the neighborhood. With these insights, we propose a novel method, InkStream, designed for real-time inference with minimal memory access and computation, while ensuring an identical output to conventional methods. InkStream operates on the principle of propagating and fetching data only when necessary. It uses an event-based system to control inter-layer effect propagation and intra-layer incremental updates of node embedding. InkStream is highly extensible and easily configurable by allowing users to create and process customized events. We showcase that less than 10 lines of additional user code are needed to support popular GNN models such as GCN, GraphSAGE, and GIN. Our experiments with three GNN models on four large graphs demonstrate that InkStream accelerates by 2.5-427×\times on a CPU cluster and 2.4-343×\times on two different GPU clusters while producing identical outputs as GNN model inference on the latest graph snapshot

    An audit of dressing practice by occupational therapists in acute stroke settings in England

    Get PDF
    Introduction Dressing independence is commonly affected after stroke, with clinical guidelines recommending that dressing practice should routinely be provided for those with dressing difficulties. The aim of this study was to update the literature on current practice in the treatment of dressing problems in stroke rehabilitation units. Method A questionnaire survey of occupational therapists experienced in stroke care was sent to 157 stroke units in England. Results Responses were received from 70 stroke units. Frequency and duration of dressing practice varied substantially between units, with respondents typically providing dressing practice for six to 10 patients per week and spending 30 to 45 minutes per treatment session. Only 17 respondents (24.3%) stated that they regularly used standardised assessments of dressing ability. The functional approach was used more widely than the remedial approach. Service priorities, working environment and limitations of time and staffing were reported to influence dressing practice. Conclusion There is widespread variability in dressing practice. There is a lack of use of standardised dressing assessments, and therapists’ rationale for their choice of approach is unclear

    Development of low cost motion-sensing system

    Full text link
    Micro-electro-mechanical system (MEMS) technology offers sensors with lower cost, smaller size, lower power consumption. In this paper, a kind of low cost motion-sensing system based MEMS sensors is developed. The objective of the design is low cost, small volume and light weight in order to be used in many fields. The constituting principle of the system is described. Algorithms and hardware of the system are researched. And the definition of coordinate, calculation of pose angle, transform of acceleration and calculation of the velocities and displacement of the moving object are presented with corresponding mathematics model and algorithms. The experiments are carried out in principle and results are given. It is proved that the low cost motion-sensing system is effective and correct.<br /

    Flip: Data-Centric Edge CGRA Accelerator

    Full text link
    Coarse-Grained Reconfigurable Arrays (CGRA) are promising edge accelerators due to the outstanding balance in flexibility, performance, and energy efficiency. Classic CGRAs statically map compute operations onto the processing elements (PE) and route the data dependencies among the operations through the Network-on-Chip. However, CGRAs are designed for fine-grained static instruction-level parallelism and struggle to accelerate applications with dynamic and irregular data-level parallelism, such as graph processing. To address this limitation, we present Flip, a novel accelerator that enhances traditional CGRA architectures to boost the performance of graph applications. Flip retains the classic CGRA execution model while introducing a special data-centric mode for efficient graph processing. Specifically, it exploits the natural data parallelism of graph algorithms by mapping graph vertices onto processing elements (PEs) rather than the operations, and supporting dynamic routing of temporary data according to the runtime evolution of the graph frontier. Experimental results demonstrate that Flip achieves up to 36×\times speedup with merely 19% more area compared to classic CGRAs. Compared to state-of-the-art large-scale graph processors, Flip has similar energy efficiency and 2.2×\times better area efficiency at a much-reduced power/area budget

    A framework of lightweight deep cross-connected convolution kernel mapping support vector machines

    Get PDF
    Deep kernel mapping support vector machines have achieved good results in numerous tasks by mapping features from a low-dimensional space to a high-dimensional space and then using support vector machines for classification. However, the depth kernel mapping support vector machine does not take into account the connection of different dimensional spaces and increases the model parameters. To further improve the recognition capability of deep kernel mapping support vector machines while reducing the number of model parameters, this paper proposes a framework of Lightweight Deep Convolutional Cross-Connected Kernel Mapping Support Vector Machines (LC-CKMSVM). The framework consists of a feature extraction module and a classification module. The feature extraction module first maps the data from low-dimensional to high-dimensional space by fusing the representations of different dimensional spaces through cross-connections; then, it uses depthwise separable convolution to replace part of the original convolution to reduce the number of parameters in the module; The classification module uses a soft margin support vector machine for classification. The results on 6 different visual datasets show that LC-CKMSVM obtains better classification accuracies on most cases than the other five models

    Fine-grained ship image recognition based on BCNN with inception and AM-Softmax

    Get PDF
    The fine-grained ship image recognition task aims to identify various classes of ships. However, small inter-class, large intra-class differences between ships, and lacking of training samples are the reasons that make the task difficult. Therefore, to enhance the accuracy of the fine-grained ship image recognition, we design a fine-grained ship image recognition network based on bilinear convolutional neural network (BCNN) with Inception and additive margin Softmax (AM-Softmax). This network improves the BCNN in two aspects. Firstly, by introducing Inception branches to the BCNN network, it is helpful to enhance the ability of extracting comprehensive features from ships. Secondly, by adding margin values to the decision boundary, the AM-Softmax function can better extend the inter-class differences and reduce the intra-class differences. In addition, as there are few publicly available datasets for fine-grained ship image recognition, we construct a Ship-43 dataset containing 47,300 ship images belonging to 43 categories. Experimental results on the constructed Ship-43 dataset demonstrate that our method can effectively improve the accuracy of ship image recognition, which is 4.08% higher than the BCNN model. Moreover, comparison results on the other three public fine-grained datasets (Cub, Cars, and Aircraft) further validate the effectiveness of the proposed method
    • …
    corecore