111 research outputs found

    Chrion: Optimizing Recurrent Neural Network Inference by Collaboratively Utilizing CPUs and GPUs

    Full text link
    Deploying deep learning models in cloud clusters provides efficient and prompt inference services to accommodate the widespread application of deep learning. These clusters are usually equipped with host CPUs and accelerators with distinct responsibilities to handle serving requests, i.e. generalpurpose CPUs for input preprocessing and domain-specific GPUs for forward computation. Recurrent neural networks play an essential role in handling temporal inputs and display distinctive computation characteristics because of their high inter-operator parallelism. Hence, we propose Chrion to optimize recurrent neural network inference by collaboratively utilizing CPUs and GPUs. We formulate the model deployment in the CPU-GPU cluster as an NP-hard scheduling problem of directed acyclic graphs on heterogeneous devices. Given an input model in the ONNX format and user-defined SLO requirement, Chrion firstly preprocesses the model by model parsing and profiling, and then partitions the graph to select execution devices for each operator. When an online request arrives, Chrion performs forward computation according to the graph partition by executing the operators on the CPU and GPU in parallel. Our experimental results show that the execution time can be reduced by 19.4% at most in the latency-optimal pattern and GPU memory footprint by 67.5% in the memory-optimal pattern compared with the execution on the GPU

    FedCP: Separating Feature Information for Personalized Federated Learning via Conditional Policy

    Full text link
    Recently, personalized federated learning (pFL) has attracted increasing attention in privacy protection, collaborative learning, and tackling statistical heterogeneity among clients, e.g., hospitals, mobile smartphones, etc. Most existing pFL methods focus on exploiting the global information and personalized information in the client-level model parameters while neglecting that data is the source of these two kinds of information. To address this, we propose the Federated Conditional Policy (FedCP) method, which generates a conditional policy for each sample to separate the global information and personalized information in its features and then processes them by a global head and a personalized head, respectively. FedCP is more fine-grained to consider personalization in a sample-specific manner than existing pFL methods. Extensive experiments in computer vision and natural language processing domains show that FedCP outperforms eleven state-of-the-art methods by up to 6.69%. Furthermore, FedCP maintains its superiority when some clients accidentally drop out, which frequently happens in mobile settings. Our code is public at https://github.com/TsingZ0/FedCP.Comment: Accepted by KDD 202

    FedALA: Adaptive Local Aggregation for Personalized Federated Learning

    Full text link
    A key challenge in federated learning (FL) is the statistical heterogeneity that impairs the generalization of the global model on each client. To address this, we propose a method Federated learning with Adaptive Local Aggregation (FedALA) by capturing the desired information in the global model for client models in personalized FL. The key component of FedALA is an Adaptive Local Aggregation (ALA) module, which can adaptively aggregate the downloaded global model and local model towards the local objective on each client to initialize the local model before training in each iteration. To evaluate the effectiveness of FedALA, we conduct extensive experiments with five benchmark datasets in computer vision and natural language processing domains. FedALA outperforms eleven state-of-the-art baselines by up to 3.27% in test accuracy. Furthermore, we also apply ALA module to other federated learning methods and achieve up to 24.19% improvement in test accuracy.Comment: Accepted by AAAI 202

    Eliminating Domain Bias for Federated Learning in Representation Space

    Full text link
    Recently, federated learning (FL) is popular for its privacy-preserving and collaborative learning abilities. However, under statistically heterogeneous scenarios, we observe that biased data domains on clients cause a representation bias phenomenon and further degenerate generic representations during local training, i.e., the representation degeneration phenomenon. To address these issues, we propose a general framework Domain Bias Eliminator (DBE) for FL. Our theoretical analysis reveals that DBE can promote bi-directional knowledge transfer between server and client, as it reduces the domain discrepancy between server and client in representation space. Besides, extensive experiments on four datasets show that DBE can greatly improve existing FL methods in both generalization and personalization abilities. The DBE-equipped FL method can outperform ten state-of-the-art personalized FL methods by a large margin. Our code is public at https://github.com/TsingZ0/DBE.Comment: Accepted by NeurIPS 2023, 24 page

    GPFL: Simultaneously Learning Global and Personalized Feature Information for Personalized Federated Learning

    Full text link
    Federated Learning (FL) is popular for its privacy-preserving and collaborative learning capabilities. Recently, personalized FL (pFL) has received attention for its ability to address statistical heterogeneity and achieve personalization in FL. However, from the perspective of feature extraction, most existing pFL methods only focus on extracting global or personalized feature information during local training, which fails to meet the collaborative learning and personalization goals of pFL. To address this, we propose a new pFL method, named GPFL, to simultaneously learn global and personalized feature information on each client. We conduct extensive experiments on six datasets in three statistically heterogeneous settings and show the superiority of GPFL over ten state-of-the-art methods regarding effectiveness, scalability, fairness, stability, and privacy. Besides, GPFL mitigates overfitting and outperforms the baselines by up to 8.99% in accuracy.Comment: Accepted by ICCV202

    fault gouge graphitization as evidence of past seismic slip

    Get PDF
    One moderate- to large-magnitude earthquake (M > 6) nucleates in Earth's crust every three days n average, but the geological record of ancient fault slip at meters-per-second seismic velocities (as opposed to subseismic slow-slip creep) remains debated because of the lack of established fault-zone evidence of seismic slip. Here we show that the irreversible temperature-dependent transformation of carbonaceous material (CM, a constituent of many fault gouges) into graphite is a reliable tracer of seismic fault slip. We sheared CM-bearing fault rocks in the laboratory at just above subseismic and at seismic velocities under both water-rich and water-deficient conditions and modeled the temperature evolution with slip. By means of micro-Raman spectroscopy and focused-ion beam transmission electron microscopy, we detected graphite grains similar to those found in the principal slip zone of the A.D. 2008 Wenchuan (Mw 7.9) earthquake (southeast Tibet) only in experiments conducted at seismic velocities. The experimental evidence presented here suggests that high-temperature pulses associated with seismic slip induce graphitization of CM. Importantly, the occurrence of graphitized fault-zone CM may allow us to ascertain the seismogenic potential of faults in areas worldwide with incomplete historical earthquake catalogues

    Tracking-assisted Weakly Supervised Online Visual Object Segmentation in Unconstrained Videos

    Get PDF
    This paper tackles the task of online video object segmentation with weak supervision, i.e., labeling the target object and background with pixel-level accuracy in unconstrained videos, given only one bounding box information in the first frame. We present a novel tracking-assisted visual object segmentation framework to achieve this. On the one hand, initialized with a given bounding box in the first frame, the auxiliary object tracking module guides the segmentation module frame by frame by providing motion and region information, which is usually missing in semi-supervised methods. Moreover, compared with the unsupervised approach, our approach with such minimum supervision can focus on the target object without bringing unrelated objects into the final results. On the other hand, the video object segmentation module also improves the robustness of the visual object tracking module by pixel-level localization and objectness information. Thus, segmentation and tracking in our framework can mutually help each other in an online manner. To verify the generality and effectiveness of the proposed framework, we evaluate our weakly supervised method on two cross-domain datasets, i.e., the DAVIS and VOT2016 datasets, with the same configuration and parameter setting. Experimental results show the top performance of our method, which is even better than the leading semi-supervised methods. Furthermore, we conduct the extensive ablation study on our approach to investigate the influence of each component and main parameters

    Optimised phase disposition pulse-width modulation strategy for hybrid-clamped multilevel inverters using switching state sequences

    Get PDF
    This study describes an optimised modulation strategy based on switching state sequences for the hybrid-clamped multilevel converter. Two key control variables defined as 'phase shift angle' and 'switching state change' for a five-level hybrid-clamped inverter are proposed to improve all switches' operation, and by changing their values, different control methods can be obtained for modulation optimisation purposes. Two example methods can solve the voltage imbalance problem of the dc-link capacitors and furthermore avoid two switches' simultaneous switching transitions and improve the inverter's performance as compared with the traditional phase disposition pulse-width modulation strategy. A 6 kW prototype inverter is developed and a range of simulation and experiments are carried out for validation. It is found that simulation and experimental results are in a good agreement and the proposed modulation strategy is verified in terms of low-order harmonic reduction

    Upregulation of MIAT Regulates LOXL2 Expression by Competitively Binding MiR-29c in Clear Cell Renal Cell Carcinoma

    Get PDF
    Background/Aims: MIAT is a long noncoding RNA (lncRNA) involved in cell proliferation and the development of tumor. However, the exact effects and molecular mechanisms of MIAT in clear cell renal cell carcinoma (ccRCC) progression are still unknown. Methods: We screened the lncRNAs’ profile of ccRCC in The Cancer Genome Atlas database, and then examined the expression levels of lncRNA MIAT in 45 paired ccRCC tissue specimens and in cell lines by q-RT-PCR. MTS, colony formation, EdU, and Transwell assays were performed to examine the effect of MIAT on proliferation and metastasis of ccRCC. Western blot and luciferase assays were performed to determine whether MIAT can regulate Loxl2 expression by competitively binding miR-29c in ccRCC. Results: MIAT was up-regulated in ccRCC tissues and cell lines. High MIAT expression correlated with worse clinicopathological features and shorter survival rate. Functional assays showed that knockdown of MIAT inhibited renal cancer cell proliferation and metastasis in vitro and in vivo. Luciferase and western blot assays further confirmed that miR-29c binds with MIAT. Additionally, the correlation of miR-29c with MIAT and Loxl2 was further verified in patients' samples. Conclusion: Our data indicated that MIAT might be an oncogenic lncRNA that promoted proliferation and metastasis of ccRCC, and could be a potential therapeutic target in human ccRCC
    • …
    corecore