111 research outputs found
Chrion: Optimizing Recurrent Neural Network Inference by Collaboratively Utilizing CPUs and GPUs
Deploying deep learning models in cloud clusters provides efficient and
prompt inference services to accommodate the widespread application of deep
learning. These clusters are usually equipped with host CPUs and accelerators
with distinct responsibilities to handle serving requests, i.e. generalpurpose
CPUs for input preprocessing and domain-specific GPUs for forward computation.
Recurrent neural networks play an essential role in handling temporal inputs
and display distinctive computation characteristics because of their high
inter-operator parallelism. Hence, we propose Chrion to optimize recurrent
neural network inference by collaboratively utilizing CPUs and GPUs. We
formulate the model deployment in the CPU-GPU cluster as an NP-hard scheduling
problem of directed acyclic graphs on heterogeneous devices. Given an input
model in the ONNX format and user-defined SLO requirement, Chrion firstly
preprocesses the model by model parsing and profiling, and then partitions the
graph to select execution devices for each operator. When an online request
arrives, Chrion performs forward computation according to the graph partition
by executing the operators on the CPU and GPU in parallel. Our experimental
results show that the execution time can be reduced by 19.4% at most in the
latency-optimal pattern and GPU memory footprint by 67.5% in the memory-optimal
pattern compared with the execution on the GPU
FedCP: Separating Feature Information for Personalized Federated Learning via Conditional Policy
Recently, personalized federated learning (pFL) has attracted increasing
attention in privacy protection, collaborative learning, and tackling
statistical heterogeneity among clients, e.g., hospitals, mobile smartphones,
etc. Most existing pFL methods focus on exploiting the global information and
personalized information in the client-level model parameters while neglecting
that data is the source of these two kinds of information. To address this, we
propose the Federated Conditional Policy (FedCP) method, which generates a
conditional policy for each sample to separate the global information and
personalized information in its features and then processes them by a global
head and a personalized head, respectively. FedCP is more fine-grained to
consider personalization in a sample-specific manner than existing pFL methods.
Extensive experiments in computer vision and natural language processing
domains show that FedCP outperforms eleven state-of-the-art methods by up to
6.69%. Furthermore, FedCP maintains its superiority when some clients
accidentally drop out, which frequently happens in mobile settings. Our code is
public at https://github.com/TsingZ0/FedCP.Comment: Accepted by KDD 202
FedALA: Adaptive Local Aggregation for Personalized Federated Learning
A key challenge in federated learning (FL) is the statistical heterogeneity
that impairs the generalization of the global model on each client. To address
this, we propose a method Federated learning with Adaptive Local Aggregation
(FedALA) by capturing the desired information in the global model for client
models in personalized FL. The key component of FedALA is an Adaptive Local
Aggregation (ALA) module, which can adaptively aggregate the downloaded global
model and local model towards the local objective on each client to initialize
the local model before training in each iteration. To evaluate the
effectiveness of FedALA, we conduct extensive experiments with five benchmark
datasets in computer vision and natural language processing domains. FedALA
outperforms eleven state-of-the-art baselines by up to 3.27% in test accuracy.
Furthermore, we also apply ALA module to other federated learning methods and
achieve up to 24.19% improvement in test accuracy.Comment: Accepted by AAAI 202
Eliminating Domain Bias for Federated Learning in Representation Space
Recently, federated learning (FL) is popular for its privacy-preserving and
collaborative learning abilities. However, under statistically heterogeneous
scenarios, we observe that biased data domains on clients cause a
representation bias phenomenon and further degenerate generic representations
during local training, i.e., the representation degeneration phenomenon. To
address these issues, we propose a general framework Domain Bias Eliminator
(DBE) for FL. Our theoretical analysis reveals that DBE can promote
bi-directional knowledge transfer between server and client, as it reduces the
domain discrepancy between server and client in representation space. Besides,
extensive experiments on four datasets show that DBE can greatly improve
existing FL methods in both generalization and personalization abilities. The
DBE-equipped FL method can outperform ten state-of-the-art personalized FL
methods by a large margin. Our code is public at
https://github.com/TsingZ0/DBE.Comment: Accepted by NeurIPS 2023, 24 page
GPFL: Simultaneously Learning Global and Personalized Feature Information for Personalized Federated Learning
Federated Learning (FL) is popular for its privacy-preserving and
collaborative learning capabilities. Recently, personalized FL (pFL) has
received attention for its ability to address statistical heterogeneity and
achieve personalization in FL. However, from the perspective of feature
extraction, most existing pFL methods only focus on extracting global or
personalized feature information during local training, which fails to meet the
collaborative learning and personalization goals of pFL. To address this, we
propose a new pFL method, named GPFL, to simultaneously learn global and
personalized feature information on each client. We conduct extensive
experiments on six datasets in three statistically heterogeneous settings and
show the superiority of GPFL over ten state-of-the-art methods regarding
effectiveness, scalability, fairness, stability, and privacy. Besides, GPFL
mitigates overfitting and outperforms the baselines by up to 8.99% in accuracy.Comment: Accepted by ICCV202
fault gouge graphitization as evidence of past seismic slip
One moderate- to large-magnitude earthquake (M > 6) nucleates in Earth's crust every three days n average, but the geological record of ancient fault slip at meters-per-second seismic velocities (as opposed to subseismic slow-slip creep) remains debated because of the lack of established fault-zone evidence of seismic slip. Here we show that the irreversible temperature-dependent transformation of carbonaceous material (CM, a constituent of many fault gouges) into graphite is a reliable tracer of seismic fault slip. We sheared CM-bearing fault rocks in the laboratory at just above subseismic and at seismic velocities under both water-rich and water-deficient conditions and modeled the temperature evolution with slip. By means of micro-Raman spectroscopy and focused-ion beam transmission electron microscopy, we detected graphite grains similar to those found in the principal slip zone of the A.D. 2008 Wenchuan (Mw 7.9) earthquake (southeast Tibet) only in experiments conducted at seismic velocities. The experimental evidence presented here suggests that high-temperature pulses associated with seismic slip induce graphitization of CM. Importantly, the occurrence of graphitized fault-zone CM may allow us to ascertain the seismogenic potential of faults in areas worldwide with incomplete historical earthquake catalogues
Tracking-assisted Weakly Supervised Online Visual Object Segmentation in Unconstrained Videos
This paper tackles the task of online video object segmentation with weak supervision, i.e., labeling the target object and background with pixel-level accuracy in unconstrained videos, given only one bounding box information in the first frame. We present a novel tracking-assisted visual object segmentation framework to achieve this. On the one hand, initialized with a given bounding box in the first frame, the auxiliary object tracking module guides the segmentation module frame by frame by providing motion and region information, which is usually missing in semi-supervised methods. Moreover, compared with the unsupervised approach, our approach with such minimum supervision can focus on the target object without bringing unrelated objects into the final results. On the other hand, the video object segmentation module also improves the robustness of the visual object tracking module by pixel-level localization and objectness information. Thus, segmentation and tracking in our framework can mutually help each other in an online manner. To verify the generality and effectiveness of the proposed framework, we evaluate our weakly supervised method on two cross-domain datasets, i.e., the DAVIS and VOT2016 datasets, with the same configuration and parameter setting. Experimental results show the top performance of our method, which is even better than the leading semi-supervised methods. Furthermore, we conduct the extensive ablation study on our approach to investigate the influence of each component and main parameters
Optimised phase disposition pulse-width modulation strategy for hybrid-clamped multilevel inverters using switching state sequences
This study describes an optimised modulation strategy based on switching state sequences for the hybrid-clamped multilevel converter. Two key control variables defined as 'phase shift angle' and 'switching state change' for a five-level hybrid-clamped inverter are proposed to improve all switches' operation, and by changing their values, different control methods can be obtained for modulation optimisation purposes. Two example methods can solve the voltage imbalance problem of the dc-link capacitors and furthermore avoid two switches' simultaneous switching transitions and improve the inverter's performance as compared with the traditional phase disposition pulse-width modulation strategy. A 6 kW prototype inverter is developed and a range of simulation and experiments are carried out for validation. It is found that simulation and experimental results are in a good agreement and the proposed modulation strategy is verified in terms of low-order harmonic reduction
Upregulation of MIAT Regulates LOXL2 Expression by Competitively Binding MiR-29c in Clear Cell Renal Cell Carcinoma
Background/Aims: MIAT is a long noncoding RNA (lncRNA) involved in cell proliferation and the development of tumor. However, the exact effects and molecular mechanisms of MIAT in clear cell renal cell carcinoma (ccRCC) progression are still unknown. Methods: We screened the lncRNAs’ profile of ccRCC in The Cancer Genome Atlas database, and then examined the expression levels of lncRNA MIAT in 45 paired ccRCC tissue specimens and in cell lines by q-RT-PCR. MTS, colony formation, EdU, and Transwell assays were performed to examine the effect of MIAT on proliferation and metastasis of ccRCC. Western blot and luciferase assays were performed to determine whether MIAT can regulate Loxl2 expression by competitively binding miR-29c in ccRCC. Results: MIAT was up-regulated in ccRCC tissues and cell lines. High MIAT expression correlated with worse clinicopathological features and shorter survival rate. Functional assays showed that knockdown of MIAT inhibited renal cancer cell proliferation and metastasis in vitro and in vivo. Luciferase and western blot assays further confirmed that miR-29c binds with MIAT. Additionally, the correlation of miR-29c with MIAT and Loxl2 was further verified in patients' samples. Conclusion: Our data indicated that MIAT might be an oncogenic lncRNA that promoted proliferation and metastasis of ccRCC, and could be a potential therapeutic target in human ccRCC
- …