81 research outputs found
A Density Peak-Based Clustering Approach for Fault Diagnosis of Photovoltaic Arrays
Fault diagnosis of photovoltaic (PV) arrays plays a significant role in safe and reliable operation of PV systems. In this paper, the distribution of the PV systems’ daily operating data under different operating conditions is analyzed. The results show that the data distribution features significant nonspherical clustering, the cluster center has a relatively large distance from any points with a higher local density, and the cluster number cannot be predetermined. Based on these features, a density peak-based clustering approach is then proposed to automatically cluster the PV data. And then, a set of labeled data with various conditions are employed to compute the minimum distance vector between each cluster and the reference data. According to the distance vector, the clusters can be identified and categorized into various conditions and/or faults. Simulation results demonstrate the feasibility of the proposed method in the diagnosis of certain faults occurring in a PV array. Moreover, a 1.8 kW grid-connected PV system with 6×3 PV array is established and experimentally tested to investigate the performance of the developed method
Privacy-preserving collaborative machine learning on genomic data using TensorFlow
Machine learning (ML) methods have been widely used in genomic studies.
However, genomic data are often held by different stakeholders (e.g. hospitals,
universities, and healthcare companies) who consider the data as sensitive
information, even though they desire to collaborate. To address this issue,
recent works have proposed solutions using Secure Multi-party Computation
(MPC), which train on the decentralized data in a way that the participants
could learn nothing from each other beyond the final trained model.
We design and implement several MPC-friendly ML primitives, including class
weight adjustment and parallelizable approximation of activation function. In
addition, we develop the solution as an extension to TF
Encrypted~\citep{dahl2018private}, enabling us to quickly experiment with
enhancements of both machine learning techniques and cryptographic protocols
while leveraging the advantages of TensorFlow's optimizations. Our
implementation compares favorably with state-of-the-art methods, winning first
place in Track IV of the iDASH2019 secure genome analysis competition.Comment: Description of the winning solution at Track IV of iDASH competition
2019, to be presented at the Trustworthy ML workshop co-located with ICLR202
DiQAD: A Benchmark Dataset for End-to-End Open-domain Dialogue Assessment
Dialogue assessment plays a critical role in the development of open-domain
dialogue systems. Existing work are uncapable of providing an end-to-end and
human-epistemic assessment dataset, while they only provide sub-metrics like
coherence or the dialogues are conversed between annotators far from real user
settings. In this paper, we release a large-scale dialogue quality assessment
dataset (DiQAD), for automatically assessing open-domain dialogue quality.
Specifically, we (1) establish the assessment criteria based on the dimensions
conforming to human judgements on dialogue qualities, and (2) annotate
large-scale dialogues that conversed between real users based on these
annotation criteria, which contains around 100,000 dialogues. We conduct
several experiments and report the performances of the baselines as the
benchmark on DiQAD. The dataset is openly accessible at
https://github.com/yukunZhao/Dataset_Dialogue_quality_evaluation.Comment: Accepted to Findings of EMNLP 202
PUMA: Secure Inference of LLaMA-7B in Five Minutes
With ChatGPT as a representative, tons of companies have began to provide
services based on large Transformers models. However, using such a service
inevitably leak users' prompts to the model provider. Previous studies have
studied secure inference for Transformer models using secure multiparty
computation (MPC), where model parameters and clients' prompts are kept secret.
Despite this, these frameworks are still limited in terms of model performance,
efficiency, and deployment. To address these limitations, we propose framework
PUMA to enable fast and secure Transformer model inference. Our framework
designs high quality approximations for expensive functions, such as GeLU and
Softmax, which significantly reduce the cost of secure inference while
preserving the model performance. Additionally, we design secure Embedding and
LayerNorm procedures that faithfully implement the desired functionality
without undermining the Transformer architecture. PUMA is about 2x faster than
the state-of-the-art MPC framework MPCFORMER(ICLR 2023) and has similar
accuracy as plaintext models without fine-tuning (which the previous works
failed to achieve).
One more thing, PUMA can evaluate LLaMA-7B in around 5 minutes to generate 1
token. To our best knowledge, this is the first time that a model with such a
parameter size is able to be evaluated under MPC. PUMA has been open-sourced in
the Github repository of SecretFlow-SPU
Cheetah: Lean and Fast Secure Two-Party Deep Neural Network Inference
Secure two-party neural network inference (2PC-NN) can offer privacy protection for both the client and the server and is a promising technique in the machine-learning-as-a-service setting. However, the large overhead of the current 2PC-NN in- ference systems is still being a headache, especially when applied to deep neural networks such as ResNet50. In this work, we present Cheetah, a new 2PC-NN inference system that is faster and more communication-efficient than state-of-the-arts. The main contributions of Cheetah are two-fold: the first part includes carefully designed homomorphic encryption-based protocols that can evaluate the linear layers (namely convolution, batch normalization, and fully-connection) without any expensive rotation operation. The second part includes several lean and communication-efficient primitives for the non-linear functions (e.g., ReLU and truncation). Using Cheetah, we present intensive benchmarks over several large-scale deep neural networks. Take ResNet50 for an example, an end- to-end execution of Cheetah under a WAN setting costs less than 2.5 minutes and 2.3 gigabytes of communication, which outperforms CrypTFlow2 (ACM CCS 2020) by about 5.6× and 12.9×, respectively
- …