Search CORE

60 research outputs found

Estimating and Explaining Model Performance When Both Covariates and Labels Shift

Author: Chen Lingjiao
Zaharia Matei
Zou James
Publication venue
Publication date: 17/09/2022
Field of study

Deployed machine learning (ML) models often encounter new user data that differs from their training data. Therefore, estimating how well a given model might perform on the new data is an important step toward reliable ML applications. This is very challenging, however, as the data distribution can change in flexible ways, and we may not have any labels on the new data, which is often the case in monitoring settings. In this paper, we propose a new distribution shift model, Sparse Joint Shift (SJS), which considers the joint shift of both labels and a few features. This unifies and generalizes several existing shift models including label shift and sparse covariate shift, where only marginal feature or label distribution shifts are considered. We describe mathematical conditions under which SJS is identifiable. We further propose SEES, an algorithmic framework to characterize the distribution shift under SJS and to estimate a model's performance on new data without any labels. We conduct extensive experiments on several real-world datasets with various ML models. Across different datasets and distribution shifts, SEES achieves significant (up to an order of magnitude) shift estimation error improvements over existing approaches.Comment: Accepted to NeurIPS 202

arXiv.org e-Print Archive

Timely-Throughput Optimal Coded Computing over Cloud Networks

Author: Ananthanarayanan Ganesh
Chen Lingjiao
Dutta Sanghamitra
Hou I.
Tandon Rashish
Zaharia Matei
Publication venue
Publication date: 11/04/2019
Field of study

In modern distributed computing systems, unpredictable and unreliable infrastructures result in high variability of computing resources. Meanwhile, there is significantly increasing demand for timely and event-driven services with deadline constraints. Motivated by measurements over Amazon EC2 clusters, we consider a two-state Markov model for variability of computing speed in cloud networks. In this model, each worker can be either in a good state or a bad state in terms of the computation speed, and the transition between these states is modeled as a Markov chain which is unknown to the scheduler. We then consider a Coded Computing framework, in which the data is possibly encoded and stored at the worker nodes in order to provide robustness against nodes that may be in a bad state. With timely computation requests submitted to the system with computation deadlines, our goal is to design the optimal computation-load allocation scheme and the optimal data encoding scheme that maximize the timely computation throughput (i.e, the average number of computation tasks that are accomplished before their deadline). Our main result is the development of a dynamic computation strategy called Lagrange Estimate-and Allocate (LEA) strategy, which achieves the optimal timely computation throughput. It is shown that compared to the static allocation strategy, LEA increases the timely computation throughput by 1.4X - 17.5X in various scenarios via simulations and by 1.27X - 6.5X in experiments over Amazon EC2 clustersComment: to appear in MobiHoc 201

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

How is ChatGPT's behavior changing over time?

Author: Chen Lingjiao
Zaharia Matei
Zou James
Publication venue
Publication date: 18/07/2023
Field of study

GPT-3.5 and GPT-4 are the two most widely used large language model (LLM) services. However, when and how these models are updated over time is opaque. Here, we evaluate the March 2023 and June 2023 versions of GPT-3.5 and GPT-4 on four diverse tasks: 1) solving math problems, 2) answering sensitive/dangerous questions, 3) generating code and 4) visual reasoning. We find that the performance and behavior of both GPT-3.5 and GPT-4 can vary greatly over time. For example, GPT-4 (March 2023) was very good at identifying prime numbers (accuracy 97.6%) but GPT-4 (June 2023) was very poor on these same questions (accuracy 2.4%). Interestingly GPT-3.5 (June 2023) was much better than GPT-3.5 (March 2023) in this task. GPT-4 was less willing to answer sensitive questions in June than in March, and both GPT-4 and GPT-3.5 had more formatting mistakes in code generation in June than in March. Overall, our findings shows that the behavior of the same LLM service can change substantially in a relatively short amount of time, highlighting the need for continuous monitoring of LLM quality

arXiv.org e-Print Archive

Gravity Effects on Information Filtering and Network Evolving

Author: Chen Lingjiao
Liu Chuang
Liu Jin-Hu
Wang Xueqi
Yang Chengcheng
Zhang Zi-Ke
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 10/02/2014
Field of study

In this paper, based on the gravity principle of classical physics, we propose a tunable gravity-based model, which considers tag usage pattern to weigh both the mass and distance of network nodes. We then apply this model in solving the problems of information filtering and network evolving. Experimental results on two real-world data sets, \emph{Del.icio.us} and \emph{MovieLens}, show that it can not only enhance the algorithmic performance, but can also better characterize the properties of real networks. This work may shed some light on the in-depth understanding of the effect of gravity model

arXiv.org e-Print Archive

CiteSeerX

Directory of Open Access Journals

PubMed Central

FigShare

HAPI: A Large-scale Longitudinal Dataset of Commercial ML API Predictions

Author: Chen Lingjiao
Eyuboglu Sabri
Jin Zhihua
Ré Christopher
Zaharia Matei
Zou James
Publication venue
Publication date: 17/09/2022
Field of study

Commercial ML APIs offered by providers such as Google, Amazon and Microsoft have dramatically simplified ML adoption in many applications. Numerous companies and academics pay to use ML APIs for tasks such as object detection, OCR and sentiment analysis. Different ML APIs tackling the same task can have very heterogeneous performance. Moreover, the ML models underlying the APIs also evolve over time. As ML APIs rapidly become a valuable marketplace and a widespread way to consume machine learning, it is critical to systematically study and compare different APIs with each other and to characterize how APIs change over time. However, this topic is currently underexplored due to the lack of data. In this paper, we present HAPI (History of APIs), a longitudinal dataset of 1,761,417 instances of commercial ML API applications (involving APIs from Amazon, Google, IBM, Microsoft and other providers) across diverse tasks including image tagging, speech recognition and text mining from 2020 to 2022. Each instance consists of a query input for an API (e.g., an image or text) along with the API's output prediction/annotation and confidence scores. HAPI is the first large-scale dataset of ML API usages and is a unique resource for studying ML-as-a-service (MLaaS). As examples of the types of analyses that HAPI enables, we show that ML APIs' performance change substantially over time--several APIs' accuracies dropped on specific benchmark datasets. Even when the API's aggregate performance stays steady, its error modes can shift across different subtypes of data between 2020 and 2022. Such changes can substantially impact the entire analytics pipelines that use some ML API as a component. We further use HAPI to study commercial APIs' performance disparities across demographic subgroups over time. HAPI can stimulate more research in the growing field of MLaaS.Comment: Preprint, to appear in NeurIPS 202

arXiv.org e-Print Archive

Genuinely Distributed Byzantine Machine Learning

Author: Abadi Martín
Abraham Ittai
Alistarh Dan
Biggio Battista
Castro Miguel
Chen Lingjiao
Chilimbi Trishul M
El Mhamdi El Mahdi
Hecht-Nielsen Robert
Hsieh Kevin
Li Mu
Rajput Shashank
Xie Cong
Publication venue
Publication date: 02/06/2020
Field of study

Machine Learning (ML) solutions are nowadays distributed, according to the so-called server/worker architecture. One server holds the model parameters while several workers train the model. Clearly, such architecture is prone to various types of component failures, which can be all encompassed within the spectrum of a Byzantine behavior. Several approaches have been proposed recently to tolerate Byzantine workers. Yet all require trusting a central parameter server. We initiate in this paper the study of the ``general'' Byzantine-resilient distributed machine learning problem where no individual component is trusted. We show that this problem can be solved in an asynchronous system, despite the presence of

\frac{1}{3}

Byzantine parameter servers and

\frac{1}{3}

Byzantine workers (which is optimal). We present a new algorithm, ByzSGD, which solves the general Byzantine-resilient distributed machine learning problem by relying on three major schemes. The first, Scatter/Gather, is a communication scheme whose goal is to bound the maximum drift among models on correct servers. The second, Distributed Median Contraction (DMC), leverages the geometric properties of the median in high dimensional spaces to bring parameters within the correct servers back close to each other, ensuring learning convergence. The third, Minimum-Diameter Averaging (MDA), is a statistically-robust gradient aggregation rule whose goal is to tolerate Byzantine workers. MDA requires loose bound on the variance of non-Byzantine gradient estimates, compared to existing alternatives (e.g., Krum). Interestingly, ByzSGD ensures Byzantine resilience without adding communication rounds (on a normal path), compared to vanilla non-Byzantine alternatives. ByzSGD requires, however, a larger number of messages which, we show, can be reduced if we assume synchrony.Comment: This is a merge of arXiv:1905.03853 and arXiv:1911.07537; arXiv:1911.07537 will be retracte

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref