Search CORE

47 research outputs found

Minimum Description Length Penalization for Group and Multi-Task Sparse Learning

Author: Dhillon Paramveer S.
Foster Dean
Ungar Lyle
Publication venue: ScholarlyCommons
Publication date: 01/01/2011
Field of study

We propose a framework MIC (Multiple Inclusion Criterion) for learning sparse models based on the information theoretic Minimum Description Length (MDL) principle. MIC provides an elegant way of incorporating arbitrary sparsity patterns in the feature space by using two-part MDL coding schemes. We present MIC based models for the problems of grouped feature selection (MIC-GROUP) and multi-task feature selection (MIC-MULTI). MIC-GROUP assumes that the features are divided into groups and induces two level sparsity, selecting a subset of the feature groups, and also selecting features within each selected group. MIC-MULTI applies when there are multiple related tasks that share the same set of potentially predictive features. It also induces two level sparsity, selecting a subset of the features, and then selecting which of the tasks each feature should be added to. Lastly, we propose a model, TRANSFEAT, that can be used to transfer knowledge from a set of previously learned tasks to a new task that is expected to share similar features. All three methods are designed for selecting a small set of predictive features from a large pool of candidate features. We demonstrate the effectiveness of our approach with experimental results on data from genomics and from word sense disambiguation problems

ScholarlyCommons@Penn

Causal Inference for Human-Language Model Collaboration

Author: Dhillon Paramveer S.
Wang Yixin
Zhang Bohan
Publication venue
Publication date: 29/03/2024
Field of study

In this paper, we examine the collaborative dynamics between humans and language models (LMs), where the interactions typically involve LMs proposing text segments and humans editing or responding to these proposals. Productive engagement with LMs in such scenarios necessitates that humans discern effective text-based interaction strategies, such as editing and response styles, from historical human-LM interactions. This objective is inherently causal, driven by the counterfactual `what-if' question: how would the outcome of collaboration change if humans employed a different text editing/refinement strategy? A key challenge in answering this causal inference question is formulating an appropriate causal estimand: the conventional average treatment effect (ATE) estimand is inapplicable to text-based treatments due to their high dimensionality. To address this concern, we introduce a new causal estimand -- Incremental Stylistic Effect (ISE) -- which characterizes the average impact of infinitesimally shifting a text towards a specific style, such as increasing formality. We establish the conditions for the non-parametric identification of ISE. Building on this, we develop CausalCollab, an algorithm designed to estimate the ISE of various interaction strategies in dynamic human-LM collaborations. Our empirical investigations across three distinct human-LM collaboration scenarios reveal that CausalCollab effectively reduces confounding and significantly improves counterfactual estimation over a set of competitive baselines.Comment: 9 pages (Accepted for publication at NAACL 2024 (Main Conference)

arXiv.org e-Print Archive

A Risk Comparison of Ordinary Least Squares vs Ridge Regression

Author: Dean P. Foster
Gabor Lugosi
Lyle H. Ungar
Paramveer S. Dhillon
Sham M. Kakade
Publication venue
Publication date: 31/05/2013
Field of study

We compare the risk of ridge regression to a simple variant of ordinary least squares, in which one simply projects the data onto a finite dimensional subspace (as specified by a Principal Component Analysis) and then performs an ordinary (un-regularized) least squares regression in this subspace. This note shows that the risk of this ordinary least squares method is within a constant factor (namely 4) of the risk of ridge regression.Comment: Appearing in JMLR 14, June 201

arXiv.org e-Print Archive

CiteSeerX

ScholarlyCommons@Penn

Faster Ridge Regression via the Subsampled Randomized Hadamard Transform

Author: Dhillon Paramveer S.
Foster Dean P
Lu Yichao
Ungar Lyle H
Publication venue: ScholarlyCommons
Publication date: 01/01/2013
Field of study

We propose a fast algorithm for ridge regression when the number of features is much larger than the number of observations (p≫n). The standard way to solve ridge regression in this setting works in the dual space and gives a running time of O(n2p). Our algorithm Subsampled Randomized Hadamard Transform - Dual Ridge Regression (SRHT-DRR) runs in time O(np log(n)) and works by preconditioning the design matrix by a Randomized Walsh-Hadamard Transform with a subsequent subsampling of features. We provide risk bounds for our SRHT-DRR algorithm in the fixed design setting and show experimental results on synthetic and real datasets

CiteSeerX

ScholarlyCommons@Penn

Metric Learning for Graph-based Domain Adaptation

Author: Koby Crammer
Paramveer S Dhillon
Partha Pratim Talukdar
Publication venue
Publication date: 01/05/2020
Field of study

Abstract In many domain adaption formulations, it is assumed to have large amount of unlabeled data from the domain of interest (target domain), some portion of it may be labeled, and large amount of labeled data from other domains, also known as source domain(s). Motivated by the fact that labeled data is hard to obtain in any domain, we design algorithms for the settings in which there exists large amount of unlabeled data from all domains, small portion of which may be labeled. We build on recent advances in graph-based semi-supervised learning and supervised metric learning. Given all instances, labeled and unlabeled, from all domains, we build a large similarity graph between them, where an edge exists between two instances if they are close according to some metric. Instead of using predefined metric, as commonly performed, we feed the labeled instances into metric-learning algorithms and (re)construct a data-dependent metric, which is used to construct the graph. We employ different types of edges depending on the domain-identity of the two vertices touching it, and learn the weights of each edge. Experimental results show that our approach leads to significant reduction in classification error across domains, and performs better than two state-of-the-art models on the task of sentiment classification

CiteSeerX

Frame-semantic parsing

Frame semantics is a linguistic theory that has been instantiated for English in the FrameNet lexicon. We solve the problem of frame-semantic parsing using a two-stage statistical model that takes lexical targets (i.e., content words and phrases) in their sentential contexts and predicts frame-semantic structures. Given a target in context, the first stage disambiguates it to a semantic frame. This model uses latent variables and semi-supervised learning to improve frame disambiguation for targets unseen at training time. The second stage finds the target's locally expressed semantic arguments. At inference time, a fast exact dual decomposition algorithm collectively predicts all the arguments of a frame at once in order to respect declaratively stated linguistic constraints, resulting in qualitatively better structures than naïve local predictors. Both components are feature-based and discriminatively trained on a small set of annotated frame-semantic parses. On the SemEval 2007 benchmark data set, the approach, along with a heuristic identifier of frame-evoking targets, outperforms the prior state of the art by significant margins. Additionally, we present experiments on the much larger FrameNet 1.5 data set. We have released our frame-semantic parser as open-source software.United States. Defense Advanced Research Projects Agency (DARPA grant NBCH-1080004)National Science Foundation (U.S.) (NSF grant IIS-0836431)National Science Foundation (U.S.) (NSF grant IIS-0915187)Qatar National Research Fund (NPRP 08-485-1-083

DSpace@MIT

Crossref

Edinburgh Research Explorer

Real- Time Monocular Face Tracking Using an Active Camera

Author: Javier Orozco
Javier Orozco
Jordi Gonzàlez
Paramveer S. Dhillon
Paramveer S. Dhillon
Publication venue
Publication date
Field of study

This paper addresses the problem of facial feature detection and tracking in real-time using a single active camera. The variable parameters of the camera (i.e pan, tilt and zoom) are changed adaptively to track the face of the agent in successive frames and detect the facial features which may be used for facial expression analysis for surveillance or mesh generation for animation purposes, at a later stage. Our tracking procedure assumes planar motion of the face. It also detects invalid feature points i.e. those feature points which do not correspond to actual facial features, but are outliers. They are subsequently abandoned by our procedure in order to extract ‘high level ’ information from the face for mesh generation or emotion recognition etc. The performance of the procedure is independent of the velocity of the agent and is robust to velocity changes. The only limitation on the performance of the procedure is imposed by the maximum pan/tilt range of the camera. 2

CiteSeerX