47 research outputs found
Minimum Description Length Penalization for Group and Multi-Task Sparse Learning
We propose a framework MIC (Multiple Inclusion Criterion) for learning sparse models based on the information theoretic Minimum Description Length (MDL) principle. MIC provides an elegant way of incorporating arbitrary sparsity patterns in the feature space by using two-part MDL coding schemes. We present MIC based models for the problems of grouped feature selection (MIC-GROUP) and multi-task feature selection (MIC-MULTI). MIC-GROUP assumes that the features are divided into groups and induces two level sparsity, selecting a subset of the feature groups, and also selecting features within each selected group. MIC-MULTI applies when there are multiple related tasks that share the same set of potentially predictive features. It also induces two level sparsity, selecting a subset of the features, and then selecting which of the tasks each feature should be added to. Lastly, we propose a model, TRANSFEAT, that can be used to transfer knowledge from a set of previously learned tasks to a new task that is expected to share similar features. All three methods are designed for selecting a small set of predictive features from a large pool of candidate features. We demonstrate the effectiveness of our approach with experimental results on data from genomics and from word sense disambiguation problems
Causal Inference for Human-Language Model Collaboration
In this paper, we examine the collaborative dynamics between humans and
language models (LMs), where the interactions typically involve LMs proposing
text segments and humans editing or responding to these proposals. Productive
engagement with LMs in such scenarios necessitates that humans discern
effective text-based interaction strategies, such as editing and response
styles, from historical human-LM interactions. This objective is inherently
causal, driven by the counterfactual `what-if' question: how would the outcome
of collaboration change if humans employed a different text editing/refinement
strategy? A key challenge in answering this causal inference question is
formulating an appropriate causal estimand: the conventional average treatment
effect (ATE) estimand is inapplicable to text-based treatments due to their
high dimensionality. To address this concern, we introduce a new causal
estimand -- Incremental Stylistic Effect (ISE) -- which characterizes the
average impact of infinitesimally shifting a text towards a specific style,
such as increasing formality. We establish the conditions for the
non-parametric identification of ISE. Building on this, we develop
CausalCollab, an algorithm designed to estimate the ISE of various interaction
strategies in dynamic human-LM collaborations. Our empirical investigations
across three distinct human-LM collaboration scenarios reveal that CausalCollab
effectively reduces confounding and significantly improves counterfactual
estimation over a set of competitive baselines.Comment: 9 pages (Accepted for publication at NAACL 2024 (Main Conference)
A Risk Comparison of Ordinary Least Squares vs Ridge Regression
We compare the risk of ridge regression to a simple variant of ordinary least
squares, in which one simply projects the data onto a finite dimensional
subspace (as specified by a Principal Component Analysis) and then performs an
ordinary (un-regularized) least squares regression in this subspace. This note
shows that the risk of this ordinary least squares method is within a constant
factor (namely 4) of the risk of ridge regression.Comment: Appearing in JMLR 14, June 201
Faster Ridge Regression via the Subsampled Randomized Hadamard Transform
We propose a fast algorithm for ridge regression when the number of features is much larger than the number of observations (p≫n). The standard way to solve ridge regression in this setting works in the dual space and gives a running time of O(n2p). Our algorithm Subsampled Randomized Hadamard Transform - Dual Ridge Regression (SRHT-DRR) runs in time O(np log(n)) and works by preconditioning the design matrix by a Randomized Walsh-Hadamard Transform with a subsequent subsampling of features. We provide risk bounds for our SRHT-DRR algorithm in the fixed design setting and show experimental results on synthetic and real datasets
Metric Learning for Graph-based Domain Adaptation
Abstract In many domain adaption formulations, it is assumed to have large amount of unlabeled data from the domain of interest (target domain), some portion of it may be labeled, and large amount of labeled data from other domains, also known as source domain(s). Motivated by the fact that labeled data is hard to obtain in any domain, we design algorithms for the settings in which there exists large amount of unlabeled data from all domains, small portion of which may be labeled. We build on recent advances in graph-based semi-supervised learning and supervised metric learning. Given all instances, labeled and unlabeled, from all domains, we build a large similarity graph between them, where an edge exists between two instances if they are close according to some metric. Instead of using predefined metric, as commonly performed, we feed the labeled instances into metric-learning algorithms and (re)construct a data-dependent metric, which is used to construct the graph. We employ different types of edges depending on the domain-identity of the two vertices touching it, and learn the weights of each edge. Experimental results show that our approach leads to significant reduction in classification error across domains, and performs better than two state-of-the-art models on the task of sentiment classification
Frame-semantic parsing
Frame semantics is a linguistic theory that has been instantiated for English in the FrameNet lexicon. We solve the problem of frame-semantic parsing using a two-stage statistical model that takes lexical targets (i.e., content words and phrases) in their sentential contexts and predicts frame-semantic structures. Given a target in context, the first stage disambiguates it to a semantic frame. This model uses latent variables and semi-supervised learning to improve frame disambiguation for targets unseen at training time. The second stage finds the target's locally expressed semantic arguments. At inference time, a fast exact dual decomposition algorithm collectively predicts all the arguments of a frame at once in order to respect declaratively stated linguistic constraints, resulting in qualitatively better structures than naïve local predictors. Both components are feature-based and discriminatively trained on a small set of annotated frame-semantic parses. On the SemEval 2007 benchmark data set, the approach, along with a heuristic identifier of frame-evoking targets, outperforms the prior state of the art by significant margins. Additionally, we present experiments on the much larger FrameNet 1.5 data set. We have released our frame-semantic parser as open-source software.United States. Defense Advanced Research Projects Agency (DARPA grant NBCH-1080004)National Science Foundation (U.S.) (NSF grant IIS-0836431)National Science Foundation (U.S.) (NSF grant IIS-0915187)Qatar National Research Fund (NPRP 08-485-1-083
Real- Time Monocular Face Tracking Using an Active Camera
This paper addresses the problem of facial feature detection and tracking in real-time using a single active camera. The variable parameters of the camera (i.e pan, tilt and zoom) are changed adaptively to track the face of the agent in successive frames and detect the facial features which may be used for facial expression analysis for surveillance or mesh generation for animation purposes, at a later stage. Our tracking procedure assumes planar motion of the face. It also detects invalid feature points i.e. those feature points which do not correspond to actual facial features, but are outliers. They are subsequently abandoned by our procedure in order to extract ‘high level ’ information from the face for mesh generation or emotion recognition etc. The performance of the procedure is independent of the velocity of the agent and is robust to velocity changes. The only limitation on the performance of the procedure is imposed by the maximum pan/tilt range of the camera. 2