96,166 research outputs found
A Provable Smoothing Approach for High Dimensional Generalized Regression with Applications in Genomics
In many applications, linear models fit the data poorly. This article studies
an appealing alternative, the generalized regression model. This model only
assumes that there exists an unknown monotonically increasing link function
connecting the response to a single index of explanatory
variables . The generalized regression model is flexible and
covers many widely used statistical models. It fits the data generating
mechanisms well in many real problems, which makes it useful in a variety of
applications where regression models are regularly employed. In low dimensions,
rank-based M-estimators are recommended to deal with the generalized regression
model, giving root- consistent estimators of . Applications of
these estimators to high dimensional data, however, are questionable. This
article studies, both theoretically and practically, a simple yet powerful
smoothing approach to handle the high dimensional generalized regression model.
Theoretically, a family of smoothing functions is provided, and the amount of
smoothing necessary for efficient inference is carefully calculated.
Practically, our study is motivated by an important and challenging scientific
problem: decoding gene regulation by predicting transcription factors that bind
to cis-regulatory elements. Applying our proposed method to this problem shows
substantial improvement over the state-of-the-art alternative in real data.Comment: 53 page
Detecting outlying subspaces for high-dimensional data: the new task, algorithms and performance
[Abstract]: In this paper, we identify a new task for studying the outlying degree (OD) of high-dimensional data, i.e. finding the subspaces (subsets of features)
in which the given points are outliers, which are called their outlying subspaces. Since the state-of-the-art outlier detection techniques fail to handle this
new problem, we propose a novel detection algorithm, called High-Dimension Outlying subspace Detection (HighDOD), to detect the outlying subspaces of
high-dimensional data efficiently. The intuitive idea of HighDOD is that we measure the OD of the point using the sum of distances between this point and its k nearest neighbors. Two heuristic pruning strategies are proposed to realize fast pruning in the subspace search and an efficient dynamic subspace search method with a sample-based learning process has been implemented. Experimental results show that HighDOD is efficient and outperforms other searching alternatives such as the naive topâdown, bottomâup and random search methods, and the existing
outlier detection methods cannot fulfill this new task effectively
Learning Convolutional Text Representations for Visual Question Answering
Visual question answering is a recently proposed artificial intelligence task
that requires a deep understanding of both images and texts. In deep learning,
images are typically modeled through convolutional neural networks, and texts
are typically modeled through recurrent neural networks. While the requirement
for modeling images is similar to traditional computer vision tasks, such as
object recognition and image classification, visual question answering raises a
different need for textual representation as compared to other natural language
processing tasks. In this work, we perform a detailed analysis on natural
language questions in visual question answering. Based on the analysis, we
propose to rely on convolutional neural networks for learning textual
representations. By exploring the various properties of convolutional neural
networks specialized for text data, such as width and depth, we present our
"CNN Inception + Gate" model. We show that our model improves question
representations and thus the overall accuracy of visual question answering
models. We also show that the text representation requirement in visual
question answering is more complicated and comprehensive than that in
conventional natural language processing tasks, making it a better task to
evaluate textual representation methods. Shallow models like fastText, which
can obtain comparable results with deep learning models in tasks like text
classification, are not suitable in visual question answering.Comment: Conference paper at SDM 2018. https://github.com/divelab/sva
- âŚ