8,043 research outputs found
The Devil is in the Decoder: Classification, Regression and GANs
Many machine vision applications, such as semantic segmentation and depth
prediction, require predictions for every pixel of the input image. Models for
such problems usually consist of encoders which decrease spatial resolution
while learning a high-dimensional representation, followed by decoders who
recover the original input resolution and result in low-dimensional
predictions. While encoders have been studied rigorously, relatively few
studies address the decoder side. This paper presents an extensive comparison
of a variety of decoders for a variety of pixel-wise tasks ranging from
classification, regression to synthesis. Our contributions are: (1) Decoders
matter: we observe significant variance in results between different types of
decoders on various problems. (2) We introduce new residual-like connections
for decoders. (3) We introduce a novel decoder: bilinear additive upsampling.
(4) We explore prediction artifacts
Sparse Bilinear Logistic Regression
In this paper, we introduce the concept of sparse bilinear logistic
regression for decision problems involving explanatory variables that are
two-dimensional matrices. Such problems are common in computer vision,
brain-computer interfaces, style/content factorization, and parallel factor
analysis. The underlying optimization problem is bi-convex; we study its
solution and develop an efficient algorithm based on block coordinate descent.
We provide a theoretical guarantee for global convergence and estimate the
asymptotical convergence rate using the Kurdyka-{\L}ojasiewicz inequality. A
range of experiments with simulated and real data demonstrate that sparse
bilinear logistic regression outperforms current techniques in several
important applications.Comment: 27 pages, 5 figure
DEEP FULLY RESIDUAL CONVOLUTIONAL NEURAL NETWORK FOR SEMANTIC IMAGE SEGMENTATION
Department of Computer Science and EngineeringThe goal of semantic image segmentation is to partition the pixels of an image into semantically meaningful parts and classifying those parts according to a predefined label set. Although object recognition
models achieved remarkable performance recently and they even surpass human???s ability to recognize
objects, but semantic segmentation models are still behind. One of the reason that makes semantic
segmentation relatively a hard problem is the image understanding at pixel level by considering global
context as oppose to object recognition. One other challenge is transferring the knowledge of an object
recognition model for the task of semantic segmentation. In this thesis, we are delineating some of the
main challenges we faced approaching semantic image segmentation with machine learning algorithms.
Our main focus was how we can use deep learning algorithms for this task since they require the
least amount of feature engineering and also it was shown that such models can be applied to large scale
datasets and exhibit remarkable performance. More precisely, we worked on a variation of convolutional
neural networks (CNN) suitable for the semantic segmentation task. We proposed a model called deep
fully residual convolutional networks (DFRCN) to tackle this problem. Utilizing residual learning makes
training of deep models feasible which ultimately leads to having a rich powerful visual representation.
Our model also benefits from skip-connections which ease the propagation of information from the
encoder module to the decoder module. This would enable our model to have less parameters in the
decoder module while it also achieves better performance. We also benchmarked the effective variation
of the proposed model on a semantic segmentation benchmark.
We first make a thorough review of current high-performance models and the problems one might
face when trying to replicate such models which mainly arose from the lack of sufficient provided
information. Then, we describe our own novel method which we called deep fully residual convolutional
network (DFRCN). We showed that our method exhibits state of the art performance on a challenging
benchmark for aerial image segmentation.clos
FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction
Advertising and feed ranking are essential to many Internet companies such as
Facebook and Sina Weibo. Among many real-world advertising and feed ranking
systems, click through rate (CTR) prediction plays a central role. There are
many proposed models in this field such as logistic regression, tree based
models, factorization machine based models and deep learning based CTR models.
However, many current works calculate the feature interactions in a simple way
such as Hadamard product and inner product and they care less about the
importance of features. In this paper, a new model named FiBiNET as an
abbreviation for Feature Importance and Bilinear feature Interaction NETwork is
proposed to dynamically learn the feature importance and fine-grained feature
interactions. On the one hand, the FiBiNET can dynamically learn the importance
of features via the Squeeze-Excitation network (SENET) mechanism; on the other
hand, it is able to effectively learn the feature interactions via bilinear
function. We conduct extensive experiments on two real-world datasets and show
that our shallow model outperforms other shallow models such as factorization
machine(FM) and field-aware factorization machine(FFM). In order to improve
performance further, we combine a classical deep neural network(DNN) component
with the shallow model to be a deep model. The deep FiBiNET consistently
outperforms the other state-of-the-art deep models such as DeepFM and extreme
deep factorization machine(XdeepFM).Comment: 8 pages,5 figure
- …