3,789 research outputs found
Convolutional Patch Networks with Spatial Prior for Road Detection and Urban Scene Understanding
Classifying single image patches is important in many different applications,
such as road detection or scene understanding. In this paper, we present
convolutional patch networks, which are convolutional networks learned to
distinguish different image patches and which can be used for pixel-wise
labeling. We also show how to incorporate spatial information of the patch as
an input to the network, which allows for learning spatial priors for certain
categories jointly with an appearance model. In particular, we focus on road
detection and urban scene understanding, two application areas where we are
able to achieve state-of-the-art results on the KITTI as well as on the
LabelMeFacade dataset.
Furthermore, our paper offers a guideline for people working in the area and
desperately wandering through all the painstaking details that render training
CNs on image patches extremely difficult.Comment: VISAPP 2015 pape
Object-Proposal Evaluation Protocol is 'Gameable'
Object proposals have quickly become the de-facto pre-processing step in a
number of vision pipelines (for object detection, object discovery, and other
tasks). Their performance is usually evaluated on partially annotated datasets.
In this paper, we argue that the choice of using a partially annotated dataset
for evaluation of object proposals is problematic -- as we demonstrate via a
thought experiment, the evaluation protocol is 'gameable', in the sense that
progress under this protocol does not necessarily correspond to a "better"
category independent object proposal algorithm.
To alleviate this problem, we: (1) Introduce a nearly-fully annotated version
of PASCAL VOC dataset, which serves as a test-bed to check if object proposal
techniques are overfitting to a particular list of categories. (2) Perform an
exhaustive evaluation of object proposal methods on our introduced nearly-fully
annotated PASCAL dataset and perform cross-dataset generalization experiments;
and (3) Introduce a diagnostic experiment to detect the bias capacity in an
object proposal algorithm. This tool circumvents the need to collect a densely
annotated dataset, which can be expensive and cumbersome to collect. Finally,
we plan to release an easy-to-use toolbox which combines various publicly
available implementations of object proposal algorithms which standardizes the
proposal generation and evaluation so that new methods can be added and
evaluated on different datasets. We hope that the results presented in the
paper will motivate the community to test the category independence of various
object proposal methods by carefully choosing the evaluation protocol.Comment: 15 pages, 11 figures, 4 table
An Integrated architecture for recognition of totally unconstrained handwritten numerals
Reprint. Reprinted from the International journal of pattern recognition and artificial intelligence. Vol. 7, no. 4 (1993) "January 1993."Includes bibliographical references (p. 127-128).Supported by the Productivity From Information Technology (PROFIT) Research Initiative at MIT.Amar Gupta ... [et al.
Feature Detection in Medical Images Using Deep Learning
This project explores the use of deep learning to predict age based on pediatric hand X-Rays. Data from the Radiological Society of North America’s pediatric bone age challenge were used to train and evaluate a convolutional neural network. The project used InceptionV3, a CNN developed by Google, that was pre-trained on ImageNet, a popular online image dataset. Our fine-tuned version of InceptionV3 yielded an average error of less than 10 months between predicted and actual age. This project shows the effectiveness of deep learning in analyzing medical images and the potential for even greater improvements in the future. In addition to the technological and potential clinical benefits of these methods, this project will serve as a useful pedagogical tool for introducing the challenges and applications of deep learning to the Bryant community
Efficient Yet Deep Convolutional Neural Networks for Semantic Segmentation
Semantic Segmentation using deep convolutional neural network pose more
complex challenge for any GPU intensive task. As it has to compute million of
parameters, it results to huge memory consumption. Moreover, extracting finer
features and conducting supervised training tends to increase the complexity.
With the introduction of Fully Convolutional Neural Network, which uses finer
strides and utilizes deconvolutional layers for upsampling, it has been a go to
for any image segmentation task. In this paper, we propose two segmentation
architecture which not only needs one-third the parameters to compute but also
gives better accuracy than the similar architectures. The model weights were
transferred from the popular neural net like VGG19 and VGG16 which were trained
on Imagenet classification data-set. Then we transform all the fully connected
layers to convolutional layers and use dilated convolution for decreasing the
parameters. Lastly, we add finer strides and attach four skip architectures
which are element-wise summed with the deconvolutional layers in steps. We
train and test on different sparse and fine data-sets like Pascal VOC2012,
Pascal-Context and NYUDv2 and show how better our model performs in this tasks.
On the other hand our model has a faster inference time and consumes less
memory for training and testing on NVIDIA Pascal GPUs, making it more efficient
and less memory consuming architecture for pixel-wise segmentation.Comment: 8 page
Feedback Based Architecture for Reading Check Courtesy Amounts
In recent years, a number of large-scale applications continue to rely heavily on the use of paper as the
dominant medium, either on intra-organization basis or on inter-organization basis, including paper
intensive applications in the check processing application. In many countries, the value of each check is
read by human eyes before the check is physically transported, in stages, from the point it was presented
to the location of the branch of the bank which issued the blank check to the concerned account holder.
Such process of manual reading of each check involves significant time and cost. In this research, a new
approach is introduced to read the numerical amount field on the check; also known as the courtesy
amount field. In the case of check processing, the segmentation of unconstrained strings into individual
digits is a challenging task because one needs to accommodate special cases involving: connected or
overlapping digits, broken digits, and digits physically connected to a piece of stroke that belongs to a
neighboring digit. The system described in this paper involves three stages: segmentation, normalization,
and the recognition of each character using a neural network classifier, with results better than many other
methods in the literaratu
- …