497 research outputs found
Solving Visual Madlibs with Multiple Cues
This paper focuses on answering fill-in-the-blank style multiple choice
questions from the Visual Madlibs dataset. Previous approaches to Visual
Question Answering (VQA) have mainly used generic image features from networks
trained on the ImageNet dataset, despite the wide scope of questions. In
contrast, our approach employs features derived from networks trained for
specialized tasks of scene classification, person activity prediction, and
person and object attribute prediction. We also present a method for selecting
sub-regions of an image that are relevant for evaluating the appropriateness of
a putative answer. Visual features are computed both from the whole image and
from local regions, while sentences are mapped to a common space using a simple
normalized canonical correlation analysis (CCA) model. Our results show a
significant improvement over the previous state of the art, and indicate that
answering different question types benefits from examining a variety of image
cues and carefully choosing informative image sub-regions
Recycling of epidermal growth factor-receptor complexes in A431 cells: identification of dual pathways
The intracellular sorting of EGF-receptor complexes (EGF-RC) has been studied in human epidermoid carcinoma A431 cells. Recycling of EGF was found to occur rapidly after internalization at 37 degrees C. The initial rate of EGF recycling was reduced at 18 degrees C. A significant pool of internalized EGF was incapable of recycling at 18 degrees C but began to recycle when cells were warmed to 37 degrees C. The relative rate of EGF outflow at 37 degrees C from cells exposed to an 18 degrees C temperature block was slower (t1/2 approximately 20 min) than the rate from cells not exposed to a temperature block (t1/2 approximately 5-7 min). These data suggest that there might be both short- and long-time cycles of EGF recycling in A431 cells. Examination of the intracellular EGF-RC dissociation and dynamics of short- and long-time recycling indicated that EGF recycled as EGF-RC. Moreover, EGF receptors that were covalently labeled with a photoactivatable derivative of 125I-EGF recycled via the long-time pathway at a rate similar to that of 125I-EGF. Since EGF-RC degradation was also blocked at 18 degrees C, we propose that sorting to the lysosomal and long-time recycling pathway may occur after a highly temperature-sensitive step, presumably in the late endosomes
Salient Objects in Clutter: Bringing Salient Object Detection to the Foreground
We provide a comprehensive evaluation of salient object detection (SOD)
models. Our analysis identifies a serious design bias of existing SOD datasets
which assumes that each image contains at least one clearly outstanding salient
object in low clutter. The design bias has led to a saturated high performance
for state-of-the-art SOD models when evaluated on existing datasets. The
models, however, still perform far from being satisfactory when applied to
real-world daily scenes. Based on our analyses, we first identify 7 crucial
aspects that a comprehensive and balanced dataset should fulfill. Then, we
propose a new high quality dataset and update the previous saliency benchmark.
Specifically, our SOC (Salient Objects in Clutter) dataset, includes images
with salient and non-salient objects from daily object categories. Beyond
object category annotations, each salient image is accompanied by attributes
that reflect common challenges in real-world scenes. Finally, we report
attribute-based performance assessment on our dataset.Comment: ECCV 201
Direct Image to Point Cloud Descriptors Matching for 6-DOF Camera Localization in Dense 3D Point Cloud
We propose a novel concept to directly match feature descriptors extracted
from RGB images, with feature descriptors extracted from 3D point clouds. We
use this concept to localize the position and orientation (pose) of the camera
of a query image in dense point clouds. We generate a dataset of matching 2D
and 3D descriptors, and use it to train a proposed Descriptor-Matcher
algorithm. To localize a query image in a point cloud, we extract 2D keypoints
and descriptors from the query image. Then the Descriptor-Matcher is used to
find the corresponding pairs 2D and 3D keypoints by matching the 2D descriptors
with the pre-extracted 3D descriptors of the point cloud. This information is
used in a robust pose estimation algorithm to localize the query image in the
3D point cloud. Experiments demonstrate that directly matching 2D and 3D
descriptors is not only a viable idea but also achieves competitive accuracy
compared to other state-of-the-art approaches for camera pose localization
The history of degenerate (bipartite) extremal graph problems
This paper is a survey on Extremal Graph Theory, primarily focusing on the
case when one of the excluded graphs is bipartite. On one hand we give an
introduction to this field and also describe many important results, methods,
problems, and constructions.Comment: 97 pages, 11 figures, many problems. This is the preliminary version
of our survey presented in Erdos 100. In this version 2 only a citation was
complete
Asymptotic Limits and Zeros of Chromatic Polynomials and Ground State Entropy of Potts Antiferromagnets
We study the asymptotic limiting function , where is the chromatic polynomial for a graph
with vertices. We first discuss a subtlety in the definition of
resulting from the fact that at certain special points , the
following limits do not commute: . We then
present exact calculations of and determine the corresponding
analytic structure in the complex plane for a number of families of graphs
, including circuits, wheels, biwheels, bipyramids, and (cyclic and
twisted) ladders. We study the zeros of the corresponding chromatic polynomials
and prove a theorem that for certain families of graphs, all but a finite
number of the zeros lie exactly on a unit circle, whose position depends on the
family. Using the connection of with the zero-temperature Potts
antiferromagnet, we derive a theorem concerning the maximal finite real point
of non-analyticity in , denoted and apply this theorem to
deduce that and for the square and
honeycomb lattices. Finally, numerical calculations of and
are presented and compared with series expansions and bounds.Comment: 33 pages, Latex, 5 postscript figures, published version; includes
further comments on large-q serie
Single view silhouette fitting techniques for estimating tennis racket position
Stereo camera systems have been used to track markers attached to a racket, allowing its position to be obtained in three-dimensional (3D) space. Typically, markers are manually selected on the image plane, but this can be time-consuming. A markerless system based on one stationary camera estimating 3D racket position data is desirable for research and play. The markerless method presented in this paper relies on a set of racket silhouette views in a common reference frame captured with a calibrated camera and a silhouette of a racket captured with a camera whose relative pose is outside the common reference frame. The aim of this paper is to provide validation of these single view fitting techniques to estimate the pose of a tennis racket. This includes the development of a calibration method to provide the relative pose of a stationary camera with respect to a racket. Mean static racket position was reconstructed to within ±2 mm. Computer generated camera poses and silhouette views of a full size racket model were used to demonstrate the potential of the method to estimate 3D racket position during a simplified serve scenario. From a camera distance of 14 m, 3D racket position was estimated providing a spatial accuracy of 1.9 ± 0.14 mm, similar to recent 3D video marker tracking studies of tennis
Modelling search for people in 900 scenes: A combined source model of eye guidance
How predictable are human eye movements during search in real world scenes? We recorded 14 observers’ eye movements as they performed a search task (person detection) in 912 outdoor scenes. Observers were highly consistent in the regions fixated during search, even when the target was absent from the scene. These eye movements were used to evaluate computational models of search guidance from three sources: Saliency, target features, and scene context. Each of these models independently outperformed a cross-image control in predicting human fixations. Models that combined sources of guidance ultimately predicted 94% of human agreement, with the scene context component providing the most explanatory power. None of the models, however, could reach the precision and fidelity of an attentional map defined by human fixations. This work puts forth a benchmark for computational models of search in real world scenes. Further improvements in modelling should capture mechanisms underlying the selectivity of observers’ fixations during search.National Eye Institute (Integrative Training Program in Vision grant T32 EY013935)Massachusetts Institute of Technology (Singleton Graduate Research Fellowship)National Science Foundation (U.S.) (Graduate Research Fellowship)National Science Foundation (U.S.) (CAREER Award (0546262))National Science Foundation (U.S.) (NSF contract (0705677))National Science Foundation (U.S.) (Career Award (0747120)
- …