2,545 research outputs found
Harvesting Training Images for Fine-Grained Object Categories using Visual Descriptions
We harvest training images for visual object recognition by casting it as an IR task. In contrast to previous work, we concentrate on fine-grained object categories, such as the large number of particular animal subspecies, for which manual annotation is expensive. We use 'visual descriptions' from nature guides as a novel augmentation to the well-known use of category names. We use these descriptions in both the query process to find potential category images as well as in image reranking where an image is more highly ranked if web page text surrounding it is similar to the visual descriptions. We show the potential of this method when harvesting images for 10 butterfly categories: when compared to a method that relies on the category name only, using visual descriptions improves precision for many categories
Clue: Cross-modal Coherence Modeling for Caption Generation
We use coherence relations inspired by computational models of discourse to
study the information needs and goals of image captioning. Using an annotation
protocol specifically devised for capturing image--caption coherence relations,
we annotate 10,000 instances from publicly-available image--caption pairs. We
introduce a new task for learning inferences in imagery and text, coherence
relation prediction, and show that these coherence annotations can be exploited
to learn relation classifiers as an intermediary step, and also train
coherence-aware, controllable image captioning models. The results show a
dramatic improvement in the consistency and quality of the generated captions
with respect to information needs specified via coherence relations.Comment: Accepted as a long paper to ACL 202
Few-Shot Object Detection in Real Life: Case Study on Auto-Harvest
Confinement during COVID-19 has caused serious effects on agriculture all
over the world. As one of the efficient solutions, mechanical
harvest/auto-harvest that is based on object detection and robotic harvester
becomes an urgent need. Within the auto-harvest system, robust few-shot object
detection model is one of the bottlenecks, since the system is required to deal
with new vegetable/fruit categories and the collection of large-scale annotated
datasets for all the novel categories is expensive. There are many few-shot
object detection models that were developed by the community. Yet whether they
could be employed directly for real life agricultural applications is still
questionable, as there is a context-gap between the commonly used training
datasets and the images collected in real life agricultural scenarios. To this
end, in this study, we present a novel cucumber dataset and propose two data
augmentation strategies that help to bridge the context-gap. Experimental
results show that 1) the state-of-the-art few-shot object detection model
performs poorly on the novel `cucumber' category; and 2) the proposed
augmentation strategies outperform the commonly used ones.Comment: 6 page
Harnessing the Power of AI based Image Generation Model DALLE 2 in Agricultural Settings
This study investigates the potential impact of artificial intelligence (AI)
on the enhancement of visualization processes in the agricultural sector, using
the advanced AI image generator, DALLE 2, developed by OpenAI. By
synergistically utilizing the natural language processing proficiency of
chatGPT and the generative prowess of the DALLE 2 model, which employs a
Generative Adversarial Networks (GANs) framework, our research offers an
innovative method to transform textual descriptors into realistic visual
content. Our rigorously assembled datasets include a broad spectrum of
agricultural elements such as fruits, plants, and scenarios differentiating
crops from weeds, maintained for AI-generated versus original images. The
quality and accuracy of the AI-generated images were evaluated via established
metrics including mean squared error (MSE), peak signal-to-noise ratio (PSNR),
and feature similarity index (FSIM). The results underline the significant role
of the DALLE 2 model in enhancing visualization processes in agriculture,
aiding in more informed decision-making, and improving resource distribution.
The outcomes of this research highlight the imminent rise of an AI-led
transformation in the realm of precision agriculture.Comment: 22 pages, 13 figures, 2 table
Cataloging Public Objects Using Aerial and Street-Level Images – Urban Trees
Each corner of the inhabited world is imaged from multiple viewpoints with increasing frequency. Online map services like Google Maps or Here Maps provide direct access to huge amounts of densely sampled, georeferenced images from street view and aerial perspective. There is an opportunity to design computer vision systems that will help us search, catalog and monitor public infrastructure, buildings and artifacts. We explore the architecture and feasibility of such a system. The main technical challenge is combining test time information from multiple views of each geographic location (e.g., aerial and street views). We implement two modules: det2geo, which detects the set of locations of objects belonging to a given category, and geo2cat, which computes the fine-grained category of the object at a given location. We introduce a solution that adapts state-of-the-art CNN-based object detectors and classifiers. We test our method on “Pasadena Urban Trees”, a new dataset of 80,000 trees with geographic and species annotations, and show that combining multiple views significantly improves both tree detection and tree species classification, rivaling human performance
Geo-Information Harvesting from Social Media Data
As unconventional sources of geo-information, massive imagery and text
messages from open platforms and social media form a temporally quasi-seamless,
spatially multi-perspective stream, but with unknown and diverse quality. Due
to its complementarity to remote sensing data, geo-information from these
sources offers promising perspectives, but harvesting is not trivial due to its
data characteristics. In this article, we address key aspects in the field,
including data availability, analysis-ready data preparation and data
management, geo-information extraction from social media text messages and
images, and the fusion of social media and remote sensing data. We then
showcase some exemplary geographic applications. In addition, we present the
first extensive discussion of ethical considerations of social media data in
the context of geo-information harvesting and geographic applications. With
this effort, we wish to stimulate curiosity and lay the groundwork for
researchers who intend to explore social media data for geo-applications. We
encourage the community to join forces by sharing their code and data.Comment: Accepted for publication IEEE Geoscience and Remote Sensing Magazin
- …