11,665 research outputs found
Survey of Techniques for Producing Blended Images: A Case Study Using Rollins College Archives
The Rollins College Archives are a treasure trove of historic resources relating to the college’s history, and they are often underutilized or overlooked by both the student body and the surrounding community. In particular, historic resources are all too often excluded from work in computer science and related fields. This project aims to bridge that gap by bringing the two areas together. To that end, the goal of this project is to merge past and present by blending historic photos with input of a present day scene in order to reveal changes and juxtapositions of the same scene across eras. This research explores the possibility of accomplishing this principally through computational means. In order to achieve this, we delve into the domain of computer vision, utilizing techniques in feature detection and matching in order to ultimately blend images in novel ways. Image blending is a technique often used for the creation of unique images, or for emphasizing a contrast between two scenes through their convergence. Whether the blend is produced through masks with alpha values, seam carving, or other techniques, most implementations require a great deal of manual input, whether that entails point selection, mask generation, or setting an alpha value. In this project, we identify recognizable regions and features on a given image. We then use these to identify similar regions and features in a second image. Any matches found are then filtered, and the bad or incorrect matches are removed. The remaining matches are used to compute the difference in perspectives between the two images, and the coordinates of the matching points are used to correct the images to match in the same perspective. We explore various approaches to the problem of feature matching, including built-in library functions, as well as a region based, template-matching algorithm. We also investigate techniques in image blending, such as automatic mask generation, Laplacian pyramid blending, and various off-the-shelf tools contained within Unity. We also test the applications of our findings with regards to working with 360-degree images
CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap
After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in
multimedia search engines, we have identified and analyzed gaps within European research effort during our second year.
In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio-
economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown
of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on
requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the
community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our
Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as
National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core
technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research
challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal
challenges
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
Current 3D open-vocabulary scene understanding methods mostly utilize
well-aligned 2D images as the bridge to learn 3D features with language.
However, applying these approaches becomes challenging in scenarios where 2D
images are absent. In this work, we introduce a completely new pipeline,
namely, OpenIns3D, which requires no 2D image inputs, for 3D open-vocabulary
scene understanding at the instance level. The OpenIns3D framework employs a
"Mask-Snap-Lookup" scheme. The "Mask" module learns class-agnostic mask
proposals in 3D point clouds. The "Snap" module generates synthetic scene-level
images at multiple scales and leverages 2D vision language models to extract
interesting objects. The "Lookup" module searches through the outcomes of
"Snap" with the help of Mask2Pixel maps, which contain the precise
correspondence between 3D masks and synthetic images, to assign category names
to the proposed masks. This 2D input-free, easy-to-train, and flexible approach
achieved state-of-the-art results on a wide range of indoor and outdoor
datasets with a large margin. Furthermore, OpenIns3D allows for effortless
switching of 2D detectors without re-training. When integrated with
state-of-the-art 2D open-world models such as ODISE and GroundingDINO, superb
results are observed on open-vocabulary instance segmentation. When integrated
with LLM-powered 2D models like LISA, it demonstrates a remarkable capacity to
process highly complex text queries, including those that require intricate
reasoning and world knowledge. Project page:
https://zheninghuang.github.io/OpenIns3D/Comment: 24 pages, 16 figures, 13 tables. Project page:
https://zheninghuang.github.io/OpenIns3D
CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines
Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective.
The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines.
From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research
Towards Content-based Pixel Retrieval in Revisited Oxford and Paris
This paper introduces the first two pixel retrieval benchmarks. Pixel
retrieval is segmented instance retrieval. Like semantic segmentation extends
classification to the pixel level, pixel retrieval is an extension of image
retrieval and offers information about which pixels are related to the query
object. In addition to retrieving images for the given query, it helps users
quickly identify the query object in true positive images and exclude false
positive images by denoting the correlated pixels. Our user study results show
pixel-level annotation can significantly improve the user experience.
Compared with semantic and instance segmentation, pixel retrieval requires a
fine-grained recognition capability for variable-granularity targets. To this
end, we propose pixel retrieval benchmarks named PROxford and PRParis, which
are based on the widely used image retrieval datasets, ROxford and RParis.
Three professional annotators label 5,942 images with two rounds of
double-checking and refinement. Furthermore, we conduct extensive experiments
and analysis on the SOTA methods in image search, image matching, detection,
segmentation, and dense matching using our pixel retrieval benchmarks. Results
show that the pixel retrieval task is challenging to these approaches and
distinctive from existing problems, suggesting that further research can
advance the content-based pixel-retrieval and thus user search experience. The
datasets can be downloaded from
\href{https://github.com/anguoyuan/Pixel_retrieval-Segmented_instance_retrieval}{this
link}
Design of Immersive Online Hotel Walkthrough System Using Image-Based (Concentric Mosaics) Rendering
Conventional hotel booking websites only represents their services in 2D photos to show
their facilities. 2D photos are just static photos that cannot be move and rotate. Imagebased
virtual walkthrough for the hospitality industry is a potential technology to attract
more customers. In this project, a research will be carried out to create an Image-based
rendering (IBR) virtual walkthrough and panoramic-based walkthrough by using only
Macromedia Flash Professional 8, Photovista Panorama 3.0 and Reality Studio for the
interaction of the images. The web-based of the image-based are using the Macromedia
Dreamweaver Professional 8. The images will be displayed in Adobe Flash Player 8 or
higher. In making image-based walkthrough, a concentric mosaic technique is used
while image mosaicing technique is applied in panoramic-based walkthrough. A
comparison of the both walkthrough is compared. The study is also focus on the
comparison between number of pictures and smoothness of the walkthrough. There are
advantages of using different techniques such as image-based walkthrough is a real time
walkthrough since the user can walk around right, left, forward and backward whereas
the panoramic-based cannot experience real time walkthrough because the user can only
view 360 degrees from a fixed spot
The Persistence of Austen in the 21st Century: A Reception History of The Lizzie Bennet Diaries
In 2012, the first episode of The Lizzie Bennet Diaries aired online via YouTube.com, offering a modernized serial form of Jane Austen’s 1813 novel Pride and Prejudice. With only word-of-mouth marketing, this series gained hundreds of thousands of views, a loyal following, and an Emmy award. In this paper, I will explore the reception history of The Lizzie Bennet Diaries by referencing its source material, analyzing its target demographics, and explaining its success
Deep Learning Perspectives on Efficient Image Matching in Natural Image Databases
With the proliferation of digital content, efficient image matching in natural image databases has become paramount. Traditional image matching techniques, while effective to a certain extent, face challenges in dealing with the high variability inherent in natural images. This research delves into the application of deep learning models, particularly Convolutional Neural Networks (CNNs), Siamese Networks, and Triplet Networks, to address these challenges. We introduce various techniques to enhance efficiency, such as data augmentation, transfer learning, dimensionality reduction, efficient sampling, and the amalgamation of traditional computer vision strategies with deep learning. Our experimental results, garnered from specific dataset, demonstrate significant improvements in image matching efficiency, as quantified by metrics like precision, recall, F1-Score, and matching time. The findings underscore the potential of deep learning as a transformative tool for natural image database matching, setting the stage for further research and optimization in this domain
- …