2,744 research outputs found
Smartphone picture organization: a hierarchical approach
We live in a society where the large majority of the population has a camera-equipped smartphone. In addition, hard drives and cloud storage are getting cheaper and cheaper, leading to a tremendous growth in stored personal photos. Unlike photo collections captured by a digital camera, which typically are pre-processed by the user who organizes them into event-related folders, smartphone pictures are automatically stored in the cloud. As a consequence, photo collections captured by a smartphone are highly unstructured and because smartphones are ubiquitous, they present a larger variability compared to pictures captured by a digital camera. To solve the need of organizing large smartphone photo collections automatically, we propose here a new methodology for hierarchical photo organization into topics and topic-related categories. Our approach successfully estimates latent topics in the pictures by applying probabilistic Latent Semantic Analysis, and automatically assigns a name to each topic by relying on a lexical database. Topic-related categories are then estimated by using a set of topic-specific Convolutional Neuronal Networks. To validate our approach, we ensemble and make public a large dataset of more than 8,000 smartphone pictures from 40 persons. Experimental results demonstrate major user satisfaction with respect to state of the art solutions in terms of organization.Peer ReviewedPreprin
Using term clouds to represent segment-level semantic content of podcasts
Spoken audio, like any time-continuous medium, is notoriously difficult to browse or skim without support of an interface providing semantically annotated jump points to signal the user where to listen in. Creation of time-aligned metadata by human annotators is prohibitively expensive, motivating the investigation of representations of segment-level semantic content based on transcripts
generated by automatic speech recognition (ASR). This paper
examines the feasibility of using term clouds to provide users with a structured representation of the semantic content of podcast episodes. Podcast episodes are visualized as a series of sub-episode segments, each represented by a term cloud derived from a transcript
generated by automatic speech recognition (ASR). Quality of
segment-level term clouds is measured quantitatively and their utility is investigated using a small-scale user study based on human labeled segment boundaries. Since the segment-level clouds generated from ASR-transcripts prove useful, we examine an adaptation of text tiling techniques to speech in order to be able to generate segments as part of a completely automated indexing and structuring system for browsing of spoken audio. Results demonstrate that the segments generated are comparable with human selected segment boundaries
Can Large Language Models Be Good Companions? An LLM-Based Eyewear System with Conversational Common Ground
Developing chatbots as personal companions has long been a goal of artificial
intelligence researchers. Recent advances in Large Language Models (LLMs) have
delivered a practical solution for endowing chatbots with anthropomorphic
language capabilities. However, it takes more than LLMs to enable chatbots that
can act as companions. Humans use their understanding of individual
personalities to drive conversations. Chatbots also require this capability to
enable human-like companionship. They should act based on personalized,
real-time, and time-evolving knowledge of their owner. We define such essential
knowledge as the \textit{common ground} between chatbots and their owners, and
we propose to build a common-ground-aware dialogue system from an LLM-based
module, named \textit{OS-1}, to enable chatbot companionship. Hosted by
eyewear, OS-1 can sense the visual and audio signals the user receives and
extract real-time contextual semantics. Those semantics are categorized and
recorded to formulate historical contexts from which the user's profile is
distilled and evolves over time, i.e., OS-1 gradually learns about its user.
OS-1 combines knowledge from real-time semantics, historical contexts, and
user-specific profiles to produce a common-ground-aware prompt input into the
LLM module. The LLM's output is converted to audio, spoken to the wearer when
appropriate.We conduct laboratory and in-field studies to assess OS-1's ability
to build common ground between the chatbot and its user. The technical
feasibility and capabilities of the system are also evaluated. OS-1, with its
common-ground awareness, can significantly improve user satisfaction and
potentially lead to downstream tasks such as personal emotional support and
assistance.Comment: 36 pages, 25 figures, Under review at ACM IMWU
Recommended from our members
Semantic Concept Co-Occurrence Patterns for Image Annotation and Retrieval.
Describing visual image contents by semantic concepts is an effective and straightforward way to facilitate various high level applications. Inferring semantic concepts from low-level pictorial feature analysis is challenging due to the semantic gap problem, while manually labeling concepts is unwise because of a large number of images in both online and offline collections. In this paper, we present a novel approach to automatically generate intermediate image descriptors by exploiting concept co-occurrence patterns in the pre-labeled training set that renders it possible to depict complex scene images semantically. Our work is motivated by the fact that multiple concepts that frequently co-occur across images form patterns which could provide contextual cues for individual concept inference. We discover the co-occurrence patterns as hierarchical communities by graph modularity maximization in a network with nodes and edges representing concepts and co-occurrence relationships separately. A random walk process working on the inferred concept probabilities with the discovered co-occurrence patterns is applied to acquire the refined concept signature representation. Through experiments in automatic image annotation and semantic image retrieval on several challenging datasets, we demonstrate the effectiveness of the proposed concept co-occurrence patterns as well as the concept signature representation in comparison with state-of-the-art approaches
CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap
After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in
multimedia search engines, we have identified and analyzed gaps within European research effort during our second year.
In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio-
economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown
of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on
requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the
community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our
Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as
National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core
technological gaps that involve research challenges, and âenablersâ, which are not necessarily technical research
challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal
challenges
Information Retrieval across Information Visualization
This article presents the analytical and retrieval
potential of visualization maps. Obtained maps were tested as
information retrieval (IR) interface. The collection of documents
derived from the ACM Digital Library was mapped on the sphere
surface. Proposed approach uses nonlinear similarity of
documents by comparing ascribed thematic categories and
thereby development of semantic connections between them. For
domain analysis the newest IT trend - Cloud Computing was
monitored across time period 2007-2009. Visualization reflects
evolution, dynamics and relational fields of cloud technology as
well as its paradigmatic property
CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines
Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective.
The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines.
From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research
- âŠ