735 research outputs found
Flexible and scalable digital library search
In this report the development of a specialised search engine for a digital library is described. The proposed system architecture consists of three levels: the conceptual, the logical and the physical level. The conceptual level schema enables by its exposure of a domain specific schema semantically rich conceptual search. The logical level provides a description language to achieve a high degree of flexibility for multimedia retrieval. The physical level takes care of scalable and efficient persistent data storage. The role, played by each level, changes during the various stages of a search engine's lifecycle: (1) modeling the index, (2) populating and maintaining the index and (3) querying the index. The integration of all this functionality allows the combination of both conceptual and content-based querying in the query stage. A search engine for the Australian Open tennis tournament website is used as a running example, which shows the power of the complete architecture and its various component
Information extraction from multimedia web documents: an open-source platform and testbed
The LivingKnowledge project aimed to enhance the current state of the art in search, retrieval and knowledge management on the web by advancing the use of sentiment and opinion analysis within multimedia applications. To achieve this aim, a diverse set of novel and complementary analysis techniques have been integrated into a single, but extensible software platform on which such applications can be built. The platform combines state-of-the-art techniques for extracting facts, opinions and sentiment from multimedia documents, and unlike earlier platforms, it exploits both visual and textual techniques to support multimedia information retrieval. Foreseeing the usefulness of this software in the wider community, the platform has been made generally available as an open-source project. This paper describes the platform design, gives an overview of the analysis algorithms integrated into the system and describes two applications that utilise the system for multimedia information retrieval
Goal event detection in soccer videos via collaborative multimodal analysis
Detecting semantic events in sports video is crucial for video indexing and retrieval. Most existing works have exclusively relied on video content features, namely, directly available and extractable data from the visual and/or aural channels. Sole reliance on such data however, can be problematic due to the high-level semantic nature of video and the difficulty to properly align detected events with their exact time of occurrences. This paper proposes a framework for soccer goal event detection through collaborative analysis of multimodal features. Unlike previous approaches, the visual and aural contents are not directly scrutinized. Instead, an external textual source (i.e., minute-by-minute reports from sports websites) is used to initially localize the event search space. This step is vital as the event search space can significantly be reduced. This also makes further visual and aural analysis more efficient since excessive and unnecessary non-eventful segments are discarded, culminating in the accurate identification of the actual goal event segment. Experiments conducted on thirteen soccer matches are very promising with high accuracy rates being reported
Metadata-driven multimedia access
With the growing ubiquity and mobility of multimedia-enabled devices, universal multimedia access (UMA) is emerging as one of the important components for the next generation of multimedia applications. The basic concept underlying UMA is universal or seamless access to multimedia content, by automatic selection and adaptation of content based on the user's environment. UMA promises an integration of these different perspectives into a new class of content adaptive applications that could allow users to access multimedia content without concern for specific coding formats, terminal capabilities, or network conditions. We discuss methods that support UMA and the tools provided by MPEG-7 to achieve this. We also discuss the inclusion of metadata in JPEG 2000 encoded images. We present these methods in the typical order that they may be used in an actual application. Therefore, we first discuss the (personalized) selection of desired content from all available content, followed by the organization of related variations of a single piece of content. Then, we discuss segmentation and summarization of audio video (AV) content, and finally, transcoding of AV content
Recommended from our members
MAC-REALM: A video content feature extraction and modelling framework
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.A consequence of the âdata delugeâ is the exponential increase in digital video footage, while the ability to find relevant video clips diminishes. Traditional text based search engines are no longer optimal for searching, as they cannot provide a granular search of the content inside video footage. To be able to search the video in a content based manner, the content features of the video need to be extracted and modelled into a content model, which can then act as a searchable proxy for the video content. This thesis focuses on the extraction of syntactic and semantic content features and content modelling, using machine driven processes, with either little or no user interaction. Our abstract framework design extracts syntactic and semantic content features and compiles them into an integrated content model. The framework integrates a four plane strategy that consists of a pre-processing plane that removes redundant data and filters the media to improve the feature extraction properties of the media; a syntactic feature extraction plane that extracts low level syntactic feature and mid-level syntactic features that have semantic attributes; a semantic relationship analysis and linkage plane, where the spatial and temporal relationships of all the content features are defined, and finally a content modelling stage where the syntactic and semantic content features are integrated into a content model. Each of the four planes can be split into three layers namely, the content layer, where the content to be processed is stored; the application layer, where the content is converted into content descriptions, and the MPEG-7 layer, where content descriptions are serialised. Using MPEG-7 standards to produce the content model will provide wide-ranging interoperability, while facilitating granular multi-content type searches. The framework is aiming to âbridgeâ the semantic gap, by integrating the syntactic and semantic content features from extraction through to modelling. The design of the framework has been implemented into a prototype called MAC-REALM, which has been tested and evaluated for its effectiveness to extract and model content features. Conclusions are drawn about the research output as a whole and whether they have met the objectives. Finally, future work is presented on how concept detection and crowd sourcing can be used with MAC-REALM
Semantic multimedia modelling & interpretation for annotation
The emergence of multimedia enabled devices, particularly the incorporation of cameras in mobile phones, and the accelerated revolutions in the low cost storage devices, boosts the multimedia data production rate drastically. Witnessing such an iniquitousness of digital images and videos, the research community has been projecting the issue of its significant utilization and management. Stored in monumental multimedia corpora, digital data need to be retrieved and organized in an intelligent way, leaning on the rich semantics involved. The utilization of these image and video collections demands proficient image and video annotation and retrieval techniques. Recently, the multimedia research community is progressively veering its emphasis to the personalization of these media. The main impediment in the image and video analysis is the semantic gap, which is the discrepancy among a userâs high-level interpretation of an image and the video and the low level computational interpretation of it. Content-based image and video annotation systems are remarkably susceptible to the semantic gap due to their reliance on low-level visual features for delineating semantically rich image and video contents. However, the fact is that the visual similarity is not semantic similarity, so there is a demand to break through this dilemma through an alternative way. The semantic gap can be narrowed by counting high-level and user-generated information in the annotation. High-level descriptions of images and or videos are more proficient of capturing the semantic meaning of multimedia content, but it is not always applicable to collect this information. It is commonly agreed that the problem of high level semantic annotation of multimedia is still far from being answered. This dissertation puts forward approaches for intelligent multimedia semantic extraction for high level annotation. This dissertation intends to bridge the gap between the visual features and semantics. It proposes a framework for annotation enhancement and refinement for the object/concept annotated images and videos datasets. The entire theme is to first purify the datasets from noisy keyword and then expand the concepts lexically and commonsensical to fill the vocabulary and lexical gap to achieve high level semantics for the corpus. This dissertation also explored a novel approach for high level semantic (HLS) propagation through the images corpora. The HLS propagation takes the advantages of the semantic intensity (SI), which is the concept dominancy factor in the image and annotation based semantic similarity of the images. As we are aware of the fact that the image is the combination of various concepts and among the list of concepts some of them are more dominant then the other, while semantic similarity of the images are based on the SI and concept semantic similarity among the pair of images. Moreover, the HLS exploits the clustering techniques to group similar images, where a single effort of the human experts to assign high level semantic to a randomly selected image and propagate to other images through clustering. The investigation has been made on the LabelMe image and LabelMe video dataset. Experiments exhibit that the proposed approaches perform a noticeable improvement towards bridging the semantic gap and reveal that our proposed system outperforms the traditional systems
- âŠ