2,486 research outputs found

    An examination of automatic video retrieval technology on access to the contents of an historical video archive

    Get PDF
    Purpose – This paper aims to provide an initial understanding of the constraints that historical video collections pose to video retrieval technology and the potential that online access offers to both archive and users. Design/methodology/approach – A small and unique collection of videos on customs and folklore was used as a case study. Multiple methods were employed to investigate the effectiveness of technology and the modality of user access. Automatic keyframe extraction was tested on the visual content while the audio stream was used for automatic classification of speech and music clips. The user access (search vs browse) was assessed in a controlled user evaluation. A focus group and a survey provided insight on the actual use of the analogue archive. The results of these multiple studies were then compared and integrated (triangulation). Findings – The amateur material challenged automatic techniques for video and audio indexing, thus suggesting that the technology must be tested against the material before deciding on a digitisation strategy. Two user interaction modalities, browsing vs searching, were tested in a user evaluation. Results show users preferred searching, but browsing becomes essential when the search engine fails in matching query and indexed words. Browsing was also valued for serendipitous discovery; however the organisation of the archive was judged cryptic and therefore of limited use. This indicates that the categorisation of an online archive should be thought of in terms of users who might not understand the current classification. The focus group and the survey showed clearly the advantage of online access even when the quality of the video surrogate is poor. The evidence gathered suggests that the creation of a digital version of a video archive requires a rethinking of the collection in terms of the new medium: a new archive should be specially designed to exploit the potential that the digital medium offers. Similarly, users' needs have to be considered before designing the digital library interface, as needs are likely to be different from those imagined. Originality/value – This paper is the first attempt to understand the advantages offered and limitations held by video retrieval technology for small video archives like those often found in special collections

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    Video browsing interfaces and applications: a review

    Get PDF
    We present a comprehensive review of the state of the art in video browsing and retrieval systems, with special emphasis on interfaces and applications. There has been a significant increase in activity (e.g., storage, retrieval, and sharing) employing video data in the past decade, both for personal and professional use. The ever-growing amount of video content available for human consumption and the inherent characteristics of video data—which, if presented in its raw format, is rather unwieldy and costly—have become driving forces for the development of more effective solutions to present video contents and allow rich user interaction. As a result, there are many contemporary research efforts toward developing better video browsing solutions, which we summarize. We review more than 40 different video browsing and retrieval interfaces and classify them into three groups: applications that use video-player-like interaction, video retrieval applications, and browsing solutions based on video surrogates. For each category, we present a summary of existing work, highlight the technical aspects of each solution, and compare them against each other

    Using Web Archives to Enrich the Live Web Experience Through Storytelling

    Get PDF
    Much of our cultural discourse occurs primarily on the Web. Thus, Web preservation is a fundamental precondition for multiple disciplines. Archiving Web pages into themed collections is a method for ensuring these resources are available for posterity. Services such as Archive-It exists to allow institutions to develop, curate, and preserve collections of Web resources. Understanding the contents and boundaries of these archived collections is a challenge for most people, resulting in the paradox of the larger the collection, the harder it is to understand. Meanwhile, as the sheer volume of data grows on the Web, storytelling is becoming a popular technique in social media for selecting Web resources to support a particular narrative or story . In this dissertation, we address the problem of understanding the archived collections through proposing the Dark and Stormy Archive (DSA) framework, in which we integrate storytelling social media and Web archives. In the DSA framework, we identify, evaluate, and select candidate Web pages from archived collections that summarize the holdings of these collections, arrange them in chronological order, and then visualize these pages using tools that users already are familiar with, such as Storify. To inform our work of generating stories from archived collections, we start by building a baseline for the structural characteristics of popular (i.e., receiving the most views) human-generated stories through investigating stories from Storify. Furthermore, we checked the entire population of Archive-It collections for better understanding the characteristics of the collections we intend to summarize. We then filter off-topic pages from the collections the using different methods to detect when an archived page in a collection has gone off-topic. We created a gold standard dataset from three Archive-It collections to evaluate the proposed methods at different thresholds. From the gold standard dataset, we identified five behaviors for the TimeMaps (a list of archived copies of a page) based on the page’s aboutness. Based on a dynamic slicing algorithm, we divide the collection and cluster the pages in each slice. We then select the best representative page from each cluster based on different quality metrics (e.g., the replay quality, and the quality of the generated snippet from the page). At the end, we put the selected pages in chronological order and visualize them using Storify. For evaluating the DSA framework, we obtained a ground truth dataset of hand-crafted stories from Archive-It collections generated by expert archivists. We used Amazon’s Mechanical Turk to evaluate the automatically generated stories against the stories that were created by domain experts. The results show that the automatically generated stories by the DSA are indistinguishable from those created by human subject domain experts, while at the same time both kinds of stories (automatic and human) are easily distinguished from randomly generated storie

    Improving Collection Understanding for Web Archives with Storytelling: Shining Light Into Dark and Stormy Archives

    Get PDF
    Collections are the tools that people use to make sense of an ever-increasing number of archived web pages. As collections themselves grow, we need tools to make sense of them. Tools that work on the general web, like search engines, are not a good fit for these collections because search engines do not currently represent multiple document versions well. Web archive collections are vast, some containing hundreds of thousands of documents. Thousands of collections exist, many of which cover the same topic. Few collections include standardized metadata. Too many documents from too many collections with insufficient metadata makes collection understanding an expensive proposition. This dissertation establishes a five-process model to assist with web archive collection understanding. This model aims to produce a social media story – a visualization with which most web users are familiar. Each social media story contains surrogates which are summaries of individual documents. These surrogates, when presented together, summarize the topic of the story. After applying our storytelling model, they summarize the topic of a web archive collection. We develop and test a framework to select the best exemplars that represent a collection. We establish that algorithms produced from these primitives select exemplars that are otherwise undiscoverable using conventional search engine methods. We generate story metadata to improve the information scent of a story so users can understand it better. After an analysis showing that existing platforms perform poorly for web archives and a user study establishing the best surrogate type, we generate document metadata for the exemplars with machine learning. We then visualize the story and document metadata together and distribute it to satisfy the information needs of multiple personas who benefit from our model. Our tools serve as a reference implementation of our Dark and Stormy Archives storytelling model. Hypercane selects exemplars and generates story metadata. MementoEmbed generates document metadata. Raintale visualizes and distributes the story based on the story metadata and the document metadata of these exemplars. By providing understanding immediately, our stories save users the time and effort of reading thousands of documents and, most importantly, help them understand web archive collections

    Analysis and Extraction of Tempo-Spatial Events in an Efficient Archival CDN with Emphasis on Telegram

    Full text link
    This paper presents an efficient archival framework for exploring and tracking cyberspace large-scale data called Tempo-Spatial Content Delivery Network (TS-CDN). Social media data streams are renewing in time and spatial dimensions. Various types of websites and social networks (i.e., channels, groups, pages, etc.) are considered spatial in cyberspace. Accurate analysis entails encompassing the bulk of data. In TS-CDN by applying the hash function on big data an efficient content delivery network is created. Using hash function rebuffs data redundancy and leads to conclude unique data archive in large-scale. This framework based on entered query allows for apparent monitoring and exploring data in tempo-spatial dimension based on TF-IDF score. Also by conformance from i18n standard, the Unicode problem has been dissolved. For evaluation of TS-CDN framework, a dataset from Telegram news channels from March 23, 2020 (1399-01-01), to September 21, 2020 (1399-06-31) on topics including Coronavirus (COVID-19), vaccine, school reopening, flood, earthquake, justice shares, petroleum, and quarantine exploited. By applying hash on Telegram dataset in the mentioned time interval, a significant reduction in media files such as 39.8% for videos (from 79.5 GB to 47.8 GB), and 10% for images (from 4 GB to 3.6 GB) occurred. TS-CDN infrastructure in a web-based approach has been presented as a service-oriented system. Experiments conducted on enormous time series data, including different spatial dimensions (i.e., Khabare Fouri, Khabarhaye Fouri, Akhbare Rouze Iran, and Akhbare Rasmi Telegram news channels), demonstrate the efficiency and applicability of the implemented TS-CDN framework

    Review of Semantic Importance and Role of using Ontologies in Web Information Retrieval Techniques

    Get PDF
    The Web contains an enormous amount of information, which is managed to accumulate, researched, and regularly used by many users. The nature of the Web is multilingual and growing very fast with its diverse nature of data including unstructured or semi-structured data such as Websites, texts, journals, and files. Obtaining critical relevant data from such vast data with its diverse nature has been a monotonous and challenging task. Simple key phrase data gathering systems rely heavily on statistics, resulting in a word incompatibility problem related to a specific word's inescapable semantic and situation variants. As a result, there is an urgent need to arrange such colossal data systematically to find out the relevant information that can be quickly analyzed and fulfill the users' needs in the relevant context. Over the years ontologies are widely used in the semantic Web to contain unorganized information systematic and structured manner. Still, they have also significantly enhanced the efficiency of various information recovery approaches. Ontological information gathering systems recover files focused on the semantic relation of the search request and the searchable information. This paper examines contemporary ontology-based information extraction techniques for texts, interactive media, and multilingual data types. Moreover, the study tried to compare and classify the most significant developments utilized in the search and retrieval techniques and their major disadvantages and benefits

    DARIAH and the Benelux

    Get PDF

    Random Assisted Browsing of Rushes Archives

    Get PDF
    How to efficiently browse a large video database if its content is unknown to the user? In this paper we propose new approaches for browsing initialisation, exploration and content access of a rushes archive, where the span of information stored can be huge and difficult to understand at a glance. Exploring and navigating through raw footage is assisted by organising the video material in a meaningful structure and by adopting appropriate visualisation solutions. Un-annotated content is organised in hierarchical previews, while browsing is enabled by novel methods of random exploration and random content access to preview nodes. User tests conducted on professional users in a real-work scenario aim at demonstrating how the hierarchical visualisation and the proposed random browsing solutions assist the process of accessing and retrieving desired content
    • …
    corecore