455 research outputs found

    Searching Spontaneous Conversational Speech:Proceedings of ACM SIGIR Workshop (SSCS2008)

    Get PDF

    Indexation et navigation dans les contenus visuels : approches basées sur les graphes

    Get PDF
    La premiĂšre partie de cette thĂšse concerne l’indexation des documents vidĂ©o en scĂšnes. Les scĂšnes sont des ensembles de plans vidĂ©o partageant des caractĂ©ristiques similaires. Nous proposons d’abord une mĂ©thode interactive de dĂ©tection de groupes de plans, partageant un contenu couleur similaire, basĂ© sur la fragmentation de graphe. Nous abordons ensuite l’indexation des documents vidĂ©o en scĂšnes de dialogue, basĂ©e sur des caractĂ©ristiques sĂ©mantiques et structurelles prĂ©sentes dans l’enchaĂźnement des plans vidĂ©o. La seconde partie de cette thĂšse traite de la visualisation et de la recherche dans des collections d’images indexĂ©es. Nous prĂ©sentons un algorithme de plongement d’un espace mĂ©trique dans le plan appliquĂ© Ă  la visualisation de collections d’images indexĂ©es. Ce type de visualisation permet de reprĂ©senter les relations de dissimilaritĂ© entre images et d’identifier visuellement des groupes d’images similaires. Nous proposons enfin une interface de recherche d’images basĂ©e sur le routage local dans un graphe. Les rĂ©sultats d’une validation expĂ©rimentale sont prĂ©sentĂ©s et discutĂ©s.This thesis deals with the indexation and the visualisation of video documents and collections of images. The proposed methods are based on graphs to represent similarity relationships between indexed video shots and images. The first part of this thesis deals with the indexation of video documents into scenes. A scene is a set of video shots that share common features. We first propose an interactive method to group shots with similar color content using graph clustering. We then present a technique to index video documents into dialogue scenes based on semantic and structural features. The second part of this thesis deals with visualisation and search in collections of indexed images.We present an algorithm for embedding a metric space in the plane applied to collections of indexed images. The aim of this technique is to visualise the dissimilarity relationships between images to identify clusters of similar images. Finally, we present a user interface for searching images, inspired from greedy routing in networks. Results from experimental validation are presented and discussed

    Videoscapes: Exploring Unstructured Video Collections

    No full text

    Doctor of Philosophy

    Get PDF
    dissertationServing as a record of what happened during a scientific process, often computational, provenance has become an important piece of computing. The importance of archiving not only data and results but also the lineage of these entities has led to a variety of systems that capture provenance as well as models and schemas for this information. Despite significant work focused on obtaining and modeling provenance, there has been little work on managing and using this information. Using the provenance from past work, it is possible to mine common computational structure or determine differences between executions. Such information can be used to suggest possible completions for partial workflows, summarize a set of approaches, or extend past work in new directions. These applications require infrastructure to support efficient queries and accessible reuse. In order to support knowledge discovery and reuse from provenance information, the management of those data is important. One component of provenance is the specification of the computations; workflows provide structured abstractions of code and are commonly used for complex tasks. Using change-based provenance, it is possible to store large numbers of similar workflows compactly. This storage also allows efficient computation of differences between specifications. However, querying for specific structure across a large collection of workflows is difficult because comparing graphs depends on computing subgraph isomorphism which is NP-Complete. Graph indexing methods identify features that help distinguish graphs of a collection to filter results for a subgraph containment query and reduce the number of subgraph isomorphism computations. For provenance, this work extends these methods to work for more exploratory queries and collections with significant overlap. However, comparing workflow or provenance graphs may not require exact equality; a match between two graphs may allow paired nodes to be similar yet not equivalent. This work presents techniques to better correlate graphs to help summarize collections. Using this infrastructure, provenance can be reused so that users can learn from their own and others' history. Just as textual search has been augmented with suggested completions based on past or common queries, provenance can be used to suggest how computations can be completed or which steps might connect to a given subworkflow. In addition, provenance can help further science by accelerating publication and reuse. By incorporating provenance into publications, authors can more easily integrate their results, and readers can more easily verify and repeat results. However, reusing past computations requires maintaining stronger associations with any input data and underlying code as well as providing paths for migrating old work to new hardware or algorithms. This work presents a framework for maintaining data and code as well as supporting upgrades for workflow computations

    Hytexpros : a hypermedia information retrieval system

    Get PDF
    The Hypermedia information retrieval system makes use of the specific capabilities of hypermedia systems with information retrieval operations and provides new kind of information management tools. It combines both hypermedia and information retrieval to offer end-users the possibility of navigating, browsing and searching a large collection of documents to satisfy an information need. TEXPROS is an intelligent document processing and retrieval system that supports storing, extracting, classifying, categorizing, retrieval and browsing enterprise information. TEXPROS is a perfect application to apply hypermedia information retrieval techniques. In this dissertation, we extend TEXPROS to a hypermedia information retrieval system called HyTEXPROS with hypertext functionalities, such as node, typed and weighted links, anchors, guided-tours, network overview, bookmarks, annotations and comments, and external linkbase. It describes the whole information base including the metadata and the original documents as network nodes connected by links. Through hypertext functionalities, a user can construct dynamically an information path by browsing through pieces of the information base. By adding hypertext functionalities to TEXPROS, HyTEXPROS is created. It changes its working domain from a personal document process domain to a personal library domain accompanied with citation techniques to process original documents. A four-level conceptual architecture is presented as the system architecture of HyTEXPROS. Such architecture is also referred to as the reference model of HyTEXPROS. Detailed description of HyTEXPROS, using the First Order Logic Calculus, is also proposed. An early version of a prototype is briefly described

    Management and Visualisation of Non-linear History of Polygonal 3D Models

    Get PDF
    The research presented in this thesis concerns the problems of maintenance and revision control of large-scale three dimensional (3D) models over the Internet. As the models grow in size and the authoring tools grow in complexity, standard approaches to collaborative asset development become impractical. The prevalent paradigm of sharing files on a file system poses serious risks with regards, but not limited to, ensuring consistency and concurrency of multi-user 3D editing. Although modifications might be tracked manually using naming conventions or automatically in a version control system (VCS), understanding the provenance of a large 3D dataset is hard due to revision metadata not being associated with the underlying scene structures. Some tools and protocols enable seamless synchronisation of file and directory changes in remote locations. However, the existing web-based technologies are not yet fully exploiting the modern design patters for access to and management of alternative shared resources online. Therefore, four distinct but highly interconnected conceptual tools are explored. The first is the organisation of 3D assets within recent document-oriented No Structured Query Language (NoSQL) databases. These "schemaless" databases, unlike their relational counterparts, do not represent data in rigid table structures. Instead, they rely on polymorphic documents composed of key-value pairs that are much better suited to the diverse nature of 3D assets. Hence, a domain-specific non-linear revision control system 3D Repo is built around a NoSQL database to enable asynchronous editing similar to traditional VCSs. The second concept is that of visual 3D differencing and merging. The accompanying 3D Diff tool supports interactive conflict resolution at the level of scene graph nodes that are de facto the delta changes stored in the repository. The third is the utilisation of HyperText Transfer Protocol (HTTP) for the purposes of 3D data management. The XML3DRepo daemon application exposes the contents of the repository and the version control logic in a Representational State Transfer (REST) style of architecture. At the same time, it manifests the effects of various 3D encoding strategies on the file sizes and download times in modern web browsers. The fourth and final concept is the reverse-engineering of an editing history. Even if the models are being version controlled, the extracted provenance is limited to additions, deletions and modifications. The 3D Timeline tool, therefore, implies a plausible history of common modelling operations such as duplications, transformations, etc. Given a collection of 3D models, it estimates a part-based correspondence and visualises it in a temporal flow. The prototype tools developed as part of the research were evaluated in pilot user studies that suggest they are usable by the end users and well suited to their respective tasks. Together, the results constitute a novel framework that demonstrates the feasibility of a domain-specific 3D version control

    Navigation, findability and the usage of cultural heritage on the web: an exploratory study

    Get PDF
    The present thesis investigates the usage of cultural heritage resources on the web. In recent years cultural heritage objects has been digitalized and made available on the web for the general public to use. The thesis addresses to what extent the digitalized material is used, and how findable it is on the web. On the web resources needs to be findable in order to be visited and used. The study is done at the intersection of several research areas in Library and Information Science; Information Seeking/Human Information Behaviour, Interactive Information Retrieval, and Webometrics. The two thesis research questions focus on different aspects of the study: (1) findability on the web; and (2) the usage and the users. The usage of the cultural heritage is analysed with Savolainen’s Everyday Life Information Seeking (ELIS) framework. The IS&R framework by Ingwersen and JĂ€rvelin is the main theoretical foundation, and a conceptual framework is developed so the examined aspects could be related to each other more clearly. An important distinction in the framework is between object and resource. An object is a single document, file or html page, whereas a resource is a collection of objects, e.g. a cultural heritage web site. Three webometric levels are used to both combine and distinguish the data types: usage, content, and structure. The interaction between the system and its users’ information search process was divided into query dependent and query independent aspects. The query dependent aspects contain the information need on the user side and the topic of the content on the system side. The query independent aspects are the structural findability on the system side and the users search skills on the user side. The conceptual framework is summarised in the User-Resource Interaction (URI) model. The research design is a methodological triangulation, in the form of a mixed methods approach in order to obtain measures and indicators of the resources and the usage from different angels. Four methods are used: site structure analysis; log analysis; web survey; and findability analysis. The research design is both sequential and parallel, the site structure analysis preceded the log analysis and the findability analysis, and the web survey was employed independent of the other methods. Three Danish resources are studied: Arkiv for Dansk Litteratur (ADL), a collection of literary texts written by authors; Kunst Index Danmark (KID), an index of the holdings in the Danish art museums; and Guaman Poma Inch Chronicle (Poma), a digitalized manuscript on the UNESCO list of World cultural heritage. The studied log covers all usage during the period October to December 2010. The site structure is analysed so the resources can be described as different levels, based on function and content. The results from the site structure analysis are used both in the log analysis and the findability analysis, as well as a way to describe the resources. In the log analysis navigation strategies and navigation patterns are studied. Navigation through a web search engine is the most common way to reach the resources, but both direct navigation and link navigation are also used in all three resources. Most users arrive in the middle level in ADL and KID, at information on authors and artists. On average cultural heritage objects are viewed in half of the session. In the analysis of the web survey answers two groups of users’ are distinguished, the professional user in a work context and users in a hobby or leisure context. School or study as a context is prominent in Guaman Poma, the Inca Chronicle. Generally are pages about the cultural heritage more frequently visited than the digitized cultural heritage objects. In the findability framework six aspects are identified as central for the findability of an object on the web: attributes of the object, accessibility, internal navigation, internal search, reachability and web prestige. The six aspects are evaluated through seven indicators. All studied objects are findable in the analysis using the findability framework. A findability issue in KID is the use of the secure https protocol instead of http, which leads to the objects in KID having no PageRank value in Google and thereby a lower ranking in comparison to similar objects with a PageRank value. The internal findability is reduced for the objects in top of all three resources, e.g. the first page, due to the focus of the internal search engine on the cultural heritage objects. Several possible adjustment or developments of the findability frameworks is discussed, such as changing the weightning between the aspects measured, alternative scores and automated measuring. In conclusion, the investigation adds to our knowledge about how resources with digitalized cultural heritage are accessed and used, as well as how findable they are. The thesis provides both theoretical and conceptual contributions to research. The IS&R framework has been adapted to the web, the information search process was split into query dependent and query independent aspects, and a whole findability framework has been developed. Both the empirical findings and the theoretical advancements support the development of better access to web resources
