32,581 research outputs found

    Innehållsbaserad sökning av hierarkiska objekt med PicSOM

    Get PDF
    The amounts of multimedia content available to the public have been increasing rapidly in the last decades and it is expected to grow exponentially in the years to come. This development puts an increasing emphasis on automated content-based information retrieval (CBIR) methods, which index and retrieve multimedia based on its contents. Such methods can automatically process huge amounts of data without the human intervention required by traditional methods (e.g. manual categorisation, entering of keywords). Unfortunately CBIR methods do have a serious problem: the so-called semantic gap between the low-level descriptions used by computer systems and the high-level concepts of humans. However, by emulating human skills such as understanding the contexts and relationships of the multimedia objects one might be able to bridge the semantic gap. To this end, this thesis proposes a method of using hierarchical objects combined with relevance sharing. The proposed method can incorporate natural relationships between multimedia objects and take advantage of these in the retrieval process, hopefully improving the retrieval accuracy considerably. The literature survey part of the thesis consists of a review of content-based information retrieval in general and also looks at multimodal fusion in CBIR systems and how that has been implemented previously in different scenarios. The work performed for this thesis includes the implementation of hierarchical objects and multimodal relevance sharing into the PicSOM CBIR system. Also extensive experiments with different kinds of multimedia and other hierarchical objects (segmented images, web-link structures and video retrieval) were performed to evaluate the usefulness of the hierarchical objects paradigm. Keywords: content-based retrieval, self-organizing map, multimedia database

    Hierarchical Attention Network for Visually-aware Food Recommendation

    Full text link
    Food recommender systems play an important role in assisting users to identify the desired food to eat. Deciding what food to eat is a complex and multi-faceted process, which is influenced by many factors such as the ingredients, appearance of the recipe, the user's personal preference on food, and various contexts like what had been eaten in the past meals. In this work, we formulate the food recommendation problem as predicting user preference on recipes based on three key factors that determine a user's choice on food, namely, 1) the user's (and other users') history; 2) the ingredients of a recipe; and 3) the descriptive image of a recipe. To address this challenging problem, we develop a dedicated neural network based solution Hierarchical Attention based Food Recommendation (HAFR) which is capable of: 1) capturing the collaborative filtering effect like what similar users tend to eat; 2) inferring a user's preference at the ingredient level; and 3) learning user preference from the recipe's visual images. To evaluate our proposed method, we construct a large-scale dataset consisting of millions of ratings from AllRecipes.com. Extensive experiments show that our method outperforms several competing recommender solutions like Factorization Machine and Visual Bayesian Personalized Ranking with an average improvement of 12%, offering promising results in predicting user preference for food. Codes and dataset will be released upon acceptance

    Measuring usability for application software using the quality in use integration measurement model

    Get PDF
    User interfaces of application software are designed to make user interaction as efficient and as simple as possible. Market accessibility of any application software is determined by the usability of its user interfaces. A poorly designed user interface will have little value no matter how powerful the program is. Thus, it is significantly important to measure usability during the system development lifecycle in order to avoid user disappointment. Various methods and standards that help measure usability have been developed. However, these methods define usability inconsistently, which makes software engineers hesitant in implementing these methods or standards. The Quality in Use Integrated Measurement (QUIM) model is a consolidated approach for measuring usability through 10 factors, 26 criteria, and 127 metrics. It decomposes usability into factors, criteria, and metrics, and it is a hierarchical model that helps developers with no or little background of usability metrics. Among 127 metrics of QUIM, essential efficiency (EE) is the most specific metric used to measure the usability of user interfaces through an equation. This study involves a comparative analysis between three case studies that use the QUIM model to measure usability in terms of EE for three case studies: (1) Public University Registration System, (2) Restaurant Menu Ordering System, and (3) ATM system. A comparison is made based on the percentage of EE for each element of the use cases in each use case diagram. The results obtained revealed that the user interface design for Restaurant Menu Ordering System scored the highest percentage of EE, thus proving to be the most user-friendly application software among its counterparts

    A Web video retrieval method using hierarchical structure of Web video groups

    Get PDF
    In this paper, we propose a Web video retrieval method that uses hierarchical structure of Web video groups. Existing retrieval systems require users to input suitable queries that identify the desired contents in order to accurately retrieve Web videos; however, the proposed method enables retrieval of the desired Web videos even if users cannot input the suitable queries. Specifically, we first select representative Web videos from a target video dataset by using link relationships between Web videos obtained via metadata “related videos” and heterogeneous video features. Furthermore, by using the representative Web videos, we construct a network whose nodes and edges respectively correspond to Web videos and links between these Web videos. Then Web video groups, i.e., Web video sets with similar topics are hierarchically extracted based on strongly connected components, edge betweenness and modularity. By exhibiting the obtained hierarchical structure of Web video groups, users can easily grasp the overview of many Web videos. Consequently, even if users cannot write suitable queries that identify the desired contents, it becomes feasible to accurately retrieve the desired Web videos by selecting Web video groups according to the hierarchical structure. Experimental results on actual Web videos verify the effectiveness of our method

    Context-aware person identification in personal photo collections

    Get PDF
    Identifying the people in photos is an important need for users of photo management systems. We present MediAssist, one such system which facilitates browsing, searching and semi-automatic annotation of personal photos, using analysis of both image content and the context in which the photo is captured. This semi-automatic annotation includes annotation of the identity of people in photos. In this paper, we focus on such person annotation, and propose person identification techniques based on a combination of context and content. We propose language modelling and nearest neighbor approaches to context-based person identification, in addition to novel face color and image color content-based features (used alongside face recognition and body patch features). We conduct a comprehensive empirical study of these techniques using the real private photo collections of a number of users, and show that combining context- and content-based analysis improves performance over content or context alone

    Video browsing interfaces and applications: a review

    Get PDF
    We present a comprehensive review of the state of the art in video browsing and retrieval systems, with special emphasis on interfaces and applications. There has been a significant increase in activity (e.g., storage, retrieval, and sharing) employing video data in the past decade, both for personal and professional use. The ever-growing amount of video content available for human consumption and the inherent characteristics of video data—which, if presented in its raw format, is rather unwieldy and costly—have become driving forces for the development of more effective solutions to present video contents and allow rich user interaction. As a result, there are many contemporary research efforts toward developing better video browsing solutions, which we summarize. We review more than 40 different video browsing and retrieval interfaces and classify them into three groups: applications that use video-player-like interaction, video retrieval applications, and browsing solutions based on video surrogates. For each category, we present a summary of existing work, highlight the technical aspects of each solution, and compare them against each other

    An empirical study of inter-concept similarities in multimedia ontologies

    Get PDF
    Generic concept detection has been a widely studied topic in recent research on multimedia analysis and retrieval, but the issue of how to exploit the structure of a multimedia ontology as well as different inter-concept relations, has not received similar attention. In this paper, we present results from our empirical analysis of different types of similarity among semantic concepts in two multimedia ontologies, LSCOM-Lite and CDVP-206. The results show promise that the proposed methods may be helpful in providing insight into the existing inter-concept relations within an ontology and selecting the most facilitating set of concepts and hierarchical relations. Such an analysis as this can be utilized in various tasks such as building more reliable concept detectors and designing large-scale ontologies
    corecore