10,003 research outputs found

    Image Semantics in the Description and Categorization of Journalistic Photographs

    Get PDF
    This paper reports a study on the description and categorization of images. The aim of the study was to evaluate existing indexing frameworks in the context of reportage photographs and to find out how the use of this particular image genre influences the results. The effect of different tasks on image description and categorization was also studied. Subjects performed keywording and free description tasks and the elicited terms were classified using the most extensive one of the reviewed frameworks. Differences were found in the terms used in constrained and unconstrained descriptions. Summarizing terms such as abstract concepts, themes, settings and emotions were used more frequently in keywording than in free description. Free descriptions included more terms referring to locations within the images, people and descriptive terms due to the narrative form the subjects used without prompting. The evaluated framework was found to lack some syntactic and semantic classes present in the data and modifications were suggested. According to the results of this study image categorization is based on high-level interpretive concepts, including affective and abstract themes. The results indicate that image genre influences categorization and keywording modifies and truncates natural image description

    Log-Euclidean Bag of Words for Human Action Recognition

    Full text link
    Representing videos by densely extracted local space-time features has recently become a popular approach for analysing actions. In this paper, we tackle the problem of categorising human actions by devising Bag of Words (BoW) models based on covariance matrices of spatio-temporal features, with the features formed from histograms of optical flow. Since covariance matrices form a special type of Riemannian manifold, the space of Symmetric Positive Definite (SPD) matrices, non-Euclidean geometry should be taken into account while discriminating between covariance matrices. To this end, we propose to embed SPD manifolds to Euclidean spaces via a diffeomorphism and extend the BoW approach to its Riemannian version. The proposed BoW approach takes into account the manifold geometry of SPD matrices during the generation of the codebook and histograms. Experiments on challenging human action datasets show that the proposed method obtains notable improvements in discrimination accuracy, in comparison to several state-of-the-art methods

    Death of mixed methods?: Or the rebirth of research as a craft

    Get PDF
    The classification by many scholars of numerical research processes as quantitative and other research techniques as qualitative has prompted the construction of a third category, that of ā€˜mixed methodsā€™, to describe studies that use elements from both processes. Such labels might be helpful in structuring our understanding of phenomena. But they can also inhibit our activities when they serve as inaccurate or limiting descriptors. Based on the observation that mixed methods is fast becoming a common research approach in the social sciences, this paper questions whether the assumptions that are used and perpetuated by mixed methods are valid. The paper calls for a critical change in how we perceive research, in order to better describe actual research processes. A more ethological taxonomy of the mechanisms underlying research structures and processes is posited to encourage creative thinking around alternatives to the three purported paradigms of quantitative, qualitative and mixed methods. This ā€˜return to basicsā€™ seeks to encourage new and innovative research designs to emerge, and suggests a rebirth of research from the ashes of mixed methods

    Multi modal multi-semantic image retrieval

    Get PDF
    PhDThe rapid growth in the volume of visual information, e.g. image, and video can overwhelm usersā€™ ability to find and access the specific visual information of interest to them. In recent years, ontology knowledge-based (KB) image information retrieval techniques have been adopted into in order to attempt to extract knowledge from these images, enhancing the retrieval performance. A KB framework is presented to promote semi-automatic annotation and semantic image retrieval using multimodal cues (visual features and text captions). In addition, a hierarchical structure for the KB allows metadata to be shared that supports multi-semantics (polysemy) for concepts. The framework builds up an effective knowledge base pertaining to a domain specific image collection, e.g. sports, and is able to disambiguate and assign high level semantics to ā€˜unannotatedā€™ images. Local feature analysis of visual content, namely using Scale Invariant Feature Transform (SIFT) descriptors, have been deployed in the ā€˜Bag of Visual Wordsā€™ model (BVW) as an effective method to represent visual content information and to enhance its classification and retrieval. Local features are more useful than global features, e.g. colour, shape or texture, as they are invariant to image scale, orientation and camera angle. An innovative approach is proposed for the representation, annotation and retrieval of visual content using a hybrid technique based upon the use of an unstructured visual word and upon a (structured) hierarchical ontology KB model. The structural model facilitates the disambiguation of unstructured visual words and a more effective classification of visual content, compared to a vector space model, through exploiting local conceptual structures and their relationships. The key contributions of this framework in using local features for image representation include: first, a method to generate visual words using the semantic local adaptive clustering (SLAC) algorithm which takes term weight and spatial locations of keypoints into account. Consequently, the semantic information is preserved. Second a technique is used to detect the domain specific ā€˜non-informative visual wordsā€™ which are ineffective at representing the content of visual data and degrade its categorisation ability. Third, a method to combine an ontology model with xi a visual word model to resolve synonym (visual heterogeneity) and polysemy problems, is proposed. The experimental results show that this approach can discover semantically meaningful visual content descriptions and recognise specific events, e.g., sports events, depicted in images efficiently. Since discovering the semantics of an image is an extremely challenging problem, one promising approach to enhance visual content interpretation is to use any associated textual information that accompanies an image, as a cue to predict the meaning of an image, by transforming this textual information into a structured annotation for an image e.g. using XML, RDF, OWL or MPEG-7. Although, text and image are distinct types of information representation and modality, there are some strong, invariant, implicit, connections between images and any accompanying text information. Semantic analysis of image captions can be used by image retrieval systems to retrieve selected images more precisely. To do this, a Natural Language Processing (NLP) is exploited firstly in order to extract concepts from image captions. Next, an ontology-based knowledge model is deployed in order to resolve natural language ambiguities. To deal with the accompanying text information, two methods to extract knowledge from textual information have been proposed. First, metadata can be extracted automatically from text captions and restructured with respect to a semantic model. Second, the use of LSI in relation to a domain-specific ontology-based knowledge model enables the combined framework to tolerate ambiguities and variations (incompleteness) of metadata. The use of the ontology-based knowledge model allows the system to find indirectly relevant concepts in image captions and thus leverage these to represent the semantics of images at a higher level. Experimental results show that the proposed framework significantly enhances image retrieval and leads to narrowing of the semantic gap between lower level machinederived and higher level human-understandable conceptualisation

    Inexpensive fusion methods for enhancing feature detection

    Get PDF
    Recent successful approaches to high-level feature detection in image and video data have treated the problem as a pattern classification task. These typically leverage the techniques learned from statistical machine learning, coupled with ensemble architectures that create multiple feature detection models. Once created, co-occurrence between learned features can be captured to further boost performance. At multiple stages throughout these frameworks, various pieces of evidence can be fused together in order to boost performance. These approaches whilst very successful are computationally expensive, and depending on the task, require the use of significant computational resources. In this paper we propose two fusion methods that aim to combine the output of an initial basic statistical machine learning approach with a lower-quality information source, in order to gain diversity in the classified results whilst requiring only modest computing resources. Our approaches, validated experimentally on TRECVid data, are designed to be complementary to existing frameworks and can be regarded as possible replacements for the more computationally expensive combination strategies used elsewhere

    Journalistic image access : description, categorization and searching

    Get PDF
    The quantity of digital imagery continues to grow, creating a pressing need to develop efficient methods for organizing and retrieving images. Knowledge on user behavior in image description and search is required for creating effective and satisfying searching experiences. The nature of visual information and journalistic images creates challenges in representing and matching images with user needs. The goal of this dissertation was to understand the processes in journalistic image access (description, categorization, and searching), and the effects of contextual factors on preferred access points. These were studied using multiple data collection and analysis methods across several studies. Image attributes used to describe journalistic imagery were analyzed based on description tasks and compared to a typology developed through a meta-analysis of literature on image attributes. Journalistic image search processes and query types were analyzed through a field study and multimodal image retrieval experiment. Image categorization was studied via sorting experiments leading to a categorization model. Advances to research methods concerning search tasks and categorization procedures were implemented. Contextual effects on image access were found related to organizational contexts, work, and search tasks, as well as publication context. Image retrieval in a journalistic work context was contextual at the level of image needs and search process. While text queries, together with browsing, remained the key access mode to journalistic imagery, participants also used visual access modes in the experiment, constructing multimodal queries. Assigned search task type and searcher expertise had an effect on query modes utilized. Journalistic images were mostly described and queried for on the semantic level but also syntactic attributes were used. Constraining the description led to more abstract descriptions. Image similarity was evaluated mainly based on generic semantics. However, functionally oriented categories were also constructed, especially by domain experts. Availability of page context promoted thematic rather than object-based categorization. The findings increase our understanding of user behavior in image description, categorization, and searching, as well as have implications for future solutions in journalistic image access. The contexts of image production, use, and search merit more interest in research as these could be leveraged for supporting annotation and retrieval. Multiple access points should be created for journalistic images based on image content and function. Support for multimodal query formulation should also be offered. The contributions of this dissertation may be used to create evaluation criteria for journalistic image access systems

    The future of corporate reporting: a review article

    Get PDF
    Significant changes in the corporate external reporting environment have led to proposals for fundamental changes in corporate reporting practices. Recent influential reports by major organisations have suggested that a variety of new information types be reported, in particular forward-looking, non-financial and soft information. This paper presents a review and synthesis of these reports and provides a framework for classifying and describing suggested information types. The existence of academic antecedents for certain current proposals are identified and the ambiguous relationship between research and practice is explored. The implications for future academic research are discussed and a research agenda is introduced

    Modeling and Analysis of Scholar Mobility on Scientific Landscape

    Full text link
    Scientific literature till date can be thought of as a partially revealed landscape, where scholars continue to unveil hidden knowledge by exploring novel research topics. How do scholars explore the scientific landscape , i.e., choose research topics to work on? We propose an agent-based model of topic mobility behavior where scholars migrate across research topics on the space of science following different strategies, seeking different utilities. We use this model to study whether strategies widely used in current scientific community can provide a balance between individual scientific success and the efficiency and diversity of the whole academic society. Through extensive simulations, we provide insights into the roles of different strategies, such as choosing topics according to research potential or the popularity. Our model provides a conceptual framework and a computational approach to analyze scholars' behavior and its impact on scientific production. We also discuss how such an agent-based modeling approach can be integrated with big real-world scholarly data.Comment: To appear in BigScholar, WWW 201

    Video summarisation: A conceptual framework and survey of the state of the art

    Get PDF
    This is the post-print (final draft post-refereeing) version of the article. Copyright @ 2007 Elsevier Inc.Video summaries provide condensed and succinct representations of the content of a video stream through a combination of still images, video segments, graphical representations and textual descriptors. This paper presents a conceptual framework for video summarisation derived from the research literature and used as a means for surveying the research literature. The framework distinguishes between video summarisation techniques (the methods used to process content from a source video stream to achieve a summarisation of that stream) and video summaries (outputs of video summarisation techniques). Video summarisation techniques are considered within three broad categories: internal (analyse information sourced directly from the video stream), external (analyse information not sourced directly from the video stream) and hybrid (analyse a combination of internal and external information). Video summaries are considered as a function of the type of content they are derived from (object, event, perception or feature based) and the functionality offered to the user for their consumption (interactive or static, personalised or generic). It is argued that video summarisation would benefit from greater incorporation of external information, particularly user based information that is unobtrusively sourced, in order to overcome longstanding challenges such as the semantic gap and providing video summaries that have greater relevance to individual users
    • ā€¦
    corecore