757 research outputs found

    Capturing Synchronous Collaborative Design Activities: A State-Of-The-Art Technology Review

    Get PDF

    Indexing, browsing and searching of digital video

    Get PDF
    Video is a communications medium that normally brings together moving pictures with a synchronised audio track into a discrete piece or pieces of information. The size of a “piece ” of video can variously be referred to as a frame, a shot, a scene, a clip, a programme or an episode, and these are distinguished by their lengths and by their composition. We shall return to the definition of each of these in section 4 this chapter. In modern society, video is ver

    Socio-technical lifelogging: deriving design principles for a future proof digital past

    Get PDF
    Lifelogging is a technically inspired approach that attempts to address the problem of human forgetting by developing systems that ‘record everything’. Uptake of lifelogging systems has generally been disappointing, however. One reason for this lack of uptake is the absence of design principles for developing digital systems to support memory. Synthesising multiple studies, we identify and evaluate 4 new empirically motivated design principles for lifelogging: Selectivity, Embodiment, Synergy and Reminiscence. We first summarise 4 empirical studies that motivate the principles, then describe the evaluation of 4 novel systems built to embody these principles. The design principles were generative, leading to the development of new classes of lifelogging system, as well as providing strategic guidance about how those systems should be built. Evaluations suggest support for Selection and Embodiment principles, but more conceptual and technical work is needed to refine the Synergy and Reminiscence principles

    Scoping study of the feasibility of developing a software tool to assist designers of pedestrian crossing places

    Get PDF
    This report is the outcome of a scoping study of how guidance can be provided for practising highway engineers in designing informal pedestrian crossing facilities. The main component of this report is an analysis by an IT consultant of a range of mechanisms for delivery of this. The study was informed by the opinions of a group of practitioners who have a direct interest in the provision of pedestrian facilities. These results are placed in context and their consequences are explored in the first part of the report

    Utilization of multimodal interaction signals for automatic summarisation of academic presentations

    Get PDF
    Multimedia archives are expanding rapidly. For these, there exists a shortage of retrieval and summarisation techniques for accessing and browsing content where the main information exists in the audio stream. This thesis describes an investigation into the development of novel feature extraction and summarisation techniques for audio-visual recordings of academic presentations. We report on the development of a multimodal dataset of academic presentations. This dataset is labelled by human annotators to the concepts of presentation ratings, audience engagement levels, speaker emphasis, and audience comprehension. We investigate the automatic classification of speaker ratings and audience engagement by extracting audio-visual features from video of the presenter and audience and training classifiers to predict speaker ratings and engagement levels. Following this, we investigate automatic identi�cation of areas of emphasised speech. By analysing all human annotated areas of emphasised speech, minimum speech pitch and gesticulation are identified as indicating emphasised speech when occurring together. Investigations are conducted into the speaker's potential to be comprehended by the audience. Following crowdsourced annotation of comprehension levels during academic presentations, a set of audio-visual features considered most likely to affect comprehension levels are extracted. Classifiers are trained on these features and comprehension levels could be predicted over a 7-class scale to an accuracy of 49%, and over a binary distribution to an accuracy of 85%. Presentation summaries are built by segmenting speech transcripts into phrases, and using keywords extracted from the transcripts in conjunction with extracted paralinguistic features. Highest ranking segments are then extracted to build presentation summaries. Summaries are evaluated by performing eye-tracking experiments as participants watch presentation videos. Participants were found to be consistently more engaged for presentation summaries than for full presentations. Summaries were also found to contain a higher concentration of new information than full presentations

    The non-Verbal Structure of Patient Case Discussions in Multidisciplinary Medical Team Meetings

    Get PDF
    Meeting analysis has a long theoretical tradition in social psychology, with established practical rami?cations in computer science, especially in computer supported cooperative work. More recently, a good deal of research has focused on the issues of indexing and browsing multimedia records of meetings. Most research in this area, however, is still based on data collected in laboratories, under somewhat arti?cial conditions. This paper presents an analysis of the discourse structure and spontaneous interactions at real-life multidisciplinary medical team meetings held as part of the work routine in a major hospital. It is hypothesised that the conversational structure of these meetings, as indicated by sequencing and duration of vocalisations, enables segmentation into individual patient case discussions. The task of segmenting audio-visual records of multidisciplinary medical team meetings is described as a topic segmentation task, and a method for automatic segmentation is proposed. An empirical evaluation based on hand labelled data is presented which determines the optimal length of vocalisation sequences for segmentation, and establishes the competitiveness of the method with approaches based on more complex knowledge sources. The effectiveness of Bayesian classi?cation as a segmentation method, and its applicability to meeting segmentation in other domains are discusse

    Situation inference and context recognition for intelligent mobile sensing applications

    Get PDF
    The usage of smart devices is an integral element in our daily life. With the richness of data streaming from sensors embedded in these smart devices, the applications of ubiquitous computing are limitless for future intelligent systems. Situation inference is a non-trivial issue in the domain of ubiquitous computing research due to the challenges of mobile sensing in unrestricted environments. There are various advantages to having robust and intelligent situation inference from data streamed by mobile sensors. For instance, we would be able to gain a deeper understanding of human behaviours in certain situations via a mobile sensing paradigm. It can then be used to recommend resources or actions for enhanced cognitive augmentation, such as improved productivity and better human decision making. Sensor data can be streamed continuously from heterogeneous sources with different frequencies in a pervasive sensing environment (e.g., smart home). It is difficult and time-consuming to build a model that is capable of recognising multiple activities. These activities can be performed simultaneously with different granularities. We investigate the separability aspect of multiple activities in time-series data and develop OPTWIN as a technique to determine the optimal time window size to be used in a segmentation process. As a result, this novel technique reduces need for sensitivity analysis, which is an inherently time consuming task. To achieve an effective outcome, OPTWIN leverages multi-objective optimisation by minimising the impurity (the number of overlapped windows of human activity labels on one label space over time series data) while maximising class separability. The next issue is to effectively model and recognise multiple activities based on the user's contexts. Hence, an intelligent system should address the problem of multi-activity and context recognition prior to the situation inference process in mobile sensing applications. The performance of simultaneous recognition of human activities and contexts can be easily affected by the choices of modelling approaches to build an intelligent model. We investigate the associations of these activities and contexts at multiple levels of mobile sensing perspectives to reveal the dependency property in multi-context recognition problem. We design a Mobile Context Recognition System, which incorporates a Context-based Activity Recognition (CBAR) modelling approach to produce effective outcome from both multi-stage and multi-target inference processes to recognise human activities and their contexts simultaneously. Upon our empirical evaluation on real-world datasets, the CBAR modelling approach has significantly improved the overall accuracy of simultaneous inference on transportation mode and human activity of mobile users. The accuracy of activity and context recognition can also be influenced progressively by how reliable user annotations are. Essentially, reliable user annotation is required for activity and context recognition. These annotations are usually acquired during data capture in the world. We research the needs of reducing user burden effectively during mobile sensor data collection, through experience sampling of these annotations in-the-wild. To this end, we design CoAct-nnotate --- a technique that aims to improve the sampling of human activities and contexts by providing accurate annotation prediction and facilitates interactive user feedback acquisition for ubiquitous sensing. CoAct-nnotate incorporates a novel multi-view multi-instance learning mechanism to perform more accurate annotation prediction. It also includes a progressive learning process (i.e., model retraining based on co-training and active learning) to improve its predictive performance over time. Moving beyond context recognition of mobile users, human activities can be related to essential tasks that the users perform in daily life. Conversely, the boundaries between the types of tasks are inherently difficult to establish, as they can be defined differently from the individuals' perspectives. Consequently, we investigate the implication of contextual signals for user tasks in mobile sensing applications. To define the boundary of tasks and hence recognise them, we incorporate such situation inference process (i.e., task recognition) into the proposed Intelligent Task Recognition (ITR) framework to learn users' Cyber-Physical-Social activities from their mobile sensing data. By recognising the engaged tasks accurately at a given time via mobile sensing, an intelligent system can then offer proactive supports to its user to progress and complete their tasks. Finally, for robust and effective learning of mobile sensing data from heterogeneous sources (e.g., Internet-of-Things in a mobile crowdsensing scenario), we investigate the utility of sensor data in provisioning their storage and design QDaS --- an application agnostic framework for quality-driven data summarisation. This allows an effective data summarisation by performing density-based clustering on multivariate time series data from a selected source (i.e., data provider). Thus, the source selection process is determined by the measure of data quality. Nevertheless, this framework allows intelligent systems to retain comparable predictive results by its effective learning on the compact representations of mobile sensing data, while having a higher space saving ratio. This thesis contains novel contributions in terms of the techniques that can be employed for mobile situation inference and context recognition, especially in the domain of ubiquitous computing and intelligent assistive technologies. This research implements and extends the capabilities of machine learning techniques to solve real-world problems on multi-context recognition, mobile data summarisation and situation inference from mobile sensing. We firmly believe that the contributions in this research will help the future study to move forward in building more intelligent systems and applications
    corecore