11 research outputs found

    Measuring interaction proxemics with wearable light tags

    Get PDF
    The proxemics of social interactions (e.g., body distance, relative orientation) in!uences many aspects of our everyday life: from patients’ reactions to interaction with physicians, successes in job interviews, to effective teamwork. Traditionally, interaction proxemics has been studied via questionnaires and participant observations, imposing high burden on users, low scalability and precision, and often biases. In this paper we present Protractor, a novel wearable technology for measuring interaction proxemics as part of non-verbal behavior cues with# ne granularity. Protractor employs near-infrared light to monitor both the distance and relative body orientation of interacting users. We leverage the characteristics of near-infrared light (i.e., line-of-sight propagation) to accurately and reliably identify interactions; a pair of collocated photodiodes aid the inference of relative interaction angle and distance. We achieve robustness against temporary blockage of the light channel (e.g., by the user’s hand or clothes) by designing sensor fusion algorithms that exploit inertial sensors to obviate the absence of light tracking results. We fabricated Protractor tags and conducted real-world experiments. Results show its accuracy in tracking body distances and relative angles. The framework achieves less than 6 error 95% of the time for measuring relative body orientation and 2.3-cm – 4.9-cm mean error in estimating interaction distance. We deployed Protractor tags to track user’s non-verbal behaviors when conducting collaborative group tasks. Results with 64 participants show that distance and angle data from Protractor tags can help assess individual’s task role with 84.9% accuracy, and identify task timeline with 93.2% accuracy

    Proceedings of the 17th International Conference on Group Decision and Negotiation

    Get PDF

    Computing on evolving social networks

    Get PDF
    Over the past decade, participation in social networking services has seen an exponential growth, so that nowadays most individuals are “virtually” connected to others anywhere in the world. Consistently, analysis of human social behavior has gained momentum in the computer science research community. Several well-known phenomena in the social sciences have been revisited in a computer science perspective, with a new focus on phenomena of emerging behavior, information diffusion, opinion formation and collective intelligence. Furthermore, the recent past has witnessed a growing interest in the dynamics of these phenomena and that of the underlying social structures. This thesis investigates a number of aspects related to the study of evolving social networks and the collective phenomena they mediate. We have mainly pursued three research directions. The first line of research is in a sense functional to the other two and concerns the collection of data tracking the evolution of human interactions in the physical space and the extraction of (time) evolving networks describing these interactions. A number of available datasets describing different kinds of social networks are available on line, but few involve physical proximity of humans in real life scenarios. During our research activity, we have deployed several social experiments tracking face-to-face human interactions in the physical space. The collected datasets have been used to analyze network properties and to investigate social phenomena, as further described below. A second line of research investigates the impact of dynamics on the analytical tools used to extract knowledge from social networks. This is clearly a vast area in which research in many cases is in its early stages. We have focused on centrality, a fundamental notion in the analysis and characterization of social network structure and key to a number of Web applications and services. While many social networks of interest (resulting from “virtual” or “physical” activity) are highly dynamic, many Web information retrieval algorithms were originally designed with static networks in mind. In this thesis, we design and analyze decentralized algorithms for computing and maintaining centrality scores over time evolving networks. These algorithms refer to notions of centrality which are explicitly conceived for evolving settings and which are consistent with PageRank in important cases. A further line of research investigates the wisdom of crowds effect, an important, yet not completely understood phenomenon of collective intelligence, whereby a group typically exhibits higher predictive accuracy than its single members and often experts. Phenomena of collective intelligence involve exchange and processing of information among individuals sharing some common social structure. In many cases of interest, this structure is suitably described by an evolving social network. Studying the interplay between the evolution of the underlying social structure and the computational properties of the resulting process is an interesting and challenging task. We have focused on the quantitative analysis of this aspect, in particular the effect of the network on the accuracy of prediction. To provide a mathematical characterization, we have revisited and modified a number of models of opinion formation and diffusion originally proposed in the social sciences. Experimental analysis using data collected from some of the social experiments we conducted allowed to test soundness of the proposed models. While many of these models seem to capture important aspects of the process of opinion formation in (physical) social networks, one variant we propose achieves higher predictive accuracy and is also robust to the presence of outliers

    Toward Geo-social Information Systems: Methods and Algorithms

    Get PDF
    The widespread adoption of GPS-enabled tagging of social media content via smartphones and social media services (e.g., Facebook, Twitter, Foursquare) uncovers a new window into the spatio-temporal activities of hundreds of millions of people. These \footprints" open new possibilities for understanding how people can organize for societal impact and lay the foundation for new crowd-powered geo-social systems. However, there are key challenges to delivering on this promise: the slow adoption of location sharing, the inherent bias in the users that do share location, imbalanced location granularity, respecting location privacy, among many others. With these challenges in mind, this dissertation aims to develop the framework, algorithms, and methods for a new class of geo-social information systems. The dissertation is structured in two main parts: the rst focuses on understanding the capacity of existing footprints; the second demonstrates the potential of new geo-social information systems through two concrete prototypes. First, we investigate the capacity of using these geo-social footprints to build new geo-social information systems. (i): we propose and evaluate a probabilistic framework for estimating a microblog user's location based purely on the content of the user's posts. With the help of a classi cation component for automatically identifying words in tweets with a strong local geo-scope, the location estimator places 51% of Twitter users within 100 miles of their actual location. (ii): we investigate a set of 22 million check-ins across 220,000 users and report a quantitative assessment of human mobility patterns by analyzing the spatial, temporal, social, and textual aspects associated with these footprints. Concretely, we observe that users follow simple reproducible mobility patterns. (iii): we compare a set of 35 million publicly shared check-ins with a set of over 400 million private query logs recorded by a commercial hotel search engine. Although generated by users with fundamentally di erent intentions, we nd common conclusions may be drawn from both data sources, indicating the viability of publicly shared location information to complement (and replace, in some cases), privately held location information. Second, we introduce a couple of prototypes of new geo-social information systems that utilize the collective intelligence from the emerging geo-social footprints. Concretely, we propose an activity-driven search system, and a local expert nding system that both take advantage of the collective intelligence. Speci cally, we study location-based activity patterns revealed through location sharing services and nd that these activity patterns can identify semantically related locations, and help with both unsupervised location clustering, and supervised location categorization with a high con dence. Based on these results, we show how activity-driven semantic organization of locations may be naturally incorporated into location-based web search. In addition, we propose a local expert nding system that identi es top local experts for a topic in a location. Concretely, the system utilizes semantic labels that people label each other, people's locations in current location-based social networks, and can identify top local experts with a high precision. We also observe that the proposed local authority metrics that utilize collective intelligence from expert candidates' core audience (list labelers), signi cantly improve the performance of local experts nding than the more intuitive way that only considers candidates' locations. ii

    Privacy-sensitive recognition of group conversational context with sociometers

    Get PDF
    Recognizing the conversational context in which group interactions unfold has applications in machines that support collaborative work and perform automatic social inference using contextual knowledge. This paper addresses the task of discriminating one conversational context from another, specifically brainstorming from decision-making interactions, using easily computable nonverbal behavioral cues. Privacy-sensitive mobile sociometers are used to record the interaction data. We hypothesize that the difference in the conversational dynamics between brainstorming and decision-making discussions is significant and measurable using speaking activity-based nonverbal cues. We characterize the communication patterns of the entire group by the aggregation (both temporal and person-wise) of their nonverbal behavior. The results on our interaction data set show that the floor-occupation patterns in a brainstorming interaction are different from a decision-making interaction, and our method can obtain a classification accuracy as high as 87.5%

    Privacy-Sensitive Audio Features for Conversational Speech Processing

    Get PDF
    The work described in this thesis takes place in the context of capturing real-life audio for the analysis of spontaneous social interactions. Towards this goal, we wish to capture conversational and ambient sounds using portable audio recorders. Analysis of conversations can then proceed by modeling the speaker turns and durations produced by speaker diarization. However, a key factor against the ubiquitous capture of real-life audio is privacy. Particularly, recording and storing raw audio would breach the privacy of people whose consent has not been explicitly obtained. In this thesis, we study audio features instead – for recording and storage – that can respect privacy by minimizing the amount of linguistic information, while achieving state-of-the-art performance in conversational speech processing tasks. Indeed, the main contributions of this thesis are the achievement of state-of-the-art performances in speech/nonspeech detection and speaker diarization tasks using such features, which we refer to, as privacy-sensitive. Besides this, we provide a comprehensive analysis of these features for the two tasks in a variety of conditions, such as indoor (predominantly) and outdoor audio. To objectively evaluate the notion of privacy, we propose the use of human and automatic speech recognition tests, with higher accuracy in either being interpreted as yielding lower privacy. For the speech/nonspeech detection (SND) task, this thesis investigates three different approaches to privacy-sensitive features. These approaches are based on simple, instantaneous, feature extraction methods, excitation source information based methods, and feature obfuscation methods. These approaches are benchmarked against Perceptual Linear Prediction (PLP) features under many conditions on a large meeting dataset of nearly 450 hours. Additionally, automatic speech (phoneme) recognition studies on TIMIT showed that the proposed features yield low phoneme recognition accuracies, implying higher privacy. For the speaker diarization task, we interpret the extraction of privacy-sensitive features as an objective that maximizes the mutual information (MI) with speakers while minimizing the MI with phonemes. The source-filter model arises naturally out of this formulation. We then investigate two different approaches for extracting excitation source based features, namely Linear Prediction (LP) residual and deep neural networks. Diarization experiments on the single and multiple distant microphone scenarios from the NIST rich text evaluation datasets show that these features yield a performance close to the Mel Frequency Cepstral coefficients (MFCC) features. Furthermore, listening tests support the proposed approaches in terms of yielding low intelligibility in comparison with MFCC features. The last part of the thesis studies the application of our methods to SND and diarization in outdoor settings. While our diarization study was more preliminary in nature, our study on SND brings about the conclusion that privacy-sensitive features trained on outdoor audio yield performance comparable to that of PLP features trained on outdoor audio. Lastly, we explored the suitability of using SND models trained on indoor conditions for the outdoor audio. Such an acoustic mismatch caused a large drop in performance, which could not be compensated even by combining indoor models

    Computational Modeling of Face-to-Face Social Interaction Using Nonverbal Behavioral Cues

    Get PDF
    The computational modeling of face-to-face interactions using nonverbal behavioral cues is an emerging and relevant problem in social computing. Studying face-to-face interactions in small groups helps in understanding the basic processes of individual and group behavior; and improving team productivity and satisfaction in the modern workplace. Apart from the verbal channel, nonverbal behavioral cues form a rich communication channel through which people infer – often automatically and unconsciously – emotions, relationships, and traits of fellowmembers. There exists a solid body of knowledge about small groups and the multimodal nature of the nonverbal phenomenon in social psychology and nonverbal communication. However, the problem has only recently begun to be studied in the multimodal processing community. A recent trend is to analyze these interactions in the context of face-to-face group conversations, using multiple sensors and make inferences automatically without the need of a human expert. These problems can be formulated in a machine learning framework involving the extraction of relevant audio, video features and the design of supervised or unsupervised learning models. While attempting to bridge social psychology, perception, and machine learning, certain factors have to be considered. Firstly, various group conversation patterns emerge at different time-scales. For example, turn-taking patterns evolve over shorter time scales, whereas dominance or group-interest trends get established over larger time scales. Secondly, a set of audio and visual cues that are not only relevant but also robustly computable need to be chosen. Thirdly, unlike typical machine learning problems where ground truth is well defined, interaction modeling involves data annotation that needs to factor in inter-annotator variability. Finally, principled ways of integrating the multimodal cues have to be investigated. In the thesis, we have investigated individual social constructs in small groups like dominance and status (two facets of the so-called vertical dimension of social relations). In the first part of this work, we have investigated how dominance perceived by external observers can be estimated by different nonverbal audio and video cues, and affected by annotator variability, the estimationmethod, and the exact task involved. In the second part, we jointly study perceived dominance and role-based status to understand whether dominant people are the ones with high status and whether dominance and status in small-group conversations be automatically explained by the same nonverbal cues. We employ speaking activity, visual activity, and visual attention cues for both the works. In the second part of the thesis, we have investigated group social constructs using both supervised and unsupervised approaches. We first propose a novel framework to characterize groups. The two-layer framework consists of a individual layer and the group layer. At the individual layer, the floor-occupation patterns of the individuals are captured. At the group layer, the identity information of the individuals is not used. We define group cues by aggregating individual cues over time and person, and use them to classify group conversational contexts – cooperative vs competitive and brainstorming vs decision-making. We then propose a framework to discover group interaction patterns using probabilistic topicmodels. An objective evaluation of ourmethodology involving human judgment and multiple annotators, showed that the learned topics indeed are meaningful, and also that the discovered patterns resemble prototypical leadership styles – autocratic, participative, and free-rein – proposed in social psychology

    Conversational scene analysis

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002.Includes bibliographical references (p. 106-109).In this thesis, we develop computational tools for analyzing conversations based on nonverbal auditory cues. We develop a notion of conversations as being made up of a variety of scenes: in each scene, either one speaker is holding the floor or both are speaking at equal levels. Our goal is to find conversations, find the scenes within them, determine what is happening inside the scenes, and then use the scene structure to characterize entire conversations. We begin by developing a series of mid-level feature detectors, including a joint voicing and speech detection method that is extremely robust to noise and microphone distance. Leveraging the results of this powerful mechanism, we develop a probabilistic pitch tracking mechanism, methods for estimating speaking rate and energy, and means to segment the stream into multiple speakers, all in significant noise conditions. These features gives us the ability to sense the interactions and characterize the style of each speaker's behavior. We then turn to the domain of conversations. We first show how we can very accurately detect conversations from independent or dependent auditory streams with measures derived from our mid-level features. We then move to developing methods to accurately classify and segment a conversation into scenes. We also show preliminary results on characterizing the varying nature of the speakers' behavior during these regions. Finally, we design features to describe entire conversations from the scene structure, and show how we can describe and browse through conversation types in this way.by Sumit Basu.Ph.D
    corecore