809 research outputs found
Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition
This paper presents a self-supervised method for visual detection of the
active speaker in a multi-person spoken interaction scenario. Active speaker
detection is a fundamental prerequisite for any artificial cognitive system
attempting to acquire language in social settings. The proposed method is
intended to complement the acoustic detection of the active speaker, thus
improving the system robustness in noisy conditions. The method can detect an
arbitrary number of possibly overlapping active speakers based exclusively on
visual information about their face. Furthermore, the method does not rely on
external annotations, thus complying with cognitive development. Instead, the
method uses information from the auditory modality to support learning in the
visual domain. This paper reports an extensive evaluation of the proposed
method using a large multi-person face-to-face interaction dataset. The results
show good performance in a speaker dependent setting. However, in a speaker
independent setting the proposed method yields a significantly lower
performance. We believe that the proposed method represents an essential
component of any artificial cognitive system or robotic platform engaging in
social interactions.Comment: 10 pages, IEEE Transactions on Cognitive and Developmental System
Bimodal Emotion Recognition using Speech and Physiological Changes
With exponentially evolving technology it is no exaggeration to say that any interface fo
Opinion Detection, Sentiment Analysis and User Attribute Detection from Online Text Data
With the growing increase in the use of the internet in most parts of the world today, users generate significant amounts of online text on different platforms such as online social networks, product review websites, travel blogs, to name just a few. The variety of content on these platforms has made them an important resource for researchers to gauge user activity, determine their opinions and analyze their behavior, without having to perform monetarily and temporally expensive surveys. Gaining insights into user behavior enables us to better understand their likes and dislikes, which in turn is helpful for economic purposes such as marketing, advertising and recommendations. Further, owing to the fact that online social networks have recently been instrumental in socio-political revolutions such as the Arab Spring, and for awareness-generation campaigns by MoveOn.org and Avaaz.org, analysis of online data can uncover user preferences. The overarching goal of this Ph.D. thesis is to pose some research questions and propose solutions, mostly pertaining to user opinions and attributes, keeping in mind the large quantities of noise present in online textual data. This thesis illustrates that with the extraction of informative textual features and the use of robust NLP and machine learning techniques, it is possible to perform efficient signal extraction from online text data, and use it to better understand user behavior. The first research problem addressed is that of opinion detection and sentiment analysis of users on a given topic, from their self-generated tweets. The key idea is to select relevant hashtags and n-grams using an -regularized logistic regression model for opinion detection. The second research problem deals with temporal opinion detection from tweets, i.e., detecting user opinions on a topic in which the conversation evolves over time. For instance, on the widely-discussed topic of Obamacare (the Affordable Care Act in the U.S.), various issues became the focal points of discussion among users over time, as corresponding socio-political events and occurrences took place in real-time. We propose a machine-learning model based on seminal work from the sociological literature that is based on the premise that most opinion changes occur slowly over time. Our model is able to successfully capture opinions over time using publicly available tweets, as well as to uncover the key points of discussion as time progresses. In the third research problem, we utilize distributed representation of words in a method that determines, from user reviews, aspects of products and services that users like and dislike. We harness the contextual similarity between words and effectively build meta-features that capture user sentiment at a granular level. Finally in the fourth research problem, we propose a method to detect the age of users from their publicly available tweets. Using a method based on distributed representation of words and clustering, we are able to achieve high accuracies in age detection, as well as to simultaneously discover topics of conversation in which users of different age groups engage
Extraction and Analysis of Dynamic Conversational Networks from TV Series
Identifying and characterizing the dynamics of modern tv series subplots is
an open problem. One way is to study the underlying social network of
interactions between the characters. Standard dynamic network extraction
methods rely on temporal integration, either over the whole considered period,
or as a sequence of several time-slices. However, they turn out to be
inappropriate in the case of tv series, because the scenes shown onscreen
alternatively focus on parallel storylines, and do not necessarily respect a
traditional chronology. In this article, we introduce Narrative Smoothing, a
novel network extraction method taking advantage of the plot properties to
solve some of their limitations. We apply our method to a corpus of 3 popular
series, and compare it to both standard approaches. Narrative smoothing leads
to more relevant observations when it comes to the characterization of the
protagonists and their relationships, confirming its appropriateness to model
the intertwined storylines constituting the plots.Comment: arXiv admin note: substantial text overlap with arXiv:1602.0781
Recommended from our members
User serviceable parts: Practice, technology, sociality and method in live electronic musicking
In live electronic musical research there is a need to confront the interrelationships between the social and the technological in order to understand our music as practice. These interrelationships form a complex and dynamic ecosystem that not only forms the context to, but is constitutive of practice. I interrogate from a variety of perspectives the musical practice that has formed over the course of this research in order to reveal the dispositions towards technology, the social situatedness and the musical approach that underlies my work.
By taking a disposition towards musical practice-led research that is non-hierarchical, performative, ecological, phenomenological and pragmatic, I place into wider context compositional and technological decisions, in terms of their relationships to improvising, skill, design, performance and research.
This work contributes both new theories of live electronic musical practice and new suggestions for practice-led methods aimed at investigating the interplay of social and material factors in musicking, and at interrogating the disciplinary status of our field vis-a-vis musical and technical disciplines
Contextual Social Networking
The thesis centers around the multi-faceted research question of how contexts may
be detected and derived that can be used for new context aware Social Networking
services and for improving the usefulness of existing Social Networking services, giving
rise to the notion of Contextual Social Networking. In a first foundational part,
we characterize the closely related fields of Contextual-, Mobile-, and Decentralized
Social Networking using different methods and focusing on different detailed
aspects. A second part focuses on the question of how short-term and long-term
social contexts as especially interesting forms of context for Social Networking may
be derived. We focus on NLP based methods for the characterization of social relations
as a typical form of long-term social contexts and on Mobile Social Signal
Processing methods for deriving short-term social contexts on the basis of geometry
of interaction and audio. We furthermore investigate, how personal social agents
may combine such social context elements on various levels of abstraction. The third
part discusses new and improved context aware Social Networking service concepts.
We investigate special forms of awareness services, new forms of social information
retrieval, social recommender systems, context aware privacy concepts and services
and platforms supporting Open Innovation and creative processes.
This version of the thesis does not contain the included publications because of
copyrights of the journals etc. Contact in terms of the version with all included
publications: Georg Groh, [email protected] zentrale Gegenstand der vorliegenden Arbeit ist die vielschichtige Frage, wie Kontexte detektiert und abgeleitet werden können, die dazu dienen können, neuartige kontextbewusste Social Networking Dienste zu schaffen und bestehende Dienste in ihrem Nutzwert zu verbessern. Die (noch nicht abgeschlossene) erfolgreiche Umsetzung dieses Programmes führt auf ein Konzept, das man als Contextual Social Networking bezeichnen kann. In einem grundlegenden ersten Teil werden die eng zusammenhängenden Gebiete Contextual Social Networking, Mobile Social Networking und Decentralized Social Networking mit verschiedenen Methoden und unter Fokussierung auf verschiedene Detail-Aspekte näher beleuchtet und in Zusammenhang gesetzt. Ein zweiter Teil behandelt die Frage, wie soziale Kurzzeit- und Langzeit-Kontexte als für das Social Networking besonders interessante Formen von Kontext gemessen und abgeleitet werden können. Ein Fokus liegt hierbei auf NLP Methoden zur Charakterisierung sozialer Beziehungen als einer typischen Form von sozialem Langzeit-Kontext. Ein weiterer Schwerpunkt liegt auf Methoden aus dem Mobile Social Signal Processing zur Ableitung sinnvoller sozialer Kurzzeit-Kontexte auf der Basis von Interaktionsgeometrien und Audio-Daten. Es wird ferner untersucht, wie persönliche soziale Agenten Kontext-Elemente verschiedener Abstraktionsgrade miteinander kombinieren können. Der dritte Teil behandelt neuartige und verbesserte Konzepte für kontextbewusste Social Networking Dienste. Es werden spezielle Formen von Awareness Diensten, neue Formen von sozialem Information Retrieval, Konzepte für kontextbewusstes Privacy Management und Dienste und Plattformen zur Unterstützung von Open Innovation und Kreativität untersucht und vorgestellt. Diese Version der Habilitationsschrift enthält die inkludierten Publikationen zurVermeidung von Copyright-Verletzungen auf Seiten der Journals u.a. nicht. Kontakt in Bezug auf die Version mit allen inkludierten Publikationen: Georg Groh, [email protected]
Recommended from our members
Towards solving computer vision problems: datasets, labels, algorithms, and applications
The solution to a supervised computer vision problem consists of an application, algorithm, input data, and a set of human generated labels. Solving these kinds of tasks involves collecting large quantities of data, collecting appropriate labels, and developing machine vision algorithms tailored to the application. Progress on these problems has often benefited from large scale datasets with high fidelity labels. Successful algorithms display a synergy between application goals and the size and quality of the dataset. This thesis presents work highlighting the importance of each component of a supervised vision task.First, the problem of automatically classifying groups of people into social categories is introduced. This problem is called Urban Tribe Classification. To tackle this problem, each individual and the entire group of individuals are modeled. Since this was a newly introduced computer vision problem, a dataset for this task was created. On this dataset, the combined representation of group and individuals outperforms using only the person representations. This model showed promising results for automatic subculture classification.Second, the problem of creating perceptual embeddings based on human similarity judgements is tackled. This work focuses on triplet similarity comparisons of the form ``Is object more similar to or ?'', which have been useful for computer vision and machine learning applications. Unfortunately, triplet similarity comparisons, like many human labeling efforts, can be prohibitively expensive. This work proposes two techniques for dealing with this obstacle. First, an alternative display for collecting triplets is designed. This display shows a probe image and a grid of query images, allowing the user to collect multiple triplets simultaneously. The display is shown to reduce the cost and time of triplet collection. In addition, higher quality embeddings are created with the improved triplet collection UI. A 10,000-food item dataset of human taste similarity was created using this UI. Second, ``SNaCK,'' a low-dimensional perceptual embedding algorithm that combines human expertise with automatic machine kernels, is introduced. Both parts are complementary: human insight can capture relationships that are not apparent from the object's visual similarity and the machine can help relieve the human from having to exhaustively specify many constraints. Finally, the precise localization of key frames of an action is explored. This work focuses on detecting the exact starting frame of a behavior, an important task for neuroscience research. To address this problem, a loss designed to penalize extra and missed action start detections over small misalignments. Recurrent neural networks (RNN) are trained to optimize this loss. The model is shown to reduce the number of false positives, an important criteria defined by the neuroscientist. The performance of the model is evaluated on a new dataset, the Mouse Reach Dataset, a large, annotated video dataset of mice performing a sequence of actions. The dataset was created for neuroscience research. On this dataset, the proposed model outperforms related approaches and baseline methods using an unstructured loss
Temporal models for mining, ranking and recommendation in the Web
Due to their first-hand, diverse and evolution-aware reflection of nearly all areas of life, heterogeneous temporal datasets i.e., the Web, collaborative knowledge bases and social networks have been emerged as gold-mines for content analytics of many sorts. In those collections, time plays an essential role in many crucial information retrieval and data mining tasks, such as from user intent understanding, document ranking to advanced recommendations. There are two semantically closed
and important constituents when modeling along the time dimension, i.e., entity and event. Time is crucially served as the context for changes driven by happenings and phenomena (events) that related to people, organizations or places (so-called entities) in our social lives. Thus, determining what users expect, or in other words, resolving the uncertainty confounded by temporal changes is a compelling task to support consistent user satisfaction.
In this thesis, we address the aforementioned issues and propose temporal models that capture the temporal dynamics of such entities and events to serve for the end tasks. Specifically, we make the following contributions in this thesis:
(1) Query recommendation and document ranking in the Web - we address the issues for suggesting entity-centric queries and ranking effectiveness surrounding the happening time period of an associated event. In particular, we propose a multi-criteria optimization framework that facilitates the combination of multiple temporal models to smooth out the abrupt changes when transitioning between event phases for the former and a probabilistic approach for search result diversification of temporally ambiguous queries for the latter.
(2) Entity relatedness in Wikipedia - we study the long-term dynamics of Wikipedia as a global memory place for high-impact events, specifically the reviving memories of past events. Additionally, we propose a neural network-based approach to measure the temporal relatedness of entities and events. The model engages different latent representations of an entity (i.e., from time, link-based graph and content) and use the collective attention from user navigation as the supervision.
(3) Graph-based ranking and temporal anchor-text mining inWeb Archives - we tackle the problem of discovering important documents along the time-span ofWeb Archives, leveraging the link graph. Specifically, we combine the problems of relevance, temporal authority, diversity and time in a unified framework. The model accounts for the incomplete link structure and natural time lagging in Web Archives in mining the temporal authority.
(4) Methods for enhancing predictive models at early-stage in social media and clinical domain - we investigate several methods to control model instability and enrich contexts of predictive models at the “cold-start” period. We demonstrate their effectiveness for the rumor detection and blood glucose prediction cases respectively.
Overall, the findings presented in this thesis demonstrate the importance of tracking these temporal dynamics surround salient events and entities for IR applications. We show that determining such changes in time-based patterns and trends in prevalent temporal collections can better satisfy user expectations, and boost ranking and recommendation effectiveness over time
- …