192 research outputs found

    Machine Understanding of Human Behavior

    Get PDF
    A widely accepted prediction is that computing will move to the background, weaving itself into the fabric of our everyday living spaces and projecting the human user into the foreground. If this prediction is to come true, then next generation computing, which we will call human computing, should be about anticipatory user interfaces that should be human-centered, built for humans based on human models. They should transcend the traditional keyboard and mouse to include natural, human-like interactive functions including understanding and emulating certain human behaviors such as affective and social signaling. This article discusses a number of components of human behavior, how they might be integrated into computers, and how far we are from realizing the front end of human computing, that is, how far are we from enabling computers to understand human behavior

    SHOE:The extraction of hierarchical structure for machine learning of natural language

    Get PDF

    Form-Independent Meaning Representation for Eventualities

    Get PDF

    Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

    Full text link
    With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, web-mined text datasets covering hundreds of languages. We manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4). Lower-resource corpora have systematic issues: At least 15 corpora have no usable text, and a significant fraction contains less than 50% sentences of acceptable quality. In addition, many are mislabeled or use nonstandard/ambiguous language codes. We demonstrate that these issues are easy to detect even for non-proficient speakers, and supplement the human audit with automatic analyses. Finally, we recommend techniques to evaluate and improve multilingual corpora and discuss potential risks that come with low-quality data releases.Comment: Accepted at TACL; pre-MIT Press publication versio

    Managing heterogeneous cues in social contexts. A holistic approach for social interactions analysis

    Get PDF
    Une interaction sociale dĂ©signe toute action rĂ©ciproque entre deux ou plusieurs individus, au cours de laquelle des informations sont partagĂ©es sans "mĂ©diation technologique". Cette interaction, importante dans la socialisation de l'individu et les compĂ©tences qu'il acquiert au cours de sa vie, constitue un objet d'Ă©tude pour diffĂ©rentes disciplines (sociologie, psychologie, mĂ©decine, etc.). Dans le contexte de tests et d'Ă©tudes observationnelles, de multiples mĂ©canismes sont utilisĂ©s pour Ă©tudier ces interactions tels que les questionnaires, l'observation directe des Ă©vĂ©nements et leur analyse par des opĂ©rateurs humains, ou l'observation et l'analyse Ă  posteriori des Ă©vĂ©nements enregistrĂ©s par des spĂ©cialistes (psychologues, sociologues, mĂ©decins, etc.). Cependant, de tels mĂ©canismes sont coĂ»teux en termes de temps de traitement, ils nĂ©cessitent un niveau Ă©levĂ© d'attention pour analyser simultanĂ©ment plusieurs descripteurs, ils sont dĂ©pendants de l'opĂ©rateur (subjectivitĂ© de l'analyse) et ne peuvent viser qu'une facette de l'interaction. Pour faire face aux problĂšmes susmentionnĂ©s, il peut donc s'avĂ©rer utile d'automatiser le processus d'analyse de l'interaction sociale. Il s'agit donc de combler le fossĂ© entre les processus d'analyse des interactions sociales basĂ©s sur l'homme et ceux basĂ©s sur la machine. Nous proposons donc une approche holistique qui intĂšgre des signaux hĂ©tĂ©rogĂšnes multimodaux et des informations contextuelles (donnĂ©es "exogĂšnes" complĂ©mentaires) de maniĂšre dynamique et optionnelle en fonction de leur disponibilitĂ© ou non. Une telle approche permet l'analyse de plusieurs "signaux" en parallĂšle (oĂč les humains ne peuvent se concentrer que sur un seul). Cette analyse peut ĂȘtre encore enrichie Ă  partir de donnĂ©es liĂ©es au contexte de la scĂšne (lieu, date, type de musique, description de l'Ă©vĂ©nement, etc.) ou liĂ©es aux individus (nom, Ăąge, sexe, donnĂ©es extraites de leurs rĂ©seaux sociaux, etc.) Les informations contextuelles enrichissent la modĂ©lisation des mĂ©tadonnĂ©es extraites et leur donnent une dimension plus "sĂ©mantique". La gestion de cette hĂ©tĂ©rogĂ©nĂ©itĂ© est une Ă©tape essentielle pour la mise en Ɠuvre d'une approche holistique. L'automatisation de la capture et de l'observation " in vivo " sans scĂ©narios prĂ©dĂ©finis lĂšve des verrous liĂ©s Ă  i) la protection de la vie privĂ©e et Ă  la sĂ©curitĂ© ; ii) l'hĂ©tĂ©rogĂ©nĂ©itĂ© des donnĂ©es ; et iii) leur volume. Par consĂ©quent, dans le cadre de l'approche holistique, nous proposons (1) un modĂšle de donnĂ©es complet prĂ©servant la vie privĂ©e qui garantit le dĂ©couplage entre les mĂ©thodes d'extraction des mĂ©tadonnĂ©es et d'analyse des interactions sociales ; (2) une mĂ©thode gĂ©omĂ©trique non intrusive de dĂ©tection par contact visuel ; et (3) un modĂšle profond de classification des repas français pour extraire les informations du contenu vidĂ©o. L'approche proposĂ©e gĂšre des signaux hĂ©tĂ©rogĂšnes provenant de diffĂ©rentes modalitĂ©s en tant que sources multicouches (signaux visuels, signaux vocaux, informations contextuelles) Ă  diffĂ©rentes Ă©chelles de temps et diffĂ©rentes combinaisons entre les couches (reprĂ©sentation des signaux sous forme de sĂ©ries temporelles). L'approche a Ă©tĂ© conçue pour fonctionner sans dispositifs intrusifs, afin d'assurer la capture de comportements rĂ©els et de rĂ©aliser l'observation naturaliste. Nous avons dĂ©ployĂ© l'approche proposĂ©e sur la plateforme OVALIE qui vise Ă  Ă©tudier les comportements alimentaires dans diffĂ©rents contextes de la vie rĂ©elle et qui est situĂ©e Ă  l'UniversitĂ© Toulouse-Jean JaurĂšs, en France.Social interaction refers to any interaction between two or more individuals, in which information sharing is carried out without any mediating technology. This interaction is a significant part of individual socialization and experience gaining throughout one's lifetime. It is interesting for different disciplines (sociology, psychology, medicine, etc.). In the context of testing and observational studies, multiple mechanisms are used to study these interactions such as questionnaires, direct observation and analysis of events by human operators, or a posteriori observation and analysis of recorded events by specialists (psychologists, sociologists, doctors, etc.). However, such mechanisms are expensive in terms of processing time. They require a high level of attention to analyzing several cues simultaneously. They are dependent on the operator (subjectivity of the analysis) and can only target one side of the interaction. In order to face the aforementioned issues, the need to automatize the social interaction analysis process is highlighted. So, it is a question of bridging the gap between human-based and machine-based social interaction analysis processes. Therefore, we propose a holistic approach that integrates multimodal heterogeneous cues and contextual information (complementary "exogenous" data) dynamically and optionally according to their availability or not. Such an approach allows the analysis of multi "signals" in parallel (where humans are able only to focus on one). This analysis can be further enriched from data related to the context of the scene (location, date, type of music, event description, etc.) or related to individuals (name, age, gender, data extracted from their social networks, etc.). The contextual information enriches the modeling of extracted metadata and gives them a more "semantic" dimension. Managing this heterogeneity is an essential step for implementing a holistic approach. The automation of " in vivo " capturing and observation using non-intrusive devices without predefined scenarios introduces various issues that are related to data (i) privacy and security; (ii) heterogeneity; and (iii) volume. Hence, within the holistic approach we propose (1) a privacy-preserving comprehensive data model that grants decoupling between metadata extraction and social interaction analysis methods; (2) geometric non-intrusive eye contact detection method; and (3) French food classification deep model to extract information from the video content. The proposed approach manages heterogeneous cues coming from different modalities as multi-layer sources (visual signals, voice signals, contextual information) at different time scales and different combinations between layers (representation of the cues like time series). The approach has been designed to operate without intrusive devices, in order to ensure the capture of real behaviors and achieve the naturalistic observation. We have deployed the proposed approach on OVALIE platform which aims to study eating behaviors in different real-life contexts and it is located in University Toulouse-Jean JaurĂšs, France

    GeoCAM: A geovisual analytics workspace to contextualize and interpret statements about movement

    Get PDF
    This article focuses on integrating computational and visual methods in a system that supports analysts to identify extract map and relate linguistic accounts of movement. We address two objectives: (1) build the conceptual theoretical and empirical framework needed to represent and interpret human-generated directions; and (2) design and implement a geovisual analytics workspace for direction document analysis. We have built a set of geo-enabled computational methods to identify documents containing movement statements and a visual analytics environment that uses natural language processing methods iteratively with geographic database support to extract interpret and map geographic movement references in context. Additionally analysts can provide feedback to improve computational results. To demonstrate the value of this integrative approach we have realized a proof-of-concept implementation focusing on identifying and processing documents that contain human-generated route directions. Using our visual analytic interface an analyst can explore the results provide feedback to improve those results pose queries against a database of route directions and interactively represent the route on a map

    Utilizing Multi-modal Weak Signals to Improve User Stance Inference in Social Media

    Get PDF
    Social media has become an integral component of the daily life. There are millions of various types of content being released into social networks daily. This allows for an interesting view into a users\u27 view on everyday life. Exploring the opinions of users in social media networks has always been an interesting subject for the Natural Language Processing researchers. Knowing the social opinions of a mass will allow anyone to make informed policy or marketing related decisions. This is exactly why it is desirable to find comprehensive social opinions. The nature of social media is complex and therefore obtaining the social opinion becomes a challenging task. Because of how diverse and complex social media networks are, they typically resonate with the actual social connections but in a digital platform. Similar to how users make friends and companions in the real world, the digital platforms enable users to mimic similar social connections. This work mainly looks at how to obtain a comprehensive social opinion out of social media network. Typical social opinion quantifiers will look at text contributions made by users to find the opinions. Currently, it is challenging because the majority of users on social media will be consuming content rather than expressing their opinions out into the world. This makes natural language processing based methods impractical due to not having linguistic features. In our work we look to improve a method named stance inference which can utilize multi-domain features to extract the social opinion. We also introduce a method which can expose users opinions even though they do not have on-topical content. We also note how by introducing weak supervision to an unsupervised task of stance inference we can improve the performance. The weak supervision we bring into the pipeline is through hashtags. We show how hashtags are contextual indicators added by humans which will be much likelier to be related than a topic model. Lastly we introduce disentanglement methods for chronological social media networks which allows one to utilize the methods we introduce above to be applied in these type of platforms
    • 

    corecore