2,481 research outputs found

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges

    Toward Geo-social Information Systems: Methods and Algorithms

    Get PDF
    The widespread adoption of GPS-enabled tagging of social media content via smartphones and social media services (e.g., Facebook, Twitter, Foursquare) uncovers a new window into the spatio-temporal activities of hundreds of millions of people. These \footprints" open new possibilities for understanding how people can organize for societal impact and lay the foundation for new crowd-powered geo-social systems. However, there are key challenges to delivering on this promise: the slow adoption of location sharing, the inherent bias in the users that do share location, imbalanced location granularity, respecting location privacy, among many others. With these challenges in mind, this dissertation aims to develop the framework, algorithms, and methods for a new class of geo-social information systems. The dissertation is structured in two main parts: the rst focuses on understanding the capacity of existing footprints; the second demonstrates the potential of new geo-social information systems through two concrete prototypes. First, we investigate the capacity of using these geo-social footprints to build new geo-social information systems. (i): we propose and evaluate a probabilistic framework for estimating a microblog user's location based purely on the content of the user's posts. With the help of a classi cation component for automatically identifying words in tweets with a strong local geo-scope, the location estimator places 51% of Twitter users within 100 miles of their actual location. (ii): we investigate a set of 22 million check-ins across 220,000 users and report a quantitative assessment of human mobility patterns by analyzing the spatial, temporal, social, and textual aspects associated with these footprints. Concretely, we observe that users follow simple reproducible mobility patterns. (iii): we compare a set of 35 million publicly shared check-ins with a set of over 400 million private query logs recorded by a commercial hotel search engine. Although generated by users with fundamentally di erent intentions, we nd common conclusions may be drawn from both data sources, indicating the viability of publicly shared location information to complement (and replace, in some cases), privately held location information. Second, we introduce a couple of prototypes of new geo-social information systems that utilize the collective intelligence from the emerging geo-social footprints. Concretely, we propose an activity-driven search system, and a local expert nding system that both take advantage of the collective intelligence. Speci cally, we study location-based activity patterns revealed through location sharing services and nd that these activity patterns can identify semantically related locations, and help with both unsupervised location clustering, and supervised location categorization with a high con dence. Based on these results, we show how activity-driven semantic organization of locations may be naturally incorporated into location-based web search. In addition, we propose a local expert nding system that identi es top local experts for a topic in a location. Concretely, the system utilizes semantic labels that people label each other, people's locations in current location-based social networks, and can identify top local experts with a high precision. We also observe that the proposed local authority metrics that utilize collective intelligence from expert candidates' core audience (list labelers), signi cantly improve the performance of local experts nding than the more intuitive way that only considers candidates' locations. ii

    Inferring Degree Of Localization Of Twitter Persons And Topics Through Time, Language, And Location Features

    Get PDF
    Identifying authoritative influencers related to a geographic area (geo-influencers) can aid content recommendation systems and local expert finding. This thesis addresses this important problem using Twitter data. A geo-influencer is identified via the locations of its followers. On Twitter, due to privacy reasons, the location reported by followers is limited to profile via a textual string or messages with coordinates. However, this textual string is often not possible to geocode and less than 1\% of message traffic provides coordinates. First, the error rates associated with Google\u27s geocoder are studied and a classifier is built that gives a warning for self-reported locations that are likely incorrect. Second, it is shown that city-level geo-influencers can be identified without geocoding by leveraging the power of Google search and follower-followee network structure. Third, we illustrate that the global vs. local influencer, at the timezone level, can be identified using a classifier using the temporal features of the followers. For global influencers, spatiotemporal analysis helps understand the evolution of their popularity over time. When applied over message traffic, the approach can differentiate top trending topics and persons in different geographical regions. Fourth, we constrain a timezone to a set of possible countries and use language features for training a high-level geocoder to further localize an influencer\u27s geographic area. Finally, we provide a repository of geo-influencers for applications related to content recommendation. The repository can be used for filtering influencers based on their audience\u27s demographics related to location, time, language, gender, and ethnicity

    A Survey of Location Prediction on Twitter

    Full text link
    Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and people's daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur

    A comparison of statistical machine learning methods in heartbeat detection and classification

    Get PDF
    In health care, patients with heart problems require quick responsiveness in a clinical setting or in the operating theatre. Towards that end, automated classification of heartbeats is vital as some heartbeat irregularities are time consuming to detect. Therefore, analysis of electro-cardiogram (ECG) signals is an active area of research. The methods proposed in the literature depend on the structure of a heartbeat cycle. In this paper, we use interval and amplitude based features together with a few samples from the ECG signal as a feature vector. We studied a variety of classification algorithms focused especially on a type of arrhythmia known as the ventricular ectopic fibrillation (VEB). We compare the performance of the classifiers against algorithms proposed in the literature and make recommendations regarding features, sampling rate, and choice of the classifier to apply in a real-time clinical setting. The extensive study is based on the MIT-BIH arrhythmia database. Our main contribution is the evaluation of existing classifiers over a range sampling rates, recommendation of a detection methodology to employ in a practical setting, and extend the notion of a mixture of experts to a larger class of algorithms

    Combating User Misbehavior on Social Media

    Get PDF
    Social media encourages user participation and facilitates user’s self-expression like never before. While enriching user behavior in a spectrum of means, many social media platforms have become breeding grounds for user misbehavior. In this dissertation we focus on understanding and combating three specific threads of user misbehaviors that widely exist on social media — spamming, manipulation, and distortion. First, we address the challenge of detecting spam links. Rather than rely on traditional blacklist-based or content-based methods, we examine the behavioral factors of both who is posting the link and who is clicking on the link. The core intuition is that these behavioral signals may be more difficult to manipulate than traditional signals. We find that this purely behavioral approach can achieve good performance for robust behavior-based spam link detection. Next, we deal with uncovering manipulated behavior of link sharing. We propose a four-phase approach to model, identify, characterize, and classify organic and organized groups who engage in link sharing. The key motivating insight is that group-level behavioral signals can distinguish manipulated user groups. We find that levels of organized behavior vary by link type and that the proposed approach achieves good performance measured by commonly-used metrics. Finally, we investigate a particular distortion behavior: making bullshit (BS) statements on social media. We explore the factors impacting the perception of BS and what leads users to ultimately perceive and call a post BS. We begin by preparing a crowdsourced collection of real social media posts that have been called BS. We then build a classification model that can determine what posts are more likely to be called BS. Our experiments suggest our classifier has the potential of leveraging linguistic cues for detecting social media posts that are likely to be called BS. We complement these three studies with a cross-cutting investigation of learning user topical profiles, which can shed light into what subjects each user is associated with, which can benefit the understanding of the connection between user and misbehavior. Concretely, we propose a unified model for learning user topical profiles that simultaneously considers multiple footprints and we show how these footprints can be embedded in a generalized optimization framework. Through extensive experiments on millions of real social media posts, we find our proposed models can effectively combat user misbehavior on social media

    Biased behavior in web activities: from understanding to unbiased visual exploration

    Get PDF
    Las tendencias actuales en la Web apuntan hacia la personalización de contenido, lo que no sería un problema en un mundo uniforme y sin sesgos, pero nuestro mundo no es ni uniforme ni libre de sesgos. En esta tesis planteamos la hipótesis de que los sesgos sistémicos y cognitivos que afectan a las personas en el mundo físico también afectan el comportamiento de éstas al explorar contenido en la Web. Proponemos que es posible fomentar una disminución en el comportamiento sesgado a través de una mirada holística que incluye cuantificación de sesgos, formulación de algoritmos, y diseño de interfaces de usuario. Estas tres partes del proceso propuesto son implementadas utilizando técnicas de Minería de la Web. A su vez, son guiadas por las Ciencias Sociales, y presentadas a través de sistemas Casuales de Visualización de Información. Seguimos un enfoque transversal en el cual se aplica este proceso con diferentes niveles de profundidad a lo largo de tres casos de estudio en Wikipedia y Twitter. Como resultado, observamos que los sesgos presentes en el mundo físico efectivamente se ven reflejados en plataformas Web, afectando el contenido, la percepción y el comportamiento de las personas. A través del análisis transversal de los casos de estudio, se presentan las siguientes conclusiones: 1) las herramientas de Minería de la Web son efectivas para medir y detectar comportamiento sesgado; 2) las técnicas de Visualización de Información enfocadas en personas no expertas fomentan el comportamiento no sesgado; y 3) no existen soluciones universales, y en adición a los contextos sociales y culturales, los sesgos deben ser considerados a la hora de diseñar sistemas. Para alcanzar estas conclusiones se implementaron sistemas "en la selva", evaluados de manera cuantitativa en un entorno no controlado, con un enfoque en métricas de participación y compromiso. El uso de dichas métricas es una contribución de la tesis, ya que probaron ser efectivas al medir diferencias en el comportamiento en sistemas exploratorios

    Quantifying & characterizing information diets of social media users

    Get PDF
    An increasing number of people are relying on online social media platforms like Twitter and Facebook to consume news and information about the world around them. This change has led to a paradigm shift in the way news and information is exchanged in our society – from traditional mass media to online social media. With the changing environment, it’s essential to study the information consumption of social media users and to audit how automated algorithms (like search and recommendation systems) are modifying the information that social media users consume. In this thesis, we fulfill this high-level goal with a two-fold approach. First, we propose the concept of information diets as the composition of information produced or consumed. Next, we quantify the diversity and bias in the information diets that social media users consume via the three main consumption channels on social media platforms: (a) word of mouth channels that users curate for themselves by creating social links, (b) recommendations that platform providers give to the users, and (c) search systems that users use to find interesting information on these platforms. We measure the information diets of social media users along three different dimensions of topics, geographic sources, and political perspectives. Our work is aimed at making social media users aware of the potential biases in their consumed diets, and at encouraging the development of novel mechanisms for mitigating the effects of these biases.Immer mehr Menschen verwenden soziale Medien, z.B. Twitter und Facebook, als Quelle für Nachrichten und Informationen aus ihrem Umfeld. Diese Entwicklung hat zu einem Paradigmenwechsel hinsichtlich der Art undWeise, wie Informationen und Nachrichten in unserer Gesellschaft ausgetauscht werden, geführt – weg von klassischen Massenmedien hin zu internetbasierten Sozialen Medien. Angesichts dieser veränderten (Informations-) Umwelt ist es von entscheidender Bedeutung, den Informationskonsum von Social Media-Nutzern zu untersuchen und zu prüfen, wie automatisierte Algorithmen (z.B. Such- und Empfehlungssysteme) die Informationen verändern, die Social Media- Nutzer aufnehmen. In der vorliegenden Arbeit wird diese Aufgabenstellung wie folgt angegangen: Zunächst wird das Konzept der “Information Diets” eingeführt, das eine Zusammensetzung aus produzierten und konsumierten Social Media-Inhalten darstellt. Als nächstes werden die Vielfalt und die Verzerrung (der sogenannte “Bias”) der “Information Diets” quantifiziert die Social Media-Nutzer über die drei hauptsächlichen Social Media- Kanäle konsumieren: (a) persönliche Empfehlungen und Auswahlen, die die Nutzer manuell pflegen und wodurch sie soziale Verbindungen (social links) erzeugen, (b) Empfehlungen, die dem Nutzer von der Social Media-Plattform bereitgestellt werden und (c) Suchsysteme der Plattform, die die Nutzer für ihren Informationsbedarf verwenden. Die “Information Diets” der Social Media-Nutzer werden hierbei anhand der drei Dimensionen Themen, geographische Lage und politische Ansichten gemessen. Diese Arbeit zielt zum einen darauf ab, Social Media-Nutzer auf die möglichen Verzerrungen in ihrer “Information Diet” aufmerksam zu machen. Des Weiteren soll diese Arbeit auch dazu anregen, neuartige Mechanismen und Algorithmen zu entwickeln, um solche Verzerrungen abzuschwächen

    What’s Happening Around the World? A Survey and Framework on Event Detection Techniques on Twitter

    Full text link
    © 2019, Springer Nature B.V. In the last few years, Twitter has become a popular platform for sharing opinions, experiences, news, and views in real-time. Twitter presents an interesting opportunity for detecting events happening around the world. The content (tweets) published on Twitter are short and pose diverse challenges for detecting and interpreting event-related information. This article provides insights into ongoing research and helps in understanding recent research trends and techniques used for event detection using Twitter data. We classify techniques and methodologies according to event types, orientation of content, event detection tasks, their evaluation, and common practices. We highlight the limitations of existing techniques and accordingly propose solutions to address the shortcomings. We propose a framework called EDoT based on the research trends, common practices, and techniques used for detecting events on Twitter. EDoT can serve as a guideline for developing event detection methods, especially for researchers who are new in this area. We also describe and compare data collection techniques, the effectiveness and shortcomings of various Twitter and non-Twitter-based features, and discuss various evaluation measures and benchmarking methodologies. Finally, we discuss the trends, limitations, and future directions for detecting events on Twitter
    corecore