34 research outputs found

    Exploring attributes, sequences, and time in Recommender Systems: From classical to Point-of-Interest recommendation

    Full text link
    Tesis Doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingenieria Informática. Fecha de lectura: 08-07-2021Since the emergence of the Internet and the spread of digital communications throughout the world, the amount of data stored on the Web has been growing exponentially. In this new digital era, a large number of companies have emerged with the purpose of ltering the information available on the web and provide users with interesting items. The algorithms and models used to recommend these items are called Recommender Systems. These systems are applied to a large number of domains, from music, books, or movies to dating or Point-of-Interest (POI), which is an increasingly popular domain where users receive recommendations of di erent places when they arrive to a city. In this thesis, we focus on exploiting the use of contextual information, especially temporal and sequential data, and apply it in novel ways in both traditional and Point-of-Interest recommendation. We believe that this type of information can be used not only for creating new recommendation models but also for developing new metrics for analyzing the quality of these recommendations. In one of our rst contributions we propose di erent metrics, some of them derived from previously existing frameworks, using this contextual information. Besides, we also propose an intuitive algorithm that is able to provide recommendations to a target user by exploiting the last common interactions with other similar users of the system. At the same time, we conduct a comprehensive review of the algorithms that have been proposed in the area of POI recommendation between 2011 and 2019, identifying the common characteristics and methodologies used. Once this classi cation of the algorithms proposed to date is completed, we design a mechanism to recommend complete routes (not only independent POIs) to users, making use of reranking techniques. In addition, due to the great di culty of making recommendations in the POI domain, we propose the use of data aggregation techniques to use information from di erent cities to generate POI recommendations in a given target city. In the experimental work we present our approaches on di erent datasets belonging to both classical and POI recommendation. The results obtained in these experiments con rm the usefulness of our recommendation proposals, in terms of ranking accuracy and other dimensions like novelty, diversity, and coverage, and the appropriateness of our metrics for analyzing temporal information and biases in the recommendations producedDesde la aparici on de Internet y la difusi on de las redes de comunicaciones en todo el mundo, la cantidad de datos almacenados en la red ha crecido exponencialmente. En esta nueva era digital, han surgido un gran n umero de empresas con el objetivo de ltrar la informaci on disponible en la red y ofrecer a los usuarios art culos interesantes. Los algoritmos y modelos utilizados para recomendar estos art culos reciben el nombre de Sistemas de Recomendaci on. Estos sistemas se aplican a un gran n umero de dominios, desde m usica, libros o pel culas hasta las citas o los Puntos de Inter es (POIs, en ingl es), un dominio cada vez m as popular en el que los usuarios reciben recomendaciones de diferentes lugares cuando llegan a una ciudad. En esta tesis, nos centramos en explotar el uso de la informaci on contextual, especialmente los datos temporales y secuenciales, y aplicarla de forma novedosa tanto en la recomendaci on cl asica como en la recomendaci on de POIs. Creemos que este tipo de informaci on puede utilizarse no s olo para crear nuevos modelos de recomendaci on, sino tambi en para desarrollar nuevas m etricas para analizar la calidad de estas recomendaciones. En una de nuestras primeras contribuciones proponemos diferentes m etricas, algunas derivadas de formulaciones previamente existentes, utilizando esta informaci on contextual. Adem as, proponemos un algoritmo intuitivo que es capaz de proporcionar recomendaciones a un usuario objetivo explotando las ultimas interacciones comunes con otros usuarios similares del sistema. Al mismo tiempo, realizamos una revisi on exhaustiva de los algoritmos que se han propuesto en el a mbito de la recomendaci o n de POIs entre 2011 y 2019, identi cando las caracter sticas comunes y las metodolog as utilizadas. Una vez realizada esta clasi caci on de los algoritmos propuestos hasta la fecha, dise~namos un mecanismo para recomendar rutas completas (no s olo POIs independientes) a los usuarios, haciendo uso de t ecnicas de reranking. Adem as, debido a la gran di cultad de realizar recomendaciones en el ambito de los POIs, proponemos el uso de t ecnicas de agregaci on de datos para utilizar la informaci on de diferentes ciudades y generar recomendaciones de POIs en una determinada ciudad objetivo. En el trabajo experimental presentamos nuestros m etodos en diferentes conjuntos de datos tanto de recomendaci on cl asica como de POIs. Los resultados obtenidos en estos experimentos con rman la utilidad de nuestras propuestas de recomendaci on en t erminos de precisi on de ranking y de otras dimensiones como la novedad, la diversidad y la cobertura, y c omo de apropiadas son nuestras m etricas para analizar la informaci on temporal y los sesgos en las recomendaciones producida

    Exploration de la dynamique humaine basée sur des données massives de réseaux sociaux de géolocalisation : analyse et applications

    Get PDF
    Human dynamics is an essential aspect of human centric computing. As a transdisciplinary research field, it focuses on understanding the underlying patterns, relationships, and changes of human behavior. By exploring human dynamics, we can understand not only individual’s behavior, such as a presence at a specific place, but also collective behaviors, such as social movement. Understanding human dynamics can thus enable various applications, such as personalized location based services. However, before the availability of ubiquitous smart devices (e.g., smartphones), it is practically hard to collect large-scale human behavior data. With the ubiquity of GPS-equipped smart phones, location based social media has gained increasing popularity in recent years, making large-scale user activity data become attainable. Via location based social media, users can share their activities as real-time presences at Points of Interests (POIs), such as a restaurant or a bar, within their social circles. Such data brings an unprecedented opportunity to study human dynamics. In this dissertation, based on large-scale location centric social media data, we study human dynamics from both individual and collective perspectives. From individual perspective, we study user preference on POIs with different granularities and its applications in personalized location based services, as well as the spatial-temporal regularity of user activities. From collective perspective, we explore the global scale collective activity patterns with both country and city granularities, and also identify their correlations with diverse human culturesLa dynamique humaine est un sujet essentiel de l'informatique centrée sur l’homme. Elle se concentre sur la compréhension des régularités sous-jacentes, des relations, et des changements dans les comportements humains. En analysant la dynamique humaine, nous pouvons comprendre non seulement des comportements individuels, tels que la présence d’une personne à un endroit précis, mais aussi des comportements collectifs, comme les mouvements sociaux. L’exploration de la dynamique humaine permet ainsi diverses applications, entre autres celles des services géo-dépendants personnalisés dans des scénarios de ville intelligente. Avec l'omniprésence des smartphones équipés de GPS, les réseaux sociaux de géolocalisation ont acquis une popularité croissante au cours des dernières années, ce qui rend les données de comportements des utilisateurs disponibles à grande échelle. Sur les dits réseaux sociaux de géolocalisation, les utilisateurs peuvent partager leurs activités en temps réel avec par l'enregistrement de leur présence à des points d'intérêt (POIs), tels qu’un restaurant. Ces données d'activité contiennent des informations massives sur la dynamique humaine. Dans cette thèse, nous explorons la dynamique humaine basée sur les données massives des réseaux sociaux de géolocalisation. Concrètement, du point de vue individuel, nous étudions la préférence de l'utilisateur quant aux POIs avec des granularités différentes et ses applications, ainsi que la régularité spatio-temporelle des activités des utilisateurs. Du point de vue collectif, nous explorons la forme d'activité collective avec les granularités de pays et ville, ainsi qu’en corrélation avec les cultures globale

    Taming Uncertainty in Big Data - Evidence from Social Media in Urban Areas

    Get PDF
    While the classic definition of Big Data included the dimensions volume, velocity, and variety, a fourth dimension, veracity, has recently come to the attention of researchers and practitioners. The increasing amount of user-generated data associated with the rise of social media emphasizes the need for methods to deal with the uncertainty inherent to these data sources. In this paper we address one aspect of uncertainty by developing a new methodology to establish the reliability of user-generated data based upon causal links with recurring patterns. We associate a large data set of geo-tagged Twitter messages in San Francisco with points of interest, such as bars, restaurants, or museums, within the city. This model is validated by causal relationships between a point of interest and the amount of messages in its vicinity. We subsequently analyze the behavior of these messages over time using a jackknifing procedure to identify categories of points of interest that exhibit consistent patterns over time. Ultimately, we condense this analysis into an indicator that gives evidence on the certainty of a data set based on these causal relationships and recurring patterns in temporal and spatial dimensions

    A Survey on Point-of-Interest Recommendations Leveraging Heterogeneous Data

    Full text link
    Tourism is an important application domain for recommender systems. In this domain, recommender systems are for example tasked with providing personalized recommendations for transportation, accommodation, points-of-interest (POIs), or tourism services. Among these tasks, in particular the problem of recommending POIs that are of likely interest to individual tourists has gained growing attention in recent years. Providing POI recommendations to tourists \emph{during their trip} can however be especially challenging due to the variability of the users' context. With the rapid development of the Web and today's multitude of online services, vast amounts of data from various sources have become available, and these heterogeneous data sources represent a huge potential to better address the challenges of in-trip POI recommendation problems. In this work, we provide a comprehensive survey of published research on POI recommendation between 2017 and 2022 from the perspective of heterogeneous data sources. Specifically, we investigate which types of data are used in the literature and which technical approaches and evaluation methods are predominant. Among other aspects, we find that today's research works often focus on a narrow range of data sources, leaving great potential for future works that better utilize heterogeneous data sources and diverse data types for improved in-trip recommendations.Comment: 35 pages, 19 figure

    Location-Based Social Network Data for Exploring Spatial and Functional Urban Tourists and Residents Consumption Patterns

    Get PDF
    Urban tourist destinations’ increasing popularity has been a catalyst for discussion about the tourist activity geographical circumscription. In this context, Big Data and more specifically location-based social networks (LBSN), appear as a valuable source of information to approach tourist and residents spatial interactions from a renewed perspective. This paper focuses on approaching similarities and differences between tourists and residents’ geographical and functional use of urban economic units. A user classificatory algorithm has been developed and applied on YELP’s Dataset for that purpose. A residents and tourists integration ratio has then been calculated and applied by types of businesses categories and their associated spatial distribution of the of 11 metropolitan areas provided in the sample: Champaign (Illinois, US), Charlotte (North Carolina, US), Cleveland (Ohio, US), Edinburgh (Scotland, UK), Las Vegas (Nevada, US), Madison (Wisconsin, US), Montreal (Quebec, CA), Pittsburgh (Pennsylvania, US), Phoenix (Arizona, US), Stuttgart (DE) and Toronto (Ontario, CA). Business category results show strong similarities in tourists and residents functional coincidence in the use of urban spaces and leisure offer, while there is a clear geographical concentration of activity for both user types in all analysed case studies

    Ontology-driven urban issues identification from social media.

    Get PDF
    As cidades em todo o mundo enfrentam muitos problemas diretamente relacionados ao espaço urbano, especialmente nos aspectos de infraestrutura. A maioria desses problemas urbanos geralmente afeta a vida de residentes e visitantes. Por exemplo, as pessoas podem relatar um carro estacionado em uma calçada que está forçando os pedestres a andar na via, ou um enorme buraco que está causando congestionamento. Além de estarem relacionados com o espaço urbano, os problemas urbanos geralmente demandam ações das autoridades municipais. Existem diversas Redes Sociais Baseadas em Localização (LBSN, em inglês) no domínio das cidades inteligentes em todo o mundo, onde as pessoas relatam problemas urbanos de forma estruturada e as autoridades locais tomam conhecimento para então solucioná-los. Com o advento das redes sociais como Facebook e Twitter, as pessoas tendem a reclamar de forma não estruturada, esparsa e imprevisível, sendo difícil identificar problemas urbanos eventualmente relatados. Dados de mídia social, especialmente mensagens do Twitter, fotos e check-ins, tem desempenhado um papel importante nas cidades inteligentes. Um problema chave é o desafio de identificar conversas específicas e relevantes ao processar dados crowdsourcing ruidosos. Neste contexto, esta pesquisa investiga métodos computacionais a fim de fornecer uma identificação automatizada de problemas urbanos compartilhados em mídias sociais. A maioria dos trabalhos relacionados depende de classificadores baseados em técnicas de aprendizado de máquina, como SVM, Naïve Bayes e Árvores de Decisão; e enfrentam problemas relacionados à representação do conhecimento semântico, legibilidade humana e capacidade de inferência. Com o objetivo de superar essa lacuna semântica, esta pesquisa investiga a Extração de Informação baseada em ontologias, a partir da perspectiva de problemas urbanos, uma vez que tais problemas podem ser semanticamente interligados em plataformas LBSN. Dessa forma, este trabalho propõe uma ontologia no domínio de Problemas Urbanos (UIDO) para viabilizar a identificação e classificação dos problemas urbanos em uma abordagem automatizada que foca principalmente nas facetas temática e geográfica. Uma avaliação experimental demonstra que o desempenho da abordagem proposta é competitivo com os algoritmos de aprendizado de máquina mais utilizados, quando aplicados a este domínio em particular.The cities worldwide face with many issues directly related to the urban space, especially in the infrastructure aspects. Most of these urban issues generally affect the life of both resident and visitant people. For example, people can report a car parked on a footpath which is forcing pedestrians to walk on the road or a huge pothole that is causing traffic congestion. Besides being related to the urban space, urban issues generally demand actions from city authorities. There are many Location-Based Social Networks (LBSN) in the smart cities domain worldwide where people complain about urban issues in a structured way and local authorities are aware to fix them. With the advent of social networks such as Facebook and Twitter, people tend to complain in an unstructured, sparse and unpredictable way, being difficult to identify urban issues eventually reported. Social media data, especially Twitter messages, photos, and check-ins, have played an important role in the smart cities. A key problem is the challenge in identifying specific and relevant conversations on processing the noisy crowdsourced data. In this context, this research investigates computational methods in order to provide automated identification of urban issues shared in social media streams. Most related work rely on classifiers based on machine learning techniques such as Support Vector Machines (SVM), Naïve Bayes and Decision Trees; and face problems concerning semantic knowledge representation, human readability and inference capability. Aiming at overcoming this semantic gap, this research investigates the ontology-driven Information Extraction (IE) from the perspective of urban issues; as such issues can be semantically linked in LBSN platforms. Therefore, this work proposes an Urban Issues Domain Ontology (UIDO) to enable the identification and classification of urban issues in an automated approach that focuses mainly on the thematic and geographical facets. Experimental evaluation demonstrates the proposed approach performance is competitive with most commonly used machine learning algorithms applied for that particular domain.CNP

    Extração e aplicação de indicadores no processo de recomendação de recursos urbanos utilizando dados estruturados e não-estruturados

    Get PDF
    Considerando o estudo do desenvolvimento de sistemas voltados a ambientes urbanos através da Informática Urbana, e tendo que dados referentes a tais de cenário encontramse muitas vezes dispersos, em diferentes formas e estruturas e, em alguns casos, com procedência duvidosa, processos de recuperação e análise de informações tornam-se nãotriviais. Nesse cenário, métodos capazes de extrair informações anteriormente desconhecidas ou não mensuradas e de valor para algum domínio são de fundamental importância. Diante de tal perspectiva, o principal objetivo desta pesquisa consiste em desenvolver uma abordagem capaz de extrair e analisar informações expressas em redes sociais baseadas em localização com o uso de Mineração de Textos, de modo a relacionar aspectos referentes a polaridade de informações e a confiabilidade dos perfis que as difundiram, bem como considerar o momento de avaliação, gerando indicadores a serem aplicados no processo de recomendação de recursos urbanos verificando tal influência ao estimar métricas de avaliação. Para tanto, procede-se a aplicação de uma metodologia baseada em premissas de análise de redes sociais, associada a aplicação de abordagens de Web Mining no processo de descoberta de conhecimentos e análise de dados Como fonte de informações foi utilizado um conjunto de dados contendo 6600 observações coletadas no Foursquare, referentes à cidade de Gramado no Rio Grande do Sul, organizadas em 13 variáveis, além de informações complementares fornecidas pela plataforma DataViva. As características extraídas foram então aplicadas a algoritmos de recomendação baseados em vizinhança e em fatoração de matrizes, de modo a apurar métricas de acurácia com seu uso. Dos resultados obtidos, observa-se que, para algoritmos baseados em vizinhança, a abordagem proposta apresentou resultados melhores quando comparada à abordagem tradicional de avaliação. Entretanto, ao utilizar algoritmos baseados em fatoração de matrizes, as taxas de erro mantém-se com médias e desvios-padrão baixos. Os resultados obtidos foram comparados utilizando testes deWilcoxon com 95% de confiança, o que permite concluir que esses retratam a não uniformidade na distribuição das amostras, evidenciando diferenças significativas entre os resultados obtidos.As a source, a dataset containing 6600 observations was collected at Foursquare, referring to the city of Gramado in Rio Grande do Sul. In this dataset, 13 variables were considered, and complementary information was provided by DataViva platform. The extracted features were applied to recommender approaches based on neighborhood and matrix factorization, and their use was measured in terms of accuracy. From the results, it is observed that the approach based on neighborhood algorithms presented better results when compared to the traditional evaluation approach. However, when using algorithms based on matrix-factorization, error rates are maintained with low standard means and standard deviations. The results obtained with the use of both metrics were compared using Wilcoxon tests with 95% confidence, which concludes that they portray the nonuniformity in the distribution of the samples, evidencing significant differences between the results obtained with the use of the approaches used

    Revisiting Urban Dynamics through Social Urban Data:

    Get PDF
    The study of dynamic spatial and social phenomena in cities has evolved rapidly in the recent years, yielding new insights into urban dynamics. This evolution is strongly related to the emergence of new sources of data for cities (e.g. sensors, mobile phones, online social media etc.), which have potential to capture dimensions of social and geographic systems that are difficult to detect in traditional urban data (e.g. census data). However, as the available sources increase in number, the produced datasets increase in diversity. Besides heterogeneity, emerging social urban data are also characterized by multidimensionality. The latter means that the information they contain may simultaneously address spatial, social, temporal, and topical attributes of people and places. Therefore, integration and geospatial (statistical) analysis of multidimensional data remain a challenge. The question which, then, arises is how to integrate heterogeneous and multidimensional social urban data into the analysis of human activity dynamics in cities? To address the above challenge, this thesis proposes the design of a framework of novel methods and tools for the integration, visualization, and exploratory analysis of large-scale and heterogeneous social urban data to facilitate the understanding of urban dynamics. The research focuses particularly on the spatiotemporal dynamics of human activity in cities, as inferred from different sources of social urban data. The main objective is to provide new means to enable the incorporation of heterogeneous social urban data into city analytics, and to explore the influence of emerging data sources on the understanding of cities and their dynamics.  In mitigating the various heterogeneities, a methodology for the transformation of heterogeneous data for cities into multidimensional linked urban data is, therefore, designed. The methodology follows an ontology-based data integration approach and accommodates a variety of semantic (web) and linked data technologies. A use case of data interlinkage is used as a demonstrator of the proposed methodology. The use case employs nine real-world large-scale spatiotemporal data sets from three public transportation organizations, covering the entire public transport network of the city of Athens, Greece.  To further encourage the consumption of linked urban data by planners and policy-makers, a set of webbased tools for the visual representation of ontologies and linked data is designed and developed. The tools – comprising the OSMoSys framework – provide graphical user interfaces for the visual representation, browsing, and interactive exploration of both ontologies and linked urban data.   After introducing methods and tools for data integration, visual exploration of linked urban data, and derivation of various attributes of people and places from different social urban data, it is examined how they can all be combined into a single platform. To achieve this, a novel web-based system (coined SocialGlass) for the visualization and exploratory analysis of human activity dynamics is designed. The system combines data from various geo-enabled social media (i.e. Twitter, Instagram, Sina Weibo) and LBSNs (i.e. Foursquare), sensor networks (i.e. GPS trackers, Wi-Fi cameras), and conventional socioeconomic urban records, but also has the potential to employ custom datasets from other sources. A real-world case study is used as a demonstrator of the capacities of the proposed web-based system in the study of urban dynamics. The case study explores the potential impact of a city-scale event (i.e. the Amsterdam Light festival 2015) on the activity and movement patterns of different social categories (i.e. residents, non-residents, foreign tourists), as compared to their daily and hourly routines in the periods  before and after the event. The aim of the case study is twofold. First, to assess the potential and limitations of the proposed system and, second, to investigate how different sources of social urban data could influence the understanding of urban dynamics. The contribution of this doctoral thesis is the design and development of a framework of novel methods and tools that enables the fusion of heterogeneous multidimensional data for cities. The framework could foster planners, researchers, and policy makers to capitalize on the new possibilities given by emerging social urban data. Having a deep understanding of the spatiotemporal dynamics of cities and, especially of the activity and movement behavior of people, is expected to play a crucial role in addressing the challenges of rapid urbanization. Overall, the framework proposed by this research has potential to open avenues of quantitative explorations of urban dynamics, contributing to the development of a new science of cities

    Revisiting Urban Dynamics through Social Urban Data

    Get PDF
    The study of dynamic spatial and social phenomena in cities has evolved rapidly in the recent years, yielding new insights into urban dynamics. This evolution is strongly related to the emergence of new sources of data for cities (e.g. sensors, mobile phones, online social media etc.), which have potential to capture dimensions of social and geographic systems that are difficult to detect in traditional urban data (e.g. census data). However, as the available sources increase in number, the produced datasets increase in diversity. Besides heterogeneity, emerging social urban data are also characterized by multidimensionality. The latter means that the information they contain may simultaneously address spatial, social, temporal, and topical attributes of people and places. Therefore, integration and geospatial (statistical) analysis of multidimensional data remain a challenge. The question which, then, arises is how to integrate heterogeneous and multidimensional social urban data into the analysis of human activity dynamics in cities?  To address the above challenge, this thesis proposes the design of a framework of novel methods and tools for the integration, visualization, and exploratory analysis of large-scale and heterogeneous social urban data to facilitate the understanding of urban dynamics. The research focuses particularly on the spatiotemporal dynamics of human activity in cities, as inferred from different sources of social urban data. The main objective is to provide new means to enable the incorporation of heterogeneous social urban data into city analytics, and to explore the influence of emerging data sources on the understanding of cities and their dynamics.  In mitigating the various heterogeneities, a methodology for the transformation of heterogeneous data for cities into multidimensional linked urban data is, therefore, designed. The methodology follows an ontology-based data integration approach and accommodates a variety of semantic (web) and linked data technologies. A use case of data interlinkage is used as a demonstrator of the proposed methodology. The use case employs nine real-world large-scale spatiotemporal data sets from three public transportation organizations, covering the entire public transport network of the city of Athens, Greece.  To further encourage the consumption of linked urban data by planners and policy-makers, a set of webbased tools for the visual representation of ontologies and linked data is designed and developed. The tools – comprising the OSMoSys framework – provide graphical user interfaces for the visual representation, browsing, and interactive exploration of both ontologies and linked urban data.  After introducing methods and tools for data integration, visual exploration of linked urban data, and derivation of various attributes of people and places from different social urban data, it is examined how they can all be combined into a single platform. To achieve this, a novel web-based system (coined SocialGlass) for the visualization and exploratory analysis of human activity dynamics is designed. The system combines data from various geo-enabled social media (i.e. Twitter, Instagram, Sina Weibo) and LBSNs (i.e. Foursquare), sensor networks (i.e. GPS trackers, Wi-Fi cameras), and conventional socioeconomic urban records, but also has the potential to employ custom datasets from other sources.  A real-world case study is used as a demonstrator of the capacities of the proposed web-based system in the study of urban dynamics. The case study explores the potential impact of a city-scale event (i.e. the Amsterdam Light festival 2015) on the activity and movement patterns of different social categories (i.e. residents, non-residents, foreign tourists), as compared to their daily and hourly routines in the periods  before and after the event. The aim of the case study is twofold. First, to assess the potential and limitations of the proposed system and, second, to investigate how different sources of social urban data could influence the understanding of urban dynamics.  The contribution of this doctoral thesis is the design and development of a framework of novel methods and tools that enables the fusion of heterogeneous multidimensional data for cities. The framework could foster planners, researchers, and policy makers to capitalize on the new possibilities given by emerging social urban data. Having a deep understanding of the spatiotemporal dynamics of cities and, especially of the activity and movement behavior of people, is expected to play a crucial role in addressing the challenges of rapid urbanization. Overall, the framework proposed by this research has potential to open avenues of quantitative explorations of urban dynamics, contributing to the development of a new science of cities
    corecore