22 research outputs found

    Crowd-sourced Photographic Content for Urban Recreational Route Planning

    Get PDF
    Routing services are able to provide travel directions for users of all modes of transport. Most of them are focusing on functional journeys (i.e. journeys linking given origin and destination with minimum cost) while paying less attention to recreational trips, in particular leisure walks in an urban context. These walks are additionally predefined by time or distance and as their purpose is the process of walking itself, the attractiveness of areas that are passed by can be an important factor in route selection. This factor is hard to be formalised and requires a reliable source of information, covering the entire street network. Previous research shows that crowd-sourced data available from photo-sharing services has a potential for being a measure of space attractiveness, thus becoming a base for a routing system that suggests leisure walks, and ongoing PhD research aims to build such system. This paper demonstrates findings on four investigated data sources (Flickr, Panoramio, Picasa and Geograph) in Central London and discusses the requirements to the algorithm that is going to be implemented in the second half of this PhD research. Visual analytics was chosen as a method for understanding and comparing obtained datasets that contain hundreds of thousands records. Interactive software was developed to find a number of problems, as well as to estimate the suitability of the sources in general. It was concluded that Picasa and Geograph have problems making them less suitable for further research while Panoramio and Flickr require filtering to remove photographs that do not contribute to understanding of local attractiveness. Based on this analysis a number of filtering methods were proposed in order to improve the quality of datasets and thus provide a more reliable measure to support urban recreational routing

    Big (Geo)Data in Social Sciences: Challenges and Opportunities

    Get PDF
    Actualmente asistimos a una verdadera revolución en la producción y el tratamiento de datos masivos (Big Data). Aunque los principales usuarios de este tipo de datos son las empresas, el mundo de la investigación ha encontrado también interesantes posibilidades en el análisis de Big Data, con abordajes nuevos a viejos problemas o incluso con el planteamiento de cuestiones que no podían ser abordadas con datos tradicionales. El presente artículo constituye una revisión de trabajos de investigación que utilizan datos masivos geolocalizados, Big (Geo)Data, y muestra ejemplos de aplicación en la investigación, ordenando los trabajos revisados según fuentes de datos: registros de llamadas de teléfonos móviles, redes sociales, comunidades de fotografías geolocalizadas, registros de transacciones con tarjetas de crédito, tarjetas inteligentes de transporte, navegadores, etc. El trabajo concluye con unas reflexiones sobre las ventajas que ofrece el Big (Geo)Data para el investigador, como la alta resolución espacial y temporal de los datos y, en muchos casos, su cobertura global y su carácter gratuito, pero también resalta algunos de los principales inconvenientes que plantea su uso, como el sesgo y la dificultad de su proceso y, en muchos casos, de acceso a los mismos.Currently we are witnessing a revolution in the production and processing of massive data (Big Data). Although the main users of such data are companies, social researchers have also found interesting possibilities in the analysis of Big Data, with new approaches to old questions or even with the approach to issues that could not be addressed with traditional data. This article is a review of research papers using geolocated massive data, Big (Geo)Data, and shows examples of their application in research, grouping the papers according to data sources: mobile phone calls records, social networks, communities of geolocated photos, credit card transactions records, transport smart cards, car navigators, etc. The paper concludes with some reflections on the advantages of Big (Geo)Data in social sciences research (high temporal and spatial resoluction, and, in many cases, global coverage and free of charge), but it also highlights some of the main problems arising from their use, such as bias, processing capacity and access barriers

    A new heuristic algorithm to create customized tourist routes

    Get PDF
    La adaptación de las actividades turísticas a sus preferencias es un elemento crucial para los turistas del siglo XXI. El presente artículo tiene por objetivo el proponer un nuevo algoritmo heurístico para la realización de rutas turísticas personalizadas en los centros históricos de las ciudades que pueda ser aplicado a plataformas web turísticas o aplicaciones para móviles. Para su consecución se ha hecho en primera instancia un análisis del estado de la cuestión que sirviera de marco de referencia. A continuación se explica el nuevo algoritmo, que consta de dos partes. En la primera se aborda el problema de determinar los puntos de interés que debe visitar el turista en base a sus predilecciones. Para ello se van a seleccionar los parámetros que se tendrán en cuenta y se propone un nuevo método para cuantificarlos. La segunda parte del algoritmo se centra en determinar el orden en que deben ser recorridos, buscando que la distancia sea lo menor posible. Se van a realizar agrupaciones de los distintos puntos de interés, y se van a unir entre sí. El resultado final es un algoritmo que se adapta a las preferencias dadas y que logra unos desplazamientos muy ajustados.The customization of tourism activities to the tourist preferences is crucial in the XXI century. This study aims to propose a new heuristic algorithm to create customized routes in historical city centers to be uploaded in web sites or mobile applications. First, a state of the art has been done followed by the description of the new algorithm proposed which consists of two different parts. In the first part, the attractions that the tourist should visit according to their preferences are selected together with the parameters and a new methodology to quantify them. In the second part, the study tries to determine the position of the attractions along the itinerary with the minimum distance between them. Finally, the attractions are grouped and linked together. The result is a customized algorithm to the tourist preferences which provides small distance walks.Este artículo se enmarca en el proyecto de investigación SIT-MAD, financiado por la Fundación Hergar

    Searching and mining in enriched geo-spatial data

    Get PDF
    The emergence of new data collection mechanisms in geo-spatial applications paired with a heightened tendency of users to volunteer information provides an ever-increasing flow of data of high volume, complex nature, and often associated with inherent uncertainty. Such mechanisms include crowdsourcing, automated knowledge inference, tracking, and social media data repositories. Such data bearing additional information from multiple sources like probability distributions, text or numerical attributes, social context, or multimedia content can be called multi-enriched. Searching and mining this abundance of information holds many challenges, if all of the data's potential is to be released. This thesis addresses several major issues arising in that field, namely path queries using multi-enriched data, trend mining in social media data, and handling uncertainty in geo-spatial data. In all cases, the developed methods have made significant contributions and have appeared in or were accepted into various renowned international peer-reviewed venues. A common use of geo-spatial data is path queries in road networks where traditional methods optimise results based on absolute and ofttimes singular metrics, i.e., finding the shortest paths based on distance or the best trade-off between distance and travel time. Integrating additional aspects like qualitative or social data by enriching the data model with knowledge derived from sources as mentioned above allows for queries that can be issued to fit a broader scope of needs or preferences. This thesis presents two implementations of incorporating multi-enriched data into road networks. In one case, a range of qualitative data sources is evaluated to gain knowledge about user preferences which is subsequently matched with locations represented in a road network and integrated into its components. Several methods are presented for highly customisable path queries that incorporate a wide spectrum of data. In a second case, a framework is described for resource distribution with reappearance in road networks to serve one or more clients, resulting in paths that provide maximum gain based on a probabilistic evaluation of available resources. Applications for this include finding parking spots. Social media trends are an emerging research area giving insight in user sentiment and important topics. Such trends consist of bursts of messages concerning a certain topic within a time frame, significantly deviating from the average appearance frequency of the same topic. By investigating the dissemination of such trends in space and time, this thesis presents methods to classify trend archetypes to predict future dissemination of a trend. Processing and querying uncertain data is particularly demanding given the additional knowledge required to yield results with probabilistic guarantees. Since such knowledge is not always available and queries are not easily scaled to larger datasets due to the #P-complete nature of the problem, many existing approaches reduce the data to a deterministic representation of its underlying model to eliminate uncertainty. However, data uncertainty can also provide valuable insight into the nature of the data that cannot be represented in a deterministic manner. This thesis presents techniques for clustering uncertain data as well as query processing, that take the additional information from uncertainty models into account while preserving scalability using a sampling-based approach, while previous approaches could only provide one of the two. The given solutions enable the application of various existing clustering techniques or query types to a framework that manages the uncertainty.Das Erscheinen neuer Methoden zur Datenerhebung in räumlichen Applikationen gepaart mit einer erhöhten Bereitschaft der Nutzer, Daten über sich preiszugeben, generiert einen stetig steigenden Fluss von Daten in großer Menge, komplexer Natur, und oft gepaart mit inhärenter Unsicherheit. Beispiele für solche Mechanismen sind Crowdsourcing, automatisierte Wissensinferenz, Tracking, und Daten aus sozialen Medien. Derartige Daten, angereichert mit mit zusätzlichen Informationen aus verschiedenen Quellen wie Wahrscheinlichkeitsverteilungen, Text- oder numerische Attribute, sozialem Kontext, oder Multimediainhalten, werden als multi-enriched bezeichnet. Suche und Datamining in dieser weiten Datenmenge hält viele Herausforderungen bereit, wenn das gesamte Potenzial der Daten genutzt werden soll. Diese Arbeit geht auf mehrere große Fragestellungen in diesem Feld ein, insbesondere Pfadanfragen in multi-enriched Daten, Trend-mining in Daten aus sozialen Netzwerken, und die Beherrschung von Unsicherheit in räumlichen Daten. In all diesen Fällen haben die entwickelten Methoden signifikante Forschungsbeiträge geleistet und wurden veröffentlicht oder angenommen zu diversen renommierten internationalen, von Experten begutachteten Konferenzen und Journals. Ein gängiges Anwendungsgebiet räumlicher Daten sind Pfadanfragen in Straßennetzwerken, wo traditionelle Methoden die Resultate anhand absoluter und oft auch singulärer Maße optimieren, d.h., der kürzeste Pfad in Bezug auf die Distanz oder der beste Kompromiss zwischen Distanz und Reisezeit. Durch die Integration zusätzlicher Aspekte wie qualitativer Daten oder Daten aus sozialen Netzwerken als Anreicherung des Datenmodells mit aus diesen Quellen abgeleitetem Wissen werden Anfragen möglich, die ein breiteres Spektrum an Anforderungen oder Präferenzen erfüllen. Diese Arbeit präsentiert zwei Ansätze, solche multi-enriched Daten in Straßennetze einzufügen. Zum einen wird eine Reihe qualitativer Datenquellen ausgewertet, um Wissen über Nutzerpräferenzen zu generieren, welches darauf mit Örtlichkeiten im Straßennetz abgeglichen und in das Netz integriert wird. Diverse Methoden werden präsentiert, die stark personalisierbare Pfadanfragen ermöglichen, die ein weites Spektrum an Daten mit einbeziehen. Im zweiten Fall wird ein Framework präsentiert, das eine Ressourcenverteilung im Straßennetzwerk modelliert, bei der einmal verbrauchte Ressourcen erneut auftauchen können. Resultierende Pfade ergeben einen maximalen Ertrag basieren auf einer probabilistischen Evaluation der verfügbaren Ressourcen. Eine Anwendung ist die Suche nach Parkplätzen. Trends in sozialen Medien sind ein entstehendes Forscchungsgebiet, das Einblicke in Benutzerverhalten und wichtige Themen zulässt. Solche Trends bestehen aus großen Mengen an Nachrichten zu einem bestimmten Thema innerhalb eines Zeitfensters, so dass die Auftrittsfrequenz signifikant über den durchschnittlichen Level liegt. Durch die Untersuchung der Fortpflanzung solcher Trends in Raum und Zeit präsentiert diese Arbeit Methoden, um Trends nach Archetypen zu klassifizieren und ihren zukünftigen Weg vorherzusagen. Die Anfragebearbeitung und Datamining in unsicheren Daten ist besonders herausfordernd, insbesondere im Hinblick auf das notwendige Zusatzwissen, um Resultate mit probabilistischen Garantien zu erzielen. Solches Wissen ist nicht immer verfügbar und Anfragen lassen sich aufgrund der \P-Vollständigkeit des Problems nicht ohne Weiteres auf größere Datensätze skalieren. Dennoch kann Datenunsicherheit wertvollen Einblick in die Struktur der Daten liefern, der mit deterministischen Methoden nicht erreichbar wäre. Diese Arbeit präsentiert Techniken zum Clustering unsicherer Daten sowie zur Anfragebearbeitung, die die Zusatzinformation aus dem Unsicherheitsmodell in Betracht ziehen, jedoch gleichzeitig die Skalierbarkeit des Ansatzes auf große Datenmengen sicherstellen

    Exploiting Flickr meta-data for predicting environmental features

    Get PDF
    The photo-sharing website Flickr has become used as an informal information source in disciplines such as geography and ecology. Many recent studies have highlighted the fact that Flickr tags capture valuable ecological information, which can complement more traditional sources. A shortcoming of most of these existing methods is that they rely on manual interpretation of Flickr content, with little automated exploitation of the associated tags. Therefore, they fail to exploit the full potential of the data. Automatically extracting and analysing information from unstructured and noisy data remains a hard task. This research aims to investigate the use of Flickr meta-data for predicting a wide variety of environmental phenomena. In particular, we consider the problem of predicting scenicness, species distribution, land cover, and climate-related features. To this end, we developed several novel machine learning methods that can efficiently utilise Flickr tags as a supplementary source to the structured information that is available from traditional scientific resources. The first proposed method aims at modelling locations, and hence inferring environmental phenomena, using georeferenced Flickr tags. Our focus was on comparing the predictive power of Flickr tags with that of structured environmental data. This method represents each location as a concatenation of two feature vectors: a bag-of words representation derived from Flickr and a feature vector encoding the numerical and categorical features obtained from the structured dataset. We found that Flickr was generally competitive with the structured environmental data for prediction, being sometimes better and sometimes worse. However, combining Flickr tags with existing ecological data sources consistently improved the results, which suggests that Flickr can indeed be regarded as complementary to traditional sources. The second method that we propose is based on a collective prediction model, which crucially relies on Flickr tags to define the neighbourhood structure. The use of a collective prediction formulation is motivated by the fact that most environmental features are strongly spatially autocorrelated. While this suggests that geographic distance should play a key role in determining neighbourhoods, we show that considerable gains can be made by additionally taking Flickr tags and traditional data into consideration. The thesis considers two further novel methods which are based on a low dimensional vector space representation. The first model, called EGEL (Embedding Geographic Locations), learns vector space embeddings of geographic locations by integrating the textual information derived from Flickr with the numerical and categorical information derived from environmental datasets. We experimentally show that this method improves on bag-of-words representation approaches, especially in cases where structured data are available. This model has been extended by considering a spatiotemporal representation of regions. In particular, we propose a spatiotemporal embeddings model, called SPATE (Spatiotemporal Embeddings), which learns a vector space embedding for each geographic region and each month of the year. This allows the model to capture environmental phenomena that may depend on monthly or seasonal variation. Apart from extending our primary model, SPATE also includes a new smoothing method to deal with the sparsity of Flickr tags over the considered spatiotemporal setup. The experimental results demonstrated in this thesis confirm our hypothesis that there is valuable information contained in Flickr tags which can be used to predict environmental features
    corecore