35 research outputs found

    Georeferencing flickr resources based on textual meta-data

    Get PDF
    The task of automatically estimating the location of web resources is of central importance in location-based services on the Web. Much attention has been focused on Flickr photos and videos, for which it was found that language modeling approaches are particularly suitable. In particular, state-of-the art systems for georeferencing Flickr photos tend to cluster the locations on Earth in a relatively small set of disjoint regions, apply feature selection to identify location-relevant tags, then use a form of text classification to identify which area is most likely to contain the true location of the resource, and finally attempt to find an appropriate location within the identified area. In this paper, we present a systematic discussion of each of the aforementioned components, based on the lessons we have learned from participating in the 2010 and 2011 editions of MediaEval’s Placing Task. Extensive experimental results allow us to analyze why certain methods work well on this task and show that a median error of just over 1 km can be achieved on a standard benchmark test set

    Georeferencing text using social media

    Get PDF

    Georeferencing Flickr photos using language models at different levels of granularity: an evidence based approach

    Get PDF
    The topic of automatically assigning geographic coordinates to Web 2.0 resources based on their tags has recently gained considerable attention. However, the coordinates that are produced by automated techniques are necessarily variable, since not all resources are described by tags that are sufficiently descriptive. Thus there is a need for adaptive techniques that assign locations to photos at the right level of granularity, or, in some cases, even refrain from making any estimations regarding location at all. To this end, we consider the idea of training language models at different levels of granularity, and combining the evidence provided by these language models using Dempster and Shafer’s theory of evidence. We provide experimental results which clearly confirm that the increased spatial awareness that is thus gained allows us to make better informed decisions, and moreover increases the overall accuracy of the individual language models

    Knowledge-based and data-driven approaches for geographical information access

    Get PDF
    Geographical Information Access (GeoIA) can be defined as a way of retrieving information from textual collections that includes the automatic analysis and interpretation of the geographical constraints and terms present in queries and documents. This PhD thesis presents, describes and evaluates several heterogeneous approaches for the following three GeoIA tasks: Geographical Information Retrieval (GIR), Geographical Question Answering (GeoQA), and Textual Georeferencing (TG). The GIR task deals with user queries that search over documents (e.g. ¿vineyards in California?) and the GeoQA task treats questions that retrieve answers (e.g. ¿What is the capital of France?). On the other hand, TG is the task of associate one or more georeferences (such as polygons or coordinates in a geodetic reference system) to electronic documents. Current state-of-the-art AI algorithms are not yet fully understanding the semantic meaning and the geographical constraints and terms present in queries and document collections. This thesis attempts to improve the effectiveness results of GeoIA tasks by: 1) improving the detection, understanding, and use of a part of the geographical and the thematic content of queries and documents with Toponym Recognition, Toponym Disambiguation and Natural Language Processing (NLP) techniques, and 2) combining Geographical Knowledge-Based Heuristics based on common sense with Data-Driven IR algorithms. The main contributions of this thesis to the state-of-the-art of GeoIA tasks are: 1) The presentation of 10 novel approaches for GeoIA tasks: 3 approaches for GIR, 3 for GeoQA, and 4 for Textual Georeferencing (TG). 2) The evaluation of these novel approaches in these contexts: within official evaluation benchmarks, after evaluation benchmarks with the test collections, and with other specific datasets. Most of these algorithms have been evaluated in international evaluations and some of them achieved top-ranked state-of-the-art results, including top-performing results in GIR (GeoCLEF 2007) and TG (MediaEval 2014) benchmarks. 3) The experiments reported in this PhD thesis show that the approaches can combine effectively Geographical Knowledge and NLP with Data-Driven techniques to improve the efectiveness measures of the three Geographical Information Access tasks investigated. 4) TALPGeoIR: a novel GIR approach that combines Geographical Knowledge ReRanking (GeoKR), NLP and Relevance Feedback (RF) that achieved state-of-the-art results in official GeoCLEF benchmarks (Ferrés and Rodríguez, 2008; Mandl et al., 2008) and posterior experiments (Ferrés and Rodríguez, 2015a). This approach has been evaluated with the full GeoCLEF corpus (100 topics) and showed that GeoKR, NLP, and RF techniques evaluated separately or in combination improve the results in MAP and R-Precision effectiveness measures of the state-of-the-art IR algorithms TF-IDF, BM25 and InL2 and show statistical significance in most of the experiments. 5) GeoTALP-QA: a scope-based GeoQA approach for Spanish and English and its evaluation with a set of questions of the Spanish geography (Ferrés and Rodríguez, 2006). 6) Four state-of-the-art Textual Georeferencing approaches for informal and formal documents that achieved state-of-the-art results in evaluation benchmarks (Ferrés and Rodríguez, 2014) and posterior experiments (Ferrés and Rodríguez, 2011; Ferrés and Rodríguez, 2015b).L'Accés a la Informació Geogràfica (GeoAI) pot ser definit com una forma de recuperar informació de col·lecions textuals que inclou l'anàlisi automàtic i la interpretació dels termes i restriccions geogràfiques que apareixen en consultes i documents. Aquesta tesi doctoral presenta, descriu i avalua varies aproximacions heterogènies a les seguents tasques de GeoAI: Recuperació de la Informació Geogràfica (RIG), Cerca de la Resposta Geogràfica (GeoCR), i Georeferenciament Textual (GT). La tasca de RIG tracta amb consultes d'usuari que cerquen documents (e.g. ¿vinyes a California?) i la tasca GeoCR tracta de recuperar respostes concretes a preguntes (e.g. ¿Quina és la capital de França?). D'altra banda, GT es la tasca de relacionar una o més referències geogràfiques (com polígons o coordenades en un sistema de referència geodètic) a documents electrònics. Els algoritmes de l'estat de l'art actual en Intel·ligència Artificial encara no comprenen completament el significat semàntic i els termes i les restriccions geogràfiques presents en consultes i col·leccions de documents. Aquesta tesi intenta millorar els resultats en efectivitat de les tasques de GeoAI de la seguent manera: 1) millorant la detecció, comprensió, i la utilització d'una part del contingut geogràfic i temàtic de les consultes i documents amb tècniques de reconeixement de topònims, desambiguació de topònims, i Processament del Llenguatge Natural (PLN), i 2) combinant heurístics basats en Coneixement Geogràfic i en el sentit comú humà amb algoritmes de Recuperació de la Informació basats en dades. Les principals contribucions d'aquesta tesi a l'estat de l'art de les tasques de GeoAI són: 1) La presentació de 10 noves aproximacions a les tasques de GeoAI: 3 aproximacions per RIG, 3 per GeoCR, i 4 per Georeferenciament Textual (GT). 2) L'avaluació d'aquestes noves aproximacions en aquests contexts: en el marc d'avaluacions comparatives internacionals, posteriorment a avaluacions comparatives internacionals amb les col·lections de test, i amb altres conjunts de dades específics. La majoria d'aquests algoritmes han estat avaluats en avaluacions comparatives internacionals i alguns d'ells aconseguiren alguns dels millors resultats en l'estat de l'art, com per exemple els resultats en comparatives de RIG (GeoCLEF 2007) i GT (MediaEval 2014). 3) Els experiments descrits en aquesta tesi mostren que les aproximacions poden combinar coneixement geogràfic i PLN amb tècniques basades en dades per millorar les mesures d'efectivitat en les tres tasques de l'Accés a la Informació Geogràfica investigades. 4) TALPGeoIR: una nova aproximació a la RIG que combina Re-Ranking amb Coneixement Geogràfic (GeoKR), PLN i Retroalimentació de Rellevancia (RR) que aconseguí resultats en l'estat de l'art en comparatives oficials GeoCLEF (Ferrés and Rodríguez, 2008; Mandl et al., 2008) i en experiments posteriors (Ferrés and Rodríguez, 2015a). Aquesta aproximació ha estat avaluada amb el conjunt complert del corpus GeoCLEF (100 topics) i ha mostrat que les tècniques GeoKR, PLN i RR avaluades separadament o en combinació milloren els resultats en les mesures efectivitat MAP i R-Precision dels algoritmes de l'estat de l'art en Recuperació de la Infomació TF-IDF, BM25 i InL2 i a més mostren significació estadística en la majoria dels experiments. 5) GeoTALP-QA: una aproximació basada en l'àmbit geogràfic per espanyol i anglès i la seva avaluació amb un conjunt de preguntes de la geografía espanyola (Ferrés and Rodríguez, 2006). 6) Quatre aproximacions per al georeferenciament de documents formals i informals que obtingueren resultats en l'estat de l'art en avaluacions comparatives (Ferrés and Rodríguez, 2014) i en experiments posteriors (Ferrés and Rodríguez, 2011; Ferrés and Rodríguez, 2015b)

    Detection, Modelling and Visualisation of Georeferenced Emotions from User-Generated Content

    Get PDF
    In recent years emotion-related applications like smartphone apps that document and analyse the emotions of the user, have become very popular. But research also can deal with human emotions in a very technology-driven approach. Thus space-related emotions are of interest as well which can be visualised cartographically and can be captured in different ways. The research project of this dissertation deals with the extraction of georeferenced emotions from the written language in the metadata of Flickr and Panoramio photos, thus from user-generated content, as well as with their modelling and visualisation. Motivation is the integration of an emotional component into location-based services for tourism since only factual information is considered thus far although places have an emotional impact. The metadata of those user-generated photos contain descriptions of the place that is depicted within the respective picture. The words used have affective connotations which are determined with the help of emotional word lists. The emotion that is associated with the particular word in the word list is described on the basis of the two dimensions ‘valence’ and ‘arousal’. Together with the coordinates of the respective photo, the extracted emotion forms a georeferenced emotion. The algorithm that was developed for the extraction of these emotions applies different approaches from the field of computer linguistics and considers grammatical special cases like the amplification or negation of words. The algorithm was applied to a dataset of Flickr and Panoramio photos of Dresden (Germany). The results are an emotional characterisation of space which makes it possible to assess and investigate specific features of georeferenced emotions. These features are especially related to the temporal dependence and the temporal reference of emotions on one hand; on the other hand collectively and individually perceived emotions have to be distinguished. As a consequence, a place does not necessarily have to be connected with merely one emotion but possibly also with several. The analysis was carried out with the help of different cartographic visualisations. The temporal occurrence of georeferenced emotions was examined detailed. Hence the dissertation focuses on fundamental research into the extraction of space-related emotions from georeferenced user-generated content as well as their visualisation. However as an outlook, further research questions and core themes are identified which arose during the investigations. This shows that this subject is far from being exhausted.:Statement of Authorship I Acknowledgements II Abstract III Zusammenfassung V Table of Contents VII List of Figures XI List of Tables XIV List of Abbreviations XV 1 Introduction 1 1.1 Motivation 1 1.2 Research Questions 3 1.3 Thesis Structure 4 1.4 Underlying Publications 4 2 State of the Art 6 2.1 Emotions 6 2.1.1 Definitions and Terms 6 2.1.2 Emotion Theories 7 2.1.2.1 James-Lange Theory 9 2.1.2.2 Two-Factor Theory 9 2.1.3 Structuring Emotions 9 2.1.3.1 Dimensional Approaches 10 2.1.3.2 Basic Emotions 11 2.1.3.3 Empirical Similarity Categories 12 2.1.4 Acquisition of Emotions 14 2.1.4.1 Verbal Procedures 14 2.1.4.2 Non-Verbal Procedures 14 2.1.5 Relation between Emotions and Places 15 2.1.6 Emotions in Language 17 2.1.7 Affect Analysis and Sentiment Analysis 20 2.2 User-Generated Content 22 2.2.1 Definition and Characterisation 22 2.2.2 Advantages and Disadvantages 23 2.2.3 Tagging 24 2.2.4 Inaccuracies 28 2.2.5 Flickr and Panoramio 29 2.2.5.1 Flickr 30 2.2.5.2 Panoramio 31 2.3 Related Work on Georeferenced Emotions 32 2.3.1 Emotional Data Resulting from Biometric Measurements 33 2.3.1.1 Bio Mapping 33 2.3.1.2 EmBaGIS 34 2.3.1.3 Ein emotionales Kiezportrait 35 2.3.2 Emotional Data Resulting from Empirical Surveys 35 2.3.2.1 EmoMap 35 2.3.2.2 WiMo 36 2.3.2.3 ECDESUP 37 2.3.2.4 Map of World Happiness 38 2.3.2.5 Emotional Study of Yeongsan River Basin 39 2.3.3 Emotional Data Resulting from User-Generated Content 40 2.3.3.1 Emography 40 2.3.3.2 Twittermood 40 2.3.3.3 Tweetbeat 42 2.3.3.4 Beautiful picture of an ugly place 42 2.3.4 Visualisation in the Related Work 43 3 Methods 45 3.1 Approach for Extracting Georeferenced Emotions from the Metadata of Flickr and Panoramio Photos 45 3.2 Implemented Algorithm 45 3.3 Grammatical Special Cases 47 3.3.1 Degree Words 48 3.3.2 Negation 52 3.3.2.1 Syntactic Negation in English Language 55 3.3.2.2 Syntactic Negation in German Language 57 3.3.3 Modification of Words Affected by Grammatical Special Cases 60 4 Visualisation and Analysis of Extracted Georeferenced Emotions 62 4.1 Data Basis 62 4.2 Density Maps 67 4.3 Inverse Distance Weight 71 4.4 3D Visualisation 73 4.5 Choropleth Mapping 74 4.6 Point Symbols 78 4.7 Impact of Considering Grammatical Special Cases 80 5 Investigation in Temporal Aspects 85 5.1 Annually Occurrence of Emotions 85 5.2 Periodic Events 87 5.3 Single Events 91 5.4 Dependence of Georeferenced Emotions on Different Periods of Time 93 5.4.1 Seasons 95 5.4.2 Months 96 5.4.3 Weekdays 98 5.4.4 Times of Day 99 5.5 Potentials and Limits of Temporal Analyses 99 6 Discussion 100 6.1 Evaluation 100 6.2 Weaknesses and Problems 102 7 Conclusions and Outlook 105 7.1 Answers to the Research Questions 105 7.2 Outlook and Future Work 107 8 Bibliography 112 Appendices XVIIn den letzten Jahren sind emotionsbezogene Anwendungen, wie Apps, die die Emotionen des Nutzers dokumentieren und analysieren, sehr populär geworden. Ebenfalls in der Forschung sind Emotionen in einem sehr technologiegetriebenen Ansatz ein Thema. So auch ortsbezogene Emotionen, die sich somit kartographisch darstellen lassen und auf verschiedene Art und Weisen gewonnen werden können. Das Forschungsvorhaben der Dissertation befasst sich mit der Extraktion von georeferenzierten Emotionen aus geschriebener Sprache unter Verwendung von Metadaten verorteter Flickr- und Panoramio-Fotos, d.h. aus nutzergenerierten Inhalten, sowie deren Modellierung und Visualisierung. Motivation hierfür ist die Einbindung einer emotionalen Komponente in ortsbasierte touristische Dienste, da diese bisher nur faktische Informationen berücksichtigen, obwohl Orte durchaus eine emotionale Wirkung haben. Die Metadaten dieser nutzergenerierten Inhalte stellen Beschreibungen des auf dem Foto festgehaltenen Ortes dar. Die dafür verwendeten Wörter besitzen affektive Konnotationen, welche mit Hilfe emotionaler Wortlisten ermittelt werden. Die Emotion, die mit dem jeweiligen Wort in der Wortliste assoziiert wird, wird anhand der zwei Dimensionen Valenz und Erregung beschrieben. Die extrahierten Emotionen bilden zusammen mit der geographischen Koordinate des jeweiligen Fotos eine georeferenzierte Emotion. Der zur Extraktion dieser Emotionen entwickelte Algorithmus bringt verschiedene Ansätze aus dem Bereich der Computerlinguistik zum Einsatz und berücksichtigt ebenso grammatikalische Sonderfälle, wie Intensivierung oder Negation von Wörtern. Der Algorithmus wurde auf einen Datensatz von Flickr- und Panoramio-Fotos von Dresden angewendet. Die Ergebnisse stellen eine emotionale Raumcharakterisierung dar und ermöglichen es, spezifische Eigenschaften verorteter Emotionen festzustellen und zu untersuchen. Diese Eigenschaften beziehen sich sowohl auf die zeitliche Abhängigkeit und den zeitlichen Bezug von Emotionen, als auch darauf, dass zwischen kollektiv und individuell wahrgenommenen Emotionen unterschieden werden muss. Das bedeutet, dass ein Ort nicht nur mit einer Emotion verbunden sein muss, sondern möglicherweise auch mit mehreren. Die Auswertung erfolgte mithilfe verschiedener kartographischer Visualisierungen. Eingehender wurde das zeitliche Auftreten der ortsbezogenen Emotionen untersucht. Der Fokus der Dissertation liegt somit auf der Grundlagenforschung zur Extraktion verorteter Emotionen aus georeferenzierten nutzergenerierten Inhalten sowie deren Visualisierung. Im Ausblick werden jedoch weitere Fragestellungen und Schwerpunkte genannt, die sich im Laufe der Untersuchungen ergeben haben, womit gezeigt wird, dass dieses Forschungsgebiet bei Weitem noch nicht ausgeschöpft ist.:Statement of Authorship I Acknowledgements II Abstract III Zusammenfassung V Table of Contents VII List of Figures XI List of Tables XIV List of Abbreviations XV 1 Introduction 1 1.1 Motivation 1 1.2 Research Questions 3 1.3 Thesis Structure 4 1.4 Underlying Publications 4 2 State of the Art 6 2.1 Emotions 6 2.1.1 Definitions and Terms 6 2.1.2 Emotion Theories 7 2.1.2.1 James-Lange Theory 9 2.1.2.2 Two-Factor Theory 9 2.1.3 Structuring Emotions 9 2.1.3.1 Dimensional Approaches 10 2.1.3.2 Basic Emotions 11 2.1.3.3 Empirical Similarity Categories 12 2.1.4 Acquisition of Emotions 14 2.1.4.1 Verbal Procedures 14 2.1.4.2 Non-Verbal Procedures 14 2.1.5 Relation between Emotions and Places 15 2.1.6 Emotions in Language 17 2.1.7 Affect Analysis and Sentiment Analysis 20 2.2 User-Generated Content 22 2.2.1 Definition and Characterisation 22 2.2.2 Advantages and Disadvantages 23 2.2.3 Tagging 24 2.2.4 Inaccuracies 28 2.2.5 Flickr and Panoramio 29 2.2.5.1 Flickr 30 2.2.5.2 Panoramio 31 2.3 Related Work on Georeferenced Emotions 32 2.3.1 Emotional Data Resulting from Biometric Measurements 33 2.3.1.1 Bio Mapping 33 2.3.1.2 EmBaGIS 34 2.3.1.3 Ein emotionales Kiezportrait 35 2.3.2 Emotional Data Resulting from Empirical Surveys 35 2.3.2.1 EmoMap 35 2.3.2.2 WiMo 36 2.3.2.3 ECDESUP 37 2.3.2.4 Map of World Happiness 38 2.3.2.5 Emotional Study of Yeongsan River Basin 39 2.3.3 Emotional Data Resulting from User-Generated Content 40 2.3.3.1 Emography 40 2.3.3.2 Twittermood 40 2.3.3.3 Tweetbeat 42 2.3.3.4 Beautiful picture of an ugly place 42 2.3.4 Visualisation in the Related Work 43 3 Methods 45 3.1 Approach for Extracting Georeferenced Emotions from the Metadata of Flickr and Panoramio Photos 45 3.2 Implemented Algorithm 45 3.3 Grammatical Special Cases 47 3.3.1 Degree Words 48 3.3.2 Negation 52 3.3.2.1 Syntactic Negation in English Language 55 3.3.2.2 Syntactic Negation in German Language 57 3.3.3 Modification of Words Affected by Grammatical Special Cases 60 4 Visualisation and Analysis of Extracted Georeferenced Emotions 62 4.1 Data Basis 62 4.2 Density Maps 67 4.3 Inverse Distance Weight 71 4.4 3D Visualisation 73 4.5 Choropleth Mapping 74 4.6 Point Symbols 78 4.7 Impact of Considering Grammatical Special Cases 80 5 Investigation in Temporal Aspects 85 5.1 Annually Occurrence of Emotions 85 5.2 Periodic Events 87 5.3 Single Events 91 5.4 Dependence of Georeferenced Emotions on Different Periods of Time 93 5.4.1 Seasons 95 5.4.2 Months 96 5.4.3 Weekdays 98 5.4.4 Times of Day 99 5.5 Potentials and Limits of Temporal Analyses 99 6 Discussion 100 6.1 Evaluation 100 6.2 Weaknesses and Problems 102 7 Conclusions and Outlook 105 7.1 Answers to the Research Questions 105 7.2 Outlook and Future Work 107 8 Bibliography 112 Appendices XV

    Geotagging Text Content With Language Models and Feature Mining

    Get PDF

    Knowledge management: the issue of multimedia contents

    Get PDF
    Knowledge Management is a very important topic in business and in academy research. There are many fields of applications for knowledge management, including cognitive science, sociology, management science, information science, knowledge engineering, artificial intelligence, and economics. Many studies on different aspects of Knowledge Management have been published, becoming common in the early 1990s. In this work, we want to represent Knowledge through a mixed-iterative approach, where top-down and bottom-up analyses of the knowledge domain which has to be represented are applied: these are typical approaches for this kind of problems. In this case, they are applied following an iterative approach which allows, through further refinements, for the efficient formalization able to represent the domain's knowledge of interest. We start from the concept of the “domain knowledge base”. The fundamental body of knowledge available on a domain is the knowledge valuable for the knowledge users. We need to represent and manage this knowledge, to define a formalization and codification of the knowledge in the domain. After this formalization we can manage this knowledge using knowledge repositories. In this thesis, we present four different formalization and management of knowledge for multimedia contents, using our proposed approach: 1. User Generated Contents from famous platform (Flickr, YouTube, etc.); 2. audio recordings regarding linguistic corpus and information added to that corpus with annotations; 3. knowledge associated with construction processes; 4. descriptions and reviews of Italian wines. The most important result we achieved with this thesis was the opportunity to make this disparaged knowledge available and manageable. In the current market, exploiting existing knowledge is a mainstream business, but in order to exploit it, one must be able to manage it first. As a token of this importance, not only about ten scientific publications, but most of all a number of industrial research projects, in partnership with ICT companies – one of which with a total value above one million Euros – stemmed from the studies discussed in this thesis

    Knowledge management: the issue of multimedia contents

    Get PDF
    Knowledge Management is a very important topic in business and in academy research. There are many fields of applications for knowledge management, including cognitive science, sociology, management science, information science, knowledge engineering, artificial intelligence, and economics. Many studies on different aspects of Knowledge Management have been published, becoming common in the early 1990s. In this work, we want to represent Knowledge through a mixed-iterative approach, where top-down and bottom-up analyses of the knowledge domain which has to be represented are applied: these are typical approaches for this kind of problems. In this case, they are applied following an iterative approach which allows, through further refinements, for the efficient formalization able to represent the domain's knowledge of interest. We start from the concept of the “domain knowledge base”. The fundamental body of knowledge available on a domain is the knowledge valuable for the knowledge users. We need to represent and manage this knowledge, to define a formalization and codification of the knowledge in the domain. After this formalization we can manage this knowledge using knowledge repositories. In this thesis, we present four different formalization and management of knowledge for multimedia contents, using our proposed approach: 1. User Generated Contents from famous platform (Flickr, YouTube, etc.); 2. audio recordings regarding linguistic corpus and information added to that corpus with annotations; 3. knowledge associated with construction processes; 4. descriptions and reviews of Italian wines. The most important result we achieved with this thesis was the opportunity to make this disparaged knowledge available and manageable. In the current market, exploiting existing knowledge is a mainstream business, but in order to exploit it, one must be able to manage it first. As a token of this importance, not only about ten scientific publications, but most of all a number of industrial research projects, in partnership with ICT companies – one of which with a total value above one million Euros – stemmed from the studies discussed in this thesis
    corecore