63 research outputs found
Recommended from our members
Extracting Personal Behavioral Patterns from Geo-Referenced Tweets
This paper presents an exploratory study of the potential of geo-referenced Twitter data for extracting knowledge about significant personal places, behaviors and potential interests of people. The study was done analysing two months’ worth of tweets from residents of the greater Seattle area
Recommended from our members
Tracing the German Centennial Flood in the Stream of Tweets: First Lessons Learned
Social microblogging services such as Twitter result in massive streams of georeferenced messages and geolocated status updates. This real-time source of information is invaluable for many application areas, in particular for disaster detection and response scenarios. Consequently, a considerable number of works has dealt with issues of their acquisition, analysis and visualization. Most of these works not only assume an appropriate percentage of georeferenced messages that allows for detecting relevant events for a specific region and time frame, but also that these geolocations are reasonably correct in representing places and times of the underlying spatio-temporal situation. In this paper, we review these two key assumption based on the results of applying a visual analytics approach to a dataset of georeferenced Tweets from Germany over eight months witnessing several large-scale flooding situations throughout the country. Our results con rm the potential of Twitter as a distributed 'social sensor' but at the same time highlight some caveats in interpreting immediate results. To overcome these limits we explore incorporating evidence from other data sources including further social media and mobile phone network metrics to detect, confirm and refine events with respect to location and time. We summarize the lessons learned from our initial analysis by proposing recommendations and outline possible future work directions
Normalization of Dutch user-generated content
Abstract This paper describes a phrase-based machine translation approach to normalize Dutch user-generated content (UGC). We compiled a corpus of three different social media genres (text messages, message board posts and tweets) to have a sample of this recent domain. We describe the various characteristics of this noisy text material and explain how it has been manually normalized using newly developed guidelines. For the automatic normalization task we focus on text messages, and find that a cascaded SMT system where a token-based module is followed by a translation at the character level gives the best word error rate reduction. After these initial experiments, we investigate the system's robustness on the complete domain of UGC by testing it on the other two social media genres, and find that the cascaded approach performs best on these genres as well. To our knowledge, we deliver the first proof-of-concept system for Dutch UGC normalization, which can serve as a baseline for future work
Twitter Based Information Extraction
In the modern world of social media dominance, the microblogs like Twitter and Facebook are probably the best source of up-to-date information. The amount of information available on these platforms is huge, although most of it is unstructured and redundant which makes our task of extracting information from it much more challenging. This automatic extraction of information from noisy sources has opened up new opportunities for querying and analyzing data.
This paper is a review of the research that has been done on extracting information like event dates [1] and classification of information from social networking platforms like Twitter. We present a brief study of the work which shows that extracting useful information from Twitter and other social media platforms is indeed feasible. We provide brief study about the extraction techniques applied by the applications based on this subject like the extraction tasks and the input exploited for extraction, the types of methods of extraction used and the type of output produced
- …