120 research outputs found

    A Survey on Cybercrime Using Social Media

    Get PDF
    There is growing interest in automating crime detection and prevention for large populations as a result of the increased usage of social media for victimization and criminal activities. This area is frequently researched due to its potential for enabling criminals to reach a large audience. While several studies have investigated specific crimes on social media, a comprehensive review paper that examines all types of social media crimes, their similarities, and detection methods is still lacking. The identification of similarities among crimes and detection methods can facilitate knowledge and data transfer across domains. The goal of this study is to collect a library of social media crimes and establish their connections using a crime taxonomy. The survey also identifies publicly accessible datasets and offers areas for additional study in this area

    Location Reference Recognition from Texts: A Survey and Comparison

    Full text link
    A vast amount of location information exists in unstructured texts, such as social media posts, news stories, scientific articles, web pages, travel blogs, and historical archives. Geoparsing refers to recognizing location references from texts and identifying their geospatial representations. While geoparsing can benefit many domains, a summary of its specific applications is still missing. Further, there is a lack of a comprehensive review and comparison of existing approaches for location reference recognition, which is the first and core step of geoparsing. To fill these research gaps, this review first summarizes seven typical application domains of geoparsing: geographic information retrieval, disaster management, disease surveillance, traffic management, spatial humanities, tourism management, and crime management. We then review existing approaches for location reference recognition by categorizing these approaches into four groups based on their underlying functional principle: rule-based, gazetteer matching–based, statistical learning-–based, and hybrid approaches. Next, we thoroughly evaluate the correctness and computational efficiency of the 27 most widely used approaches for location reference recognition based on 26 public datasets with different types of texts (e.g., social media posts and news stories) containing 39,736 location references worldwide. Results from this thorough evaluation can help inform future methodological developments and can help guide the selection of proper approaches based on application needs

    Crime prediction and monitoring in Porto, Portugal, using machine learning, spatial and text analytics

    Get PDF
    Crimes are a common societal concern impacting quality of life and economic growth. Despite the global decrease in crime statistics, specific types of crime and feelings of insecurity, have often increased, leading safety and security agencies with the need to apply novel approaches and advanced systems to better predict and prevent occurrences. The use of geospatial technologies, combined with data mining and machine learning techniques allows for significant advances in the criminology of place. In this study, official police data from Porto, in Portugal, between 2016 and 2018, was georeferenced and treated using spatial analysis methods, which allowed the identification of spatial patterns and relevant hotspots. Then, machine learning processes were applied for space-time pattern mining. Using lasso regression analysis, significance for crime variables were found, with random forest and decision tree supporting the important variable selection. Lastly, tweets related to insecurity were collected and topic modeling and sentiment analysis was performed. Together, these methods assist interpretation of patterns, prediction and ultimately, performance of both police and planning professionals

    A combined classification-clustering framework for identifying disruptive events

    Get PDF
    Twitter is a popular micro-blogging web application serving hundreds of millions of users. Users publish short messages to communicate with friends and families, express their opinions and broadcast news and information about a variety of topics all in real-time. User-generated content can be utilized as a rich source of real-world event identification as well as extract useful knowledge about disruptive events for a given region. In this paper, we propose a novel detection framework for identifying real-time events, including a main event and associated disruptive events, from Twitter data. Theapproach is based on five steps:data collection, pre-processing,classification, online clustering and summarization. We use a Naïve Bayes classification model and an Online Clustering method to validate our model on a major real-world event (Formula 1 Abu Dhabi Grand Prix 2013)

    Predictive Analysis on Twitter: Techniques and Applications

    Full text link
    Predictive analysis of social media data has attracted considerable attention from the research community as well as the business world because of the essential and actionable information it can provide. Over the years, extensive experimentation and analysis for insights have been carried out using Twitter data in various domains such as healthcare, public health, politics, social sciences, and demographics. In this chapter, we discuss techniques, approaches and state-of-the-art applications of predictive analysis of Twitter data. Specifically, we present fine-grained analysis involving aspects such as sentiment, emotion, and the use of domain knowledge in the coarse-grained analysis of Twitter data for making decisions and taking actions, and relate a few success stories

    Robust part-of-speech tagging of social media text

    Get PDF
    Part-of-Speech (PoS) tagging (Wortklassenerkennung) ist ein wichtiger Verarbeitungsschritt in vielen sprachverarbeitenden Anwendungen. Heute gibt es daher viele PoS Tagger, die diese wichtige Aufgabe automatisiert erledigen. Es hat sich gezeigt, dass PoS tagging auf informellen Texten oft nur mit unzureichender Genauigkeit möglich ist. Insbesondere Texte aus sozialen Medien sind eine große Herausforderung. Die erhöhte Fehlerrate, welche auf mangelnde Robustheit zurückgeführt werden kann, hat schwere Folgen für Anwendungen die auf PoS Informationen angewiesen sind. Diese Arbeit untersucht daher Tagger-Robustheit unter den drei Gesichtspunkten der (i) Domänenrobustheit, (ii) Sprachrobustheit und (iii) Robustheit gegenüber seltenen linguistischen Phänomene. Für (i) beginnen wir mit einer Analyse der Phänomene, die in informellen Texten häufig anzutreffen sind, aber in formalen Texten nur selten bis gar keine Verwendung finden. Damit schaffen wir einen Überblick über die Art der Phänomene die das Tagging von informellen Texten so schwierig machen. Wir evaluieren viele der üblicherweise benutzen Tagger für die englische und deutsche Sprache auf Texten aus verschiedenen Domänen, um einen umfassenden Überblick über die derzeitige Robustheit der verfügbaren Tagger zu bieten. Die Untersuchung ergab im Wesentlichen, dass alle Tagger auf informellen Texten große Schwächen zeigen. Methoden, um die Robustheit für domänenübergreifendes Tagging zu verbessern, sind prinzipiell hilfreich, lösen aber das grundlegende Robustheitsproblem nicht. Als neuen Lösungsansatz stellen wir Tagging in zwei Schritten vor, welches eine erhöhte Robustheit gegenüber domänenübergreifenden Tagging bietet. Im ersten Schritt wird nur grob-granular getaggt und im zweiten Schritt wird dieses Tagging dann auf das fein-granulare Level verfeinert. Für (ii) untersuchen wir Sprachrobustheit und ob jede Sprache einen zugeschnittenen Tagger benötigt, oder ob es möglich ist einen sprach-unabhängigen Tagger zu konstruieren, der für mehrere Sprachen funktioniert. Dazu vergleichen wir Tagger basierend auf verschiedenen Algorithmen auf 21 Sprachen und analysieren die notwendigen technischen Eigenschaften für einen Tagger, der auf mehreren Sprachen akkurate Modelle lernen kann. Die Untersuchung ergibt, dass Sprachrobustheit an für sich kein schwerwiegendes Problem ist und, dass die Tagsetgröße des Trainingskorpus ein wesentlich stärkerer Einflussfaktor für die Eignung eines Taggers ist als die Zugehörigkeit zu einer gewissen Sprache. Bezüglich (iii) untersuchen wir, wie man mit seltenen Phänomenen umgehen kann, für die nicht genug Trainingsdaten verfügbar sind. Dazu stellen wir eine neue kostengünstige Methode vor, die nur einen minimalen Aufwand an manueller Annotation erwartet, um zusätzliche Daten für solche seltenen Phänomene zu produzieren. Ein Feldversuch hat gezeigt, dass die produzierten Daten ausreichen um das Tagging von seltenen Phänomenen deutlich zu verbessern. Abschließend präsentieren wir zwei Software-Werkzeuge, FlexTag und DeepTC, die wir im Rahmen dieser Arbeit entwickelt haben. Diese Werkzeuge bieten die notwendige Flexibilität und Reproduzierbarkeit für die Experimente in dieser Arbeit.Part-of-speech (PoS) taggers are an important processing component in many Natural Language Processing (NLP) applications, which led to a variety of taggers for tackling this task. Recent work in this field showed that tagging accuracy on informal text domains is poor in comparison to formal text domains. In particular, social media text, which is inherently different from formal standard text, leads to a drastically increased error rate. These arising challenges originate in a lack of robustness of taggers towards domain transfers. This increased error rate has an impact on NLP applications that depend on PoS information. The main contribution of this thesis is the exploration of the concept of robustness under the following three aspects: (i) domain robustness, (ii) language robustness and (iii) long tail robustness. Regarding (i), we start with an analysis of the phenomena found in informal text that make tagging this kind of text challenging. Furthermore, we conduct a comprehensive robustness comparison of many commonly used taggers for English and German by evaluating them on the text of several text domains. We find that the tagging of informal text is poorly supported by available taggers. A review and analysis of currently used methods to adapt taggers to informal text showed that these methods improve tagging accuracy but offer no satisfactory solution. We propose an alternative tagging approach that reaches an increased multi-domain tagging robustness. This approach is based on tagging in two steps. The first step tags on a coarse-grained level and the second step refines the tags to the fine-grained tags. Regarding (ii), we investigate whether each language requires a language-tailored PoS tagger or if the construction of a competitive language independent tagger is feasible. We explore the technical details that contribute to a tagger's language robustness by comparing taggers based on different algorithms to learn models of 21 languages. We find that language robustness is a less severe issue and that the impact of the tagger choice depends more on the granularity of the tagset that shall be learned than on the language. Regarding (iii), we investigate methods to improve tagging of infrequent phenomena of which no sufficient amount of annotated training data is available, which is a common challenge in the social media domain. We propose a new method to overcome this lack of data that offers an inexpensive way of producing more training data. In a field study, we show that the quality of the produced data suffices to train tagger models that can recognize these under-represented phenomena. Furthermore, we present two software tools, FlexTag and DeepTC, which we developed in the course of this thesis. These tools provide the necessary flexibility for conducting all the experiments in this thesis and ensure their reproducibility

    Media’s influence on the 21st century society: A global criminological systematic review

    Get PDF
    This investigation assumes that the media can reduce or spread criminal activities and tendencies based on how the concerned parties apply the policies and community standards that guide these platforms’ use. In total, 254 materials were gathered across several search systems between October 2021 and September 2022. Qualitative data were used from the selected materials to synthesise and summarise the content on the examined 21st-century events and media’s influence on crime. It is not possible to reject the premise that the media influences opinions on crime and the legal system. Nevertheless, the data reveals that no causal media effect can be directly established. However, the same data uncovers how media portrays an activity affects how people perceive it. Advances in technology, media, and criminology may have affected the analysis of records, including the time and quality of resources. More accurate and fair media coverage of crime would lead to a more informed and aware population. On the other hand, media houses that promote and reward good behaviour should be applauded. These two steps ensure the media cannot be ignored when assessing crime and how the public perceives it, as it can encourage crime and shift perceptions. Therefore, further research, stricter laws and policies, and community education on crime prevention and media screening are needed. The fact that unfavourable media coverage of crime can ruin a business, either directly or indirectly (consumer behaviour changes due to crime), makes this paper of utmost importance for businessmen, politicians, and local agencies.Esta dissertação presume que os media podem ser utilizados para reduzir ou difundir atividades ou tendências criminosas, dependendo da aplicação de políticas e padrões comunitários que influenciam tais plataformas. Foram utilizados 254 materiais reunidos em diversos sistemas de pesquisa entre outubro de 2021 e setembro de 2022. Estes compreendem publicações do século XXI que examinam a influência dos media nas práticas criminais e suas perceções. Apesar deste estudo não possibilitar estabelecer uma relação causal, não é, ainda assim, possível rejeitar a premissa de que os media influenciam as perceções face ao crime. Determina, contudo, que o modo como os media divulgam uma atividade afeta a perceção social face à mesma. Uma população mais informada e consciente depende de uma cobertura mediática mais fatual. Os media que promovem e recompensam o bom comportamento devem ser louvados. Os media não podem ser ignorados na avaliação do crime e da sua perceção, tendo o poder de incentivar a criminalidade e potenciar alterações nas perceções sociais. Consequentemente, é necessário investigar mais, aplicar leis e políticas mais rigorosas, e investir em programas de educação comunitária de prevenção à criminalidade e interpretação dos media. Esta dissertação é de elevada importância a empresários, políticos e outros órgãos locais, pelo fato de a cobertura desfavorável do crime pelos media poder arruinar um indivíduo, organização ou até um negócio, seja de forma direta (críticas ao estabelecimento) ou indireta (mudanças no comportamento do consumidor devido à ocorrência de crimes numa região)

    A Deep Multi-View Learning Framework for City Event Extraction from Twitter Data Streams

    Get PDF
    Cities have been a thriving place for citizens over the centuries due to their complex infrastructure. The emergence of the Cyber-Physical-Social Systems (CPSS) and context-aware technologies boost a growing interest in analysing, extracting and eventually understanding city events which subsequently can be utilised to leverage the citizen observations of their cities. In this paper, we investigate the feasibility of using Twitter textual streams for extracting city events. We propose a hierarchical multi-view deep learning approach to contextualise citizen observations of various city systems and services. Our goal has been to build a flexible architecture that can learn representations useful for tasks, thus avoiding excessive task-specific feature engineering. We apply our approach on a real-world dataset consisting of event reports and tweets of over four months from San Francisco Bay Area dataset and additional datasets collected from London. The results of our evaluations show that our proposed solution outperforms the existing models and can be used for extracting city related events with an averaged accuracy of 81% over all classes. To further evaluate the impact of our Twitter event extraction model, we have used two sources of authorised reports through collecting road traffic disruptions data from Transport for London API, and parsing the Time Out London website for sociocultural events. The analysis showed that 49.5% of the Twitter traffic comments are reported approximately five hours prior to the authorities official records. Moreover, we discovered that amongst the scheduled sociocultural event topics; tweets reporting transportation, cultural and social events are 31.75% more likely to influence the distribution of the Twitter comments than sport, weather and crime topics
    corecore