9 research outputs found

    A systematic survey of online data mining technology intended for law enforcement

    Get PDF
    As an increasing amount of crime takes on a digital aspect, law enforcement bodies must tackle an online environment generating huge volumes of data. With manual inspections becoming increasingly infeasible, law enforcement bodies are optimising online investigations through data-mining technologies. Such technologies must be well designed and rigorously grounded, yet no survey of the online data-mining literature exists which examines their techniques, applications and rigour. This article remedies this gap through a systematic mapping study describing online data-mining literature which visibly targets law enforcement applications, using evidence-based practices in survey making to produce a replicable analysis which can be methodologically examined for deficiencies

    An architecture for establishing legal semantic workflows in the context of integrated law enforcement

    Get PDF
    A previous version of this paper was presented at the Third Workshop on Legal Knowledge and the Semantic Web (LK&SW-2016), EKAW-2016, November 19th, Bologna, ItalyTraditionally the integration of data from multiple sources is done on an ad-hoc basis for each to "silos" that prevent sharing data across different agencies or tasks, and is unable to cope with the modern environment, where workflows, tasks, and priorities frequently change. Operating within the Data to Decision Cooperative Research Centre (D2D CRC), the authors are currently involved in the Integrated Law Enforcement Project, which has the goal of developing a federated data platform that will enable the execution of integrated analytics on data accessed from different external and internal sources, thereby providing effective support to an investigator or analyst working to evaluate evidence and manage lines of inquiries in the investigation. Technical solutions should also operate ethically, in compliance with the law, and subject to good governance principles

    A survey on extremism analysis using natural language processing: definitions, literature review, trends and challenges

    Get PDF
    Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.Extremism has grown as a global problem for society in recent years, especially after the apparition of movements such as jihadism. This and other extremist groups have taken advantage of different approaches, such as the use of Social Media, to spread their ideology, promote their acts and recruit followers. The extremist discourse, therefore, is reflected on the language used by these groups. Natural language processing (NLP) provides a way of detecting this type of content, and several authors make use of it to describe and discriminate the discourse held by these groups, with the final objective of detecting and preventing its spread. Following this approach, this survey aims to review the contributions of NLP to the field of extremism research, providing the reader with a comprehensive picture of the state of the art of this research area. The content includes a first conceptualization of the term extremism, the elements that compose an extremist discourse and the differences with other terms. After that, a review description and comparison of the frequently used NLP techniques is presented, including how they were applied, the insights they provided, the most frequently used NLP software tools, descriptive and classification applications, and the availability of datasets and data sources for research. Finally, research questions are approached and answered with highlights from the review, while future trends, challenges and directions derived from these highlights are suggested towards stimulating further research in this exciting research area.CRUE-CSIC agreementSpringer Natur

    A survey on extremism analysis using natural language processing: definitions, literature review, trends and challenges

    Get PDF
    Extremism has grown as a global problem for society in recent years, especially after the apparition of movements such as jihadism. This and other extremist groups have taken advantage of different approaches, such as the use of Social Media, to spread their ideology, promote their acts and recruit followers. The extremist discourse, therefore, is reflected on the language used by these groups. Natural language processing (NLP) provides a way of detecting this type of content, and several authors make use of it to describe and discriminate the discourse held by these groups, with the final objective of detecting and preventing its spread. Following this approach, this survey aims to review the contributions of NLP to the field of extremism research, providing the reader with a comprehensive picture of the state of the art of this research area. The content includes a first conceptualization of the term extremism, the elements that compose an extremist discourse and the differences with other terms. After that, a review description and comparison of the frequently used NLP techniques is presented, including how they were applied, the insights they provided, the most frequently used NLP software tools, descriptive and classification applications, and the availability of datasets and data sources for research. Finally, research questions are approached and answered with highlights from the review, while future trends, challenges and directions derived from these highlights are suggested towards stimulating further research in this exciting research area.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature

    Modélisation spatio-temporelle pour la détection d’événements de sécurité publique à partir d’un flux Twitter

    Get PDF
    Twitter est un réseau social très répandu en Amérique du Nord, offrant aux autorités policières une opportunité pour détecter les événements d’intérêt public. Les messages Twitter liés à un événement contiennent souvent les noms de rue où se déroule l’événement, ce qui permet la géolocalisation en temps réel. Plusieurs logiciels commerciaux sont offerts pour effectuer la vigie des réseaux sociaux. L’efficacité de ces outils pour les autorités policières pourrait être grandement améliorée avec un accès à un plus grand échantillon de messages Twitter, avec un tri préalable pour dégager les événements pertinents en moins de temps et avec une mesure de la fiabilité des événements détectés. Ce mémoire vise à proposer une démarche afin de détecter, à partir du flux de messages Twitter, les événements de sécurité publique d’un territoire, automatiquement et avec un niveau de fiabilité acceptable. Pour atteindre cet objectif, un modèle informatisé a été conçu, basé sur les quatre composantes suivantes: a) la cueillette de tweets à partir de mots clés avec un filtrage géographique, b) l’analyse linguistique et l’utilisation d’un répertoire de rues pour déceler les tweets localisables et pour trouver leurs coordonnées à partir des noms de rue et de leur intersection, c) une méthode spatio-temporelle pour former des grappes de tweets, et d) la détection des événements en identifiant les grappes contenant au moins deux (2) tweets communs touchant le même sujet. Ce travail de recherche diffère des articles scientifiques recensés car il combine l’analyse textuelle, la recherche et le géocodage de toponymes à partir d’un répertoire de noms de rue, la formation de grappes avec la géomatique et l’identification de grappes contenant des tweets communs pour détecter localement des événements de sécurité publique. L’application du modèle aux 90 347 tweets cueillis dans la région de Toronto-Niagara au Canada a résulté en l’identification et la géolocalisation de 1 614 tweets ainsi qu’en la formation de 172 grappes dont 79 grappes d’événements contenant au moins deux (2) tweets touchant le même sujet, soit un taux de fiabilité de 45,9 %.Abstract : Twitter is a social media that is very popular in North America, giving law enforcement agencies an opportunity to detect events of public interest. Twitter messages (tweets) tied to an event often contain street names, indicating where this event takes place, which can be used to infer the event's geographical coordinates in real time. Many commercial software tools are available to monitor social media. The performance of these tools could be greatly improved with a larger sample of tweets, a sorting mechanism to identify pertinent events more quickly and to measure the reliability of the detected events. The goal of this master‟s thesis is to detect, from a public Twitter stream, events relative to public safety of a territory, automatically and with an acceptable level of reliability. To achieve this objective, a computer model based on four components has been developed: a) capture of public tweets based on keywords with the application of a geographic filter, b) natural language processing of the text of these tweets, use of a street gazetteer to identify tweets that can be localized and geocoding of tweets based on street names and intersections, c) a spatio-temporal method to form tweet clusters and, d) event detection by isolating clusters containing at least two tweets treating the same subject. This research project differs from existing scientific research as it combines natural language processing, search and geocoding of toponyms based on a street gazetteer, the creation of clusters using geomatics and identification of event clusters based on common tweets to detect public safety events in a Twitter public stream. The application of the model to the 90,347 tweets collected for the Toronto-Niagara region in Ontario, Canada has resulted in the identification and geocoding of 1,614 tweets and the creation of 172 clusters from which 79 event clusters contain at least two tweets having the same subject showing a reliability rate of 45.9 %

    Open-Source Intelligence Investigations: Development and Application of Efficient Tools

    Get PDF
    Open-source intelligence is a branch within cybercrime investigation that focuses on information collection and aggregation. Through this aggregation, investigators and analysts can analyze the data for connections relevant to the investigation. There are many tools that assist with information collection and aggregation. However, these often require enterprise licensing. A solution to enterprise licensed tools is using open-source tools to collect information, often by scraping websites. These tools provide useful information, but they provide a large number of disjointed reports. The framework we developed automates information collection, aggregates these reports, and generates one single graphical report. By using a graphical report, the time required for analysis is also reduced. This framework can be used for different investigations. We performed a case study regarding the performance of the framework with missing person case information. It showed a significant improvement in the time required for information collection and report analysis

    Human decision-making in computer security incident response

    Get PDF
    Background: Cybersecurity has risen to international importance. Almost every organization will fall victim to a successful cyberattack. Yet, guidance for computer security incident response analysts is inadequate. Research Questions: What heuristics should an incident analyst use to construct general knowledge and analyse attacks? Can we construct formal tools to enable automated decision support for the analyst with such heuristics and knowledge? Method: We take an interdisciplinary approach. To answer the first question, we use the research tradition of philosophy of science, specifically the study of mechanisms. To answer the question on formal tools, we use the research tradition of program verification and logic, specifically Separation Logic. Results: We identify several heuristics from biological sciences that cybersecurity researchers have re-invented to varying degrees. We consolidate the new mechanisms literature to yield heuristics related to the fact that knowledge is of clusters of multi-field mechanism schema on four dimensions. General knowledge structures such as the intrusion kill chain provide context and provide hypotheses for filling in details. The philosophical analysis answers this research question, and also provides constraints on building the logic. Finally, we succeed in defining an incident analysis logic resembling Separation Logic and translating the kill chain into it as a proof of concept. Conclusion: These results benefits incident analysis, enabling it to expand from a tradecraft or art to also integrate science. Future research might realize our logic into automated decision-support. Additionally, we have opened the field of cybersecuity to collaboration with philosophers of science and logicians
    corecore