9 research outputs found
A systematic survey of online data mining technology intended for law enforcement
As an increasing amount of crime takes on a digital aspect, law enforcement bodies must tackle an online environment generating huge volumes of data. With manual inspections becoming increasingly infeasible, law enforcement bodies are optimising online investigations through data-mining technologies. Such technologies must be well designed and rigorously grounded, yet no survey of the online data-mining literature exists which examines their techniques, applications and rigour. This article remedies this gap through a systematic mapping study describing online data-mining literature which visibly targets law enforcement applications, using evidence-based practices in survey making to produce a replicable analysis which can be methodologically examined for deficiencies
An architecture for establishing legal semantic workflows in the context of integrated law enforcement
A previous version of this paper was presented at the Third Workshop on Legal Knowledge and the Semantic Web (LK&SW-2016), EKAW-2016, November 19th, Bologna, ItalyTraditionally the integration of data from multiple sources is done on an ad-hoc basis for each to "silos" that prevent sharing data across different agencies or tasks, and is unable to cope with the modern environment, where workflows, tasks, and priorities frequently change. Operating within the Data to Decision Cooperative Research Centre (D2D CRC), the authors are currently involved in the Integrated Law Enforcement Project, which has the goal of developing a federated data platform that will enable the execution of integrated analytics on data accessed from different external and internal sources, thereby providing effective support to an investigator or analyst working to evaluate evidence and manage lines of inquiries in the investigation. Technical solutions should also operate ethically, in compliance with the law, and subject to good governance principles
A survey on extremism analysis using natural language processing: definitions, literature review, trends and challenges
Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.Extremism has grown as a global problem for society in recent years, especially after the apparition of movements such as
jihadism. This and other extremist groups have taken advantage of different approaches, such as the use of Social Media, to
spread their ideology, promote their acts and recruit followers. The extremist discourse, therefore, is reflected on the language
used by these groups. Natural language processing (NLP) provides a way of detecting this type of content, and several authors
make use of it to describe and discriminate the discourse held by these groups, with the final objective of detecting and
preventing its spread. Following this approach, this survey aims to review the contributions of NLP to the field of extremism
research, providing the reader with a comprehensive picture of the state of the art of this research area. The content includes
a first conceptualization of the term extremism, the elements that compose an extremist discourse and the differences with
other terms. After that, a review description and comparison of the frequently used NLP techniques is presented, including
how they were applied, the insights they provided, the most frequently used NLP software tools, descriptive and classification
applications, and the availability of datasets and data sources for research. Finally, research questions are approached
and answered with highlights from the review, while future trends, challenges and directions derived from these highlights
are suggested towards stimulating further research in this exciting research area.CRUE-CSIC agreementSpringer Natur
A survey on extremism analysis using natural language processing: definitions, literature review, trends and challenges
Extremism has grown as a global problem for society in recent years, especially after the apparition of movements such as jihadism. This and other extremist groups have taken advantage of different approaches, such as the use of Social Media, to spread their ideology, promote their acts and recruit followers. The extremist discourse, therefore, is reflected on the language used by these groups. Natural language processing (NLP) provides a way of detecting this type of content, and several authors make use of it to describe and discriminate the discourse held by these groups, with the final objective of detecting and preventing its spread. Following this approach, this survey aims to review the contributions of NLP to the field of extremism research, providing the reader with a comprehensive picture of the state of the art of this research area. The content includes a first conceptualization of the term extremism, the elements that compose an extremist discourse and the differences with other terms. After that, a review description and comparison of the frequently used NLP techniques is presented, including how they were applied, the insights they provided, the most frequently used NLP software tools, descriptive and classification applications, and the availability of datasets and data sources for research. Finally, research questions are approached and answered with highlights from the review, while future trends, challenges and directions derived from these highlights are suggested towards stimulating further research in this exciting research area.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature
Modélisation spatio-temporelle pour la détection d’événements de sécurité publique à partir d’un flux Twitter
Twitter est un réseau social très répandu en Amérique du Nord, offrant aux autorités policières une opportunité pour détecter les événements d’intérêt public. Les messages Twitter liés à un événement contiennent souvent les noms de rue où se déroule l’événement, ce qui permet la géolocalisation en temps réel.
Plusieurs logiciels commerciaux sont offerts pour effectuer la vigie des réseaux sociaux. L’efficacité de ces outils pour les autorités policières pourrait être grandement améliorée avec un accès à un plus grand échantillon de messages Twitter, avec un tri préalable pour dégager les événements pertinents en moins de temps et avec une mesure de la fiabilité des événements détectés.
Ce mémoire vise à proposer une démarche afin de détecter, à partir du flux de messages Twitter, les événements de sécurité publique d’un territoire, automatiquement et avec un niveau de fiabilité acceptable. Pour atteindre cet objectif, un modèle informatisé a été conçu, basé sur les quatre composantes suivantes: a) la cueillette de tweets à partir de mots clés avec un filtrage géographique, b) l’analyse linguistique et l’utilisation d’un répertoire de rues pour déceler les tweets localisables et pour trouver leurs coordonnées à partir des noms de rue et de leur intersection, c) une méthode spatio-temporelle pour former des grappes de tweets, et d) la détection des événements en identifiant les grappes contenant au moins deux (2) tweets communs touchant le même sujet.
Ce travail de recherche diffère des articles scientifiques recensés car il combine l’analyse textuelle, la recherche et le géocodage de toponymes à partir d’un répertoire de noms de rue, la formation de grappes avec la géomatique et l’identification de grappes contenant des tweets communs pour détecter localement des événements de sécurité publique.
L’application du modèle aux 90 347 tweets cueillis dans la région de Toronto-Niagara au Canada a résulté en l’identification et la géolocalisation de 1 614 tweets ainsi qu’en la formation de 172 grappes dont 79 grappes d’événements contenant au moins deux (2) tweets touchant le même sujet, soit un taux de fiabilité de 45,9 %.Abstract : Twitter is a social media that is very popular in North America, giving law enforcement
agencies an opportunity to detect events of public interest. Twitter messages (tweets) tied
to an event often contain street names, indicating where this event takes place, which can
be used to infer the event's geographical coordinates in real time.
Many commercial software tools are available to monitor social media. The performance
of these tools could be greatly improved with a larger sample of tweets, a sorting
mechanism to identify pertinent events more quickly and to measure the reliability of the
detected events.
The goal of this master‟s thesis is to detect, from a public Twitter stream, events relative
to public safety of a territory, automatically and with an acceptable level of reliability. To
achieve this objective, a computer model based on four components has been developed:
a) capture of public tweets based on keywords with the application of a geographic filter,
b) natural language processing of the text of these tweets, use of a street gazetteer to
identify tweets that can be localized and geocoding of tweets based on street names and
intersections, c) a spatio-temporal method to form tweet clusters and, d) event detection
by isolating clusters containing at least two tweets treating the same subject.
This research project differs from existing scientific research as it combines natural
language processing, search and geocoding of toponyms based on a street gazetteer, the
creation of clusters using geomatics and identification of event clusters based on common
tweets to detect public safety events in a Twitter public stream.
The application of the model to the 90,347 tweets collected for the Toronto-Niagara
region in Ontario, Canada has resulted in the identification and geocoding of 1,614 tweets
and the creation of 172 clusters from which 79 event clusters contain at least two tweets
having the same subject showing a reliability rate of 45.9 %
Open-Source Intelligence Investigations: Development and Application of Efficient Tools
Open-source intelligence is a branch within cybercrime investigation that focuses on information collection and aggregation. Through this aggregation, investigators and analysts can analyze the data for connections relevant to the investigation. There are many tools that assist with information collection and aggregation. However, these often require enterprise licensing. A solution to enterprise licensed tools is using open-source tools to collect information, often by scraping websites. These tools provide useful information, but they provide a large number of disjointed reports. The framework we developed automates information collection, aggregates these reports, and generates one single graphical report. By using a graphical report, the time required for analysis is also reduced. This framework can be used for different investigations. We performed a case study regarding the performance of the framework with missing person case information. It showed a significant improvement in the time required for information collection and report analysis
Human decision-making in computer security incident response
Background: Cybersecurity has risen to international importance. Almost every organization will fall victim to a successful cyberattack. Yet, guidance for computer security incident response analysts is inadequate. Research Questions: What heuristics should an incident analyst use to construct general knowledge and analyse attacks? Can we construct formal tools to enable automated decision support for the analyst with such heuristics and knowledge? Method: We take an interdisciplinary approach. To answer the first question, we use the research tradition of philosophy of science, specifically the study of mechanisms. To answer the question on formal tools, we use the research tradition of program verification and logic, specifically Separation Logic. Results: We identify several heuristics from biological sciences that cybersecurity researchers have re-invented to varying degrees. We consolidate the new mechanisms literature to yield heuristics related to the fact that knowledge is of clusters of multi-field mechanism schema on four dimensions. General knowledge structures such as the intrusion kill chain provide context and provide hypotheses for filling in details. The philosophical analysis answers this research question, and also provides constraints on building the logic. Finally, we succeed in defining an incident analysis logic resembling Separation Logic and translating the kill chain into it as a proof of concept. Conclusion: These results benefits incident analysis, enabling it to expand from a tradecraft or art to also integrate science. Future research might realize our logic into automated decision-support. Additionally, we have opened the field of cybersecuity to collaboration with philosophers of science and logicians