4,510 research outputs found

    Exploring Text Mining and Analytics for Applications in Public Security: An in-depth dive into a systematic literature review

    Get PDF
    Text mining and related analytics emerge as a technological approach to support human activities in extracting useful knowledge through texts in several formats. From a managerial point of view, it can help organizations in planning and decision-making processes, providing information that was not previously evident through textual materials produced internally or even externally. In this context, within the public/governmental scope, public security agencies are great beneficiaries of the tools associated with text mining, in several aspects, from applications in the criminal area to the collection of people's opinions and sentiments about the actions taken to promote their welfare. This article reports details of a systematic literature review focused on identifying the main areas of text mining application in public security, the most recurrent technological tools, and future research directions. The searches covered four major article bases (Scopus, Web of Science, IEEE Xplore, and ACM Digital Library), selecting 194 materials published between 2014 and the first half of 2021, among journals, conferences, and book chapters. There were several findings concerning the targets of the literature review, as presented in the results of this article

    Flipping 419 Cybercrime Scams: Targeting the Weak and the Vulnerable

    Get PDF
    Most of cyberscam-related studies focus on threats perpetrated against the Western society, with a particular attention to the USA and Europe. Regrettably, no research has been done on scams targeting African countries, especially Nigeria, where the notorious and (in)famous 419 advanced-fee scam, targeted towards other countries, originated. How- ever, as we know, cybercrime is a global problem affecting all parties. In this study, we investigate a form of advance fee fraud scam unique to Nigeria and targeted at Nigerians, but unknown to the Western world. For the study, we rely substantially on almost two years worth of data harvested from an on-line discussion forum used by criminals. We complement this dataset with recent data from three other active forums to consolidate and generalize the research. We apply machine learning to the data to understand the criminals' modus operandi. We show that the criminals exploit the socio-political and economic problems prevalent in the country to craft various fraud schemes to defraud vulnerable groups such as secondary school students and unemployed graduates. The result of our research can help potential victims and policy makers to develop measures to counter the activities of these criminal groups

    Quantitative Legal Prediction--or--How I Learned to Stop Worrying and Start Preparing for the Data-Driven Future of the Legal Services Industry

    Get PDF
    Welcome to law\u27s information revolution-revolution already in progress

    Data Mining Tools and Techniques: a review

    Get PDF
    Data mining automates the detection of relevant patterns in a database, using defined approaches andalgorithms to look into current and historical data that can then be analyzed to predict future trends.Because data mining tools predict future trends and behaviors by reading through databases for hiddenpatterns, they allow organizations to make proactive, knowledge-driven decisions and answer questions thatwere previously too time-consuming to resolve. The data mining methods such as clustering, associationrules, sequential pattern, statistics analysis, characteristics rules and so on can be used to find out the usefulknowledge, enabling such data to become the real fortune of logistics companies and support theirdecisions and development. This paper introduces the significance use of data mining tools and techniquesin logistics management system, and its implications. Finally, it is pointed out that the data miningtechnology is becoming more and more powerful in logistics management.Keywords: Logistics management, Data Mining concepts, application areas, Tools and Technique

    Integration of Data Mining into Scientific Data Analysis Processes

    Get PDF
    In recent years, using advanced semi-interactive data analysis algorithms such as those from the field of data mining gained more and more importance in life science in general and in particular in bioinformatics, genetics, medicine and biodiversity. Today, there is a trend away from collecting and evaluating data in the context of a specific problem or study only towards extensively collecting data from different sources in repositories which is potentially useful for subsequent analysis, e.g. in the Gene Expression Omnibus (GEO) repository of high throughput gene expression data. At the time the data are collected, it is analysed in a specific context which influences the experimental design. However, the type of analyses that the data will be used for after they have been deposited is not known. Content and data format are focused only to the first experiment, but not to the future re-use. Thus, complex process chains are needed for the analysis of the data. Such process chains need to be supported by the environments that are used to setup analysis solutions. Building specialized software for each individual problem is not a solution, as this effort can only be carried out for huge projects running for several years. Hence, data mining functionality was developed to toolkits, which provide data mining functionality in form of a collection of different components. Depending on the different research questions of the users, the solutions consist of distinct compositions of these components. Today, existing solutions for data mining processes comprise different components that represent different steps in the analysis process. There exist graphical or script-based toolkits for combining such components. The data mining tools, which can serve as components in analysis processes, are based on single computer environments, local data sources and single users. However, analysis scenarios in medical- and bioinformatics have to deal with multi computer environments, distributed data sources and multiple users that have to cooperate. Users need support for integrating data mining into analysis processes in the context of such scenarios, which lacks today. Typically, analysts working with single computer environments face the problem of large data volumes since tools do not address scalability and access to distributed data sources. Distributed environments such as grid environments provide scalability and access to distributed data sources, but the integration of existing components into such environments is complex. In addition, new components often cannot be directly developed in distributed environments. Moreover, in scenarios involving multiple computers, multiple distributed data sources and multiple users, the reuse of components, scripts and analysis processes becomes more important as more steps and configuration are necessary and thus much bigger efforts are needed to develop and set-up a solution. In this thesis we will introduce an approach for supporting interactive and distributed data mining for multiple users based on infrastructure principles that allow building on data mining components and processes that are already available instead of designing of a completely new infrastructure, so that users can keep working with their well-known tools. In order to achieve the integration of data mining into scientific data analysis processes, this thesis proposes an stepwise approach of supporting the user in the development of analysis solutions that include data mining. We see our major contributions as the following: first, we propose an approach to integrate data mining components being developed for a single processor environment into grid environments. By this, we support users in reusing standard data mining components with small effort. The approach is based on a metadata schema definition which is used to grid-enable existing data mining components. Second, we describe an approach for interactively developing data mining scripts in grid environments. The approach efficiently supports users when it is necessary to enhance available components, to develop new data mining components, and to compose these components. Third, building on that, an approach for facilitating the reuse of existing data mining processes based on process patterns is presented. It supports users in scenarios that cover different steps of the data mining process including several components or scripts. The data mining process patterns support the description of data mining processes at different levels of abstraction between the CRISP model as most general and executable workflows as most concrete representation

    Keyword Assisted Topic Models

    Full text link
    For a long time, many social scientists have conducted content analysis by using their substantive knowledge and manually coding documents. In recent years, however, fully automated content analysis based on probabilistic topic models has become increasingly popular because of their scalability. Unfortunately, applied researchers find that these models often fail to yield topics of their substantive interest by inadvertently creating multiple topics with similar content and combining different themes into a single topic. In this paper, we empirically demonstrate that providing topic models with a small number of keywords can substantially improve their performance. The proposed keyword assisted topic model (keyATM) offers an important advantage that the specification of keywords requires researchers to label topics prior to fitting a model to the data. This contrasts with a widespread practice of post-hoc topic interpretation and adjustments that compromises the objectivity of empirical findings. In our applications, we find that the keyATM provides more interpretable results, has better document classification performance, and is less sensitive to the number of topics than the standard topic models. Finally, we show that the keyATM can also incorporate covariates and model time trends. An open-source software package is available for implementing the proposed methodology

    Adaptively selecting occupations to detect skill shortages from online job ads

    Full text link
    Labour demand and skill shortages have historically been difficult to assess given the high costs of conducting representative surveys and the inherent delays of these indicators. This is particularly consequential for fast developing skills and occupations, such as those relating to Data Science and Analytics (DSA). This paper develops a data-driven solution to detecting skill shortages from online job advertisements (ads) data. We first propose a method to generate sets of highly similar skills based on a set of seed skills from job ads. This provides researchers with a novel method to adaptively select occupations based on granular skills data. Next, we apply this adaptive skills similarity technique to a dataset of over 6.7 million Australian job ads in order to identify occupations with the highest proportions of DSA skills. This uncovers 306,577 DSA job ads across 23 occupational classes from 2012-2019. Finally, we propose five variables for detecting skill shortages from online job ads: (1) posting frequency; (2) salary levels; (3) education requirements; (4) experience demands; and (5) job ad posting predictability. This contributes further evidence to the goal of detecting skills shortages in real-time. In conducting this analysis, we also find strong evidence of skills shortages in Australia for highly technical DSA skills and occupations. These results provide insights to Data Science researchers, educators, and policy-makers from other advanced economies about the types of skills that should be cultivated to meet growing DSA labour demands in the future

    SeMi: A SEmantic Modeling machIne to build Knowledge Graphs with graph neural networks

    Get PDF
    SeMi (SEmantic Modeling machIne) is a tool to semi-automatically build large-scale Knowledge Graphs from structured sources such as CSV, JSON, and XML files. To achieve such a goal, SeMi builds the semantic models of the data sources, in terms of concepts and relations within a domain ontology. Most of the research contributions on automatic semantic modeling is focused on the detection of semantic types of source attributes. However, the inference of the correct semantic relations between these attributes is critical to reconstruct the precise meaning of the data. SeMi covers the entire process of semantic modeling: (i) it provides a semi-automatic step to detect semantic types; (ii) it exploits a novel approach to inference semantic relations, based on a graph neural network trained on background linked data. At the best of our knowledge, this is the first technique that exploits a graph neural network to support the semantic modeling process. Furthermore, the pipeline implemented in SeMi is modular and each component can be replaced to tailor the process to very specific domains or requirements. This contribution can be considered as a step ahead towards automatic and scalable approaches for building Knowledge Graphs
    • …
    corecore