Search CORE

272 research outputs found

Document-level sentiment analysis of email data

Author: Liu Sisi
Publication venue
Publication date: 01/01/2020
Field of study

Sisi Liu investigated machine learning methods for Email document sentiment analysis. She developed a systematic framework that has been qualitatively and quantitatively proved to be effective and efficient in identifying sentiment from massive amount of Email data. Analytical results obtained from the document-level Email sentiment analysis framework are beneficial for better decision making in various business settings

ResearchOnline@JCU

ResearchOnline at James Cook University

Cyber Security

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/12/2022
Field of study

This open access book constitutes the refereed proceedings of the 18th China Annual Conference on Cyber Security, CNCERT 2022, held in Beijing, China, in August 2022. The 17 papers presented were carefully reviewed and selected from 64 submissions. The papers are organized according to the following topical sections: data security; anomaly detection; cryptocurrency; information security; vulnerabilities; mobile internet; threat intelligence; text recognition

Directory of Open Access Books (DOAB)

NLP-Based Techniques for Cyber Threat Intelligence

Author: A. Rafidha Rehiman K.
Arazzi Marco
Arikkat Dincy R.
Conti Mauro
Nicolazzo Serena
Nocera Antonino
P. Vinod
Publication venue
Publication date: 15/11/2023
Field of study

In the digital era, threat actors employ sophisticated techniques for which, often, digital traces in the form of textual data are available. Cyber Threat Intelligence~(CTI) is related to all the solutions inherent to data collection, processing, and analysis useful to understand a threat actor's targets and attack behavior. Currently, CTI is assuming an always more crucial role in identifying and mitigating threats and enabling proactive defense strategies. In this context, NLP, an artificial intelligence branch, has emerged as a powerful tool for enhancing threat intelligence capabilities. This survey paper provides a comprehensive overview of NLP-based techniques applied in the context of threat intelligence. It begins by describing the foundational definitions and principles of CTI as a major tool for safeguarding digital assets. It then undertakes a thorough examination of NLP-based techniques for CTI data crawling from Web sources, CTI data analysis, Relation Extraction from cybersecurity data, CTI sharing and collaboration, and security threats of CTI. Finally, the challenges and limitations of NLP in threat intelligence are exhaustively examined, including data quality issues and ethical considerations. This survey draws a complete framework and serves as a valuable resource for security professionals and researchers seeking to understand the state-of-the-art NLP-based threat intelligence techniques and their potential impact on cybersecurity

arXiv.org e-Print Archive

Cyber Security

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

OAPEN Library

On the use of Machine Learning and Deep Learning for Text Similarity and Categorization and its Application to Troubleshooting Automation

Author: Callegari Daniel
Couto Julia
Godoy Julia
Kniest Davi
Meneguzzi Felipe
Ruiz Duncan
Tomaz Laura
Publication venue: 'HICSS Conference Office'
Publication date: 03/01/2022
Field of study

Troubleshooting is a labor-intensive task that includes repetitive solutions to similar problems. This task can be partially or fully automated using text-similarity matching to find previous solutions, lowering the workload of technicians. We develop a systematic literature review to identify the best approaches to solve the problem of troubleshooting automation and classify incidents effectively. We identify promising approaches and point in the direction of a comprehensive set of solutions that could be employed in solving the troubleshooting automation problem

ScholarSpace at University of Hawai'i at Manoa

AIS Electronic Library (AISeL)

Knowledge-based Biomedical Data Science 2019

Author: Callahan Tiffany J.
Hunter Lawrence E.
Pielke-Lombardo Harrison
Tripodi Ignacio J.
Publication venue
Publication date: 08/10/2019
Field of study

Knowledge-based biomedical data science (KBDS) involves the design and implementation of computer systems that act as if they knew about biomedicine. Such systems depend on formally represented knowledge in computer systems, often in the form of knowledge graphs. Here we survey the progress in the last year in systems that use formally represented knowledge to address data science problems in both clinical and biological domains, as well as on approaches for creating knowledge graphs. Major themes include the relationships between knowledge graphs and machine learning, the use of natural language processing, and the expansion of knowledge-based approaches to novel domains, such as Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages with 3 table

arXiv.org e-Print Archive

Bringing order into the realm of Transformer-based language models for artificial intelligence and law

Author: Greco Candida M.
Tagarelli Andrea
Publication venue
Publication date: 03/02/2024
Field of study

Transformer-based language models (TLMs) have widely been recognized to be a cutting-edge technology for the successful development of deep-learning-based solutions to problems and applications that require natural language processing and understanding. Like for other textual domains, TLMs have indeed pushed the state-of-the-art of AI approaches for many tasks of interest in the legal domain. Despite the first Transformer model being proposed about six years ago, there has been a rapid progress of this technology at an unprecedented rate, whereby BERT and related models represent a major reference, also in the legal domain. This article provides the first systematic overview of TLM-based methods for AI-driven problems and tasks in the legal sphere. A major goal is to highlight research advances in this field so as to understand, on the one hand, how the Transformers have contributed to the success of AI in supporting legal processes, and on the other hand, what are the current limitations and opportunities for further research development.Comment: Please refer to the published version: Greco, C.M., Tagarelli, A. (2023) Bringing order into the realm of Transformer-based language models for artificial intelligence and law. Artif Intell Law, Springer Nature. November 2023. https://doi.org/10.1007/s10506-023-09374-

arXiv.org e-Print Archive

Mapping (Dis-)Information Flow about the MH17 Plane Crash

Author: Augenstein Isabelle
Golovchenko Yevgeniy
Hartmann Mareike
Publication venue
Publication date: 01/01/2019
Field of study

Digital media enables not only fast sharing of information, but also disinformation. One prominent case of an event leading to circulation of disinformation on social media is the MH17 plane crash. Studies analysing the spread of information about this event on Twitter have focused on small, manually annotated datasets, or used proxys for data annotation. In this work, we examine to what extent text classifiers can be used to label data for subsequent content analysis, in particular we focus on predicting pro-Russian and pro-Ukrainian Twitter content related to the MH17 plane crash. Even though we find that a neural classifier improves over a hashtag based baseline, labeling pro-Russian and pro-Ukrainian content with high precision remains a challenging problem. We provide an error analysis underlining the difficulty of the task and identify factors that might help improve classification in future work. Finally, we show how the classifier can facilitate the annotation task for human annotators

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Analyzing fluctuation of topics and public sentiment through social media data

Author: Liu Haoyue
Publication venue: Digital Commons @ NJIT
Publication date: 31/08/2022
Field of study

Over the past decade years, Internet users were expending rapidly in the world. They form various online social networks through such Internet platforms as Twitter, Facebook and Instagram. These platforms provide a fast way that helps their users receive and disseminate information and express personal opinions in virtual space. When dealing with massive and chaotic social media data, how to accurately determine what events or concepts users are discussing is an interesting and important problem. This dissertation work mainly consists of two parts. First, this research pays attention to mining the hidden topics and user interest trend by analyzing real-world social media activities. Topic modeling and sentiment analysis methods are proposed to classify the social media posts into different sentiment classes and then discover the trend of sentiment based on different topics over time. The presented case study focuses on COVID-19 pandemic that started in 2019. A large amount of Twitter data is collected and used to discover the vaccine-related topics during the pre- and post-vaccine emergency use period. By using the proposed framework, 11 vaccine-related trend topics are discovered. Ultimately the discovered topics can be used to improve the readability of confusing messages about vaccines on social media and provide effective results to support policymakers in making their policy their informed decisions about public health. Second, using conventional topic models cannot deal with the sparsity problem of short text. A novel topic model, named Topic Noise based-Biterm Topic Model with FastText embeddings (TN-BTMF), is proposed to deal with this problem. Word co-occurrence patterns (i.e. biterms) are dirctly generated in BTM. A scoring method based on word co-occurrence and semantic similarity is proposed to detect noise biterms. In th

Digital Commons @ New Jersey Institute of Technology (NJIT)