272 research outputs found
Document-level sentiment analysis of email data
Sisi Liu investigated machine learning methods for Email document sentiment analysis. She developed a systematic framework that has been qualitatively and quantitatively proved to be effective and efficient in identifying sentiment from massive amount of Email data. Analytical results obtained from the document-level Email sentiment analysis framework are beneficial for better decision making in various business settings
Cyber Security
This open access book constitutes the refereed proceedings of the 18th China Annual Conference on Cyber Security, CNCERT 2022, held in Beijing, China, in August 2022. The 17 papers presented were carefully reviewed and selected from 64 submissions. The papers are organized according to the following topical sections: data security; anomaly detection; cryptocurrency; information security; vulnerabilities; mobile internet; threat intelligence; text recognition
NLP-Based Techniques for Cyber Threat Intelligence
In the digital era, threat actors employ sophisticated techniques for which,
often, digital traces in the form of textual data are available. Cyber Threat
Intelligence~(CTI) is related to all the solutions inherent to data collection,
processing, and analysis useful to understand a threat actor's targets and
attack behavior. Currently, CTI is assuming an always more crucial role in
identifying and mitigating threats and enabling proactive defense strategies.
In this context, NLP, an artificial intelligence branch, has emerged as a
powerful tool for enhancing threat intelligence capabilities. This survey paper
provides a comprehensive overview of NLP-based techniques applied in the
context of threat intelligence. It begins by describing the foundational
definitions and principles of CTI as a major tool for safeguarding digital
assets. It then undertakes a thorough examination of NLP-based techniques for
CTI data crawling from Web sources, CTI data analysis, Relation Extraction from
cybersecurity data, CTI sharing and collaboration, and security threats of CTI.
Finally, the challenges and limitations of NLP in threat intelligence are
exhaustively examined, including data quality issues and ethical
considerations. This survey draws a complete framework and serves as a valuable
resource for security professionals and researchers seeking to understand the
state-of-the-art NLP-based threat intelligence techniques and their potential
impact on cybersecurity
Cyber Security
This open access book constitutes the refereed proceedings of the 18th China Annual Conference on Cyber Security, CNCERT 2022, held in Beijing, China, in August 2022. The 17 papers presented were carefully reviewed and selected from 64 submissions. The papers are organized according to the following topical sections: data security; anomaly detection; cryptocurrency; information security; vulnerabilities; mobile internet; threat intelligence; text recognition
On the use of Machine Learning and Deep Learning for Text Similarity and Categorization and its Application to Troubleshooting Automation
Troubleshooting is a labor-intensive task that includes repetitive solutions to similar problems. This task can be partially or fully automated using text-similarity matching to find previous solutions, lowering the workload of technicians. We develop a systematic literature review to identify the best approaches to solve the problem of troubleshooting automation and classify incidents effectively. We identify promising approaches and point in the direction of a comprehensive set of solutions that could be employed in solving the troubleshooting automation problem
Knowledge-based Biomedical Data Science 2019
Knowledge-based biomedical data science (KBDS) involves the design and
implementation of computer systems that act as if they knew about biomedicine.
Such systems depend on formally represented knowledge in computer systems,
often in the form of knowledge graphs. Here we survey the progress in the last
year in systems that use formally represented knowledge to address data science
problems in both clinical and biological domains, as well as on approaches for
creating knowledge graphs. Major themes include the relationships between
knowledge graphs and machine learning, the use of natural language processing,
and the expansion of knowledge-based approaches to novel domains, such as
Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages
with 3 table
Bringing order into the realm of Transformer-based language models for artificial intelligence and law
Transformer-based language models (TLMs) have widely been recognized to be a
cutting-edge technology for the successful development of deep-learning-based
solutions to problems and applications that require natural language processing
and understanding. Like for other textual domains, TLMs have indeed pushed the
state-of-the-art of AI approaches for many tasks of interest in the legal
domain. Despite the first Transformer model being proposed about six years ago,
there has been a rapid progress of this technology at an unprecedented rate,
whereby BERT and related models represent a major reference, also in the legal
domain. This article provides the first systematic overview of TLM-based
methods for AI-driven problems and tasks in the legal sphere. A major goal is
to highlight research advances in this field so as to understand, on the one
hand, how the Transformers have contributed to the success of AI in supporting
legal processes, and on the other hand, what are the current limitations and
opportunities for further research development.Comment: Please refer to the published version: Greco, C.M., Tagarelli, A.
(2023) Bringing order into the realm of Transformer-based language models for
artificial intelligence and law. Artif Intell Law, Springer Nature. November
2023. https://doi.org/10.1007/s10506-023-09374-
Mapping (Dis-)Information Flow about the MH17 Plane Crash
Digital media enables not only fast sharing of information, but also
disinformation. One prominent case of an event leading to circulation of
disinformation on social media is the MH17 plane crash. Studies analysing the
spread of information about this event on Twitter have focused on small,
manually annotated datasets, or used proxys for data annotation. In this work,
we examine to what extent text classifiers can be used to label data for
subsequent content analysis, in particular we focus on predicting pro-Russian
and pro-Ukrainian Twitter content related to the MH17 plane crash. Even though
we find that a neural classifier improves over a hashtag based baseline,
labeling pro-Russian and pro-Ukrainian content with high precision remains a
challenging problem. We provide an error analysis underlining the difficulty of
the task and identify factors that might help improve classification in future
work. Finally, we show how the classifier can facilitate the annotation task
for human annotators
Analyzing fluctuation of topics and public sentiment through social media data
Over the past decade years, Internet users were expending rapidly in the world. They form various online social networks through such Internet platforms as Twitter, Facebook and Instagram. These platforms provide a fast way that helps their users receive and disseminate information and express personal opinions in virtual space. When dealing with massive and chaotic social media data, how to accurately determine what events or concepts users are discussing is an interesting and important problem.
This dissertation work mainly consists of two parts. First, this research pays attention to mining the hidden topics and user interest trend by analyzing real-world social media activities. Topic modeling and sentiment analysis methods are proposed to classify the social media posts into different sentiment classes and then discover the trend of sentiment based on different topics over time. The presented case study focuses on COVID-19 pandemic that started in 2019. A large amount of Twitter data is collected and used to discover the vaccine-related topics during the pre- and post-vaccine emergency use period. By using the proposed framework, 11 vaccine-related trend topics are discovered. Ultimately the discovered topics can be used to improve the readability of confusing messages about vaccines on social media and provide effective results to support policymakers in making their policy their informed decisions about public health. Second, using conventional topic models cannot deal with the sparsity problem of short text. A novel topic model, named Topic Noise based-Biterm Topic Model with FastText embeddings (TN-BTMF), is proposed to deal with this problem. Word co-occurrence patterns (i.e. biterms) are dirctly generated in BTM. A scoring method based on word co-occurrence and semantic similarity is proposed to detect noise biterms. In th
- …