2,040 research outputs found

    NLP-Based Techniques for Cyber Threat Intelligence

    Full text link
    In the digital era, threat actors employ sophisticated techniques for which, often, digital traces in the form of textual data are available. Cyber Threat Intelligence~(CTI) is related to all the solutions inherent to data collection, processing, and analysis useful to understand a threat actor's targets and attack behavior. Currently, CTI is assuming an always more crucial role in identifying and mitigating threats and enabling proactive defense strategies. In this context, NLP, an artificial intelligence branch, has emerged as a powerful tool for enhancing threat intelligence capabilities. This survey paper provides a comprehensive overview of NLP-based techniques applied in the context of threat intelligence. It begins by describing the foundational definitions and principles of CTI as a major tool for safeguarding digital assets. It then undertakes a thorough examination of NLP-based techniques for CTI data crawling from Web sources, CTI data analysis, Relation Extraction from cybersecurity data, CTI sharing and collaboration, and security threats of CTI. Finally, the challenges and limitations of NLP in threat intelligence are exhaustively examined, including data quality issues and ethical considerations. This survey draws a complete framework and serves as a valuable resource for security professionals and researchers seeking to understand the state-of-the-art NLP-based threat intelligence techniques and their potential impact on cybersecurity

    Navigating Healthcare Challenges Text Analytics, Data Integration, and Decision-Making in the COVID-19 Era

    Get PDF
    In the context of the COVID-19 pandemic, Integrated Healthcare Systems have emerged as crucial components in effectively managing healthcare challenges. This study delves into the multifaceted role of integrated systems, with a particular focus on the pivotal aspects of text analytics. An exploration of various applications of text analytics unfolds, shedding light on its diverse utility within the healthcare landscape. Extensive reviews of problems encountered by different organizations and insights gleaned from research contribute to a comprehensive understanding of the challenges faced by Health and Human Services (HHS). These challenges, intricately linked to issues such as hospital strains and consumers' personal experiences, are thoroughly examined to provide actionable solutions. A key emphasis is placed on the indispensability of data integration, and the abstract discusses how various analytic approaches can be strategically employed within a well-integrated database system. The nuances of implementing an integrated model are scrutinized, highlighting the primary challenges that organizations, particularly HHS, may encounter. Subsequently, potential solutions are presented, leveraging the power of OLAP to construct a dashboard tailored to address the identified problems. Beyond the technical intricacies, the abstract explores the ramifications of an integrated approach on decision-making processes within HHS. The discussion extends to the acceleration of decision-making possibilities, underlining the imperative need for timely and informed actions in the face of healthcare challenges. In essence, this study provides a nuanced exploration of the role of Integrated Healthcare Systems during the COVID-19 pandemic, incorporating insights from text analytics, data integration, and analytic methodologies. The findings aim to contribute valuable perspectives to healthcare organizations, particularly HHS, as they navigate and mitigate the complexities posed by the ongoing global health crisis

    A Topic Modeling Guided Approach for Semantic Knowledge Discovery in e-Commerce

    Get PDF
    The task of mining large unstructured text archives, extracting useful patterns and then organizing them into a knowledgebase has attained a great attention due to its vast array of immediate applications in business. Businesses thus demand new and efficient algorithms for leveraging potentially useful patterns from heterogeneous data sources that produce huge volumes of unstructured data. Due to the ability to bring out hidden themes from large text repositories, topic modeling algorithms attained significant attention in the recent past. This paper proposes an efficient and scalable method which is guided by topic modeling for extracting concepts and relationships from e-commerce product descriptions and organizing them into knowledgebase. Semantic graphs can be generated from such a knowledgebase on which meaning aware product discovery experience can be built for potential buyers. Extensive experiments using proposed unsupervised algorithms with e-commerce product descriptions collected from open web shows that our proposed method outperforms some of the existing methods of leveraging concepts and relationships so that efficient knowledgebase construction is possible

    Local Ranking Problem on the BrowseGraph

    Full text link
    The "Local Ranking Problem" (LRP) is related to the computation of a centrality-like rank on a local graph, where the scores of the nodes could significantly differ from the ones computed on the global graph. Previous work has studied LRP on the hyperlink graph but never on the BrowseGraph, namely a graph where nodes are webpages and edges are browsing transitions. Recently, this graph has received more and more attention in many different tasks such as ranking, prediction and recommendation. However, a web-server has only the browsing traffic performed on its pages (local BrowseGraph) and, as a consequence, the local computation can lead to estimation errors, which hinders the increasing number of applications in the state of the art. Also, although the divergence between the local and global ranks has been measured, the possibility of estimating such divergence using only local knowledge has been mainly overlooked. These aspects are of great interest for online service providers who want to: (i) gauge their ability to correctly assess the importance of their resources only based on their local knowledge, and (ii) take into account real user browsing fluxes that better capture the actual user interest than the static hyperlink network. We study the LRP problem on a BrowseGraph from a large news provider, considering as subgraphs the aggregations of browsing traces of users coming from different domains. We show that the distance between rankings can be accurately predicted based only on structural information of the local graph, being able to achieve an average rank correlation as high as 0.8
    • …
    corecore