268 research outputs found

    A DATA DRIVEN APPROACH TO IDENTIFY JOURNALISTIC 5WS FROM TEXT DOCUMENTS

    Get PDF
    Textual understanding is the process of automatically extracting accurate high-quality information from text. The amount of textual data available from different sources such as news, blogs and social media is growing exponentially. These data encode significant latent information which if extracted accurately can be valuable in a variety of applications such as medical report analyses, news understanding and societal studies. Natural language processing techniques are often employed to develop customized algorithms to extract such latent information from text. Journalistic 5Ws refer to the basic information in news articles that describes an event and include where, when, who, what and why. Extracting them accurately may facilitate better understanding of many social processes including social unrest, human rights violations, propaganda spread, and population migration. Furthermore, the 5Ws information can be combined with socio-economic and demographic data to analyze state and trajectory of these processes. In this thesis, a data driven pipeline has been developed to extract the 5Ws from text using syntactic and semantic cues in the text. First, a classifier is developed to identify articles specifically related to social unrest. The classifier has been trained with a dataset of over 80K news articles. We then use NLP algorithms to generate a set of candidates for the 5Ws. Then, a series of algorithms to extract the 5Ws are developed. These algorithms based on heuristics leverage specific words and parts-of-speech customized for individual Ws to compute their scores. The heuristics are based on the syntactic structure of the document as well as syntactic and semantic representations of individual words and sentences. These scores are then combined and ranked to obtain the best answers to Journalistic 5Ws. The classification accuracy of the algorithms is validated using a manually annotated dataset of news articles

    Natural Language Processing: Emerging Neural Approaches and Applications

    Get PDF
    This Special Issue highlights the most recent research being carried out in the NLP field to discuss relative open issues, with a particular focus on both emerging approaches for language learning, understanding, production, and grounding interactively or autonomously from data in cognitive and neural systems, as well as on their potential or real applications in different domains

    A Survey on Semantic Processing Techniques

    Full text link
    Semantic processing is a fundamental research domain in computational linguistics. In the era of powerful pre-trained language models and large language models, the advancement of research in this domain appears to be decelerating. However, the study of semantics is multi-dimensional in linguistics. The research depth and breadth of computational semantic processing can be largely improved with new technologies. In this survey, we analyzed five semantic processing tasks, e.g., word sense disambiguation, anaphora resolution, named entity recognition, concept extraction, and subjectivity detection. We study relevant theoretical research in these fields, advanced methods, and downstream applications. We connect the surveyed tasks with downstream applications because this may inspire future scholars to fuse these low-level semantic processing tasks with high-level natural language processing tasks. The review of theoretical research may also inspire new tasks and technologies in the semantic processing domain. Finally, we compare the different semantic processing techniques and summarize their technical trends, application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN 1566-2535. The equal contribution mark is missed in the published version due to the publication policies. Please contact Prof. Erik Cambria for detail

    A Survey on Arabic Named Entity Recognition: Past, Recent Advances, and Future Trends

    Full text link
    As more and more Arabic texts emerged on the Internet, extracting important information from these Arabic texts is especially useful. As a fundamental technology, Named entity recognition (NER) serves as the core component in information extraction technology, while also playing a critical role in many other Natural Language Processing (NLP) systems, such as question answering and knowledge graph building. In this paper, we provide a comprehensive review of the development of Arabic NER, especially the recent advances in deep learning and pre-trained language model. Specifically, we first introduce the background of Arabic NER, including the characteristics of Arabic and existing resources for Arabic NER. Then, we systematically review the development of Arabic NER methods. Traditional Arabic NER systems focus on feature engineering and designing domain-specific rules. In recent years, deep learning methods achieve significant progress by representing texts via continuous vector representations. With the growth of pre-trained language model, Arabic NER yields better performance. Finally, we conclude the method gap between Arabic NER and NER methods from other languages, which helps outline future directions for Arabic NER.Comment: Accepted by IEEE TKD

    Adverse Drug Event Detection, Causality Inference, Patient Communication and Translational Research

    Get PDF
    Adverse drug events (ADEs) are injuries resulting from a medical intervention related to a drug. ADEs are responsible for nearly 20% of all the adverse events that occur in hospitalized patients. ADEs have been shown to increase the cost of health care and the length of stays in hospital. Therefore, detecting and preventing ADEs for pharmacovigilance is an important task that can improve the quality of health care and reduce the cost in a hospital setting. In this dissertation, we focus on the development of ADEtector, a system that identifies ADEs and medication information from electronic medical records and the FDA Adverse Event Reporting System reports. The ADEtector system employs novel natural language processing approaches for ADE detection and provides a user interface to display ADE information. The ADEtector employs machine learning techniques to automatically processes the narrative text and identify the adverse event (AE) and medication entities that appear in that narrative text. The system will analyze the entities recognized to infer the causal relation that exists between AEs and medications by automating the elements of Naranjo score using knowledge and rule based approaches. The Naranjo Adverse Drug Reaction Probability Scale is a validated tool for finding the causality of a drug induced adverse event or ADE. The scale calculates the likelihood of an adverse event related to drugs based on a list of weighted questions. The ADEtector also presents the user with evidence for ADEs by extracting figures that contain ADE related information from biomedical literature. A brief summary is generated for each of the figures that are extracted to help users better comprehend the figure. This will further enhance the user experience in understanding the ADE information better. The ADEtector also helps patients better understand the narrative text by recognizing complex medical jargon and abbreviations that appear in the text and providing definitions and explanations for them from external knowledge resources. This system could help clinicians and researchers in discovering novel ADEs and drug relations and also hypothesize new research questions within the ADE domain
    • …
    corecore