Search CORE

21 research outputs found

Non-Standard Vietnamese Word Detection and Normalization for Text-to-Speech

Author: Dang Huu-Tien
Phan Xuan-Hieu
Vuong Thi-Hai-Yen
Publication venue
Publication date: 07/09/2022
Field of study

Converting written texts into their spoken forms is an essential problem in any text-to-speech (TTS) systems. However, building an effective text normalization solution for a real-world TTS system face two main challenges: (1) the semantic ambiguity of non-standard words (NSWs), e.g., numbers, dates, ranges, scores, abbreviations, and (2) transforming NSWs into pronounceable syllables, such as URL, email address, hashtag, and contact name. In this paper, we propose a new two-phase normalization approach to deal with these challenges. First, a model-based tagger is designed to detect NSWs. Then, depending on NSW types, a rule-based normalizer expands those NSWs into their final verbal forms. We conducted three empirical experiments for NSW detection using Conditional Random Fields (CRFs), BiLSTM-CNN-CRF, and BERT-BiGRU-CRF models on a manually annotated dataset including 5819 sentences extracted from Vietnamese news articles. In the second phase, we propose a forward lexicon-based maximum matching algorithm to split down the hashtag, email, URL, and contact name. The experimental results of the tagging phase show that the average F1 scores of the BiLSTM-CNN-CRF and CRF models are above 90.00%, reaching the highest F1 of 95.00% with the BERT-BiGRU-CRF model. Overall, our approach has low sentence error rates, at 8.15% with CRF and 7.11% with BiLSTM-CNN-CRF taggers, and only 6.67% with BERT-BiGRU-CRF tagger.Comment: The 14th International Conference on Knowledge and Systems Engineering (KSE 2022

arXiv.org e-Print Archive

RMDM: A Multilabel Fakenews Dataset for Vietnamese Evidence Verification

Author: Le Thai-Son
Nguyen Ha-Thanh
Nguyen Hai-Long
Nguyen Tan-Minh
Pham Thi-Kieu-Trang
Vuong Thi-Hai-Yen
Publication venue
Publication date: 16/09/2023
Field of study

In this study, we present a novel and challenging multilabel Vietnamese dataset (RMDM) designed to assess the performance of large language models (LLMs), in verifying electronic information related to legal contexts, focusing on fake news as potential input for electronic evidence. The RMDM dataset comprises four labels: real, mis, dis, and mal, representing real information, misinformation, disinformation, and mal-information, respectively. By including these diverse labels, RMDM captures the complexities of differing fake news categories and offers insights into the abilities of different language models to handle various types of information that could be part of electronic evidence. The dataset consists of a total of 1,556 samples, with 389 samples for each label. Preliminary tests on the dataset using GPT-based and BERT-based models reveal variations in the models' performance across different labels, indicating that the dataset effectively challenges the ability of various language models to verify the authenticity of such information. Our findings suggest that verifying electronic information related to legal contexts, including fake news, remains a difficult problem for language models, warranting further attention from the research community to advance toward more reliable AI models for potential legal applications.Comment: ISAILD@KSE 202

arXiv.org e-Print Archive

Constructing a Knowledge Graph for Vietnamese Legal Cases with Heterogeneous Graphs

Author: Hoang Minh-Quan
Nguyen Ha-Thanh
Nguyen Hoang-Trung
Nguyen Tan-Minh
Vuong Thi-Hai-Yen
Publication venue
Publication date: 16/09/2023
Field of study

This paper presents a knowledge graph construction method for legal case documents and related laws, aiming to organize legal information efficiently and enhance various downstream tasks. Our approach consists of three main steps: data crawling, information extraction, and knowledge graph deployment. First, the data crawler collects a large corpus of legal case documents and related laws from various sources, providing a rich database for further processing. Next, the information extraction step employs natural language processing techniques to extract entities such as courts, cases, domains, and laws, as well as their relationships from the unstructured text. Finally, the knowledge graph is deployed, connecting these entities based on their extracted relationships, creating a heterogeneous graph that effectively represents legal information and caters to users such as lawyers, judges, and scholars. The established baseline model leverages unsupervised learning methods, and by incorporating the knowledge graph, it demonstrates the ability to identify relevant laws for a given legal case. This approach opens up opportunities for various applications in the legal domain, such as legal case analysis, legal recommendation, and decision support.Comment: ISAILD@KSE 202

arXiv.org e-Print Archive

LBMT team at VLSP2022-Abmusu: Hybrid method with text correlation and generative models for Vietnamese multi-document summarization

Author: Nguyen Ha-Thanh
Nguyen Hai-Long
Nguyen Hoang-Trung
Nguyen Tan-Minh
Nguyen Thai-Binh
Thanh Tam Doan
Vuong Thi-Hai-Yen
Publication venue
Publication date: 11/04/2023
Field of study

Multi-document summarization is challenging because the summaries should not only describe the most important information from all documents but also provide a coherent interpretation of the documents. This paper proposes a method for multi-document summarization based on cluster similarity. In the extractive method we use hybrid model based on a modified version of the PageRank algorithm and a text correlation considerations mechanism. After generating summaries by selecting the most important sentences from each cluster, we apply BARTpho and ViT5 to construct the abstractive models. Both extractive and abstractive approaches were considered in this study. The proposed method achieves competitive results in VLSP 2022 competition.Comment: In Proceedings of the 9th International Workshop on Vietnamese Language and Speech Processing (VLSP 2022

arXiv.org e-Print Archive

NeCo@ALQAC 2023: Legal Domain Knowledge Acquisition for Low-Resource Languages through Data Enrichment

Author: Nguyen Dieu-Quynh
Nguyen Ha-Thanh
Nguyen Hai-Long
Nguyen Hoang-Trung
Nguyen Huu-Dong
Nguyen Thach-Anh
Pham Thu-Trang
Vuong Thi-Hai-Yen
Publication venue
Publication date: 11/09/2023
Field of study

In recent years, natural language processing has gained significant popularity in various sectors, including the legal domain. This paper presents NeCo Team's solutions to the Vietnamese text processing tasks provided in the Automated Legal Question Answering Competition 2023 (ALQAC 2023), focusing on legal domain knowledge acquisition for low-resource languages through data enrichment. Our methods for the legal document retrieval task employ a combination of similarity ranking and deep learning models, while for the second task, which requires extracting an answer from a relevant legal article in response to a question, we propose a range of adaptive techniques to handle different question types. Our approaches achieve outstanding results on both tasks of the competition, demonstrating the potential benefits and effectiveness of question answering systems in the legal field, particularly for low-resource languages.Comment: ISAILD@KSE 202

arXiv.org e-Print Archive

NOWJ1@ALQAC 2023: Enhancing Legal Task Performance with Classic Statistical Models and Pre-trained Language Models

Author: Hoang Minh-Quan
Mai Ngoc-Duy
Nguyen Ha-Thanh
Nguyen Hoang-Viet
Nguyen Tan-Minh
Nguyen Van-Huan
Nguyen Xuan-Hoa
Vuong Thi-Hai-Yen
Publication venue
Publication date: 16/09/2023
Field of study

This paper describes the NOWJ1 Team's approach for the Automated Legal Question Answering Competition (ALQAC) 2023, which focuses on enhancing legal task performance by integrating classical statistical models and Pre-trained Language Models (PLMs). For the document retrieval task, we implement a pre-processing step to overcome input limitations and apply learning-to-rank methods to consolidate features from various models. The question-answering task is split into two sub-tasks: sentence classification and answer extraction. We incorporate state-of-the-art models to develop distinct systems for each sub-task, utilizing both classic statistical models and pre-trained Language Models. Experimental results demonstrate the promising potential of our proposed methodology in the competition.Comment: ISAILD@KSE 202

arXiv.org e-Print Archive

Prospects for Food Fermentation in South-East Asia, Topics From the Tropical Fermentation and Biotechnology Network at the End of the AsiFood Erasmus+Project

Author: Anil Kumar Anal
Da Lorn
Da Lorn
Da Lorn
Da Lorn
Dinh-Vuong Mai
Dinh-Vuong Mai
Dinh-Vuong Mai
Dinh-Vuong Mai
Dinh-Vuong Mai
Dominique Valentin
Dominique Valentin
Duc-Chien Vu
Dzung-Hoang Nguyen
Hai-Vu Pham
Hai-Vu Pham
Hasika Mith
Hélène Licandro
Hélène Licandro
Hélène Licandro
Kitiya Vongkamjan
Mai-Huong Ly-Chatain
Maxime Haure
Maxime Haure
Maxime Haure
Maxime Haure
Nguyen-Thanh Vu
Phu-Ha Ho
Phu-Ha Ho
Quoc-Bao Vo-Van
Quyet-Tien Phi
Reasmey Tan
Sokny Ly
Son Chu-Ky
Son Chu-Ky
Sophal Try
Sophal Try
Sophal Try
Sophal Try
Thanh-Tam Phan
Thanh-Tam Phan
Thi-Bao-Hoa Do
Thi-Kim-Chi Nguyen
Thi-Kim-Chi Nguyen
Thi-Kim-Chi Nguyen
Thi-Minh-Tu Nguyen
Thi-Minh-Tu Nguyen
Thi-Thanh-Thuy Nguyen
Thi-Viet-Anh Nguyen
Thi-Yen Do
Thi-Yen Do
Thierry Tran
Thierry Tran
Thierry Tran
Thuy-Le Do
Tien-Nam Tien
Tuan-Anh Pham
Tuan-Anh Pham
Van-Viet-Man Le
Warapa Mahakarnchanakul
Wen-Jun Li
Yves Waché
Yves Waché
Yves Waché
Publication venue: 'Frontiers Media SA'
Publication date: 01/10/2018
Field of study

Fermentation has been used for centuries to produce food in South-East Asia and some foods of this region are famous in the whole world. However, in the twenty first century, issues like food safety and quality must be addressed in a world changing from local business to globalization. In Western countries, the answer to these questions has been made through hygienisation, generalization of the use of starters, specialization of agriculture and use of long-distance transportation. This may have resulted in a loss in the taste and typicity of the products, in an extensive use of antibiotics and other chemicals and eventually, in a loss in the confidence of consumers to the products. The challenges awaiting fermentation in South-East Asia are thus to improve safety and quality in a sustainable system producing tasty and typical fermented products and valorising by-products. At the end of the “AsiFood Erasmus+ project” (www.asifood.org), the goal of this paper is to present and discuss these challenges as addressed by the Tropical Fermentation Network, a group of researchers from universities, research centers and companies in Asia and Europe. This paper presents current actions and prospects on hygienic, environmental, sensorial and nutritional qualities of traditional fermented food including screening of functional bacteria and starters, food safety strategies, research for new antimicrobial compounds, development of more sustainable fermentations and valorisation of by-products. A specificity of this network is also the multidisciplinary approach dealing with microbiology, food, chemical, sensorial, and genetic analyses, biotechnology, food supply chain, consumers and ethnology

Directory of Open Access Journals

Awareness and preparedness of healthcare workers against the first wave of the COVID-19 pandemic: A cross-sectional survey across 57 countries.

Author: Abbas Kirellos Said
Abdul Aziz Jeza
Alhady Shamael Thabit Mohammed
Balogun Emmanuel Oluwadare
Chico R Matthew
contributors of the TMGH-Global COVID-19 Collaborative
Cox Sharon
Dat Vu Quoc
Dhouibi Nacir
Dong Vinh
Duc Nguyen Tran Minh
Dumre Shyam Prakash
Dung Tran Nu Thuy
Duru Vincent
Duy Nguyen The
Gad Abdelrahman
Ghozy Sherief
Giang Hoang Thi Nam
Hai Yen Tran
Hashan Mohammad Rashidul
Hirayama Kenji
Huan Vuong Thanh
Hue Nguyen Thi Linh
Hung Pham Dinh Long
Huy Nguyen Tien
Huynh Trang
Huynh Vy Thi Nhat
Imoto Atsuko
Jee Yap Siang
Karimzadeh Sedighe
Khue Bui Diem
Koonrungsesomboon Nut
Kubota Kazumi
Lee Peter N
Linh Le Khac
Luu Mai Ngoc
Matsui Mitsuaki
Mohamed Eltaras Mennatullah
Mohammed Ali Al-Ahdal Tareq
Moji Kazuhiko
Nam Nguyen Hai
Ng Sze Jia
Nguyen Hoang-Minh
Pavlenko Dmytro
Phan Truc
Phuong Dang Thuy Ha
Qarawi Ahmad Taysir Atieh
Quynh Tran Thuy Huong
Shah Jaffer
Shaikhkhalil Hosam Waleed
Sharma Akash
Smith Chris
Soliman Mohammed
Tam Dao Ngoc Hien
Tawfik Gehad Mohamed
Thi Nguyen Anh
TMGH-Global COVID-19 Collaborative
Trang Luong Thi
Trang Vu Thi Thu
Truong Le Van
Uyen Vuong Ngoc Thao
Vu Le Thuong
Vuong Nguyen Lam
Yen-Xuan Nguyen Thi
Publication venue: PLoS One
Publication date: 01/01/2021
Field of study

BACKGROUND: Since the COVID-19 pandemic began, there have been concerns related to the preparedness of healthcare workers (HCWs). This study aimed to describe the level of awareness and preparedness of hospital HCWs at the time of the first wave. METHODS: This multinational, multicenter, cross-sectional survey was conducted among hospital HCWs from February to May 2020. We used a hierarchical logistic regression multivariate analysis to adjust the influence of variables based on awareness and preparedness. We then used association rule mining to identify relationships between HCW confidence in handling suspected COVID-19 patients and prior COVID-19 case-management training. RESULTS: We surveyed 24,653 HCWs from 371 hospitals across 57 countries and received 17,302 responses from 70.2% HCWs overall. The median COVID-19 preparedness score was 11.0 (interquartile range [IQR] = 6.0-14.0) and the median awareness score was 29.6 (IQR = 26.6-32.6). HCWs at COVID-19 designated facilities with previous outbreak experience, or HCWs who were trained for dealing with the SARS-CoV-2 outbreak, had significantly higher levels of preparedness and awareness (p<0.001). Association rule mining suggests that nurses and doctors who had a 'great-extent-of-confidence' in handling suspected COVID-19 patients had participated in COVID-19 training courses. Male participants (mean difference = 0.34; 95% CI = 0.22, 0.46; p<0.001) and nurses (mean difference = 0.67; 95% CI = 0.53, 0.81; p<0.001) had higher preparedness scores compared to women participants and doctors. INTERPRETATION: There was an unsurprising high level of awareness and preparedness among HCWs who participated in COVID-19 training courses. However, disparity existed along the lines of gender and type of HCW. It is unknown whether the difference in COVID-19 preparedness that we detected early in the pandemic may have translated into disproportionate SARS-CoV-2 burden of disease by gender or HCW type

LSHTM Research Online

PubMed Central

EUR Research Repository

Apollo (Cambridge)

Archivio della ricerca- Università di Roma La Sapienza