Search CORE

1,066 research outputs found

Thematic Annotation: extracting concepts out of documents

Author: Andrews Pierre
Rajman Martin
Publication venue
Publication date: 29/12/2004
Field of study

Contrarily to standard approaches to topic annotation, the technique used in this work does not centrally rely on some sort of -- possibly statistical -- keyword extraction. In fact, the proposed annotation algorithm uses a large scale semantic database -- the EDR Electronic Dictionary -- that provides a concept hierarchy based on hyponym and hypernym relations. This concept hierarchy is used to generate a synthetic representation of the document by aggregating the words present in topically homogeneous document segments into a set of concepts best preserving the document's content. This new extraction technique uses an unexplored approach to topic selection. Instead of using semantic similarity measures based on a semantic resource, the later is processed to extract the part of the conceptual hierarchy relevant to the document content. Then this conceptual hierarchy is searched to extract the most relevant set of concepts to represent the topics discussed in the document. Notice that this algorithm is able to extract generic concepts that are not directly present in the document.Comment: Technical report EPFL/LIA. 81 pages, 16 figure

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

AI-assisted patent prior art searching - feasibility study

Author: Setchi Rossi
Spasic Irena
Publication venue: The Intellectual Property Office
Publication date: 30/04/2020
Field of study

This study seeks to understand the feasibility, technical complexities and effectiveness of using artificial intelligence (AI) solutions to improve operational processes of registering IP rights. The Intellectual Property Office commissioned Cardiff University to undertake this research. The research was funded through the BEIS Regulators’ Pioneer Fund (RPF). The RPF fund was set up to help address barriers to innovation in the UK economy

Online Research @ Cardiff

- The Cases of Japan, South Korea, and China -

Author: 이가은
Publication venue: 서울대학교 대학원
Publication date: 01/02/2022
Field of study

학위논문(석사) -- 서울대학교대학원 : 행정대학원 행정학과(정책학전공), 2022.2. 구민교.Although the international community faces challenges with the rise of protectionism, World Trade Organization contributed to the expansion and stabilization of the world economy as a center of the international trade system. Among the key institutional pillars of the WTO, the Dispute Settlement Mechanism (DSM) has attracted major scholarly attention in contemporary research on trade organizations. Yet, this study focuses on the least studied Trade Policy Review Mechanism (TPRM), another key function to safeguard against protectionism. TPRM is a mechanism that imposes peer pressure, a social criticism related to 'naming and shaming' rather than oppressive sanctions, which can help raise awareness of member states' trade practices and polices and increase responsibility and transparency. Despite its significance, the main reasons for the lack of attention by trade scholars are semantic complexity of review reports and a vast amount of text. To overcome the existing limitations, this study analyzed TPR reports using information extraction (IE) techniques. A total of 18 TPR reports on the three East Asian trading partners (Japan, Korea, and China) were analyzed by Rapid Automation Keyword Extraction (RAKE) and TextRank algorithms. Based on this, major trade issues of the three countries were extracted. In the second phase, for an in-depth understanding and rich interpretation of the issue, a qualitative method of case study was conducted in accordance with peer pressure formation stages.보호주의 압력이 커지는 시대 속에서 국제사회는 여전히 거대한 도전에 직면해 있지만, WTO는 국제 무역 시스템의 중심으로서 세계 경제의 확대와 안정에 상당한 기여를 했다. 본 연구는 WTO의 핵심 제도적 축 가운데, 무역 정책 검토 메커니즘(TPRM)이 WTO의 또 다른 제도적 축인 분쟁 해결 메커니즘(DSM)에 비해 보호무역주의에 대항할 수 있는 보호 수단이라는 중요성에도 불구하고 가장 적게 연구되어 왔다는 점에 주목한다. TPRM은 물리적인 제재가 아닌 사회적 비판을 핵심으로 하는 메커니즘인 peer pressure를 기반으로 국제사회가 무역 정책과 관행에 대해 책임과 투명성을 높이는 데 도움을 줄 수 있다. 그럼에도 학술적 관심이 부족한 주된 이유는 검토 보고서의 미묘한 기술과 방대한 양의 택스트에 있다. 기존 한계를 뛰어넘고자, 본 연구는 1단계에서 정보 추출(IE) 기법을 사용하여 TPR을 분석했다. RAKE(Rapid Automation Keyword 추출) 및 TextRank 알고리즘을 이용하여 동아시아의 3대 교역국(한국, 중국, 일본)에 대한 총 18건의 TPR 보고서를 분석했으며, 이를 바탕으로 3국의 주요 통상 이슈를 추출하였다. 해당 이슈에 대한 심층적 이해와 풍부한 해석을 위해 2단계에서는 peer pressure 형성 단계에 따라 사례분석을 진행했다. 연구 결과는 일본, 한국, 중국의 주요 통상정책 패턴과 TPR의 영향, 그리고 그 과정에서 발생한 peer pressure의 형태 및 결과를 심층적으로 보여주며, 국제 무역 사회에 새로운 방향성을 제시하고자 한다.Chapter 1. Introduction 1 1.1 Study Background 1 1.2 Purpose of Research 2 Chapter 2. Theoretical Background and Literature Review 5 2.1 Transparency in Trade Environment 5 2.2 Trade Policy Review Mechanism 6 2.3 Relationship Between Three East Asian States 8 Chapter 3. Research Design 10 3.1 Conceptual Framework 10 3.2 Peer pressure mechanism 11 3.3 Data and Methodology 13 Chapter 4. Result of Text Mining 17 4.1 Analysis of Japan’s Text Mining Results 17 4.2 Analysis of South Korea’s Text Mining Results 20 4.3 Analysis of China’s Text Mining Results 23 Chapter 5. Case Study on the Trade Policy: Stages of Peer Pressure Formation 26 5.1 Japan's Trade Issue: Change in Position towards Regional Economic Integration 26 5.2 Korea's Trade Issue: Moratorium on Rice Tarrification 31 5.3 China's Trade Issue: The Government's Market Intervention via SOES 37 Chapter 6. Conclusion and Implications 42 References 45 Abstract in Korean 54석

SNU Open Repository and Archive

Constructing a Knowledge Graph for Vietnamese Legal Cases with Heterogeneous Graphs

Author: Hoang Minh-Quan
Nguyen Ha-Thanh
Nguyen Hoang-Trung
Nguyen Tan-Minh
Vuong Thi-Hai-Yen
Publication venue
Publication date: 16/09/2023
Field of study

This paper presents a knowledge graph construction method for legal case documents and related laws, aiming to organize legal information efficiently and enhance various downstream tasks. Our approach consists of three main steps: data crawling, information extraction, and knowledge graph deployment. First, the data crawler collects a large corpus of legal case documents and related laws from various sources, providing a rich database for further processing. Next, the information extraction step employs natural language processing techniques to extract entities such as courts, cases, domains, and laws, as well as their relationships from the unstructured text. Finally, the knowledge graph is deployed, connecting these entities based on their extracted relationships, creating a heterogeneous graph that effectively represents legal information and caters to users such as lawyers, judges, and scholars. The established baseline model leverages unsupervised learning methods, and by incorporating the knowledge graph, it demonstrates the ability to identify relevant laws for a given legal case. This approach opens up opportunities for various applications in the legal domain, such as legal case analysis, legal recommendation, and decision support.Comment: ISAILD@KSE 202

arXiv.org e-Print Archive

Exploring the State of the Art in Legal QA Systems

Author: Abdallah Abdelrahman
Jatowt Adam
Piryani Bhawna
Publication venue
Publication date: 13/04/2023
Field of study

Answering questions related to the legal domain is a complex task, primarily due to the intricate nature and diverse range of legal document systems. Providing an accurate answer to a legal query typically necessitates specialized knowledge in the relevant domain, which makes this task all the more challenging, even for human experts. QA (Question answering systems) are designed to generate answers to questions asked in human languages. They use natural language processing to understand questions and search through information to find relevant answers. QA has various practical applications, including customer service, education, research, and cross-lingual communication. However, they face challenges such as improving natural language understanding and handling complex and ambiguous questions. Answering questions related to the legal domain is a complex task, primarily due to the intricate nature and diverse range of legal document systems. Providing an accurate answer to a legal query typically necessitates specialized knowledge in the relevant domain, which makes this task all the more challenging, even for human experts. At this time, there is a lack of surveys that discuss legal question answering. To address this problem, we provide a comprehensive survey that reviews 14 benchmark datasets for question-answering in the legal field as well as presents a comprehensive review of the state-of-the-art Legal Question Answering deep learning models. We cover the different architectures and techniques used in these studies and the performance and limitations of these models. Moreover, we have established a public GitHub repository where we regularly upload the most recent articles, open data, and source code. The repository is available at: \url{https://github.com/abdoelsayed2016/Legal-Question-Answering-Review}

arXiv.org e-Print Archive

Keywords at Work: Investigating Keyword Extraction in Social Media Applications

Author: Lahiri Shibamouli
Publication venue
Publication date: 01/01/2018
Field of study

This dissertation examines a long-standing problem in Natural Language Processing (NLP) -- keyword extraction -- from a new angle. We investigate how keyword extraction can be formulated on social media data, such as emails, product reviews, student discussions, and student statements of purpose. We design novel graph-based features for supervised and unsupervised keyword extraction from emails, and use the resulting system with success to uncover patterns in a new dataset -- student statements of purpose. Furthermore, the system is used with new features on the problem of usage expression extraction from product reviews, where we obtain interesting insights. The system while used on student discussions, uncover new and exciting patterns. While each of the above problems is conceptually distinct, they share two key common elements -- keywords and social data. Social data can be messy, hard-to-interpret, and not easily amenable to existing NLP resources. We show that our system is robust enough in the face of such challenges to discover useful and important patterns. We also show that the problem definition of keyword extraction itself can be expanded to accommodate new and challenging research questions and datasets.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145929/1/lahiri_1.pd

Deep Blue Documents at the University of Michigan

Summarization of COVID-19 news documents deep learning-based using transformer architecture

Author: Ghufron Kharisma Muzaki
Hayatin Nur
Wicaksono Galih Wasis
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/06/2021
Field of study

Facing the news on the internet about the spreading of Corona virus disease 2019 (COVID-19) is challenging because it is required a long time to get valuable information from the news. Deep learning has a significant impact on NLP research. However, the deep learning models used in several studies, especially in document summary, still have a deficiency. For example, the maximum output of long text provides incorrectly. The other results are redundant, or the characters repeatedly appeared so that the resulting sentences were less organized, and the recall value obtained was low. This study aims to summarize using a deep learning model implemented to COVID-19 news documents. We proposed transformer as base language models with architectural modification as the basis for designing the model to improve results significantly in document summarization. We make a transformer-based architecture model with encoder and decoder that can be done several times repeatedly and make a comparison of layer modifications based on scoring. From the resulting experiment used, ROUGE-1 and ROUGE-2 show the good performance for the proposed model with scores 0.58 and 0.42, respectively, with a training time of 11438 seconds. The model proposed was evidently effective in improving result performance in abstractive document summarization

Journal of Education and Learning (EduLearn)

TELKOMNIKA (Telecommunication Computing Electronics and Control)

UAD Journal Management System