1,066 research outputs found

    Thematic Annotation: extracting concepts out of documents

    Get PDF
    Contrarily to standard approaches to topic annotation, the technique used in this work does not centrally rely on some sort of -- possibly statistical -- keyword extraction. In fact, the proposed annotation algorithm uses a large scale semantic database -- the EDR Electronic Dictionary -- that provides a concept hierarchy based on hyponym and hypernym relations. This concept hierarchy is used to generate a synthetic representation of the document by aggregating the words present in topically homogeneous document segments into a set of concepts best preserving the document's content. This new extraction technique uses an unexplored approach to topic selection. Instead of using semantic similarity measures based on a semantic resource, the later is processed to extract the part of the conceptual hierarchy relevant to the document content. Then this conceptual hierarchy is searched to extract the most relevant set of concepts to represent the topics discussed in the document. Notice that this algorithm is able to extract generic concepts that are not directly present in the document.Comment: Technical report EPFL/LIA. 81 pages, 16 figure

    AI-assisted patent prior art searching - feasibility study

    Get PDF
    This study seeks to understand the feasibility, technical complexities and effectiveness of using artificial intelligence (AI) solutions to improve operational processes of registering IP rights. The Intellectual Property Office commissioned Cardiff University to undertake this research. The research was funded through the BEIS Regulatorsโ€™ Pioneer Fund (RPF). The RPF fund was set up to help address barriers to innovation in the UK economy

    - The Cases of Japan, South Korea, and China -

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ํ–‰์ •๋Œ€ํ•™์› ํ–‰์ •ํ•™๊ณผ(์ •์ฑ…ํ•™์ „๊ณต), 2022.2. ๊ตฌ๋ฏผ๊ต.Although the international community faces challenges with the rise of protectionism, World Trade Organization contributed to the expansion and stabilization of the world economy as a center of the international trade system. Among the key institutional pillars of the WTO, the Dispute Settlement Mechanism (DSM) has attracted major scholarly attention in contemporary research on trade organizations. Yet, this study focuses on the least studied Trade Policy Review Mechanism (TPRM), another key function to safeguard against protectionism. TPRM is a mechanism that imposes peer pressure, a social criticism related to 'naming and shaming' rather than oppressive sanctions, which can help raise awareness of member states' trade practices and polices and increase responsibility and transparency. Despite its significance, the main reasons for the lack of attention by trade scholars are semantic complexity of review reports and a vast amount of text. To overcome the existing limitations, this study analyzed TPR reports using information extraction (IE) techniques. A total of 18 TPR reports on the three East Asian trading partners (Japan, Korea, and China) were analyzed by Rapid Automation Keyword Extraction (RAKE) and TextRank algorithms. Based on this, major trade issues of the three countries were extracted. In the second phase, for an in-depth understanding and rich interpretation of the issue, a qualitative method of case study was conducted in accordance with peer pressure formation stages.๋ณดํ˜ธ์ฃผ์˜ ์••๋ ฅ์ด ์ปค์ง€๋Š” ์‹œ๋Œ€ ์†์—์„œ ๊ตญ์ œ์‚ฌํšŒ๋Š” ์—ฌ์ „ํžˆ ๊ฑฐ๋Œ€ํ•œ ๋„์ „์— ์ง๋ฉดํ•ด ์žˆ์ง€๋งŒ, WTO๋Š” ๊ตญ์ œ ๋ฌด์—ญ ์‹œ์Šคํ…œ์˜ ์ค‘์‹ฌ์œผ๋กœ์„œ ์„ธ๊ณ„ ๊ฒฝ์ œ์˜ ํ™•๋Œ€์™€ ์•ˆ์ •์— ์ƒ๋‹นํ•œ ๊ธฐ์—ฌ๋ฅผ ํ–ˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” WTO์˜ ํ•ต์‹ฌ ์ œ๋„์  ์ถ• ๊ฐ€์šด๋ฐ, ๋ฌด์—ญ ์ •์ฑ… ๊ฒ€ํ†  ๋ฉ”์ปค๋‹ˆ์ฆ˜(TPRM)์ด WTO์˜ ๋˜ ๋‹ค๋ฅธ ์ œ๋„์  ์ถ•์ธ ๋ถ„์Ÿ ํ•ด๊ฒฐ ๋ฉ”์ปค๋‹ˆ์ฆ˜(DSM)์— ๋น„ํ•ด ๋ณดํ˜ธ๋ฌด์—ญ์ฃผ์˜์— ๋Œ€ํ•ญํ•  ์ˆ˜ ์žˆ๋Š” ๋ณดํ˜ธ ์ˆ˜๋‹จ์ด๋ผ๋Š” ์ค‘์š”์„ฑ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ๊ฐ€์žฅ ์ ๊ฒŒ ์—ฐ๊ตฌ๋˜์–ด ์™”๋‹ค๋Š” ์ ์— ์ฃผ๋ชฉํ•œ๋‹ค. TPRM์€ ๋ฌผ๋ฆฌ์ ์ธ ์ œ์žฌ๊ฐ€ ์•„๋‹Œ ์‚ฌํšŒ์  ๋น„ํŒ์„ ํ•ต์‹ฌ์œผ๋กœ ํ•˜๋Š” ๋ฉ”์ปค๋‹ˆ์ฆ˜์ธ peer pressure๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ตญ์ œ์‚ฌํšŒ๊ฐ€ ๋ฌด์—ญ ์ •์ฑ…๊ณผ ๊ด€ํ–‰์— ๋Œ€ํ•ด ์ฑ…์ž„๊ณผ ํˆฌ๋ช…์„ฑ์„ ๋†’์ด๋Š” ๋ฐ ๋„์›€์„ ์ค„ ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋Ÿผ์—๋„ ํ•™์ˆ ์  ๊ด€์‹ฌ์ด ๋ถ€์กฑํ•œ ์ฃผ๋œ ์ด์œ ๋Š” ๊ฒ€ํ†  ๋ณด๊ณ ์„œ์˜ ๋ฏธ๋ฌ˜ํ•œ ๊ธฐ์ˆ ๊ณผ ๋ฐฉ๋Œ€ํ•œ ์–‘์˜ ํƒ์ŠคํŠธ์— ์žˆ๋‹ค. ๊ธฐ์กด ํ•œ๊ณ„๋ฅผ ๋›ฐ์–ด๋„˜๊ณ ์ž, ๋ณธ ์—ฐ๊ตฌ๋Š” 1๋‹จ๊ณ„์—์„œ ์ •๋ณด ์ถ”์ถœ(IE) ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ TPR์„ ๋ถ„์„ํ–ˆ๋‹ค. RAKE(Rapid Automation Keyword ์ถ”์ถœ) ๋ฐ TextRank ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ด์šฉํ•˜์—ฌ ๋™์•„์‹œ์•„์˜ 3๋Œ€ ๊ต์—ญ๊ตญ(ํ•œ๊ตญ, ์ค‘๊ตญ, ์ผ๋ณธ)์— ๋Œ€ํ•œ ์ด 18๊ฑด์˜ TPR ๋ณด๊ณ ์„œ๋ฅผ ๋ถ„์„ํ–ˆ์œผ๋ฉฐ, ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ 3๊ตญ์˜ ์ฃผ์š” ํ†ต์ƒ ์ด์Šˆ๋ฅผ ์ถ”์ถœํ•˜์˜€๋‹ค. ํ•ด๋‹น ์ด์Šˆ์— ๋Œ€ํ•œ ์‹ฌ์ธต์  ์ดํ•ด์™€ ํ’๋ถ€ํ•œ ํ•ด์„์„ ์œ„ํ•ด 2๋‹จ๊ณ„์—์„œ๋Š” peer pressure ํ˜•์„ฑ ๋‹จ๊ณ„์— ๋”ฐ๋ผ ์‚ฌ๋ก€๋ถ„์„์„ ์ง„ํ–‰ํ–ˆ๋‹ค. ์—ฐ๊ตฌ ๊ฒฐ๊ณผ๋Š” ์ผ๋ณธ, ํ•œ๊ตญ, ์ค‘๊ตญ์˜ ์ฃผ์š” ํ†ต์ƒ์ •์ฑ… ํŒจํ„ด๊ณผ TPR์˜ ์˜ํ–ฅ, ๊ทธ๋ฆฌ๊ณ  ๊ทธ ๊ณผ์ •์—์„œ ๋ฐœ์ƒํ•œ peer pressure์˜ ํ˜•ํƒœ ๋ฐ ๊ฒฐ๊ณผ๋ฅผ ์‹ฌ์ธต์ ์œผ๋กœ ๋ณด์—ฌ์ฃผ๋ฉฐ, ๊ตญ์ œ ๋ฌด์—ญ ์‚ฌํšŒ์— ์ƒˆ๋กœ์šด ๋ฐฉํ–ฅ์„ฑ์„ ์ œ์‹œํ•˜๊ณ ์ž ํ•œ๋‹ค.Chapter 1. Introduction 1 1.1 Study Background 1 1.2 Purpose of Research 2 Chapter 2. Theoretical Background and Literature Review 5 2.1 Transparency in Trade Environment 5 2.2 Trade Policy Review Mechanism 6 2.3 Relationship Between Three East Asian States 8 Chapter 3. Research Design 10 3.1 Conceptual Framework 10 3.2 Peer pressure mechanism 11 3.3 Data and Methodology 13 Chapter 4. Result of Text Mining 17 4.1 Analysis of Japanโ€™s Text Mining Results 17 4.2 Analysis of South Koreaโ€™s Text Mining Results 20 4.3 Analysis of Chinaโ€™s Text Mining Results 23 Chapter 5. Case Study on the Trade Policy: Stages of Peer Pressure Formation 26 5.1 Japan's Trade Issue: Change in Position towards Regional Economic Integration 26 5.2 Korea's Trade Issue: Moratorium on Rice Tarrification 31 5.3 China's Trade Issue: The Government's Market Intervention via SOES 37 Chapter 6. Conclusion and Implications 42 References 45 Abstract in Korean 54์„

    Constructing a Knowledge Graph for Vietnamese Legal Cases with Heterogeneous Graphs

    Full text link
    This paper presents a knowledge graph construction method for legal case documents and related laws, aiming to organize legal information efficiently and enhance various downstream tasks. Our approach consists of three main steps: data crawling, information extraction, and knowledge graph deployment. First, the data crawler collects a large corpus of legal case documents and related laws from various sources, providing a rich database for further processing. Next, the information extraction step employs natural language processing techniques to extract entities such as courts, cases, domains, and laws, as well as their relationships from the unstructured text. Finally, the knowledge graph is deployed, connecting these entities based on their extracted relationships, creating a heterogeneous graph that effectively represents legal information and caters to users such as lawyers, judges, and scholars. The established baseline model leverages unsupervised learning methods, and by incorporating the knowledge graph, it demonstrates the ability to identify relevant laws for a given legal case. This approach opens up opportunities for various applications in the legal domain, such as legal case analysis, legal recommendation, and decision support.Comment: ISAILD@KSE 202

    Exploring the State of the Art in Legal QA Systems

    Full text link
    Answering questions related to the legal domain is a complex task, primarily due to the intricate nature and diverse range of legal document systems. Providing an accurate answer to a legal query typically necessitates specialized knowledge in the relevant domain, which makes this task all the more challenging, even for human experts. QA (Question answering systems) are designed to generate answers to questions asked in human languages. They use natural language processing to understand questions and search through information to find relevant answers. QA has various practical applications, including customer service, education, research, and cross-lingual communication. However, they face challenges such as improving natural language understanding and handling complex and ambiguous questions. Answering questions related to the legal domain is a complex task, primarily due to the intricate nature and diverse range of legal document systems. Providing an accurate answer to a legal query typically necessitates specialized knowledge in the relevant domain, which makes this task all the more challenging, even for human experts. At this time, there is a lack of surveys that discuss legal question answering. To address this problem, we provide a comprehensive survey that reviews 14 benchmark datasets for question-answering in the legal field as well as presents a comprehensive review of the state-of-the-art Legal Question Answering deep learning models. We cover the different architectures and techniques used in these studies and the performance and limitations of these models. Moreover, we have established a public GitHub repository where we regularly upload the most recent articles, open data, and source code. The repository is available at: \url{https://github.com/abdoelsayed2016/Legal-Question-Answering-Review}

    Keywords at Work: Investigating Keyword Extraction in Social Media Applications

    Full text link
    This dissertation examines a long-standing problem in Natural Language Processing (NLP) -- keyword extraction -- from a new angle. We investigate how keyword extraction can be formulated on social media data, such as emails, product reviews, student discussions, and student statements of purpose. We design novel graph-based features for supervised and unsupervised keyword extraction from emails, and use the resulting system with success to uncover patterns in a new dataset -- student statements of purpose. Furthermore, the system is used with new features on the problem of usage expression extraction from product reviews, where we obtain interesting insights. The system while used on student discussions, uncover new and exciting patterns. While each of the above problems is conceptually distinct, they share two key common elements -- keywords and social data. Social data can be messy, hard-to-interpret, and not easily amenable to existing NLP resources. We show that our system is robust enough in the face of such challenges to discover useful and important patterns. We also show that the problem definition of keyword extraction itself can be expanded to accommodate new and challenging research questions and datasets.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145929/1/lahiri_1.pd

    Summarization of COVID-19 news documents deep learning-based using transformer architecture

    Get PDF
    Facing the news on the internet about the spreading of Corona virus disease 2019 (COVID-19) is challenging because it is required a long time to get valuable information from the news. Deep learning has a significant impact on NLP research. However, the deep learning models used in several studies, especially in document summary, still have a deficiency. For example, the maximum output of long text provides incorrectly. The other results are redundant, or the characters repeatedly appeared so that the resulting sentences were less organized, and the recall value obtained was low. This study aims to summarize using a deep learning model implemented to COVID-19 news documents. We proposed transformer as base language models with architectural modification as the basis for designing the model to improve results significantly in document summarization. We make a transformer-based architecture model with encoder and decoder that can be done several times repeatedly and make a comparison of layer modifications based on scoring. From the resulting experiment used, ROUGE-1 and ROUGE-2 show the good performance for the proposed model with scores 0.58 and 0.42, respectively, with a training time of 11438 seconds. The model proposed was evidently effective in improving result performance in abstractive document summarization
    • โ€ฆ
    corecore