848 research outputs found

    Recognizing Coordinate Structures for Machine Translation of English Patent Documents

    Get PDF
    PACLIC / The University of the Philippines Visayas Cebu College Cebu City, Philippines / November 20-22, 200

    Patent Claim Structure Recognition

    Get PDF
    In patents, the claims section is the most relevant part. It is written in a legal jargon, containing independent and dependent claims, forming a hierarchy. We present our work aimed at automatically identifying that hierarchy within complete patent claim texts. Beginning with a short introduction into patent claims and typical use cases for searching in claims, we proceed to show results from a preliminary context analysis with English claims from the European Patents Fulltext (EPFULL) database. We point out some possibilities with which claim dependency is indicated in the text and show a way of identifying them. Additionally, we describe several of the problems encountered, in particular problems resulting from noisy data. Finally, we show results from our internal evaluations, in which accuracies greater than 93% were measured. We also indicate areas of further research

    Automatic discourse structure generation using rhetorical structure theory

    Get PDF
    This thesis addresses a difficult problem in text processing: creating a System to automatically derive rhetorical structures of text. Although the rhetorical structure has proven to be useful in many fields of text processing such as text summarisation and information extraction, Systems that automatically generate rhetorical structures with high accuracy are difficult to find. This is because discourse is one of the biggest and yet least well defined areas in linguistics. An agreement amongst researchers on the best method for analysing the rhetorical structure of text has not been found. This thesis focuses on investigating a method to generate the rhetorical structures of text. By exploiting different cohesive devices, it proposes a method to recognise rhetorical relations between spans by checking for the appearance of these devices. These factors include cue phrases, noun-phrase cues, verb-phrase cues, reference words, time references, substitution words, ellipses, and syntactic information. The discourse analyser is divided into two levels: sentence-level and text-level. The former uses syntactic information and cue phrases to segment sentences into elementary discourse units and to generate a rhetorical structure for each sentence. The latter derives rhetorical relations between large spans and then replaces each sentence by its corresponding rhetorical structure to produce the rhetorical structure of text. The rhetorical structure at the text-level is derived by selecting rhetorical relations to connect adjacent and non-overlapping spans to form a discourse structure that covers the entire text. Constraints of textual organisation and textual adjacency are effectively used in a beam search to reduce the search space in generating such rhetorical structures. Experiments carried out in this research received 89.4% F-score for the discourse segmentation, 52.4% F-score for the sentence-level discourse analyser and 38.1% F-score for the final output of the System. It shows that this approach provides good performance comparison with current research in discourse

    Argumentative zoning information extraction from scientific text

    Get PDF
    Let me tell you, writing a thesis is not always a barrel of laughs—and strange things can happen, too. For example, at the height of my thesis paranoia, I had a re-current dream in which my cat Amy gave me detailed advice on how to restructure the thesis chapters, which was awfully nice of her. But I also had a lot of human help throughout this time, whether things were going fine or beserk. Most of all, I want to thank Marc Moens: I could not have had a better or more knowledgable supervisor. He always took time for me, however busy he might have been, reading chapters thoroughly in two days. He both had the calmness of mind to give me lots of freedom in research, and the right judgement to guide me away, tactfully but determinedly, from the occasional catastrophe or other waiting along the way. He was great fun to work with and also became a good friend. My work has profitted from the interdisciplinary, interactive and enlightened atmosphere at the Human Communication Centre and the Centre for Cognitive Science (which is now called something else). The Language Technology Group was a great place to work in, as my research was grounded in practical applications develope

    Automatic discourse structure generation using rhetorical structure theory

    Get PDF
    This thesis addresses a difficult problem in text processing: creating a System to automatically derive rhetorical structures of text. Although the rhetorical structure has proven to be useful in many fields of text processing such as text summarisation and information extraction, Systems that automatically generate rhetorical structures with high accuracy are difficult to find. This is beccause discourse is one of the biggest and yet least well defined areas in linguistics. An agreement amongst researchcrs on the best method for nnalysing thc rhetorical structure of text has not been found. This thesis focuses on investigating a method to generate the rhetorical structures of text. By exploiting different cohesive devices, it proposes a method to recognise rhetorical relations between spans by checking for the appearance of these devices. These factors include cue phrases, noun-phrase cues, verb-phrase cues, reference words, time references, substitution words, ellipses, and syntactic information. The discourse analyser is divided into two levels: sentence-level and text-level. The former uses syntactic information and cue phrases to segment sentences into elementary discourse units and to generate a rhetorical structure for each sentence. The latter derives rhetorical relations between large spans and then replaces each sentence by its corresponding rhetorical structure to produce the rhetorical structure of text. The rhetorical structure at the text-level is derived by selecting rhetorical relations to connect adjacent and non-overlapping spans to form a discourse structure that covers the entire text. Constraints of textual organisation and textual adjacency are effectively used in a beam search to reduce the search space in generating such rhetorical structures. Experiments carried out in this research received 89.4% F-score for the discourse segmentation, 52.4% F-score for the sentence-level discourse analyser and 38.1% F-score for the final output of the System. It shows that this approach provides good performance cumparison with current research in discourse.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    NLP Driven Models for Automatically Generating Survey Articles for Scientific Topics.

    Full text link
    This thesis presents new methods that use natural language processing (NLP) driven models for summarizing research in scientific fields. Given a topic query in the form of a text string, we present methods for finding research articles relevant to the topic as well as summarization algorithms that use lexical and discourse information present in the text of these articles to generate coherent and readable extractive summaries of past research on the topic. In addition to summarizing prior research, good survey articles should also forecast future trends. With this motivation, we present work on forecasting future impact of scientific publications using NLP driven features.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113407/1/rahuljha_1.pd

    Rhetoric Of Young Non-Regular Workers In Post-Bubble Japan: A Genealogical Analysis

    Get PDF
    This work explores the development and struggle of a rhetorical subject of Japanese young non-regular workers against the recent slow economic trend. In Japan, the bubble-burst in 1991 invited a long economic recession, and companies started to adopt non-regular—low-wage, short-term and insecure—contracts from quintessential Fordist full-time seishain regular contract; yet, a large body of older seishain workers has retained this stable and affordable status. As a result, the vast majority of working forces enrolled in the job market since then has suffered from a low living standard, many on the verge of survival, while domestic mass media discourses have legitimated unfair treatments as if they do not deserve seishain positions because they are incompetent and lazy. Combining Michel Foucault’s framework of genealogy with Louis Althusser’s idea of interpellation, this study investigates a development of discourses in ways that has legitimated their inferior material and symbolic status as well as activists’ attempts to challenge the status. After I provide an overview of the project in Chapter 1, Chapter 2 reexamines the birth and development of kaisha-shugi, companyism, or a set of normative ideas that aims at ongoing development of private companies as the national mission. The Chapter 2 remarks an effect of this system in terms of the poor notion of civic distributional justice and minimum civic life as citizens’ rights. Chapter 3 investigates discourses on the national economy, labor relations and youth culture, exploring how domestic mass media with the state hegemony rearticulated the subject of young and non-regular workers. I claim that, in the early era of the post-bubble period, the public subject was conveniently obliterated as working forces, while their future risk was optimistically calculated and underrated. In consequence, however, I also contend that a few new denominations in the middle of the 2000s have reformed their public subject in a way that explicitly degrades their symbolic status. Chapter 4 analyzes activists’ efforts, highlighting the effectivity of their rhetorics against the neoliberal dominant capitalist powers. In the conclusion, Chapter 5 claims a few contributions of this study to rhetorical studies and neoliberal studies

    Elaboration of a RST Chinese Treebank

    Get PDF
    [EN] As a subfield of Artificial Intelligence (AI), Natural Language Processing (NLP) aims to automatically process human languages. Fruitful achievements of variant studies from different research fields for NLP exist. Among these research fields, discourse analysis is becoming more and more popular. Discourse information is crucial for NLP studies. As the most spoken language in the world, Chinese occupy a very important position in NLP analysis. Therefore, this work aims to present a discourse treebank for Chinese, whose theoretical framework is Rhetorical Structure Theory (RST) (Mann and Thompson, 1988). In this work, 50 Chinese texts form the research corpus and the corpus can be consulted from the following aspects: segmentation, central unit (CU) and discourse structure. Finally, we create an open online interface for the Chinese treebank.[EU] Adimen Artifizialaren (AA) barneko arlo bat izanez, Hizkuntzaren Prozesamenduak (HP) giza-hizkuntzak automatikoko prozesatzea du helburu. Arlo horretako ikasketa anitzetan lorpen emankor asko eman dira. Ikasketa-arlo ezberdin horien artean, diskurtso-analisia gero eta ezagunagoa da. Diskurtsoko inforamzioa interes handikoa da HPko ikasketetan. Munduko hiztun gehien duen hizkuntza izanda, txinera aztertzea oso garrantzitsua da HPan egiten ari diren ikasketetarako. Hori dela eta, lan honek txinerako diskurtso-egituraz etiketaturiko zuhaitz-banku bat aurkeztea du helburu, Egitura Erretorikoaren Teoria (EET) (Mann eta Thompson, 1988) oinarrituta. Lan honetan, ikerketa-corpusa 50 testu txinatarrez osatu da, ea zuhaitz-bankua hiru etiketatze-mailatan aurkeztuko da: segmentazioa, unitate zentrala (UZ) eta diskurtso-egitura. Azkenik, corpusa webgune batean argitaratu da zuhaitz-bankua kontsultatzeko

    Combining content-based and EAP approaches to academic writing: Towards an eclectic program

    Get PDF
    Over the past decade, Australian universities have experienced an exponential increase in the enrolment of fee-paying overseas students whose preparation for tertiary studies may differ significantly from that of local students. Despite English language proficiency requirements, there is some concern that international entry tests do not adequately measure the complex features of university writing; an important concern given that student success is heavily dependent on their mastery of academic writing. As a result, many international students require additional support structures. Until the present, debate about the most effective way to meet the diverse needs of English as an Additional Language (EAL) writers entering universities has concerned a choice between two alternatives: on one hand a separate, short-term English for Academic Purposes (EAP) language program and on the other, direct entry into disciplines with lecturers taking responsibility for assisting students to learn the discipline-specific language skills required. While the Australian Universities Quality Agency (AUQA, 2009, 2013) supports the latter view, this research investigates a third alternative; that is, an English for Academic Purposes Pathway program (EAPP) that not only teaches general academic English skills, but also English required in discipline specific contexts, as well as important and necessary adjunct skills that support writing. This three-phase, mixed-methods study used both qualitative and quantitative data to investigate the efficacy of such a program. The study, which was analytic, descriptive and comparative in approach, was conducted in a naturalistic setting and, where possible, qualitative data were used to support the findings from quantitative data. Theoretical propositions guided the data collection and provided important links to connect primary and secondary research. Phase 1 investigated the academic writing needs perceived by 60 students who were either studying in the 20-week or 10-week EAPP program at Swan University (a pseudonym). Perceptions of student needs by 13 EAPP teachers were also analysed and writing samples collected. In Phase 2, the cohort decreased to 31 students representing seven faculties. Perceptions of 17 faculty staff from across and within these seven faculties were sought regarding the tasks and genres required for EAL students to meet the writing expectations within these disciplines. The marked ex-EAPP student’s faculty writing assignments were collected and analysed at the end of first semester. At this stage, because the volume of student writing produced over the course of the study was so large, disproportional stratified random sampling was used to select and analyse the EAPP and faculty writing of a sample of seven students. Research by Kaldor, Herriman and Rochecouste (1998) provided direction for frame analysis which was used to analyse the student writing. In Phase 3, which was conducted one year after entering their chosen faculties, 22 students replied to a request to judge which, if any, writing skills from their EAPP program had transferred to assist them with their faculty writing. Findings are discussed in relation to four major issues. Firstly, reflections provided by ex-EAPP students ascertained that, on entering the EAPP program, the majority of them had been academically, linguistically, culturally and socially unprepared for study at master’s degree level in an Australian university. Secondly, analysis determined that in the students’ first year of faculty study, writing tasks and genres were almost identical in type, complexity and word-count restrictions to those taught in the EAPP program and that students readily adapted to the highly specified frameworks of any tasks that were unfamiliar. A third major finding was the significance that students placed on the type of feedback necessary to support their writing. Finally, students identified major areas of improvement in their academic writing at the end of the program, but provided suggestions in key pedagogical areas about how the EAPP program could be improved to better address their needs. This study found that EAL writing development involves much more than content knowledge, mastery over discipline-specific genre requirements and a wide vocabulary. Academic writing comprises a complex combination of extratextual, circumtextual, intratextual and intertextual features and skills, some of which are completely new to international students. A model was proposed to illustrate elements that provide: circumtextual assistance for prewriting support; intertextual assistance through reading and writing support; extratextual assistance through sociocultural support, and intratextual assistance through the scaffolding of academic writing skills. To conclude, recommended modifications to the program are presented

    Natural Language Processing for Technology Foresight Summarization and Simplification: the case of patents

    Get PDF
    Technology foresight aims to anticipate possible developments, understand trends, and identify technologies of high impact. To this end, monitoring emerging technologies is crucial. Patents -- the legal documents that protect novel inventions -- can be a valuable source for technology monitoring. Millions of patent applications are filed yearly, with 3.4 million applications in 2021 only. Patent documents are primarily textual documents and disclose innovative and potentially valuable inventions. However, their processing is currently underresearched. This is due to several reasons, including the high document complexity: patents are very lengthy and are written in an extremely hard-to-read language, which is a mix of technical and legal jargon. This thesis explores how Natural Language Processing -- the discipline that enables machines to process human language automatically -- can aid patent processing. Specifically, we focus on two tasks: patent summarization (i.e., we try to reduce the document length while preserving its core content) and patent simplification (i.e., we try to reduce the document's linguistic complexity while preserving its original core meaning). We found that older patent summarization approaches were not compared on shared benchmarks (making thus it hard to draw conclusions), and even the most recent abstractive dataset presents important issues that might make comparisons meaningless. We try to fill both gaps: we first document the issues related to the BigPatent dataset and then benchmark extractive, abstraction, and hybrid approaches in the patent domain. We also explore transferring summarization methods from the scientific paper domain with limited success. For the automatic text simplification task, we noticed a lack of simplified text and parallel corpora. We fill this gap by defining a method to generate a silver standard for patent simplification automatically. Lay human judges evaluated the simplified sentences in the corpus as grammatical, adequate, and simpler, and we show that it can be used to train a state-of-the-art simplification model. This thesis describes the first steps toward Natural Language Processing-aided patent summarization and simplification. We hope it will encourage more research on the topic, opening doors for a productive dialog between NLP researchers and domain experts.Technology foresight aims to anticipate possible developments, understand trends, and identify technologies of high impact. To this end, monitoring emerging technologies is crucial. Patents -- the legal documents that protect novel inventions -- can be a valuable source for technology monitoring. Millions of patent applications are filed yearly, with 3.4 million applications in 2021 only. Patent documents are primarily textual documents and disclose innovative and potentially valuable inventions. However, their processing is currently underresearched. This is due to several reasons, including the high document complexity: patents are very lengthy and are written in an extremely hard-to-read language, which is a mix of technical and legal jargon. This thesis explores how Natural Language Processing -- the discipline that enables machines to process human language automatically -- can aid patent processing. Specifically, we focus on two tasks: patent summarization (i.e., we try to reduce the document length while preserving its core content) and patent simplification (i.e., we try to reduce the document's linguistic complexity while preserving its original core meaning). We found that older patent summarization approaches were not compared on shared benchmarks (making thus it hard to draw conclusions), and even the most recent abstractive dataset presents important issues that might make comparisons meaningless. We try to fill both gaps: we first document the issues related to the BigPatent dataset and then benchmark extractive, abstraction, and hybrid approaches in the patent domain. We also explore transferring summarization methods from the scientific paper domain with limited success. For the automatic text simplification task, we noticed a lack of simplified text and parallel corpora. We fill this gap by defining a method to generate a silver standard for patent simplification automatically. Lay human judges evaluated the simplified sentences in the corpus as grammatical, adequate, and simpler, and we show that it can be used to train a state-of-the-art simplification model. This thesis describes the first steps toward Natural Language Processing-aided patent summarization and simplification. We hope it will encourage more research on the topic, opening doors for a productive dialog between NLP researchers and domain experts
    • …
    corecore