29,631 research outputs found

    A Comparison of Nuggets and Clusters for Evaluating Timeline Summaries

    Get PDF
    There is growing interest in systems that generate timeline summaries by filtering high-volume streams of documents to retain only those that are relevant to a particular event or topic. Continued advances in algorithms and techniques for this task depend on standardized and reproducible evaluation methodologies for comparing systems. However, timeline summary evaluation is still in its infancy, with competing methodologies currently being explored in international evaluation forums such as TREC. One area of active exploration is how to explicitly represent the units of information that should appear in a 'good' summary. Currently, there are two main approaches, one based on identifying nuggets in an external 'ground truth', and the other based on clustering system outputs. In this paper, by building test collections that have both nugget and cluster annotations, we are able to compare these two approaches. Specifically, we address questions related to evaluation effort, differences in the final evaluation products, and correlations between scores and rankings generated by both approaches. We summarize advantages and disadvantages of nuggets and clusters to offer recommendations for future system evaluation

    Facilitating access to pre-processed research evidence in public health

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Evidence-informed decision making is accepted in Canada and worldwide as necessary for the provision of effective health services. This process involves: 1) clearly articulating a practice-based issue; 2) searching for and accessing relevant evidence; 3) appraising methodological rigor and choosing the most synthesized evidence of the highest quality and relevance to the practice issue and setting that is available; and 4) extracting, interpreting, and translating knowledge, in light of the local context and resources, into practice, program and policy decisions. While the public health sector in Canada is working toward evidence-informed decision making, considerable barriers, including efficient access to synthesized resources, exist.</p> <p>Methods</p> <p>In this paper we map to a previously developed 6 level pyramid of pre-processed research evidence, relevant resources that include public health-related effectiveness evidence. The resources were identified through extensive searches of both the published and unpublished domains.</p> <p>Results</p> <p>Many resources with public health-related evidence were identified. While there were very few resources dedicated solely to public health evidence, many clinically focused resources include public health-related evidence, making tools such as the pyramid, that identify these resources, particularly helpful for public health decisions makers. A practical example illustrates the application of this model and highlights its potential to reduce the time and effort that would be required by public health decision makers to address their practice-based issues.</p> <p>Conclusions</p> <p>This paper describes an existing hierarchy of pre-processed evidence and its adaptation to the public health setting. A number of resources with public health-relevant content that are either freely accessible or requiring a subscription are identified. This will facilitate easier and faster access to pre-processed, public health-relevant evidence, with the intent of promoting evidence-informed decision making. Access to such resources addresses several barriers identified by public health decision makers to evidence-informed decision making, most importantly time, as well as lack of knowledge of resources that house public health-relevant evidence.</p

    Generating indicative-informative summaries with SumUM

    Get PDF
    We present and evaluate SumUM, a text summarization system that takes a raw technical text as input and produces an indicative informative summary. The indicative part of the summary identifies the topics of the document, and the informative part elaborates on some of these topics according to the reader's interest. SumUM motivates the topics, describes entities, and defines concepts. It is a first step for exploring the issue of dynamic summarization. This is accomplished through a process of shallow syntactic and semantic analysis, concept identification, and text regeneration. Our method was developed through the study of a corpus of abstracts written by professional abstractors. Relying on human judgment, we have evaluated indicativeness, informativeness, and text acceptability of the automatic summaries. The results thus far indicate good performance when compared with other summarization technologies

    The reforming appeal of distributed leadership

    Get PDF
    With a systematic literature review, this article examines the significance of distributed leadership in healthcare, assessing the extent to which it reflects a consistent set of values, meanings, practices and outcomes. It identifies key mediating factors and their importance in enabling or constraining distributive leadership processes. The findings indicate that clinicians without formal leadership titles are inspiring change and driving improvements, although countervailing pressures are limiting this in practice. Distributed leadership is evident in the way that clinical teams function, and more could be made of this for the modernisation of healthcare. At present, this potential tends to be constrained, and subject to competing interpretations that reflect distinct occupational identities. Greater attention could be given to educational and developmental programmes that claim space for distributed influence among current and aspiring leaders, and for enabling arrangements that can help โ€˜ordinary leadersโ€™ to feel less vulnerable and more confident about this aspect of their practice. Established approaches to leader development could be usefully refocused to prioritise collective processes and refine relational abilities, ideally with more inclusive, joint venture initiatives that bring formal and informal leaders together for mutual learning and effective engagement

    Modulating semantic speech-gesture matching in healthy subjects and patients with schizophrenia spectrum disorder via transcranial direct current stimulation

    Get PDF
    Hintergrund: Schwere Defizite bei der Verarbeitung von Sprache und Gestik sind ein wichtiges Merkmal von Patienten mit Schizophrenie-Spektrum-Stรถrungen. Da sprachbegleitende Gesten einen essentiellen Teil menschlicher Kommunikation darstellen, ist es nicht รผberraschend, dass Einschrรคnkungen bei der Wahrnehmung und Durchfรผhrung von sprachbegleitender Gestik erheblich zum Leiden dieser Patienten beitragen. Mittels bildgebender Verfahren konnte gezeigt werden, dass links frontale Cortexareale sowohl bei Gesunden als auch bei Patienten mit Schizophrenie-Spektrum-Stรถrungen eine groรŸe Rolle bei der Verarbeitung sprachbegleitender Gestik spielen. Der linke inferiore frontale Gyrus scheint insbesondere fรผr die Wahrnehmung metaphorischer Gesten, d.h. von Gesten die einen Satz mit abstraktem Inhalt begleiten (z.B. das Heben der Hand, um die hohe Qualitรคt einer Diskussion darzustellen), wichtiger zu sein als fรผr die Wahrnehmung ikonischer Gesten, d.h. von Gesten die einen Satz mit konkretem Inhalt begleiten (z.B. eine kreisfรถrmige Bewegung der Hand, um einen runden Tisch zu veranschaulichen). Bei Patienten mit Schizophrenie liegt zudem eine รผbermรครŸige Aktivierung links frontaler Hirnareale vor. Bisher wurde noch nicht untersucht, ob transkranielle Gleichstromstimulation die gestรถrte Verarbeitung sprachbegleitender Gestik von Patienten mit Schizophrenie beeinflussen kann. Zielsetzung: Im ersten Teil unserer Studie (Publikation 1) untersuchten wir mittels transkranieller Gleichstromstimulation die funktionelle Bedeutung des linken Frontallappens fรผr die Verarbeitung metaphorischer sprachbegleitender Gestik bei gesunden Probanden. Wir stellten die Hypothese auf, dass sich links frontale transkranielle Gleichstromstimulation polarisationsabhรคngig auf die Bewertung der semantischen Passung von Sprache und Gestik bei einer Sprach-Gestik-Passungsbewertungsaufgabe auswirkt und sich dieser Effekt durch eine Verรคnderung der Reaktionszeiten und der Bewertungen der Passung feststellen lรคsst. Im zweiten Teil der Studie (Publikation 2) untersuchten wir die Auswirkungen von transkranieller Gleichstromstimulation auf die Verarbeitung sprachbegleitender Gestik bei Patienten mit Schizophrenie-Spektrum-Stรถrungen. Unsere Hypothese war, dass inhibitorische transkranielle Gleichstromstimulation des linken Frontallappens die Leistung der Patienten bei der Sprach-Gestik-Passungsbewertungsaufgabe verbessert. Methoden: Wir fรผhrten bei neunundzwanzig gesunden Probanden sowie zwanzig Patienten mit Schizophrenie-Spektrum-Stรถrungen anodale, kathodale und Schein-Stimulation der frontalen, parietalen und frontoparietalen Hirnareale durch. Wรคhrend der Stimulation wurden den Probanden Videosequenzen eines Schauspielers gezeigt. Dieser sprach einen konkreten oder abstrakten Satz aus und begleitete diesen Satz mit einer semantisch passenden oder unpassenden, ikonischen oder metaphorischen Geste. Nach jeder Videosequenz bewerteten die Probanden sofort, in welchem AusmaรŸ der Satzinhalt zur Gestik passte (Frage: โ€žPassen Satzinhalt und Gestik zusammen?โ€œ, Antwort auf einer Skala von eins โ€žsehr schlechtโ€œ bis sieben โ€žsehr gutโ€œ). Ergebnisse: Fรผr die erste aus siebzehn gesunden Probanden bestehende Stichprobe (Publikation 1) fanden wir Verรคnderungen der Reaktionszeiten und Bewertungen in Abhรคngigkeit von Stimulationsort und Polarisation fรผr metaphorische sprachbegleitende Gesten. Anodale Stimulation des linken Frontallappens reduzierte die Reaktionszeiten und Bewertungen der Sprach-Gestik-Passung fรผr diesen Gestiktyp. Beim Vergleich zwischen den gesunden Probanden und den Patienten mit Schizophrenie-Spektrum-Stรถrungen (Publikation 2) stellten wir einen spezifischen Effekt der transkraniellen Gleichstromstimulation auf die Bewertung der Sprach-Gestik-Passung bei Patienten fest. Links frontale kathodale Stimulation verbesserte die Unterscheidung zwischen passenden und unpassenden Gesten bei Patienten signifikant und reduzierte somit den Unterschied in der Bewertung der Sprach-Gestik-Passung zwischen Patienten und gesunden Probanden. Fazit: Zunรคchst zeigten wir, dass links frontale transkranielle Gleichstromstimulation die Verarbeitung sprachbegleitender metaphorischer Gesten bei Gesunden beeinflusst (Publikation 1). AnschlieรŸend demonstrierten wir, dass transkranielle Gleichstromstimulation auch bei Patienten mit Schizophrenie-Spektrum-Stรถrungen die semantische Sprach-Gestik Verarbeitung verbessern kann. Die transkranielle Gleichstromstimulation kรถnnte mรถglicherweise in der Zukunft genutzt werden, um gestรถrte Verarbeitungsprozesse im linken Frontallappen von Patienten mit Schizophrenie zu modulieren und dadurch die Defizite dieser Patienten in der sozialen Kommunikation zu mildern

    Supporting evidence-based adaptation decision-making in the Australian Capital Territory: a synthesis of climate change adaptation research

    Get PDF
    This research synthesis provides policy-makers and practitioners with an understanding of the building blocks for effective adaptation decision-making, as evidenced through the NCCARF research program. It synthesised a portfolio of adaptation research for each Australian state and territory and addressing the complex relationships between research and policy development.&nbsp;&nbsp;&nbsp;Each state and territory synthesis report directs users to research relevant identified priorities. Authored by Jennifer Cane, Laura Cacho, Nicolas Dircks and Peter Steele

    Automated PDF highlighting to support faster curation of literature for Parkinson's and Alzheimer's disease

    Get PDF
    Neurodegenerative disorders such as Parkinsonโ€™s and Alzheimerโ€™s disease are devastating and costly illnesses, a source of major global burden. In order to provide successful interventions for patients and reduce costs, both causes and pathological processes need to be understood. The ApiNATOMY project aims to contribute to our understanding of neurodegenerative disorders by manually curating and abstracting data from the vast body of literature amassed on these illnesses. As curation is labour-intensive, we aimed to speed up the process by automatically highlighting those parts of the PDF document of primary importance to the curator. Using techniques similar to those of summarisation, we developed an algorithm that relies on linguistic, semantic and spatial features. Employing this algorithm on a test set manually corrected for tool imprecision, we achieved a macro F1-measure of 0.51, which is an increase of 132% compared to the best bag-of-words baseline model. A user based evaluation was also conducted to assess the usefulness of the methodology on 40 unseen publications, which reveals that in 85% of cases all highlighted sentences are relevant to the curation task and in about 65% of the cases, the highlights are sufficient to support the knowledge curation task without needing to consult the full text. In conclusion, we believe that these are promising results for a step in automating the recognition of curation-relevant sentences. Refining our approach to pre-digest papers will lead to faster processing and cost reduction in the curation process

    ์กฐ๊ฑด๋ถ€ ํ…์ŠคํŠธ ์ƒ์„ฑ ์‹œ์Šคํ…œ์— ๋Œ€ํ•œ ์‚ฌ์‹ค ๊ด€๊ณ„์˜ ์ผ๊ด€์„ฑ ํ‰๊ฐ€

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2022. 8. ์ •๊ต๋ฏผ.์ตœ๊ทผ์˜ ์‚ฌ์ „ํ•™์Šต ์–ธ์–ด๋ชจ๋ธ์˜ ํ™œ์šฉ์„ ํ†ตํ•œ ์กฐ๊ฑด๋ถ€ ํ…์ŠคํŠธ ์ƒ์„ฑ ์‹œ์Šคํ…œ๋“ค์˜ ๋ฐœ์ „์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , ์‹œ์Šคํ…œ๋“ค์˜ ์‚ฌ์‹ค ๊ด€๊ณ„์˜ ์ผ๊ด€์„ฑ์€ ์—ฌ์ „ํžˆ ์ถฉ๋ถ„ํ•˜์ง€ ์•Š์€ ํŽธ์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” n-๊ทธ๋žจ ๊ธฐ๋ฐ˜ ์œ ์‚ฌ์„ฑ ํ‰๊ฐ€ ๊ธฐ๋ฒ•์€ ์‚ฌ์‹ค ์ผ๊ด€์„ฑ ํ‰๊ฐ€์— ๋งค์šฐ ์ทจ์•ฝํ•˜๋‹ค. ๋”ฐ๋ผ์„œ, ์‚ฌ์‹ค ์ผ๊ด€๋œ ํ…์ŠคํŠธ ์ƒ์„ฑ ์‹œ์Šคํ…œ์„ ๊ฐœ๋ฐœํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋จผ์ € ์‹œ์Šคํ…œ์˜ ์‚ฌ์‹ค ๊ด€๊ณ„๋ฅผ ์ œ๋Œ€๋กœ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š” ์ž๋™ ํ‰๊ฐ€ ๊ธฐ๋ฒ•์ด ํ•„์š”ํ•˜๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค์–‘ํ•œ ์กฐ๊ฑด๋ถ€ ํ…์ŠคํŠธ ์ƒ์„ฑ ์‹œ์Šคํ…œ์— ๋Œ€ํ•ด, ์ด์ „ ํ‰๊ฐ€ ๊ธฐ๋ฒ•๋ณด๋‹ค ์‚ฌ์‹ค ๊ด€๊ณ„ ์ผ๊ด€์„ฑ ํ‰๊ฐ€์—์„œ ์ธ๊ฐ„์˜ ํŒ๋‹จ๊ณผ ๋งค์šฐ ๋†’์€ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๋ณด์—ฌ์ฃผ๋Š” 4๊ฐ€์ง€ ํ‰๊ฐ€ ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ด ๊ธฐ๋ฒ•๋“ค์€ (1) ๋ณด์กฐ ํƒœ์Šคํฌ ํ™œ์šฉ ๋ฐ (2) ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ธฐ๋ฒ• ๋“ฑ์„ ํ™œ์šฉํ•œ๋‹ค. ์ฒซ์งธ๋กœ, ์šฐ๋ฆฌ๋Š” ์ค‘์š”ํ•œ ํ•ต์‹ฌ ๋‹จ์–ด๋˜๋Š” ํ•ต์‹ฌ ๊ตฌ๋ฌธ์— ์ดˆ์ ์„ ๋งž์ถ˜ ๋‘ ๊ฐ€์ง€ ๋‹ค๋ฅธ ๋ณด์กฐ ํƒœ์Šคํฌ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋‘ ๊ฐ€์ง€ ์‚ฌ์‹ค ๊ด€๊ณ„์˜ ์ผ๊ด€์„ฑ ํ‰๊ฐ€ ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์šฐ๋ฆฌ๋Š” ๋จผ์ € ํ•ต์‹ฌ ๊ตฌ๋ฌธ์˜ ๊ฐ€์ค‘์น˜ ์˜ˆ์ธก ํƒœ์Šคํฌ๋ฅผ ์ด์ „ ํ‰๊ฐ€ ๊ธฐ๋ฒ•์— ๊ฒฐํ•ฉํ•˜์—ฌ ์ฃผ๊ด€์‹ ์งˆ์˜ ์‘๋‹ต์„ ์œ„ํ•œ ํ‰๊ฐ€ ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋˜ํ•œ, ์šฐ๋ฆฌ๋Š” ์งˆ์˜ ์ƒ์„ฑ ๋ฐ ์‘๋‹ต์„ ํ™œ์šฉํ•˜์—ฌ ํ‚ค์›Œ๋“œ์— ๋Œ€ํ•œ ์งˆ์˜๋ฅผ ์ƒ์„ฑํ•˜๊ณ , ์ด๋ฏธ์ง€์™€ ์บก์…˜์— ๋Œ€ํ•œ ์งˆ๋ฌธ์˜ ๋‹ต์„ ๋น„๊ตํ•˜์—ฌ ์‚ฌ์‹ค ์ผ๊ด€์„ฑ์„ ํ™•์ธํ•˜๋Š” QACE๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๋‘˜์งธ๋กœ, ์šฐ๋ฆฌ๋Š” ๋ณด์กฐ ํƒœ์Šคํฌ ํ™œ์šฉ๊ณผ ๋‹ฌ๋ฆฌ, ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ๋ฐฉ์‹์˜ ํ•™์Šต์„ ํ†ตํ•ด ๋‘ ๊ฐ€์ง€์˜ ํ‰๊ฐ€ ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ์šฐ๋ฆฌ๋Š” ์ฆ๊ฐ•๋œ ์ผ๊ด€์„ฑ ์—†๋Š” ํ…์ŠคํŠธ๋ฅผ ์ผ๊ด€์„ฑ ์žˆ๋Š” ํ…์ŠคํŠธ์™€ ๊ตฌ๋ถ„ํ•˜๋„๋ก ํ›ˆ๋ จํ•œ๋‹ค. ๋จผ์ € ๊ทœ์น™ ๊ธฐ๋ฐ˜ ๋ณ€ํ˜•์„ ํ†ตํ•œ ๋ถˆ์ผ์น˜ ์บก์…˜ ์ƒ์„ฑ์œผ๋กœ ์ด๋ฏธ์ง€ ์บก์…˜ ํ‰๊ฐ€ ์ง€ํ‘œ UMIC์„ ์ œ์•ˆํ•œ๋‹ค. ๋‹ค์Œ ๋‹จ๊ณ„๋กœ, ๋งˆ์Šคํ‚น๋œ ์†Œ์Šค์™€ ๋งˆ์Šคํ‚น๋œ ์š”์•ฝ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ผ๊ด€์„ฑ์ด ์—†๋Š” ์š”์•ฝ์„ ์ƒ์„ฑํ•˜๋Š” MFMA๋ฅผ ํ†ตํ•ด ํ‰๊ฐ€ ์ง€ํ‘œ๋ฅผ ๊ฐœ๋ฐœํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ์‚ฌ์‹ค ์ผ๊ด€์„ฑ ํ‰๊ฐ€ ๊ธฐ๋ฒ• ๊ฐœ๋ฐœ์˜ ํ™•์žฅ์œผ๋กœ, ์‹œ์Šคํ…œ์˜ ์‚ฌ์‹ค ๊ด€๊ณ„ ์˜ค๋ฅ˜๋ฅผ ์ˆ˜์ •ํ•  ์ˆ˜ ์žˆ๋Š” ๋น ๋ฅธ ์‚ฌํ›„ ๊ต์ • ์‹œ์Šคํ…œ์„ ์ œ์•ˆํ•œ๋‹ค.Despite the recent advances of conditional text generation systems leveraged from pre-trained language models, factual consistency of the systems are still not sufficient. However, widely used n-gram similarity metrics are vulnerable to evaluate the factual consistency. Hence, in order to develop a factual consistent system, an automatic factuality metric is first necessary. In this dissertation, we propose four metrics that show very higher correlation with human judgments than previous metrics in evaluating factual consistency, for diverse conditional text generation systems. To build such metrics, we utilize (1) auxiliary tasks and (2) data augmentation methods. First, we focus on the keywords or keyphrases that are critical for evaluating factual consistency and propose two factual consistency metrics using two different auxiliary tasks. We first integrate the keyphrase weights prediction task to the previous metrics to propose a KPQA (Keyphrase Prediction for Question Answering)-metric for generative QA. Also, we apply question generation and answering to develop a captioning metric QACE (Question Answering for Captioning Evaluation). QACE generates questions on the keywords of the candidate. QACE checks the factual consistency by comparing the answers of these questions for the source image and the caption. Secondly, different from using auxiliary tasks, we directly train a metric with a data-driven approach to propose two metrics. Specifically, we train a metric to distinguish augmented inconsistent texts with the consistent text. We first modify the original reference captions to generate inconsistent captions using several rule-based methods such as substituting keywords to propose UMIC (Unreferenced Metric for Image Captioning). As a next step, we introduce a MFMA (Mask-and-Fill with Masked-Article)-metric by generating inconsistent summary using the masked source and the masked summary. Finally, as an extension of developing data-driven factual consistency metrics, we also propose a faster post-editing system that can fix the factual errors in the system.1 Introduction 1 2 Background 10 2.1 Text Evaluation Metrics 10 2.1.1 N-gram Similarity Metrics 10 2.1.2 Embedding Similarity Metrics 12 2.1.3 Auxiliary Task Based Metrics 12 2.1.4 Entailment Based Metrics 13 2.2 Evaluating Automated Metrics 14 3 Integrating Keyphrase Weights for Factual Consistency Evaluation 15 3.1 Related Work 17 3.2 Proposed Approach: KPQA-Metric 18 3.2.1 KPQA 18 3.2.2 KPQA Metric 19 3.3 Experimental Setup and Dataset 23 3.3.1 Dataset 23 3.3.2 Implementation Details 26 3.4 Empirical Results 27 3.4.1 Comparison with Other Methods 27 3.4.2 Analysis 29 3.5 Conclusion 35 4 Question Generation and Question Answering for Factual Consistency Evaluation 36 4.1 Related Work 37 4.2 Proposed Approach: QACE 38 4.2.1 Question Generation 38 4.2.2 Question Answering 39 4.2.3 Abstractive Visual Question Answering 40 4.2.4 QACE Metric 42 4.3 Experimental Setup and Dataset 43 4.3.1 Dataset 43 4.3.2 Implementation Details 44 4.4 Empirical Results 45 4.4.1 Comparison with Other Methods 45 4.4.2 Analysis 46 4.5 Conclusion 48 5 Rule-Based Inconsistent Data Augmentation for Factual Consistency Evaluation 49 5.1 Related Work 51 5.2 Proposed Approach: UMIC 52 5.2.1 Modeling 52 5.2.2 Negative Samples 53 5.2.3 Contrastive Learning 55 5.3 Experimental Setup and Dataset 56 5.3.1 Dataset 56 5.3.2 Implementation Details 60 5.4 Empirical Results 61 5.4.1 Comparison with Other Methods 61 5.4.2 Analysis 62 5.5 Conclusion 65 6 Inconsistent Data Augmentation with Masked Generation for Factual Consistency Evaluation 66 6.1 Related Work 68 6.2 Proposed Approach: MFMA and MSM 70 6.2.1 Mask-and-Fill with Masked Article 71 6.2.2 Masked Summarization 72 6.2.3 Training Factual Consistency Checking Model 72 6.3 Experimental Setup and Dataset 73 6.3.1 Dataset 73 6.3.2 Implementation Details 74 6.4 Empirical Results 75 6.4.1 Comparison with Other Methods 75 6.4.2 Analysis 78 6.5 Conclusion 84 7 Factual Error Correction for Improving Factual Consistency 85 7.1 Related Work 87 7.2 Proposed Approach: RFEC 88 7.2.1 Problem Formulation 88 7.2.2 Training Dataset Construction 89 7.2.3 Evidence Sentence Retrieval 90 7.2.4 Entity Retrieval Based Factual Error Correction 90 7.3 Experimental Setup and Dataset 92 7.3.1 Dataset 92 7.3.2 Implementation Details 93 7.4 Empirical Results 93 7.4.1 Comparison with Other Methods 93 7.4.2 Analysis 95 7.5 Conclusion 95 8 Conclusion 97 Abstract (In Korean) 118๋ฐ•
    • โ€ฆ
    corecore