24 research outputs found

    LHIP: Extended DCGs for Configurable Robust Parsing

    Full text link
    We present LHIP, a system for incremental grammar development using an extended DCG formalism. The system uses a robust island-based parsing method controlled by user-defined performance thresholds.Comment: 10 pages, in Proc. Coling9

    Apportioning Development Effort in a Probabilistic LR Parsing System through Evaluation

    Get PDF
    We describe an implemented system for robust domain-independent syntactic parsing of English, using a unification-based grammar of part-of-speech and punctuation labels coupled with a probabilistic LR parser. We present evaluations of the system's performance along several different dimensions; these enable us to assess the contribution that each individual part is making to the success of the system as a whole, and thus prioritise the effort to be devoted to its further enhancement. Currently, the system is able to parse around 80% of sentences in a substantial corpus of general text containing a number of distinct genres. On a random sample of 250 such sentences the system has a mean crossing bracket rate of 0.71 and recall and precision of 83% and 84% respectively when evaluated against manually-disambiguated analyses.Comment: 10 pages, 1 Postscript figure. To Appear in Proceedings of the Conference on Empirical Methods in Natural Language Processing, University of Pennsylvania, May 199

    MBT: A Memory-Based Part of Speech Tagger-Generator

    Full text link
    We introduce a memory-based approach to part of speech tagging. Memory-based learning is a form of supervised learning based on similarity-based reasoning. The part of speech tag of a word in a particular context is extrapolated from the most similar cases held in memory. Supervised learning approaches are useful when a tagged corpus is available as an example of the desired output of the tagger. Based on such a corpus, the tagger-generator automatically builds a tagger which is able to tag new text the same way, diminishing development time for the construction of a tagger considerably. Memory-based tagging shares this advantage with other statistical or machine learning approaches. Additional advantages specific to a memory-based approach include (i) the relatively small tagged corpus size sufficient for training, (ii) incremental learning, (iii) explanation capabilities, (iv) flexible integration of information in case representations, (v) its non-parametric nature, (vi) reasonably good results on unknown words without morphological analysis, and (vii) fast learning and tagging. In this paper we show that a large-scale application of the memory-based approach is feasible: we obtain a tagging accuracy that is on a par with that of known statistical approaches, and with attractive space and time complexity properties when using {\em IGTree}, a tree-based formalism for indexing and searching huge case bases.} The use of IGTree has as additional advantage that optimal context size for disambiguation is dynamically computed.Comment: 14 pages, 2 Postscript figure

    Cue Phrase Classification Using Machine Learning

    Full text link
    Cue phrases may be used in a discourse sense to explicitly signal discourse structure, but also in a sentential sense to convey semantic rather than structural information. Correctly classifying cue phrases as discourse or sentential is critical in natural language processing systems that exploit discourse structure, e.g., for performing tasks such as anaphora resolution and plan recognition. This paper explores the use of machine learning for classifying cue phrases as discourse or sentential. Two machine learning programs (Cgrendel and C4.5) are used to induce classification models from sets of pre-classified cue phrases and their features in text and speech. Machine learning is shown to be an effective technique for not only automating the generation of classification models, but also for improving upon previous results. When compared to manually derived classification models already in the literature, the learned models often perform with higher accuracy and contain new linguistic insights into the data. In addition, the ability to automatically construct classification models makes it easier to comparatively analyze the utility of alternative feature representations of the data. Finally, the ease of retraining makes the learning approach more scalable and flexible than manual methods.Comment: 42 pages, uses jair.sty, theapa.bst, theapa.st

    Korean Part-of-Speech Tagging Based on Syllables

    Get PDF
    ์ธํ„ฐ๋„ท์˜ ๊ธ‰์†ํ•œ ๋ฐœ์ „์œผ๋กœ ๊ฐ์ข… ํฌํ„ธ ์‚ฌ์ดํŠธ์˜ ๊ฒŒ์‹œํŒ, ์นดํŽ˜, ๋™ํ˜ธํšŒ, ๋ธ”๋กœ๊ทธ ๋“ฑ์—๋Š” ์ˆ˜๋งŽ์€ ๋ฌธ์„œ๊ฐ€ ์ƒ์„ฑ๋˜๊ณ  ์žˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๊ฐœ์ธ ๋ธ”๋กœ๊ทธ์—๋Š” ๊ด€์‹ฌ๋ถ„์•ผ์— ๋”ฐ๋ฅธ ์ˆ˜๋งŽ์€ ์ •๋ณด๋“ค์ด ๊ฒŒ์‹œ๋˜๊ณ  ์žˆ๊ณ , ๊ฐ์ข… ๋™ํ˜ธํšŒ ๊ฒŒ์‹œํŒ์—๋Š” ๋™ํ˜ธํšŒ์˜ ๋ชฉ์ ๊ณผ ๊ด€๋ จ๋œ ์ˆ˜๋งŽ์€ ์ •๋ณด ๋“ฑ์ด ๋งค์ผ ๊ฒŒ์‹œ๋˜๊ณ  ์žˆ๋‹ค. ์ด๋ ‡๊ฒŒ ๋งŽ์€ ๋ฌธ์„œ๋“ค์€ ๋ถ„์„๊ณผ ๋ถ„๋ฅ˜๋ฅผ ํ†ตํ•ด ๋ณด๋‹ค ๋งŽ์€ ์‚ฌ๋žŒ๋“ค์—๊ฒŒ ์ค‘์š”ํ•œ ์ •๋ณด๋กœ ํ™œ์šฉ๋  ์ˆ˜ ์žˆ๊ณ , ์ด๋Ÿฌํ•œ ์ด์œ ๋กœ ๋ฌธ์„œ์˜ ๋ถ„์„ ๋ฐ ๋ถ„๋ฅ˜์™€ ๊ฐ™์€ ์ •๋ณด์ฒ˜๋ฆฌ์˜ ํ•„์š”์„ฑ์ด ๋Œ€๋‘๋˜๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ•„์š”์„ฑ์— ๋”ฐ๋ผ ๋งŽ์€ ํ•™์ž๋“ค์ด ๋ฌธ์„œ๋ฅผ ๋ณด๋‹ค ์ •ํ™•ํ•˜๊ฒŒ ๋ถ„์„ํ•˜๊ณ  ๋ถ„๋ฅ˜ํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•๋“ค์„ ์—ฐ๊ตฌํ•˜๊ณ  ์ œ์•ˆํ•˜๋ฉฐ ์‹ค์ œ๋กœ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋‹ค(Manning et al., 2010). ์ด๋Ÿฌํ•œ ์ˆ˜๋งŽ์€ ๋ฐฉ๋ฒ•๋“ค ์ค‘์—์„œ ํ˜•ํƒœ์†Œ ๋ถ„์„ ๋ฐ ํ’ˆ์‚ฌ ๋ถ€์ฐฉ์€ ๋ฌธ์„œ๋ฅผ ๋ถ„์„ํ•˜๊ณ  ๋ถ„๋ฅ˜ํ•˜์—ฌ ์ •๋ณด๋กœ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•œ ์—ฌ๋Ÿฌ ๋ฐฉ๋ฒ•๋“ค์˜ ๊ณตํ†ต๋œ ์ตœํ•˜์œ„ ๋‹จ๊ณ„์— ์†ํ•œ๋‹ค. ํ˜•ํƒœ์†Œ ๋ถ„์„์ด๋ž€ ์ž…๋ ฅ๋œ ๋ฌธ์„œ์— ๋Œ€ํ•ด ํ˜•ํƒœ์†Œ์˜ ๋ณ€ํ˜•๊ณผ ๋ถ„๋ฆฌ ๊ฒฝ๊ณ„๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๋ฌธ์ œ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ๊ณผ์ •์œผ๋กœ ์–ธ์–ด์  ํŠน์„ฑ์— ๋งž๊ฒŒ ๊ตฌํ˜„๋œ๋‹ค(Dale, etal., 2000). ํŠนํžˆ ํ•œ๊ตญ์–ด๋Š” ๋‚ด์šฉ์–ด์™€ ๊ธฐ๋Šฅ์–ด์˜ ๊ฒฐํ•ฉ์œผ๋กœ ๋‹ค์–‘ํ•œ ํ˜•ํƒœ์˜ ๋ณ€ํ˜•์ด ๋ฐœ์ƒ๋œ๋‹ค(์„œ์ •์ˆ˜, 1996). ์ด๋Ÿฌํ•œ ์ด์œ ๋กœ ํ•œ๊ตญ์–ด ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ๋Š” ์˜์–ด์™€ ๊ฐ™์€ ์™ธ๊ตญ์–ด ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ ๋ณด๋‹ค ๋ณต์žกํ•œ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค ์šฉ์–ธ์— ๋Œ€ํ•œ ํ˜•ํƒœ์†Œ ๋ถ„์„์€ ํ™œ์šฉ ์ฒ˜๋ฆฌ, ๋ถˆ๊ทœ์น™ ์ฒ˜๋ฆฌ, ์Œ์šดํ˜„์ƒ ์ฒ˜๋ฆฌ ๋“ฑ ๋งค์šฐ ๋ณต์žกํ•œ ๊ณผ์ •์„ ํฌํ•จํ•˜๊ณ  ์žˆ๋‹ค. . ์ด๋ ‡๊ฒŒ ๋ณต์žกํ•œ ๊ตฌ์กฐ์˜ ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ๋ฅผ ์„ค๊ณ„ํ•˜๊ณ  ๊ตฌํ˜„ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋ณต์žกํ•œ ์ง€์‹๊ณผ ๋ฐฉ๋Œ€ํ•œ ์‚ฌ์ „์ •๋ณด๊ฐ€ ์š”๊ตฌ๋œ๋‹ค(๊น€์žฌํ›ˆ, ์ด๊ณต์ฃผ, 2003). ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋งค์šฐ ๊นŒ๋‹ค๋กœ์šด ๊ตฌํ˜„๊ณผ์ •์„ ๊ฑฐ์น˜๊ธฐ ๋•Œ๋ฌธ์— ์œ ์ง€๋ณด์ˆ˜๋ฅผ ํ•œ๋‹ค๋Š” ๊ฒƒ์€ ํ˜•ํƒœ์†Œ ๋ถ„์„๊ธฐ๋ฅผ ๊ตฌํ˜„ํ•˜๋Š” ๊ฒƒ๋งŒํผ ์–ด๋ ค์šด ๊ฒƒ์ด ํ˜„์‹ค์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ผ๋ถ€ ์ •๋ณด๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์€ ์ฃผ์–ด์ง„ ๋ฌธ์žฅ์—์„œ ๋ช…์‚ฌ๋งŒ ์ถ”์ถœํ•˜์—ฌ ์ƒ‰์ธํ•˜๋Š”๋ฐ ์‘์šฉ๋ถ„์•ผ์— ๋”ฐ๋ผ์„œ๋Š” ๋ชจ๋“  ์ข…๋ฅ˜์˜ ํ˜•ํƒœ์†Œ ๋ถ„์„๊ฒฐ๊ณผ๋ฅผ ํ•„์š”๋กœ ํ•˜์ง€ ์•Š๋Š”๋‹ค. ๋˜ํ•œ ํ’ˆ์‚ฌ๋ถ€์ฐฉ์€ ํ˜•ํƒœ์†Œ ๋ถ„์„์—์„œ ๋ฐœ์ƒ๋œ ์—ฌ๋Ÿฌ ๋ถ„์„ ๊ฒฐ๊ณผ๋ฅผ ์ฃผ์–ด์ง„ ๋ฌธ์žฅ์— ๊ฐ€์žฅ ์ ํ•ฉํ•œ ๋ถ„์„์„ ์„ ํƒํ•˜์—ฌ ์—ฌ๋Ÿฌ ์‘์šฉ๋ถ„์•ผ์— ์‚ฌ์šฉ๋œ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋“ค์„ ํ•ด๊ฒฐํ•˜๊ตญ์–ด ํ’ˆ์‚ฌ๋ฅผ๊ธฐ ์œ„ํ•ด ์Œ์ ˆ๋‹จ์œ„๋กœ ํ•œ ๋ถ€์ฐฉํ•œ ์—ฐ๊ตฌ(์‹ฌ๊ด‘์„ญ, 2011)๊ฐ€ ์žˆ์œผ๋‚˜ ๋ณตํ•ฉ๋ช…์‚ฌ๋ฅผ ๋ถ„์„ํ•˜๊ธฐ ์–ด๋ ค์šฐ๋ฉฐ ๊ทœ์น™์„ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ทœ์น™์˜ ๋ชจํ˜ธ์„ฑ ๋ฌธ์ œ๊ฐ€ ์กด์žฌํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด์™€ ๊ฐ™์€ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ ์ž ๊ธฐ๊ณ„ํ•™์Šต ๊ธฐ๋ฒ•์„ ์ด์šฉํ•œ ์Œ์ ˆ๊ธฐ๋ฐ˜ ํ’ˆ์‚ฌ ๋ถ€์ฐฉ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ์–ธ์–ด์ฒ˜๋ฆฌ ์‹œ์Šคํ…œ์ด๋‚˜ ๋Œ€๋Ÿ‰์˜ ์‚ฌ์ „์ •๋ณด๋ฅผ ์ด์šฉํ•˜์—ฌ ํ˜•ํƒœ์†Œ ๋ถ„์„์„ ํ•˜์ง€ ์•Š๊ณ  ๊ธฐ๊ณ„ํ•™์Šต ๋„๊ตฌ๋ฅผ ์ด์šฉํ•˜์—ฌ ์Œ์ ˆ๋‹จ์œ„๋กœ ํ’ˆ์‚ฌ ๋ถ€์ฐฉ์ด ๊ฐ€๋Šฅํ•œ ํ•™์Šต๋ชจ๋ธ์„ ์ƒ์„ฑํ•˜์—ฌ ์ž…๋ ฅ๋œ ๋ฌธ์žฅ์„ ์Œ์ ˆ๋‹จ์œ„๋กœ ์Œ์ ˆํ’ˆ์‚ฌ๋ฅผ ๋ถ€์ฐฉํ•˜๊ณ  ์–ด์ ˆ๊ฒฝ๊ณ„๋ฅผ ํ‘œ์‹œํ•˜์—ฌ ๋ณตํ•ฉ๋ช…์‚ฌ์˜ ๋ถ„์„์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ์Œ์ ˆํ’ˆ์‚ฌ๊ฐ€ ๋ถ€์ฐฉ๋œ ๋ฌธ์žฅ์€ ์Œ์ ˆ ๋ณต์›๊ธฐ๋ฅผ ํ†ตํ•ด ์Œ์ ˆ์˜ ์›ํ˜• ๋ณต์› ๊ฒฐ๊ณผ๋ฅผ ์–ป๋Š”๋‹ค. ์Œ์ ˆ์„ ๋ณต์›ํ•˜๋Š” ๊ณผ์ •์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๋ชจํ˜ธ์„ฑ ๋ฌธ์ œ๋Š” Na&iumlve Bayes ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ์ด์šฉํ•ด์„œ ํ•ด๊ฒฐํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•˜๋Š” ํ˜•ํƒœ์†Œ ๋ถ„์„ ๋ฐ ํ’ˆ์‚ฌ๋ถ€์ฐฉ์€ ๊ธฐ๊ณ„ํ•™์Šต ๊ธฐ๋ฒ•์„ ์ด์šฉํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ๊ตฌํ˜„์ด ์‰ฝ๊ณ  ๊ฐ„๋‹จํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋‹จ๊ธฐ๊ฐ„ ๋‚ด์— ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋ณต์žกํ•œ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง„ ๊ธฐํƒ€ ํ’ˆ์‚ฌ ๋ถ€์ฐฉ๊ธฐ์™€ ๋น„์Šทํ•œ ์ˆ˜์ค€์˜ ์„ฑ๋Šฅ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์˜ ๊ตฌ์„ฑ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. 2์žฅ์—์„œ ๊ธฐ์กด์˜ ํ˜•ํƒœ์†Œ ๋ถ„์„ ๋ฐ ํ’ˆ์‚ฌ ๋ถ€์ฐฉ ๋ฐฉ๋ฒ•๋“ค๊ณผ ์Œ์ ˆ๊ธฐ๋ฐ˜ ์–ธ์–ด ์ฒ˜๋ฆฌ ๋ฐฉ๋ฒ•๋“ค์— ๋Œ€ํ•ด ์‚ดํŽด๋ณด๊ณ , 3์žฅ์—์„œ ๊ธฐ๊ณ„ํ•™์Šต์— ํ•„์š”ํ•œ ํ•™์Šต๋ง๋ญ‰์น˜์˜ ๊ฐ€๊ณต๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์‚ดํŽด๋ณธ๋‹ค. 4์žฅ์—์„œ ๊ธฐ๊ณ„ํ•™์Šต์„ ์ด์šฉํ•œ ์Œ์ ˆ ๊ธฐ๋ฐ˜ ํ˜•ํƒœ์†Œ ๋ถ„์„์— ๋Œ€ํ•ด ๋…ผํ•˜๋ฉฐ 5์žฅ์—์„œ๋Š” ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ๊ตฌํ˜„ํ•œ ์‹œ์Šคํ…œ์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ 6์žฅ์—์„œ ๊ฒฐ๋ก ์„ ๋งบ๊ณ  ์•ž์œผ๋กœ์˜ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ์„ ์ œ์‹œํ•œ๋‹ค.์ œ 1 ์žฅ ์„œ๋ก  ์ œ 2 ์žฅ ๊ด€๋ จ ์—ฐ๊ตฌ 2.1 ํ˜•ํƒœ์†Œ ๋ถ„์„ ๋ฐ ํ’ˆ์‚ฌ๋ถ€์ฐฉ 2.2 ํ•œ๊ตญ์–ด ํ˜•ํƒœ์†Œ ๋ถ„์„ ๋ฐฉ๋ฒ• 2.3 ํ•œ๊ตญ์–ด ํ’ˆ์‚ฌ ๋ถ€์ฐฉ ๋ฐฉ๋ฒ• 2.4 ์Œ์ ˆ์ •๋ณด๋ฅผ ์ด์šฉํ•œ ์–ธ์–ด์ฒ˜๋ฆฌ 2.4.1 ๋‹จ์–ด ๋ถ„๋ฆฌ ๋ฐ ๋ฒ”์ฃผ ๊ฒฐ์ • 2.4.2 ํ•œ๊ตญ์–ด ํ’ˆ์‚ฌ ๋ถ€์ฐฉ 2.4.3 ๋ณตํ•ฉ๋ช…์‚ฌ ๋ถ„ํ•ด 2.5 CRF๋ฅผ ์ด์šฉํ•œ ํ•œ๊ตญ์–ด ํ’ˆ์‚ฌ ๋ถ€์ฐฉ 2.5.1 ์Œ์ ˆํ’ˆ์‚ฌ ๋ถ€์ฐฉ๊ธฐ 2.5.2 ๊ทœ์น™์„ ์ด์šฉํ•œ ์›ํ˜•๋ณต์› 2.5.3 ์‹œ์Šคํ…œ์˜ ๋ฌธ์ œ์  ์ œ 3 ์žฅ ํ•™์Šต๋ง๋ญ‰์น˜์˜ ๊ตฌ์„ฑ ๋ฐ ๊ฐ€๊ณต 3.1 ํ’ˆ์‚ฌ ํƒœ๊ทธ ์ง‘ํ•ฉ 3.2 ํ•™์Šต๋ง๋ญ‰์น˜์˜ ๊ตฌ์„ฑ 3.3 ํ•™์Šต๋ง๋ญ‰์น˜ ๊ตฌ์ถ• 3.3.1 ์–ด์ ˆ ๋ฐ ํ˜•ํƒœ์†Œ ๋ถ„์„ ๊ฒฐ๊ณผ์˜ ์ •๋ ฌ 3.3.2 ์›์‹œ๋ง๋ญ‰์น˜์˜ ๊ฐ€๊ณต ์ œ 4 ์žฅ ๊ธฐ๊ณ„ํ•™์Šต์„ ์ด์šฉํ•œ ์Œ์ ˆ๊ธฐ๋ฐ˜ ํ’ˆ์‚ฌ๋ถ€์ฐฉ 4.1 ์Œ์ ˆํ’ˆ์‚ฌ ๋ถ€์ฐฉ๊ธฐ 4.1.1 ์Œ์ ˆํ’ˆ์‚ฌ ๋ถ€์ฐฉ ํ•™์Šต๋ง๋ญ‰์น˜์˜ ์ž์งˆ์ถ”์ถœ 4.1.2 ๊ธฐ๊ณ„ํ•™์Šต ๋ชจ๋ธ 4.2 ์Œ์ ˆ ๋ณต์›๊ธฐ 4.3 ํ˜•ํƒœ์†Œ ๋ณต์›๊ธฐ 4.4 ํ’ˆ์‚ฌ ๋ณต์›๊ธฐ ์ œ 5 ์žฅ ์‹คํ—˜ ๋ฐ ํ‰๊ฐ€ 5.1 ๊ธฐ๊ณ„ํ•™์Šต ๋„๊ตฌ 5.2 ์„ฑ๋Šฅํ‰๊ฐ€ ์ฒ™๋„ 5.3 ์„ฑ๋Šฅํ‰๊ฐ€ 5.3.1 ์ „์ฒด ์‹œ์Šคํ…œ์˜ ์„ฑ๋Šฅํ‰๊ฐ€ 5.3.2 ๊ฐ ์‹œ์Šคํ…œ ๋ณ„ ์„ฑ๋Šฅ ํ‰๊ฐ€ 5.4 ์˜ค๋ฅ˜๋ถ„์„ 5.4.1 ์Œ์ ˆํ’ˆ์‚ฌ ๋ถ€์ฐฉ๊ฒฐ๊ณผ์˜ ์˜ค๋ฅ˜๋ถ„์„ 5.4.2 ์Œ์ ˆ ๋ณต์› ๊ฒฐ๊ณผ์˜ ์˜ค๋ฅ˜๋ถ„์„ 5.4.3 ์Œ์ ˆํ’ˆ์‚ฌ ๋ณต์›๊ฒฐ๊ณผ์˜ ์˜ค๋ฅ˜๋ถ„์„ ์ œ 6 ์žฅ ๊ฒฐ๋ก  ๋ฐ ํ–ฅํ›„ ์—ฐ๊ตฌ๊ณผ์ œ ์ฐธ๊ณ ๋ฌธํ—Œ ๋ถ€

    The application of linguistic processing to automatic abstract generation

    Get PDF
    One approach to the problem of generating abstracts by computer is to extract from a source text those sentences which give a strong indication of the central subject matter and findings of the paper. Not surprisingly, concatenations of extracted sentences show a lack of cohesion, due partly to the frequent occurrence of anaphoric references. This paper describes the text processing which was necessary to identify these anaphors so that they may be utilised in the enhancement of the sentence selection criteria. It is assumed that sentences which contain non-anaphoric nounphrases and introduce key concepts into the text are worthy of inclusion in an abstract. The results suggest that the key concepts are indeed identified but the abstracts are too long. Further recommendations are made to continue this work in abstracting which makes use of text structure
    corecore