29,547 research outputs found

    Constraint Complexity of Realizations of Linear Codes on Arbitrary Graphs

    Full text link
    A graphical realization of a linear code C consists of an assignment of the coordinates of C to the vertices of a graph, along with a specification of linear state spaces and linear ``local constraint'' codes to be associated with the edges and vertices, respectively, of the graph. The \k-complexity of a graphical realization is defined to be the largest dimension of any of its local constraint codes. \k-complexity is a reasonable measure of the computational complexity of a sum-product decoding algorithm specified by a graphical realization. The main focus of this paper is on the following problem: given a linear code C and a graph G, how small can the \k-complexity of a realization of C on G be? As useful tools for attacking this problem, we introduce the Vertex-Cut Bound, and the notion of ``vc-treewidth'' for a graph, which is closely related to the well-known graph-theoretic notion of treewidth. Using these tools, we derive tight lower bounds on the \k-complexity of any realization of C on G. Our bounds enable us to conclude that good error-correcting codes can have low-complexity realizations only on graphs with large vc-treewidth. Along the way, we also prove the interesting result that the ratio of the \k-complexity of the best conventional trellis realization of a length-n code C to the \k-complexity of the best cycle-free realization of C grows at most logarithmically with codelength n. Such a logarithmic growth rate is, in fact, achievable.Comment: Submitted to IEEE Transactions on Information Theor

    The Treewidth of MDS and Reed-Muller Codes

    Full text link
    The constraint complexity of a graphical realization of a linear code is the maximum dimension of the local constraint codes in the realization. The treewidth of a linear code is the least constraint complexity of any of its cycle-free graphical realizations. This notion provides a useful parametrization of the maximum-likelihood decoding complexity for linear codes. In this paper, we prove the surprising fact that for maximum distance separable codes and Reed-Muller codes, treewidth equals trelliswidth, which, for a code, is defined to be the least constraint complexity (or branch complexity) of any of its trellis realizations. From this, we obtain exact expressions for the treewidth of these codes, which constitute the only known explicit expressions for the treewidth of algebraic codes.Comment: This constitutes a major upgrade of previous versions; submitted to IEEE Transactions on Information Theor

    Information visualization for DNA microarray data analysis: A critical review

    Get PDF
    Graphical representation may provide effective means of making sense of the complexity and sheer volume of data produced by DNA microarray experiments that monitor the expression patterns of thousands of genes simultaneously. The ability to use ldquoabstractrdquo graphical representation to draw attention to areas of interest, and more in-depth visualizations to answer focused questions, would enable biologists to move from a large amount of data to particular records they are interested in, and therefore, gain deeper insights in understanding the microarray experiment results. This paper starts by providing some background knowledge of microarray experiments, and then, explains how graphical representation can be applied in general to this problem domain, followed by exploring the role of visualization in gene expression data analysis. Having set the problem scene, the paper then examines various multivariate data visualization techniques that have been applied to microarray data analysis. These techniques are critically reviewed so that the strengths and weaknesses of each technique can be tabulated. Finally, several key problem areas as well as possible solutions to them are discussed as being a source for future work

    ๋จธ์‹  ๋Ÿฌ๋‹ ๊ธฐ๋ฒ•๊ณผ ์ •๋ณด ์ด๋ก ์„ ์ด์šฉํ•œ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ์ด์ƒ ๊ฐ์ง€ ๋ฐ ์ง„๋‹จ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ํ™”ํ•™์ƒ๋ฌผ๊ณตํ•™๋ถ€, 2021.8. ๋ฌธ๊ฒฝ๋นˆ.๊ณต์ • ๋ชจ๋‹ˆํ„ฐ๋ง ์‹œ์Šคํ…œ์€ ํšจ๊ณผ์ ์ด๊ณ  ์•ˆ์ „ํ•œ ๊ณต์ • ์šด์ „์„ ์œ„ํ•œ ํ•„์ˆ˜์ ์ธ ์š”์†Œ์ด๋‹ค. ๊ณต์ • ์ด์ƒ์€ ๋ชฉํ‘œ ์ƒ์„ฑ๋ฌผ์˜ ํ’ˆ์งˆ์— ์˜ํ–ฅ์„ ์ฃผ๊ฑฐ๋‚˜ ๊ณต์ •์˜ ์ •์ƒ ๊ฐ€๋™์„ ๋ฐฉํ•ดํ•˜์—ฌ ์ƒ์‚ฐ์„ฑ์„ ์ €ํ•ดํ•  ์ˆ˜ ์žˆ๋‹ค. ํญ๋ฐœ์„ฑ ๋ฐ ์ธํ™”์„ฑ ๋ฌผ์งˆ์„ ์ฃผ๋กœ ๋‹ค๋ฃจ๋Š” ํ™”ํ•™๊ณต์ •์˜ ๊ฒฝ์šฐ ๊ณต์ • ์ด์ƒ์€ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ์š”์†Œ์ธ ๊ณต์ •์˜ ์•ˆ์ „์„ ์œ„ํ˜‘ํ•˜๋Š” ์š”์†Œ๋กœ ์ž‘์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. ํ•œํŽธ, ํ˜„๋Œ€์˜ ๊ณต์ •์˜ ๋ฒ”์œ„๊ฐ€ ํ™•์žฅ๋˜๊ณ  ์ž๋™ํ™”์™€ ๊ณ ๋„ํ™”๊ฐ€ ์ง„ํ–‰๋จ์— ๋”ฐ๋ผ ์ ์  ๋” ์‹ ๋ขฐ๋„ ๋†’์€ ๋ชจ๋‹ˆํ„ฐ๋ง ์‹œ์Šคํ…œ์ด ์š”๊ตฌ๋˜๊ณ  ์žˆ๋‹ค. ๊ณต์ • ๋ชจ๋‹ˆํ„ฐ๋ง์€ ํฌ๊ฒŒ ์„ธ ๋‹จ๊ณ„๋กœ ๊ตฌ๋ถ„๋  ์ˆ˜ ์žˆ๋‹ค. ์‹ค์‹œ๊ฐ„์œผ๋กœ ๊ณต์ •์˜ ์ด์ƒ ์—ฌ๋ถ€๋ฅผ ํŒ๋‹จํ•˜๋Š” ๊ณต์ • ์ด์ƒ ๊ฐ์ง€, ๋‹ค์Œ์œผ๋กœ ๊ฐ์ง€๋œ ์ด์ƒ์˜ ์›์ธ์„ ํŒŒ์•…ํ•˜๋Š” ์ด์ƒ ์ง„๋‹จ, ๋งˆ์ง€๋ง‰์œผ๋กœ ๊ณต์ • ์ด์ƒ์˜ ์›์ธ์„ ์ œ๊ฑฐํ•˜๊ณ  ์ •์ƒ ์ƒํƒœ๋กœ ํšŒ๋ณต์‹œํ‚ค๋Š” ๋ณต์›์œผ๋กœ ๋‚˜๋‰˜์–ด์ง„๋‹ค. ํŠนํžˆ ๊ณต์ • ์ด์ƒ ๊ฐ์ง€์™€ ์ง„๋‹จ ์‹œ์Šคํ…œ์„ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•๋ก ๋“ค์ด ์ œ์•ˆ๋˜์–ด์™”์œผ๋ฉฐ, ๊ทธ ๋ฐฉ๋ฒ•๋ก ๋“ค์€ ํฌ๊ฒŒ ์„ธ ๊ฐ€์ง€๋กœ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ฌผ๋ฆฌ ์ด๋ก ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ๋ชจ๋ธ ๋ถ„์„ ๋ฐฉ๋ฒ•๊ณผ ํŠน์ • ๋ถ„์•ผ์˜ ๊ฒฝํ—˜ ์ง€์‹์„ ๋ฐ”ํƒ•์œผ๋กœ ํ•œ ์ง€์‹ ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•๋ก ์— ๋น„ํ•ด ๋ฒ”์šฉ์ ์ธ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ๊ณผ ํ˜„๋Œ€ ๊ณต์ •์˜ ํ’๋ถ€ํ•œ ๊ณต์ • ๋ฐ์ดํ„ฐ๊ฐ€ ์ œ๊ณต๋˜๋Š” ์กฐ๊ฑด์˜ ์ถฉ์กฑ์œผ๋กœ ์ธํ•ด ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•๋ก ์ด ๋„๋ฆฌ ํ™œ์šฉ๋˜์–ด์ง€๊ณ  ์žˆ๋‹ค. ๋˜ํ•œ, ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ๊ณต์ • ๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐฉ๋ฒ•๋ก ๋“ค์€ ๊ณต์ •์˜ ๊ทœ๋ชจ์™€ ๋ณต์žก๋„๊ฐ€ ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ๊ทธ ์žฅ์ ์ด ๋”์šฑ ๊ทน๋Œ€ํ™”๋˜๋Š” ํŠน์ง•์„ ๊ฐ–๋Š”๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๊ธฐ์กด์˜ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ๊ณต์ • ๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐฉ๋ฒ•๋ก ๋“ค์˜ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•œ ๊ณต์ • ์ด์ƒ ๊ฐ์ง€ ๋ฐฉ๋ฒ•๋ก ๊ณผ ์ด์ƒ ์ง„๋‹จ ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ์ „ํ†ต์ ์ธ ๊ณต์ • ์ด์ƒ ๊ฐ์ง€ ์‹œ์Šคํ…œ์€ ์ฐจ์› ์ถ•์†Œ๋ฐฉ๋ฒ•๋“ค์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ฐœ๋ฐœ๋˜์—ˆ๋‹ค. ์ฐจ์› ์ถ•์†Œ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ๊ณต์ • ์ด์ƒ ๊ฐ์ง€ ๋ชจ๋ธ์€ ๊ณต์ • ๋ฐ์ดํ„ฐ์— ๋‚ด์žฌ๋˜์–ด ์žˆ๋Š” ํŠน์ง•์œผ๋กœ ์ •์˜๋˜๋Š” ์ €์ฐจ์›์˜ ์ž ์žฌ ๊ณต๊ฐ„์„ ์ •์˜ํ•˜๊ณ , ์ด๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋ชจ๋‹ˆํ„ฐ๋ง์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. ๋Œ€ํ‘œ์ ์ธ ๋ฐฉ๋ฒ•์œผ๋กœ๋Š” ์ „ํ†ต์ ์ธ ๋‹ค๋ณ€๋Ÿ‰ ๊ณต์ • ๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐฉ๋ฒ•์ธ ์ฃผ ์„ฑ๋ถ„ ๋ถ„์„๊ณผ ๋จธ์‹  ๋Ÿฌ๋‹ ๊ธฐ๋ฒ•์ธ ์˜คํ† ์ธ์ฝ”๋”๊ฐ€ ์žˆ๋‹ค. ์ตœ๊ทผ ํ’๋ถ€ํ•œ ํ•™์Šต ๋ฐ์ดํ„ฐ์™€ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ ๋•๋ถ„์— ๋‹ค์–‘ํ•œ ๋จธ์‹  ๋Ÿฌ๋‹ ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•œ ์ด์ƒ ๊ฐ์ง€ ์‹œ์Šคํ…œ์ด ๋„๋ฆฌ ํ™œ์šฉ๋˜๊ณ  ์žˆ์ง€๋งŒ, ์•ž์„œ ์†Œ๊ฐœํ•œ ํ˜„๋Œ€ ๊ณต์ •์˜ ๋‹ค์–‘ํ•œ ํŠน์ง•์œผ๋กœ ์ธํ•ด ๋”์šฑ ํ–ฅ์ƒ๋œ ์„ฑ๋Šฅ์˜ ๋ชจ๋‹ˆํ„ฐ๋ง ๊ธฐ๋ฒ•์˜ ๊ฐœ๋ฐœ์ด ์š”๊ตฌ๋˜์–ด์ง€๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ๋ชจ๋‹ˆํ„ฐ๋ง ์‹œ์Šคํ…œ์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•ด์„œ ๋ชจ๋ธ์˜ ๊ตฌ์กฐ๋ฅผ ๋ณ€๊ฒฝํ•˜๊ฑฐ๋‚˜ ๋ชจ๋ธ์˜ ํ•™์Šต ์ ˆ์ฐจ๋ฅผ ๋ณ€ํ˜•ํ•˜๋Š” ์ ‘๊ทผ๋ฒ•๋“ค์ด ์ฃผ๋กœ ์ œ์•ˆ๋˜์—ˆ๋‹ค. ํ•˜์ง€๋งŒ, ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•๋ก ๋“ค์€ ๊ถ๊ทน์ ์œผ๋กœ ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ํ’ˆ์งˆ์— ์˜์กด์ ์ด๋ผ๋Š” ํŠน์„ฑ์€ ์—ฌ์ „ํžˆ ๋‚จ์•„์žˆ๋‹ค. ์ฆ‰, ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ๋ถ€์กฑํ•œ ์ •๋ณด๋ฅผ ๋ณด์™„ํ•จ์œผ๋กœ์จ ๋ชจ๋‹ˆํ„ฐ๋ง ์‹œ์Šคํ…œ์˜ ์™„์„ฑ๋„๋ฅผ ๋†’์ผ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•๋ก ์ด ์š”๊ตฌ๋œ๋‹ค. ๋”ฐ๋ผ์„œ, ๋ณธ ์—ฐ๊ตฌ๋Š” ์ฒซ ๋ฒˆ์งธ ์ฃผ์ œ๋กœ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ธฐ๋ฒ•์„ ๊ฒฐํ•ฉํ•œ ๊ณต์ • ์ด์ƒ ๊ฐ์ง€ ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ธฐ๋ฒ•์€ ์—ฌ๋Ÿฌ ์ง‘ํ•ฉ์„ ๊ตฌ๋ถ„ํ•˜๋Š” ๋ถ„๋ฅ˜๊ธฐ ๋ชจ๋ธ๋ง์‹œ์— ํŠน์ • ์ง‘ํ•ฉ์˜ ํ•™์Šต ๋ฐ์ดํ„ฐ๊ฐ€ ๋ถ€์กฑํ•œ ๊ฒฝ์šฐ์— ์ฃผ๋กœ ํ™œ์šฉ๋˜์—ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ฒฝ์šฐ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์„ ํ†ตํ•ด ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ๊ท ํ˜•์„ ๋งž์ถค์œผ๋กœ์จ ๋ชจ๋ธ์˜ ํ•™์Šต ํšจ์œจ์„ ์ฆ์ง„์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค. ๋ฐ˜๋ฉด์—, ๋ณธ ์—ฐ๊ตฌ์—์„œ์˜ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์€ ํ•œ ์ง‘ํ•ฉ ๋‚ด์—์„œ์˜ ๋ถˆ๊ท ํ˜•์„ ์™„ํ™”ํ•˜๊ธฐ ์œ„ํ•œ ๋ชฉ์ ์œผ๋กœ ์‚ฌ์šฉ๋˜์—ˆ๋‹ค. ์ •์ƒ ์กฐ๊ฑด์˜ ๊ณต์ • ๋ฐ์ดํ„ฐ๋Š” ์ •์ƒ๊ณผ ์ด์ƒ์˜ ๊ฒฝ๊ณ„์— ๋ถ„ํฌํ•˜๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ํฌ๋ฐ•ํ•˜๊ฒŒ ์กด์žฌํ•˜๋Š” ํŠน์ง•์„ ๊ฐ–๋Š”๋‹ค. ์ด์ƒ ๊ฐ์ง€ ์‹œ์Šคํ…œ์ด ์ •์ƒ ์ƒํƒœ์˜ ์ €์ฐจ์› ํŠน์ง• ๊ณต๊ฐ„์„ ํ•™์Šตํ•˜๊ณ , ์ด๋ฅผ ํ†ตํ•ด ์ •์ƒ๊ณผ ์ด์ƒ์„ ๊ตฌ๋ถ„ํ•˜๋Š” ๋ชจ๋ธ์ด๋ผ๋Š” ์ ์„ ๊ณ ๋ คํ•˜๋ฉด ๊ฒฝ๊ณ„ ์˜์—ญ์˜ ๋ฐ์ดํ„ฐ์˜ ์ฆ๊ฐ•์ด ํŠน์ง• ๊ณต๊ฐ„ ํ•™์Šต์— ๊ธ์ •์ ์œผ๋กœ ์ž‘์šฉํ•  ๊ฒƒ์„ ๊ธฐ๋Œ€ํ•ด ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ์ด์™€ ๊ฐ™์€ ๋งฅ๋ฝ์—์„œ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. ๋จผ์ €, ๊ธฐ์กด์˜ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•˜์—ฌ ์ธ๊ณต ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ์œ„ํ•œ ์ƒ์„ฑ๋ชจ๋ธ์ธ ๋ณ€๋ถ„ ์˜คํ† ์ธ์ฝ”๋”๋ฅผ ํ•™์Šตํ•œ๋‹ค. ์ƒ์„ฑ ๋ชจ๋ธ๋กœ ํ•™์Šตํ•œ ์ •์ƒ ์šด์ „ ๋ฐ์ดํ„ฐ์˜ ์ €์ฐจ์› ๋ถ„ํฌ์˜ ๊ฒฝ๊ณ„์˜์—ญ์— ํ•ด๋‹นํ•˜๋Š” ๋ฐ์ดํ„ฐ๋“ค์„ ์ธ๊ณต ๋ฐ์ดํ„ฐ๋กœ ์ƒ์„ฑํ•˜์—ฌ ํ•™์Šต๋ฐ์ดํ„ฐ์— ์ฆ๊ฐ•์‹œํ‚จ๋‹ค. ์ด๋ ‡๊ฒŒ ์ฆ๊ฐ•๋œ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ด์ƒ ๊ฐ์ง€ ๋ชจ๋ธ์„ ์œ„ํ•œ ๋จธ์‹  ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ์ฐจ์› ์ถ•์†Œ ๋ฐฉ๋ฒ•์ธ ์˜คํ† ์ธ์ฝ”๋”๋ฅผ ํ•™์Šตํ•˜์—ฌ ์ด์ƒ ๊ฐ์ง€ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•œ๋‹ค. ์ฆ๊ฐ•๋œ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ ์˜คํ† ์ธ์ฝ”๋”์˜ ์ž ์žฌ ๊ณต๊ฐ„ ํ•™์Šต์ด ๋” ํšจ๊ณผ์ ์œผ๋กœ ์ˆ˜ํ–‰๋  ์ˆ˜ ์žˆ๊ณ , ์ด๋Š” ๊ณง ์ •์ƒ๊ณผ ์ด์ƒ ์ƒํƒœ๋ฅผ ๊ตฌ๋ถ„ํ•˜๋Š” ์ด์ƒ ๊ฐ์ง€ ์‹œ์Šคํ…œ์˜ ์„ฑ๋Šฅ ๊ฐœ์„ ์œผ๋กœ ์ด์–ด์งˆ ์ˆ˜ ์žˆ๋‹ค. ์ฐจ์› ์ถ•์†Œ ๊ธฐ๋ฒ•์€ ์ „ํ†ต์ ์ธ ์ด์ƒ ์ง„๋‹จ ๋ฐฉ๋ฒ•์œผ๋กœ๋„ ํ™œ์šฉ๋˜์—ˆ๋‹ค. ํ•˜์ง€๋งŒ, ์ด๋Š” ์ฐจ์› ์ถ•์†Œ์‹œ์˜ ์ •๋ณด์˜ ์†์‹ค๋กœ ์ธํ•ด ์ €์กฐํ•˜๊ณ  ์ผ๊ด€์„ฑ์ด ๋ถ€์กฑํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. ์ „ํ†ต์ ์ธ ๋ฐฉ๋ฒ•์˜ ํ•œ๊ณ„์ ์„ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด ๊ณต์ • ๋ณ€์ˆ˜ ๊ฐ„์˜ ์ธ๊ณผ ๊ด€๊ณ„๋ฅผ ์ง์ ‘์ ์œผ๋กœ ๋ถ„์„ํ•˜๋Š” ๊ธฐ๋ฒ•๋“ค์ด ๊ฐœ๋ฐœ๋˜์—ˆ๋‹ค. ๊ทธ ์ค‘ ํ•˜๋‚˜์ธ ์ •๋ณด ์ด๋ก  ๊ธฐ๋ฐ˜์˜ ์ „๋‹ฌ ์—”ํŠธ๋กœํ”ผ๋Š” ํŠน์ • ๋ชจ๋ธ์ด๋‚˜ ์„ ํ˜• ๊ฐ€์ •์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ๋น„์„ ํ˜• ๊ณต์ •์˜ ์ด์ƒ ์ง„๋‹จ์— ๋Œ€ํ•ด ์ผ๋ฐ˜์ ์œผ๋กœ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค๊ณ  ์•Œ๋ ค์ ธ ์žˆ๋‹ค. ํ•˜์ง€๋งŒ, ์ „๋‹ฌ ์—”ํŠธ๋กœํ”ผ๋ฅผ ์ด์šฉํ•œ ์ธ๊ณผ๊ด€๊ณ„ ๋ถ„์„ ๋ฐฉ๋ฒ•์€ ๊ณ ๋น„์šฉ์˜ ๋ฐ€๋„ ์ถ”์ •์„ ํ•„์š”๋กœ ํ•œ๋‹ค๋Š” ๋‹จ์ ์œผ๋กœ ์ธํ•ด ์†Œ๊ทœ๋ชจ ๊ณต์ •์— ๋Œ€ํ•ด์„œ๋งŒ ์ œํ•œ์ ์œผ๋กœ ์ ์šฉ๋˜์–ด ์™”๋‹ค. ์ด๋Ÿฌํ•œ ํ•œ๊ณ„์ ์„ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ์•ˆ์œผ๋กœ ๊ทธ๋ž˜ํ”„ ๋ผ์˜๋ผ๋Š” ์กฐ์ • ๋ฐฉ๋ฒ•์„ ์ „๋‹ฌ ์—”ํŠธ๋กœํ”ผ์™€ ๊ฒฐํ•ฉํ•œ ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๊ทธ๋ž˜ํ”„ ๋ผ์˜๋Š” ๋น„ ๋ฐฉํ–ฅ์„ฑ ๊ทธ๋ž˜ํ”„ ๋ชจ๋ธ์—์„œ ์„ฑ๊ธด ๊ตฌ์กฐ๋ฅผ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•๋ก ์œผ๋กœ ์ „์ฒด ๊ณต์ • ๊ทธ๋ž˜ํ”„๋กœ๋ถ€ํ„ฐ ์ƒ๊ด€ ๊ด€๊ณ„๊ฐ€ ๋†’์€ ๋ถ€๋ถ„ ๊ทธ๋ž˜ํ”„๋ฅผ ์ถ”์ถœํ•ด๋‚ผ ์ˆ˜ ์žˆ๋‹ค. ๊ฐ€์žฅ ๋†’์€ ์ƒ๊ด€ ๊ด€๊ณ„๋ฅผ ๊ฐ–๋Š” ๋ถ€๋ถ„ ๊ทธ๋ž˜ํ”„์™€ ๋…๋ฆฝ๋œ ๋‚˜๋จธ์ง€ ๋ณ€์ˆ˜๋“ค์ด ๊ทธ๋ž˜ํ”„ ๋ผ์˜์˜ ์ถœ๋ ฅ์œผ๋กœ ์ œ์‹œ๋˜๊ธฐ ๋•Œ๋ฌธ์—, ๋‚˜๋จธ์ง€ ๋ณ€์ˆ˜๋“ค์— ๋Œ€ํ•œ ๋ฐ˜๋ณต์ ์ธ ์ ์šฉ์„ ํ†ตํ•ด ์ „์ฒด ๊ณต์ • ๋ณ€์ˆ˜๋“ค์„ ์—ฐ๊ด€์„ฑ์ด ๋†’์€ ๋ช‡๋ช‡์˜ ๋ถ€๋ถ„ ๊ทธ๋ž˜ํ”„๋กœ ๋ณ€ํ™˜ํ•  ์ˆ˜ ์žˆ๋‹ค. ์—ฐ๊ด€์„ฑ์ด ๋‚ฎ์€ ๊ด€๊ณ„๋ฅผ ์‚ฌ์ „์— ๋ฐฐ์ œํ•จ์œผ๋กœ์จ ์ธ๊ณผ ๊ด€๊ณ„ ๋ถ„์„์˜ ๋Œ€์ƒ์„ ํฌ๊ฒŒ ์ถ•์†Œํ•  ์ˆ˜ ์žˆ๋‹ค. ์ฆ‰, ์ด ๋‹จ๊ณ„๋ฅผ ํ†ตํ•ด ๊ณ ๋น„์šฉ์˜ ์ „๋‹ฌ ์—”ํŠธ๋กœํ”ผ์˜ ํ•œ๊ณ„์ ์„ ์™„ํ™”ํ•˜๊ณ , ๊ทธ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ํ™•์žฅํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค. ๋‘ ๋ฐฉ๋ฒ•์„ ๊ฒฐํ•ฉํ•˜์—ฌ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ด์ƒ ์ง„๋‹จ ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋จผ์ €, ๊ณต์ • ์ด์ƒ์ด ๋ฐœ์ƒํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๋Œ€์ƒ์œผ๋กœ ๋ฐ˜๋ณต์  ๊ทธ๋ž˜ํ”„ ๋ผ์˜๋ฅผ ์ ์šฉํ•˜์—ฌ ์ „์ฒด ๊ณต์ • ๋ณ€์ˆ˜๋“ค์„ ์—ฐ๊ด€์„ฑ์ด ๋†’์€ 5๊ฐœ์˜ ๋ถ€๋ถ„ ์ง‘ํ•ฉ์œผ๋กœ ๊ตฌ๋ถ„ํ•œ๋‹ค. ๊ตฌ๋ถ„๋œ ๊ฐ๊ฐ์˜ ๋ถ€๋ถ„ ์ง‘ํ•ฉ์„ ๋Œ€์ƒ์œผ๋กœ ์ „๋‹ฌ ์—”ํŠธ๋กœํ”ผ๋ฅผ ์ด์šฉํ•œ ์ธ๊ณผ๊ด€๊ณ„ ์ฒ™๋„๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ , ๊ฐ€์žฅ ์œ ๋ ฅํ•œ ์›์ธ ๋ณ€์ˆ˜๋ฅผ ํŒ๋ณ„ํ•ด๋‚ธ๋‹ค. ์ฆ‰, ๊ทธ๋ž˜ํ”„ ๋ผ์˜๋ฅผ ํ†ตํ•ด ํšจ๊ณผ์ ์œผ๋กœ ์ธ๊ณผ๊ด€๊ณ„ ๋ถ„์„์˜ ๋Œ€์ƒ์„ ์ถ•์†Œํ•จ์œผ๋กœ์จ ๋ถˆํ•„์š”ํ•œ ์ „๋‹ฌ ์—”ํŠธ๋กœํ”ผ ๊ณ„์‚ฐ์œผ๋กœ ๋ฐœ์ƒํ•˜๋Š” ๋น„์šฉ์„ ํฌ๊ฒŒ ์ ˆ๊ฐํ•  ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์€ ๋Œ€๊ทœ๋ชจ ์‚ฐ์—… ๊ณต์ •์— ๋Œ€ํ•ด์„œ๋„ ์ „๋‹ฌ ์—”ํŠธ๋กœํ”ผ๋ฅผ ์ด์šฉํ•œ ์ด์ƒ ์ง„๋‹จ ๊ธฐ๋ฒ•์˜ ์ ์šฉ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ–ˆ๋‹ค๋Š” ์ ์—์„œ ์˜์˜๊ฐ€ ์žˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์˜ ์„ฑ๋Šฅ์„ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์‚ฐ์—… ๊ทœ๋ชจ์˜ ๋ฒค์น˜๋งˆํฌ ๊ณต์ • ๋ชจ๋ธ์ธ ํ…Œ๋„ค์‹œ ์ด์ŠคํŠธ๋งŒ ๊ณต์ •์— ์ด๋ฅผ ์ ์šฉํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ๋ถ„์„ํ•˜์˜€๋‹ค. ๋ฒค์น˜๋งˆํฌ ๊ณต์ • ๋ชจ๋ธ์€ ๋‹ค์ˆ˜์˜ ๋‹จ์œ„ ๊ณต์ •์„ ํฌํ•จํ•˜๊ณ , ์žฌ์ˆœํ™˜ ํ๋ฆ„๊ณผ ํ™”ํ•™ ๋ฐ˜์‘์„ ํฌํ•จํ•˜๊ณ  ์žˆ์–ด ์‹ค์ œ ๊ณต์ •๊ณผ ๊ฐ™์€ ๋ณต์žก๋„๋ฅผ ๊ฐ–๋Š” ๊ณต์ • ๋ชจ๋ธ๋กœ์„œ ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•๋ก ๋“ค์˜ ์„ฑ๋Šฅ์„ ์‹œํ—˜ํ•ด๋ณด๊ธฐ์— ์ ํ•ฉํ–ˆ๋‹ค. ์„ฑ๋Šฅ ํ…Œ์ŠคํŠธ๋Š” ํ…Œ๋„ค์‹œ ์ด์ŠคํŠธ๋งŒ ๊ณต์ • ๋ชจ๋ธ์— ํฌํ•จ๋˜์–ด ์žˆ๋Š” ์‚ฌ์ „์— ์ •์˜๋œ 28๊ฐœ ์ข…๋ฅ˜์˜ ๊ณต์ • ์ด์ƒ์— ๋Œ€ํ•˜์—ฌ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค. ์ œ์•ˆํ•œ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์„ ์ ‘๋ชฉํ•œ ๊ณต์ • ์ด์ƒ ๊ฐ์ง€ ๋ฐฉ๋ฒ•๋ก ์€ ๊ธฐ์กด ๋ฐฉ๋ฒ•๋ก  ๋Œ€๋น„ ๋†’์€ ์ด์ƒ ๊ฐ์ง€์œจ์„ ๋ณด์˜€๋‹ค. ์ผ๋ถ€์˜ ๊ฒฝ์šฐ ์ด์ƒ ๊ฐ์ง€ ์ง€์—ฐ์ธก๋ฉด์—์„œ๋„ ๊ฐœ์„ ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋˜ํ•œ, ์ด์ƒ ์ง„๋‹จ์„ ์œ„ํ•ด ์ „๋‹ฌ ์—”ํŠธ๋กœํ”ผ์™€ ๊ทธ๋ž˜ํ”„ ๋ผ์˜๋ฅผ ๊ฒฐํ•ฉํ•œ ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•๋ก ์€ ์ „์ฒด ๊ณต์ •์— ์ „๋‹ฌ ์—”ํŠธ๋กœํ”ผ๋ฅผ ์ง์ ‘ ์ ์šฉํ•œ ๊ธฐ์กด์˜ ๋ฐฉ๋ฒ•๋ก  ๋Œ€๋น„ ์•ฝ 20%์˜ ๊ณ„์‚ฐ ๋น„์šฉ๋งŒ์œผ๋กœ๋„ ํšจ๊ณผ์ ์œผ๋กœ ์ด์ƒ์˜ ์›์ธ์„ ํŒŒ์•…ํ•ด๋‚ด๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋˜ํ•œ, ์„ฑ๋Šฅ ํ…Œ์ŠคํŠธ ๊ฒฐ๊ณผ๋Š” ์ผ๋ถ€ ๊ณต์ • ์ด์ƒ์˜ ๊ฒฝ์šฐ ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•๋ก ์ด ๊ธฐ์กด์˜ ๋ฐฉ๋ฒ•๋ณด๋‹ค ๋” ์ •ํ™•ํ•œ ์ด์ƒ ์ง„๋‹จ ๊ฒฐ๊ณผ๋ฅผ ์ œ์‹œํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์˜€๋‹ค.Process monitoring system is an essential component for efficient and safe operation. Process faults can affect the quality of the product or interfere with the normal operation of the process, hindering productivity. In the case of chemical processes dealing with explosive and flammable materials, process fault can act as a threat to the process safety which should be the top priority. Meanwhile, modern processes demand a more advanced monitoring system as the scope of the process expands and the process automation and intensification progress. The framework of the process monitoring system can be classified into three stages. It is divided into process fault detection that determines the existence of process faults in a system in real-time, fault diagnosis that identifies the root cause of the faults, and finally, process recovery that removes the cause of the fault and normalizes the process. In particular, various methodologies for fault detection and diagnosis have been proposed, and they can be categorized into three approaches. Data-driven methodologies are widely utilized due to the general applicability and the conditions under which abundant process data are provided compared to analytical methods based on the detailed first-principle models and knowledge-based methods on the specific domain knowledge. Furthermore, the advantage of the data-driven methods can be prominent as the scale and complexity of the process increase. In this thesis, fault detection and diagnosis methodologies to improve the performance of existing data-driven methods are proposed. Conventional data-driven fault detection systems have been developed based on dimensionality reduction methods. The fault detection models using dimensionality reduction identify the low dimensional latent space defined by features inherent in process data, performing process monitoring based on it. As the representative methods, there are principal component analysis which is the conventional multivariate process monitoring approach, and autoencoder which is one of the machine learning techniques. Although the monitoring systems using various machine learning techniques have been widely utilized thanks to sufficient process data and good performance, a monitoring scheme that improves the performance of up-to-date methods is required due to the aforementioned factors. To improve the performance of such a data-driven monitoring system, approaches that change the structure of the model or learning procedure have been mainly discussed. Meanwhile, the nature that data-driven methods are ultimately dependent on the quality of the training dataset still remains. In other words, a methodology to enhance the completeness of the monitoring system by supplementing the insufficient information in the training dataset is required. Thus, a process fault detection method that combines data augmentation techniques is proposed in the first part of the thesis. Data augmentation has been mostly employed to manage the deficiency of certain classes, between-class imbalance, in a classification problem. In this case, data augmentation can be effectively applied to improve the training performance by balancing the amount of each class. Data augmentation in this study, on the other hand, is applied to alleviate the with-in-class imbalance. The process data in normal operation has characteristics that the data samples in the borderline of normal and abnormal state are relatively sparse. Given that the modeling of the fault detection system corresponds to defining the low-dimensional feature space and monitoring the system in it, it can be expected that the supplement of the samples on the boundary of the normal state would positively affect the training process. In this context, the proposed method is as follows. First, variational autoencoder which is a generative model is constructed to generate the synthetic data using the original training data. The sample vector corresponding to the boundary region of the low-dimensional distribution of the normal state learned by the generative model is generated as the synthetic data and augmented to the original training data. Based on the augmented training data the fault detection system is established using autoencoder, a machine learning algorithm for feature extraction. The feature learning of autoencoder can be performed more effectively by using the augmented training data, which can lead to the improvement of the fault detection system that distinguishes between normal and abnormal states. The dimensionality reduction methods have been also utilized as the fault isolation method known as the contribution charts. However, the approaches showed limited performance and inconsistent analysis results due to the information loss during the dimension reduction process. To resolve the limitations of the conventional method, the approaches that directly figure out the causal relationships between process variables have been developed. As one of them, transfer entropy, an information-theoretic causality measure, is generally known to have good fault isolation performance in the fault isolation of nonlinear processes because it is neither linearity assumption nor model-based method. However, it has been limitedly applied to the small-scale process because of the drawback that the causal analysis using transfer entropy requires costly density estimation. To resolve the limitation, the method that combines graphical lasso which is a regularization method with transfer entropy is proposed. Graphical lasso is a sparse structure learning algorithm of the undirected graph model, which can be used to sort out the most relevant sub-group in the entire graph model. As graphical lasso algorithm presents the output as a highly correlated subgroup with the rest of the variables, the iterative application of graphical lasso can substitute the entire process into several subgroups. This process can greatly reduce the subject of causal analysis by excluding relationships with little relevance in advance. Accordingly, the limitation of demanding cost of transfer entropy can be mitigated and thus the applicability of fault isolation using transfer entropy can be expanded through this process. Combining the two methods, the following fault isolation method is proposed. First of all, the entire process variables are divided into the five most relevant subgroups based on the data when the fault has occurred. The root cause variable can be isolated from the most significant relationship by calculating the causality measure using transfer entropy only within each subgroup. It is possible to significantly reduce the computational cost due to transfer entropy by efficiently decreasing the subject of causal analysis through graphical lasso. Therefore, the proposed method is noteworthy in that it enables the application of fault isolation using transfer entropy for industrial-scale processes. The proposed methodologies in each stage are verified by applying them to the industrial-scale benchmark process model, the Tennessee Eastman process (TEP). The benchmark process model is suitable to test the performance of the proposed methods because it is a process model with similar complexity as a real chemical process involving multiple unit operations, recycle stream, and chemical reactions in it. The performance test is performed with respect to the 28 predefined process faults scenarios in TEP model. Application results of the proposed fault detection method performed better than the case using the conventional approach in terms of the fault detection rate. In some fault cases, the fault detection delay, the time required to first detect a fault since it occurred, also showed improvement. Fault isolation results by the proposed method integrating transfer entropy with graphical lasso showed that it could effectively identify the cause of the process fault with only about 20% of the computational cost compared to the base case that directly applied the transfer entropy to the entire process for fault isolation. In addition, the demonstration results suggested that the proposed method could outperform the base case in terms of accuracy in some particular cases.Chapter 1 Introduction -2 1.1. Research Motivation -2 1.2. Research Objectives 5 1.3. Outline of the Thesis 7 Chapter 2 Backgrounds and Preliminaries 8 2.1. Autoencoder 8 2.2. Variational Autoencoder 3 2.3. Transfer Entropy 7 2.4. Graphical Lasso 11 Chapter 3 Process Fault Detection Using Autoencoder with Data Augmentation via Variational Autoencoder 23 3.1. Introduction 23 3.2. Process Fault Detection Model Integrated with Data Augmentation 28 3.2.1. Info-Variational Autoencoder for Data Augmentation 31 3.2.2. Autoencoder for Process Monitoring 33 3.3. Case study and Discussion 34 3.3.1. Tennessee Eastman Process 35 3.3.2. Implementation of the Proposed Methodology 39 3.3.3. Discussion of the Results 64 Chapter 4 Process Fault Isolation using Transfer Entropy and Graphical Lasso 80 4.1. Introduction 80 4.2. Fault Isolation using Transfer Entropy Integrated with Graphical Lasso 86 4.2.1. Graphical Lasso for Sub-group Modeling 89 4.2.2. Transfer Entropy for Fault Isolation 90 4.3. Case study and Discussion 1 92 4.3.1. Selective Catalytic Reduction Process 92 4.3.2. Implementation of the Proposed Methodology 97 4.3.3. Discussion of the Results 99 4.4. Case study and Discussion 2 102 4.4.1. Tennessee Eastman Process 102 4.4.2. Implementation of the Proposed Methodology 108 4.4.3. Discussion of the Results 109 Chapter 5 Concluding Remarks 130 5.1. Summary of the Contributions 130 5.2. Future Work 133 Bibliography 135๋ฐ•

    Representation Learning: A Review and New Perspectives

    Full text link
    The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, auto-encoders, manifold learning, and deep networks. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning
    • โ€ฆ
    corecore