1,711 research outputs found

    A quantum Jensen-Shannon graph kernel for unattributed graphs

    Get PDF
    In this paper, we use the quantum Jensen-Shannon divergence as a means of measuring the information theoretic dissimilarity of graphs and thus develop a novel graph kernel. In quantum mechanics, the quantum Jensen-Shannon divergence can be used to measure the dissimilarity of quantum systems specified in terms of their density matrices. We commence by computing the density matrix associated with a continuous-time quantum walk over each graph being compared. In particular, we adopt the closed form solution of the density matrix introduced in Rossi et al. (2013) [27,28] to reduce the computational complexity and to avoid the cumbersome task of simulating the quantum walk evolution explicitly. Next, we compare the mixed states represented by the density matrices using the quantum Jensen-Shannon divergence. With the quantum states for a pair of graphs described by their density matrices to hand, the quantum graph kernel between the pair of graphs is defined using the quantum Jensen-Shannon divergence between the graph density matrices. We evaluate the performance of our kernel on several standard graph datasets from both bioinformatics and computer vision. The experimental results demonstrate the effectiveness of the proposed quantum graph kernel

    HAQJSK: Hierarchical-Aligned Quantum Jensen-Shannon Kernels for Graph Classification

    Full text link
    In this work, we propose a family of novel quantum kernels, namely the Hierarchical Aligned Quantum Jensen-Shannon Kernels (HAQJSK), for un-attributed graphs. Different from most existing classical graph kernels, the proposed HAQJSK kernels can incorporate hierarchical aligned structure information between graphs and transform graphs of random sizes into fixed-sized aligned graph structures, i.e., the Hierarchical Transitive Aligned Adjacency Matrix of vertices and the Hierarchical Transitive Aligned Density Matrix of the Continuous-Time Quantum Walk (CTQW). For a pair of graphs to hand, the resulting HAQJSK kernels are defined by measuring the Quantum Jensen-Shannon Divergence (QJSD) between their transitive aligned graph structures. We show that the proposed HAQJSK kernels not only reflect richer intrinsic global graph characteristics in terms of the CTQW, but also address the drawback of neglecting structural correspondence information arising in most existing R-convolution kernels. Furthermore, unlike the previous Quantum Jensen-Shannon Kernels associated with the QJSD and the CTQW, the proposed HAQJSK kernels can simultaneously guarantee the properties of permutation invariant and positive definiteness, explaining the theoretical advantages of the HAQJSK kernels. Experiments indicate the effectiveness of the proposed kernels

    A nested alignment graph kernel through the dynamic time warping framework

    Get PDF
    In this paper, we propose a novel nested alignment graph kernel drawing on depth-based complexity traces and the dynamic time warping framework. Specifically, for a pair of graphs, we commence by computing the depth-based complexity traces rooted at the centroid vertices. The resulting kernel for the graphs is defined by measuring the global alignment kernel, which is developed through the dynamic time warping framework, between the complexity traces. We show that the proposed kernel simultaneously considers the local and global graph characteristics in terms of the complexity traces, but also provides richer statistic measures by incorporating the whole spectrum of alignment costs between these traces. Our experiments demonstrate the effectiveness and efficiency of the proposed kernel

    A transitive aligned Weisfeiler-Lehman subtree kernel

    Get PDF
    In this paper, we develop a new transitive aligned Weisfeiler-Lehman subtree kernel. This kernel not only overcomes the shortcoming of ignoring correspondence information between isomorphic substructures that arises in existing R-convolution kernels, but also guarantees the transitivity between the correspondence information that is not available for existing matching kernels. Our kernel outperforms state-of-the-art graph kernels in terms of classification accuracy on standard graph datasets

    Entropic Dynamic Time Warping Kernels for Co-evolving Financial Time Series Analysis

    Get PDF
    In this work, we develop a novel framework to measure the similarity between dynamic financial networks, i.e., time-varying financial networks. Particularly, we explore whether the proposed similarity measure can be employed to understand the structural evolution of the financial networks with time. For a set of time-varying financial networks with each vertex representing the individual time series of a different stock and each edge between a pair of time series representing the absolute value of their Pearson correlation, our start point is to compute the commute time matrix associated with the weighted adjacency matrix of the network structures, where each element of the matrix can be seen as the enhanced correlation value between pairwise stocks. For each network, we show how the commute time matrix allows us to identify a reliable set of dominant correlated time series as well as an associated dominant probability distribution of the stock belonging to this set. Furthermore, we represent each original network as a discrete dominant Shannon entropy time series computed from the dominant probability distribution. With the dominant entropy time series for each pair of financial networks to hand, we develop a similarity measure based on the classical dynamic time warping framework, for analyzing the financial time-varying networks. We show that the proposed similarity measure is positive definite and thus corresponds to a kernel measure on graphs. The proposed kernel bridges the gap between graph kernels and the classical dynamic time warping framework for multiple financial time series analysis. Experiments on time-varying networks extracted through New York Stock Exchange (NYSE) database demonstrate the effectiveness of the proposed approach.Comment: Previously, the original version of this manuscript appeared as arXiv:1902.09947v2, that was submitted as a replacement by a mistake. Now, that article has been replaced to correct the error, and this manuscript is distinct from that articl

    Quantum kernels for unattributed graphs using discrete-time quantum walks

    Get PDF
    In this paper, we develop a new family of graph kernels where the graph structure is probed by means of a discrete-time quantum walk. Given a pair of graphs, we let a quantum walk evolve on each graph and compute a density matrix with each walk. With the density matrices for the pair of graphs to hand, the kernel between the graphs is defined as the negative exponential of the quantum Jensenโ€“Shannon divergence between their density matrices. In order to cope with large graph structures, we propose to construct a sparser version of the original graphs using the simplification method introduced in Qiu and Hancock (2007). To this end, we compute the minimum spanning tree over the commute time matrix of a graph. This spanning tree representation minimizes the number of edges of the original graph while preserving most of its structural information. The kernel between two graphs is then computed on their respective minimum spanning trees. We evaluate the performance of the proposed kernels on several standard graph datasets and we demonstrate their effectiveness and efficiency

    A Hierarchical Transitive-Aligned Graph Kernel for Un-attributed Graphs

    Get PDF
    In this paper, we develop a new graph kernel, namely the Hierarchical Transitive-Aligned kernel, by transitively aligning the vertices between graphs through a family of hierarchical prototype graphs. Comparing to most existing state-of-the-art graph kernels, the proposed kernel has three theoretical advantages. First, it incorporates the locational correspondence information between graphs into the kernel computation, and thus overcomes the shortcoming of ignoring structural correspondences arising in most R-convolution kernels. Second, it guarantees the transitivity between the correspondence information that is not available for most existing matching kernels. Third, it incorporates the information of all graphs under comparisons into the kernel computation process, and thus encapsulates richer characteristics. By transductively training the C-SVM classifier, experimental evaluations demonstrate the effectiveness of the new transitive-aligned kernel. The proposed kernel can outperform state-of-the-art graph kernels on standard graph-based datasets in terms of the classification accuracy

    RNA ์ƒํ˜ธ์ž‘์šฉ ๋ฐ DNA ์„œ์—ด์˜ ์ •๋ณดํ•ด๋…์„ ์œ„ํ•œ ๊ธฐ๊ณ„ํ•™์Šต ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€,2020. 2. ๊น€์„ .์ƒ๋ฌผ์ฒด ๊ฐ„ ํ‘œํ˜„ํ˜•์˜ ์ฐจ์ด๋Š” ๊ฐ ๊ฐœ์ฒด์˜ ์œ ์ „์  ์ •๋ณด ์ฐจ์ด๋กœ๋ถ€ํ„ฐ ๊ธฐ์ธํ•œ๋‹ค. ์œ ์ „์  ์ •๋ณด์˜ ๋ณ€ํ™”์— ๋”ฐ๋ผ์„œ, ๊ฐ ์ƒ๋ฌผ์ฒด๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ์ข…์œผ๋กœ ์ง„ํ™”ํ•˜๊ธฐ๋„ ํ•˜๊ณ , ๊ฐ™์€ ๋ณ‘์— ๊ฑธ๋ฆฐ ํ™˜์ž๋ผ๋„ ์„œ๋กœ ๋‹ค๋ฅธ ์˜ˆํ›„๋ฅผ ๋ณด์ด๊ธฐ๋„ ํ•œ๋‹ค. ์ด์ฒ˜๋Ÿผ ์ค‘์š”ํ•œ ์ƒ๋ฌผํ•™์  ์ •๋ณด๋Š” ๋Œ€์šฉ๋Ÿ‰ ์‹œํ€€์‹ฑ ๋ถ„์„ ๊ธฐ๋ฒ• ๋“ฑ์„ ํ†ตํ•ด ๋‹ค์–‘ํ•œ ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ๋กœ ์ธก์ •๋œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ๋Š” ๊ณ ์ฐจ์› ํŠน์ง• ๋ฐ ์†Œ๊ทœ๋ชจ ํ‘œ๋ณธ ๋ฐ์ดํ„ฐ์ด๊ธฐ ๋•Œ๋ฌธ์—, ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์ƒ๋ฌผํ•™์  ์ •๋ณด๋ฅผ ํ•ด์„ํ•˜๋Š” ๊ฒƒ์€ ๋งค์šฐ ์–ด๋ ค์šด ๋ฌธ์ œ์ด๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ, ๋ฐ์ดํ„ฐ ํŠน์ง•์˜ ๊ฐœ์ˆ˜๊ฐ€ ์ƒ˜ํ”Œ์˜ ๊ฐœ์ˆ˜๋ณด๋‹ค ๋งŽ์„ ๋•Œ, ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ์˜ ํ•ด์„์„ ๊ฐ€์žฅ ๋‚œํ•ดํ•œ ๊ธฐ๊ณ„ํ•™์Šต ๋ฌธ์ œ๋“ค ์ค‘ ํ•˜๋‚˜๋กœ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ๋ณธ ๋ฐ•์‚ฌํ•™์œ„ ๋…ผ๋ฌธ์€ ๊ธฐ๊ณ„ํ•™์Šต ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•˜์—ฌ ๊ณ ์ฐจ์›์ ์ธ ์ƒ๋ฌผํ•™์  ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์ƒ๋ฌผํ•™์  ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๊ธฐ ์œ„ํ•œ ์ƒˆ๋กœ์šด ์ƒ๋ฌผ์ •๋ณดํ•™ ๋ฐฉ๋ฒ•๋“ค์„ ๊ณ ์•ˆํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ ์—ฐ๊ตฌ๋Š” DNA ์„œ์—ด์„ ํ™œ์šฉํ•˜์—ฌ ์ข… ๊ฐ„ ๋น„๊ต์™€ ๋™์‹œ์— DNA ์„œ์—ด์ƒ์— ์žˆ๋Š” ๋‹ค์–‘ํ•œ ์ง€์—ญ์— ๋‹ด๊ธด ์ƒ๋ฌผํ•™์  ์ •๋ณด๋ฅผ ์œ ์ „์  ๊ด€์ ์—์„œ ํ•ด์„ํ•ด๋ณด๊ณ ์ž ํ•˜์˜€๋‹ค. ์ด๋ฅผ ์œ„ํ•ด, ์ˆœ์œ„ ๊ธฐ๋ฐ˜ k ๋‹จ์–ด ๋ฌธ์ž์—ด ๋น„๊ต๋ฐฉ๋ฒ•, RKSS ์ปค๋„์„ ๊ฐœ๋ฐœํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๊ฒŒ๋†ˆ ์ƒ์˜ ์ง€์—ญ์—์„œ ์—ฌ๋Ÿฌ ์ข… ๊ฐ„ ๋น„๊ต ์‹คํ—˜์„ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค. RKSS ์ปค๋„์€ ๊ธฐ์กด์˜ k ๋‹จ์–ด ๋ฌธ์ž์—ด ์ปค๋„์„ ํ™•์žฅํ•œ ๊ฒƒ์œผ๋กœ, k ๊ธธ์ด ๋‹จ์–ด์˜ ์ˆœ์œ„ ์ •๋ณด์™€ ์ข… ๊ฐ„ ๊ณตํ†ต์ ์„ ํ‘œํ˜„ํ•˜๋Š” ๋น„๊ต๊ธฐ์ค€์  ๊ฐœ๋…์„ ํ™œ์šฉํ•˜์˜€๋‹ค. k ๋‹จ์–ด ๋ฌธ์ž์—ด ์ปค๋„์€ k์˜ ๊ธธ์ด์— ๋”ฐ๋ผ ๋‹จ์–ด ์ˆ˜๊ฐ€ ๊ธ‰์ฆํ•˜์ง€๋งŒ, ๋น„๊ต๊ธฐ์ค€์ ์€ ๊ทน์†Œ์ˆ˜์˜ ๋‹จ์–ด๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์œผ๋ฏ€๋กœ ์„œ์—ด ๊ฐ„ ์œ ์‚ฌ๋„๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ๊ณ„์‚ฐ๋Ÿ‰์„ ํšจ์œจ์ ์œผ๋กœ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค. ๊ฒŒ๋†ˆ ์ƒ์˜ ์„ธ ์ง€์—ญ์— ๋Œ€ํ•ด์„œ ์‹คํ—˜์„ ์ง„ํ–‰ํ•œ ๊ฒฐ๊ณผ, RKSS ์ปค๋„์€ ๊ธฐ์กด์˜ ์ปค๋„์— ๋น„ํ•ด ์ข… ๊ฐ„ ์œ ์‚ฌ๋„ ๋ฐ ์ฐจ์ด๋ฅผ ํšจ์œจ์ ์œผ๋กœ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋˜ํ•œ, RKSS ์ปค๋„์€ ์‹คํ—˜์— ์‚ฌ์šฉ๋œ ์ƒ๋ฌผํ•™์  ์ง€์—ญ์— ํฌํ•จ๋œ ์ƒ๋ฌผํ•™์  ์ •๋ณด๋Ÿ‰ ์ฐจ์ด๋ฅผ ์ƒ๋ฌผํ•™์  ์ง€์‹๊ณผ ๋ถ€ํ•ฉ๋˜๋Š” ์ˆœ์„œ๋กœ ๋น„๊ตํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ ์—ฐ๊ตฌ๋Š” ์ƒ๋ฌผํ•™์  ๋„คํŠธ์›Œํฌ๋ฅผ ํ†ตํ•ด ๋ณต์žกํ•˜๊ฒŒ ์–ฝํžŒ ์œ ์ „์ž ์ƒํ˜ธ์ž‘์šฉ ๊ฐ„ ์ •๋ณด๋ฅผ ํ•ด์„ํ•˜์—ฌ, ๋” ๋‚˜์•„๊ฐ€ ์ƒ๋ฌผํ•™์  ๊ธฐ๋Šฅ ํ•ด์„์„ ํ†ตํ•ด ์•”์˜ ์•„ํ˜•์„ ๋ถ„๋ฅ˜ํ•˜๊ณ ์ž ํ•˜์˜€๋‹ค. ์ด๋ฅผ ์œ„ํ•ด, ๊ทธ๋ž˜ํ”„ ์ปจ๋ณผ๋ฃจ์…˜ ๋„คํŠธ์›Œํฌ์™€ ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํ™œ์šฉํ•˜์—ฌ ํŒจ์Šค์›จ์ด ๊ธฐ๋ฐ˜ ํ•ด์„ ๊ฐ€๋Šฅํ•œ ์•” ์•„ํ˜• ๋ถ„๋ฅ˜ ๋ชจ๋ธ(GCN+MAE)์„ ๊ณ ์•ˆํ•˜์˜€๋‹ค. ๊ทธ๋ž˜ํ”„ ์ปจ๋ณผ๋ฃจ์…˜ ๋„คํŠธ์›Œํฌ๋ฅผ ํ†ตํ•ด์„œ ์ƒ๋ฌผํ•™์  ์‚ฌ์ „ ์ง€์‹์ธ ํŒจ์Šค์›จ์ด ์ •๋ณด๋ฅผ ํ•™์Šตํ•˜์—ฌ ๋ณต์žกํ•œ ์œ ์ „์ž ์ƒํ˜ธ์ž‘์šฉ ์ •๋ณด๋ฅผ ํšจ์œจ์ ์œผ๋กœ ๋‹ค๋ฃจ์—ˆ๋‹ค. ๋˜ํ•œ, ์—ฌ๋Ÿฌ ํŒจ์Šค์›จ์ด ์ •๋ณด๋ฅผ ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํ†ตํ•ด ํ•ด์„ ๊ฐ€๋Šฅํ•œ ์ˆ˜์ค€์œผ๋กœ ๋ณ‘ํ•ฉํ•˜์˜€๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ํ•™์Šตํ•œ ํŒจ์Šค์›จ์ด ๋ ˆ๋ฒจ ์ •๋ณด๋ฅผ ๋ณด๋‹ค ๋ณต์žกํ•˜๊ณ  ๋‹ค์–‘ํ•œ ์œ ์ „์ž ๋ ˆ๋ฒจ๋กœ ํšจ์œจ์ ์œผ๋กœ ์ „๋‹ฌํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋„คํŠธ์›Œํฌ ์ „ํŒŒ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ™œ์šฉํ•˜์˜€๋‹ค. ๋‹ค์„ฏ ๊ฐœ์˜ ์•” ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด GCN+MAE ๋ชจ๋ธ์„ ์ ์šฉํ•œ ๊ฒฐ๊ณผ, ๊ธฐ์กด์˜ ์•” ์•„ํ˜• ๋ถ„๋ฅ˜ ๋ชจ๋ธ๋“ค๋ณด๋‹ค ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€์œผ๋ฉฐ ์•” ์•„ํ˜• ํŠน์ด์ ์ธ ํŒจ์Šค์›จ์ด ๋ฐ ์ƒ๋ฌผํ•™์  ๊ธฐ๋Šฅ์„ ๋ฐœ๊ตดํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ์„ธ ๋ฒˆ์งธ ์—ฐ๊ตฌ๋Š” ํŒจ์Šค์›จ์ด๋กœ๋ถ€ํ„ฐ ์„œ๋ธŒ ํŒจ์Šค์›จ์ด/๋„คํŠธ์›Œํฌ๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•œ ์—ฐ๊ตฌ๋‹ค. ํŒจ์Šค์›จ์ด๋‚˜ ์ƒ๋ฌผํ•™์  ๋„คํŠธ์›Œํฌ์— ๋‹จ์ผ ์ƒ๋ฌผํ•™์  ๊ธฐ๋Šฅ์ด ์•„๋‹ˆ๋ผ ๋‹ค์–‘ํ•œ ์ƒ๋ฌผํ•™์  ๊ธฐ๋Šฅ์ด ํฌํ•จ๋˜์–ด ์žˆ์Œ์— ์ฃผ๋ชฉํ•˜์˜€๋‹ค. ๋‹จ์ผ ๊ธฐ๋Šฅ์„ ์ง€๋‹Œ ์œ ์ „์ž ์กฐํ•ฉ์„ ์ฐพ๊ธฐ ์œ„ํ•ด์„œ ์ƒ๋ฌผํ•™์  ๋„คํŠธ์›Œํฌ์ƒ์—์„œ ์กฐ๊ฑด ํŠน์ด์ ์ธ ์œ ์ „์ž ๋ชจ๋“ˆ์„ ์ฐพ๊ณ ์ž ํ•˜์˜€์œผ๋ฉฐ MIDAS๋ผ๋Š” ๋„๊ตฌ๋ฅผ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ํŒจ์Šค์›จ์ด๋กœ๋ถ€ํ„ฐ ์œ ์ „์ž ์ƒํ˜ธ์ž‘์šฉ ๊ฐ„ ํ™œ์„ฑ๋„๋ฅผ ์œ ์ „์ž ๋ฐœํ˜„๋Ÿ‰๊ณผ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ๋ฅผ ํ†ตํ•ด ๊ณ„์‚ฐํ•˜์˜€๋‹ค. ๊ณ„์‚ฐ๋œ ํ™œ์„ฑ๋„๋“ค์„ ํ™œ์šฉํ•˜์—ฌ ๋‹ค์ค‘ ํด๋ž˜์Šค์—์„œ ์„œ๋กœ ๋‹ค๋ฅด๊ฒŒ ํ™œ์„ฑํ™”๋œ ์„œ๋ธŒ ํŒจ์Šค๋“ค์„ ํ†ต๊ณ„์  ๊ธฐ๋ฒ•์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ๋ฐœ๊ตดํ•˜์˜€๋‹ค. ๋˜ํ•œ, ์–ดํ…์…˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜๊ณผ ๊ทธ๋ž˜ํ”„ ์ปจ๋ณผ๋ฃจ์…˜ ๋„คํŠธ์›Œํฌ๋ฅผ ํ†ตํ•ด์„œ ํ•ด๋‹น ์—ฐ๊ตฌ๋ฅผ ํŒจ์Šค์›จ์ด๋ณด๋‹ค ๋” ํฐ ์ƒ๋ฌผํ•™์  ๋„คํŠธ์›Œํฌ์— ํ™•์žฅํ•˜๋ ค๊ณ  ์‹œ๋„ํ•˜์˜€๋‹ค. ์œ ๋ฐฉ์•” ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ์‹คํ—˜์„ ์ง„ํ–‰ํ•œ ๊ฒฐ๊ณผ, MIDAS์™€ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์„ ๋‹ค์ค‘ ํด๋ž˜์Šค์—์„œ ์ฐจ์ด๊ฐ€ ๋‚˜๋Š” ์œ ์ „์ž ๋ชจ๋“ˆ์„ ํšจ๊ณผ์ ์œผ๋กœ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๊ฒฐ๋ก ์ ์œผ๋กœ, ๋ณธ ๋ฐ•์‚ฌํ•™์œ„ ๋…ผ๋ฌธ์€ DNA ์„œ์—ด์— ๋‹ด๊ธด ์ง„ํ™”์  ์ •๋ณด๋Ÿ‰ ๋น„๊ต, ํŒจ์Šค์›จ์ด ๊ธฐ๋ฐ˜ ์•” ์•„ํ˜• ๋ถ„๋ฅ˜, ์กฐ๊ฑด ํŠน์ด์ ์ธ ์œ ์ „์ž ๋ชจ๋“ˆ ๋ฐœ๊ตด์„ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๊ธฐ๊ณ„ํ•™์Šต ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค.Phenotypic differences among organisms are mainly due to the difference in genetic information. As a result of genetic information modification, an organism may evolve into a different species and patients with the same disease may have different prognosis. This important biological information can be observed in the form of various omics data using high throughput instrument technologies such as sequencing instruments. However, interpretation of such omics data is challenging since omics data is with very high dimensions but with relatively small number of samples. Typically, the number of dimensions is higher than the number of samples, which makes the interpretation of omics data one of the most challenging machine learning problems. My doctoral study aims to develop new bioinformatics methods for decoding information in these high dimensional data by utilizing machine learning algorithms. The first study is to analyze the difference in the amount of information between different regions of the DNA sequence. To achieve the goal, a ranked-based k-spectrum string kernel, RKSS kernel, is developed for comparative and evolutionary comparison of various genomic region sequences among multiple species. RKSS kernel extends the existing k-spectrum string kernel by utilizing rank information of k-mers and landmarks of k-mers that represents a species. By using a landmark as a reference point for comparison, the number of k-mers needed to calculating sequence similarities is dramatically reduced. In the experiments on three different genomic regions, RKSS kernel captured more reliable distances between species according to genetic information contents of the target region. Also, RKSS kernel was able to rearrange each region to match a biological common insight. The second study aims to efficiently decode complex genetic interactions using biological networks and, then, to classify cancer subtypes by interpreting biological functions. To achieve the goal, a pathway-based deep learning model using graph convolutional network and multi-attention based ensemble (GCN+MAE) for cancer subtype classification is developed. In order to efficiently reduce the relationships between genes using pathway information, GCN+MAE is designed as an explainable deep learning structure using graph convolutional network and attention mechanism. Extracted pathway-level information of cancer subtypes is transported into gene-level again by network propagation. In the experiments of five cancer data sets, GCN+MAE showed better cancer subtype classification performances and captured subtype-specific pathways and their biological functions. The third study is to identify sub-networks of a biological pathway. The goal is to dissect a biological pathway into multiple sub-networks, each of which is to be of a single functional unit. To achieve the goal, a condition-specific sub-module detection method in a biological network, MIDAS (MIning Differentially Activated Subpaths) is developed. From the pathway, edge activities are measured by explicit gene expression and network topology. Using the activities, differentially activated subpaths are explored by a statistical approach. Also, by extending this idea on graph convolutional network, different sub-networks are highlighted by attention mechanisms. In the experiment with breast cancer data, MIDAS and the deep learning model successfully decomposed gene-level features into sub-modules of single functions. In summary, my doctoral study proposes new computational methods to compare genomic DNA sequences as information contents, to model pathway-based cancer subtype classifications and regulations, and to identify condition-specific sub-modules among multiple cancer subtypes.Chapter 1 Introduction 1 1.1 Biological questions with genetic information 2 1.1.1 Biological Sequences 2 1.1.2 Gene expression 2 1.2 Formulating computational problems for the biological questions 3 1.2.1 Decoding biological sequences by k-mer vectors 3 1.2.2 Interpretation of complex relationships between genes 7 1.3 Three computational problems for the biological questions 9 1.4 Outline of the thesis 14 Chapter 2 Ranked k-spectrum kernel for comparative and evolutionary comparison of DNA sequences 15 2.1 Motivation 16 2.1.1 String kernel for sequence comparison 17 2.1.2 Approach: RKSS kernel 19 2.2 Methods 21 2.2.1 Mapping biological sequences to k-mer space: the k-spectrum string kernel 23 2.2.2 The ranked k-spectrum string kernel with a landmark 24 2.2.3 Single landmark-based reconstruction of phylogenetic tree 27 2.2.4 Multiple landmark-based distance comparison of exons, introns, CpG islands 29 2.2.5 Sequence Data for analysis 30 2.3 Results 31 2.3.1 Reconstruction of phylogenetic tree on the exons, introns, and CpG islands 31 2.3.2 Landmark space captures the characteristics of three genomic regions 38 2.3.3 Cross-evaluation of the landmark-based feature space 45 Chapter 3 Pathway-based cancer subtype classification and interpretation by attention mechanism and network propagation 46 3.1 Motivation 47 3.2 Methods 52 3.2.1 Encoding biological prior knowledge using Graph Convolutional Network 52 3.2.2 Re-producing comprehensive biological process by Multi-Attention based Ensemble 53 3.2.3 Linking pathways and transcription factors by network propagation with permutation-based normalization 55 3.3 Results 58 3.3.1 Pathway database and cancer data set 58 3.3.2 Evaluation of individual GCN pathway models 60 3.3.3 Performance of ensemble of GCN pathway models with multi-attention 60 3.3.4 Identification of TFs as regulator of pathways and GO term analysis of TF target genes 67 Chapter 4 Detecting sub-modules in biological networks with gene expression by statistical approach and graph convolutional network 70 4.1 Motivation 70 4.1.1 Pathway based analysis of transcriptome data 71 4.1.2 Challenges and Summary of Approach 74 4.2 Methods 78 4.2.1 Convert single KEGG pathway to directed graph 79 4.2.2 Calculate edge activity for each sample 79 4.2.3 Mining differentially activated subpath among classes 80 4.2.4 Prioritizing subpaths by the permutation test 82 4.2.5 Extension: graph convolutional network and class activation map 83 4.3 Results 84 4.3.1 Identifying 36 subtype specific subpaths in breast cancer 86 4.3.2 Subpath activities have a good discrimination power for cancer subtype classification 88 4.3.3 Subpath activities have a good prognostic power for survival outcomes 90 4.3.4 Comparison with an existing tool, PATHOME 91 4.3.5 Extension: detection of subnetwork on PPI network 98 Chapter 5 Conclusions 101 ๊ตญ๋ฌธ์ดˆ๋ก 127Docto

    Designing labeled graph classifiers by exploiting the R\'enyi entropy of the dissimilarity representation

    Full text link
    Representing patterns as labeled graphs is becoming increasingly common in the broad field of computational intelligence. Accordingly, a wide repertoire of pattern recognition tools, such as classifiers and knowledge discovery procedures, are nowadays available and tested for various datasets of labeled graphs. However, the design of effective learning procedures operating in the space of labeled graphs is still a challenging problem, especially from the computational complexity viewpoint. In this paper, we present a major improvement of a general-purpose classifier for graphs, which is conceived on an interplay between dissimilarity representation, clustering, information-theoretic techniques, and evolutionary optimization algorithms. The improvement focuses on a specific key subroutine devised to compress the input data. We prove different theorems which are fundamental to the setting of the parameters controlling such a compression operation. We demonstrate the effectiveness of the resulting classifier by benchmarking the developed variants on well-known datasets of labeled graphs, considering as distinct performance indicators the classification accuracy, computing time, and parsimony in terms of structural complexity of the synthesized classification models. The results show state-of-the-art standards in terms of test set accuracy and a considerable speed-up for what concerns the computing time.Comment: Revised versio
    • โ€ฆ
    corecore