11 research outputs found

    Learning mixed graphical models with separate sparsity parameters and stability-based model selection

    Get PDF
    Background: Mixed graphical models (MGMs) are graphical models learned over a combination of continuous and discrete variables. Mixed variable types are common in biomedical datasets. MGMs consist of a parameterized joint probability density, which implies a network structure over these heterogeneous variables. The network structure reveals direct associations between the variables and the joint probability density allows one to ask arbitrary probabilistic questions on the data. This information can be used for feature selection, classification and other important tasks. Results: We studied the properties of MGM learning and applications of MGMs to high-dimensional data (biological and simulated). Our results show that MGMs reliably uncover the underlying graph structure, and when used for classification, their performance is comparable to popular discriminative methods (lasso regression and support vector machines). We also show that imposing separate sparsity penalties for edges connecting different types of variables significantly improves edge recovery performance. To choose these sparsity parameters, we propose a new efficient model selection method, named Stable Edge-specific Penalty Selection (StEPS). StEPS is an expansion of an earlier method, StARS, to mixed variable types. In terms of edge recovery, StEPS selected MGMs outperform those models selected using standard techniques, including AIC, BIC and cross-validation. In addition, we use a heuristic search that is linear in size of the sparsity value search space as opposed to the cubic grid search required by other model selection methods. We applied our method to clinical and mRNA expression data from the Lung Genomics Research Consortium (LGRC) and the learned MGM correctly recovered connections between the diagnosis of obstructive or interstitial lung disease, two diagnostic breathing tests, and cigarette smoking history. Our model also suggested biologically relevant mRNA markers that are linked to these three clinical variables. Conclusions: MGMs are able to accurately recover dependencies between sets of continuous and discrete variables in both simulated and biomedical datasets. Separation of sparsity penalties by edge type is essential for accurate network edge recovery. Furthermore, our stability based method for model selection determines sparsity parameters faster and more accurately (in terms of edge recovery) than other model selection methods. With the ongoing availability of comprehensive clinical and biomedical datasets, MGMs are expected to become a valuable tool for investigating disease mechanisms and answering an array of critical healthcare questions

    Implementation of Lesson Study in Learning Process: A Study of Biology Student Learning Activities

    Get PDF
    Prospective student educators can use Lesson Study to improve their ability to plan and design learning. However, understanding of the implementation of Lesson Study in the learning process by 1st semester students at the Postgraduate Biology Education Study Program, Universitas Negeri Malang are still lacking. Thus, a study or research is necessary for the implementation of Lesson Study by students. This study aims to determine student learning activities through the application of Lesson Study and as a form of training for students to transform their knowledge as prospective educators. Descriptive qualitative research is the type of research conducted here. There were three stages to this research, namely planning, implementing, and reflecting. The data obtained from this study are the implementation of Lesson Study activities with the application of the Jigsaw type cooperative learning model, the implementation of Lesson Study activities with the application of the STAD cooperative learning model combined with Snowball Throwing, student learning activities, and the results of reflection on the implementation of Lesson Study. According to the study's findings, Lesson Study can accurately reflect student activities and its implementation in the learning process is also able to provide prospective educators with an understanding of pedagogical competence

    Identification and comparison of pandemic-to-symptom networks of South Korea and the United States

    Get PDF
    BackgroundThe Coronavirus (COVID-19) pandemic resulted in a dramatic increase in the prevalence of anxiety and depression globally. Although the impact on the mental health of young adults was especially strong, its underlying mechanisms remain elusive.Materials and methodsUsing a network approach, the present study investigated the putative pathways between pandemic-related factors and anxiety and depressive symptoms among young adults in South Korea and the U.S. Network analyses were conducted on cross-country data collected during the COVID-19 lockdown period (nโ€‰=โ€‰1,036). Our model included depression symptoms (PHQ-9), generalized anxiety symptoms (GAD-7), and COVID-19-related factors (e.g., COVID-19-related traumatic stress, pandemic concerns, access to medical/mental health services).ResultsThe overall structure of pandemic-to-symptom networks of South Korea and the U.S. were found to be similar. In both countries, COVID-related stress and negative future anticipation (an anxiety symptom) were identified as bridging nodes between pandemic-related factors and psychological distress. In addition, worry-related symptoms (e.g., excessive worry, uncontrollable worry) were identified as key contributors in maintaining the overall pandemic-to-symptom network in both countries.ConclusionThe similar network structures and patterns observed in both countries imply that there may exist a stable relationship between the pandemic and internalizing symptoms above and beyond the sociocultural differences. The current findings provide new insights into the common potential pathway between the pandemic and internalizing symptoms in South Korea and in the U.S. and inform policymakers and mental health professionals of potential intervention targets to alleviate internalizing symptoms

    ๋งˆ๋ฅด์ฝ”ํ”„ ๋žœ๋ค ํ•„๋“œ ๋ชจํ˜•์„ ์ด์šฉํ•œ 2๊ฐœ ์ง‘๋‹จ์˜ ํ˜ผํ•ฉ ๊ทธ๋ž˜ํ”„ ๋ชจํ˜• ์ถ”์ • ๋ฐ ์ ์šฉ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ์ž์—ฐ๊ณผํ•™๋Œ€ํ•™ ํ˜‘๋™๊ณผ์ • ์ƒ๋ฌผ์ •๋ณดํ•™์ „๊ณต, 2022. 8. ์›์„ฑํ˜ธ.Background Large datasets with a huge number of variables or subjects, such as multi-omics data, have been widely generated recently. Many of these datasets are mixed type including both numeric and categorical variables, which makes their analyses difficult. In some studies, the networks underlying the large dataset may be of interest. There have been several methods that are suggested for the inference of the networks, but most of them can be used only for a single type of data or single class cases. Objective The objective of the study is to develop and propose a new method, named fused MGM (FMGM), that infers network structures underlying mixed data in 2 groups, with assumptions that both the networks and the differences are sparse. Also, statistical analyses including the proposed method were conducted to find biological markers of the atopic dermatitis (AD) and underlying network structures from multi-omics data of 6-month-old infants. Methods For FMGM, the statistical models of the networks are based on pairwise Markov random field model, and the penalty functions implement the main assumption that the networks in 2 groups and their differences are sparse. Fast proximal gradient method (PGM) was used for the optimization of the target function. The extension of FMGM that allows the inclusion of prior knowledges, named prior-induced FMGM (piFMGM), was also developed. The performance of the method was measured with synthetic datasets that simulate power-law network structures. Also, the multi-omics profiles of 6-month-old infants were analyzed. The profiles include host gene transcriptome (N=199), intestinal microbial compositions (N=197), and predicted intestinal microbial functions (N=98; 84 in common). For the analysis, differential analysis with limma and network inference with FMGM were applied. Results From the analysis of simulated 2-class datasets, generated from simulated scale-free networks, FMGM showed superior performances especially in terms of F1-scores compared to the previous method inferring the networks one by one (0.392 & 0.546). FMGM performed better not only in inferring the differences (0.217 & 0.410), but also in inferring the networks (0.492 & 0.572). Utilizing prior information with piFMGM obtained slightly better F1-scores from the inference of networks (0.572 & 0.589), and from the inference of the difference (0.410 & 0.423). As a result, the overall performance showed slight improvement (0.546 & 0.562). From the inference of networks from 6-month-old infantsโ€™ AD data, 10 pairs of variables were shown to have different correlations by disease statuses, including host expression of LINC01036 and MIR4788 and abundance of microbial genes related to carotenoid biosynthesis and RNA degradation. Conclusions The proposed method, FMGM inferred the network structures in 2 classes better than the previous method. Inclusion of prior information in piFMGM may be useful in more accurate inference of networks, but since the change was subtle, additional studies may be conducted to improve it. Network inference revealed several markers of AD such as microbial genes related to carotenoid biosynthesis and RNA degradation, suggesting a number of possible underlying metabolisms related to AD such as oxidative stress and microbial RNA balance.์—ฐ๊ตฌ ๋ฐฐ๊ฒฝ ์ตœ๊ทผ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ์ž๋ฃŒ์™€ ๊ฐ™์ด ๋‹ค์ˆ˜์˜ ๋ณ€์ˆ˜ ํ˜น์€ ๊ด€์ฐฐ์„ ํฌํ•จํ•˜๋Š” ๋Œ€์šฉ๋Ÿ‰ ์ž๋ฃŒ๊ฐ€ ๊ด‘๋ฒ”์œ„ํ•˜๊ฒŒ ์ƒ์‚ฐ๋˜๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ž๋ฃŒ๋Š” ์—ฐ์†ํ˜• ๋ฐ ์ด์‚ฐํ˜• ๋ณ€์ˆ˜๋ฅผ ๋ชจ๋‘ ํฌํ•จํ•˜๋Š” ํ˜ผํ•ฉํ˜• ์ž๋ฃŒ์ธ ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์œผ๋ฉฐ, ์ด๋Š” ์ž๋ฃŒ์˜ ํ†ต๊ณ„์  ๋ถ„์„์„ ์–ด๋ ต๊ฒŒ ํ•œ๋‹ค. ํŠนํžˆ ๊ธฐ์ € ๋„คํŠธ์›Œํฌ ์ถ”๋ก ์˜ ๊ฒฝ์šฐ, ๊ทธ๊ฐ„ ๋ช‡๋ช‡ ํ†ต๊ณ„์  ๋ฐฉ๋ฒ•๋“ค์ด ์ œ์‹œ๋˜์–ด ์™”์œผ๋‚˜, ๋Œ€๋ถ€๋ถ„ ๋ณ€์ˆ˜ ์œ ํ˜•์ด ๋‹จ์ผํ•˜๊ฑฐ๋‚˜ ์ง‘๋‹จ์ด ํ•˜๋‚˜์ธ ๊ฒฝ์šฐ์— ๋Œ€ํ•ด์„œ๋งŒ ์ ์šฉ ๊ฐ€๋Šฅํ•˜๋‹ค. ์—ฐ๊ตฌ ๋ชฉ์  ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” 2๊ฐœ ์ง‘๋‹จ์˜ ํ˜ผํ•ฉํ˜• ์ž๋ฃŒ๋กœ๋ถ€ํ„ฐ ๊ธฐ์ € ๋„คํŠธ์›Œํฌ๋ฅผ ์ถ”๋ก ํ•˜๋Š” ๋ฐฉ๋ฒ•์ธ fused MGM (FMGM)์„ ๊ฐœ๋ฐœํ•˜๊ณ  ์ œ์‹œํ•˜๊ณ ์ž ํ•˜์˜€๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ๋„คํŠธ์›Œํฌ ์ž์ฒด์— ๋”ํ•˜์—ฌ ๊ทธ ์ฐจ์ด ์—ญ์‹œ ์ „์ฒด ์ž๋ฃŒ์— ๋น„ํ•ด ํฌ๋ฐ•ํ•œ ๋ฐ€๋„๋ฅผ ๊ฐ€์ง์„ ๊ฐ€์ •ํ•œ๋‹ค. ๋˜ํ•œ, 6๊ฐœ์›” ์•„๋™์˜ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ์ž๋ฃŒ์— ์ด ๋ฐฉ๋ฒ•์„ ํฌํ•จํ•œ ํ†ต๊ณ„์  ๋ถ„์„ ๋ฐฉ๋ฒ•์„ ์ ์šฉํ•˜์—ฌ, ์•„ํ† ํ”ผ์„ฑ ํ”ผ๋ถ€์—ผ๊ณผ ๊ด€๋ จ๋œ ์ƒ๋ฌผํ•™์  ๋งˆ์ปค ๋ฐ ๊ธฐ์ € ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ๋ฅผ ์ฐพ์•„๋‚ด๊ณ ์ž ํ•˜์˜€๋‹ค. ์—ฐ๊ตฌ ๋ฐฉ๋ฒ• FMGM์€ ์Œ๋ณ„ ๋งˆ๋ฅด์ฝ”ํ”„ ๋žœ๋ค ํ•„๋“œ์— ๊ธฐ๋ฐ˜ํ•œ ํ†ต๊ณ„์  ๋ชจํ˜•์„ ์‚ฌ์šฉํ•˜๋ฉฐ, ๋ฒŒ์  ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ๋„คํŠธ์›Œํฌ ๋ฐ ์ฐจ์ด์˜ ํฌ๋ฐ•ํ•จ์„ ์œ ๋„ํ•œ๋‹ค. ๋ชฉ์ ํ•จ์ˆ˜์˜ ์ตœ์ ํ™”์—๋Š” ๊ณ ์† ๊ทผ์œ„ ๊ฒฝ์‚ฌ๋ฒ•์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ๋˜ํ•œ FMGM์˜ ์ถ”๋ก ์— ์‚ฌ์ „ ์ •๋ณด๋ฅผ ๋„์ž…ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ์‚ฌ์ „ ์ •๋ณด ์œ ๋„ FMGM (piFMGM) ์—ญ์‹œ ๊ฐœ๋ฐœํ•˜์˜€๋‹ค. ์ถ”๋ก  ๋ฐฉ๋ฒ•์˜ ์„ฑ๋Šฅ์€ ์—ญ๋ฒ•์น™ ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ๋ฅผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•œ ํ•ฉ์„ฑ ์ž๋ฃŒ๋ฅผ ํ†ตํ•ด ์ธก์ •ํ•˜์˜€๋‹ค. 6๊ฐœ์›” ์•„๋™์˜ ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ์ •๋ณด ์—ญ์‹œ ๋ถ„์„ํ•˜์˜€์œผ๋ฉฐ, ์˜ค๋ฏน์Šค ์ •๋ณด์—๋Š” ์ˆ™์ฃผ ์œ ์ „์ž ์ „์‚ฌ์ฒด (N=199), ์žฅ๋‚ด ๋ฏธ์ƒ๋ฌผ์ฒด ๊ตฌ์„ฑ (N=197) ๋ฐ ์žฅ๋‚ด ๋ฏธ์ƒ๋ฌผ ๊ธฐ๋Šฅ ์ •๋ณด (N=98)๊ฐ€ ํฌํ•จ๋œ๋‹ค (๊ณตํ†ต ํ‘œ๋ณธ ์ˆ˜ 84). ๋ถ„์„์—๋Š” ์„ ํ˜• ๋ชจํ˜•์„ ํ†ตํ•œ ์ฐจ์ด ๋ถ„์„๊ณผ FMGM์„ ํ†ตํ•œ ๋„คํŠธ์›Œํฌ ์ถ”๋ก ์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ์—ฐ๊ตฌ ๊ฒฐ๊ณผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ํ•œ ๋ฌด์ฒ™๋„ ๋„คํŠธ์›Œํฌ๋กœ๋ถ€ํ„ฐ 2๊ฐœ ์ง‘๋‹จ ์ž๋ฃŒ๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ๋ถ„์„ํ•œ ๊ฒฐ๊ณผ, ๊ฐœ๋ณ„ ์ง‘๋‹จ์— ๋Œ€ํ•ด ๋„คํŠธ์›Œํฌ๋ฅผ ์ถ”๋ก ํ•œ ๊ฒฐ๊ณผ์™€ ๋น„๊ตํ•˜์—ฌ FMGM์ด ๋” ๋†’์€ F1 ์ ์ˆ˜๋ฅผ ๋‚˜ํƒ€๋‚ด์–ด ์„ฑ๋Šฅ์ด ๋” ์šฐ์ˆ˜ํ•จ์„ ๋ณด์˜€๋‹ค (0.392 & 0.546). FMGM์€ ๋„คํŠธ์›Œํฌ ๊ฐ„ ์ฐจ์ด (0.217 & 0.410)๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋„คํŠธ์›Œํฌ ์ž์ฒด์˜ ์ถ”๋ก ์—์„œ๋„ ๋” ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค (0.492 & 0.572). ์‚ฌ์ „ ์ •๋ณด๋ฅผ piFMGM์„ ํ†ตํ•ด ๋„์ž…ํ•œ ๊ฒฝ์šฐ ์ „์ฒด์ ์ธ ์„ฑ๋Šฅ์ด ๋ฏธ์„ธํ•œ ์ฆ๊ฐ€๋ฅผ ๋ณด์˜€๋‹ค (0.546 & 0.562). ๋„คํŠธ์›Œํฌ์˜ ์ถ”๋ก ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ (0.572 & 0.589), ์ฐจ์ด๋ฅผ ์ถ”๋ก ํ•  ๋•Œ์˜ ์„ฑ๋Šฅ ์—ญ์‹œ ์ž‘์€ ์ฆ๊ฐ€์„ธ๋ฅผ ๋„์—ˆ๋‹ค (0.410 & 0.423). 6๊ฐœ์›” ์•„๋™์˜ ์•„ํ† ํ”ผ์„ฑ ํ”ผ๋ถ€์—ผ ์ž๋ฃŒ๋กœ๋ถ€ํ„ฐ ๋„คํŠธ์›Œํฌ ์ถ”๋ก ์„ ์ˆ˜ํ–‰ํ•œ ๊ฒฐ๊ณผ ์ˆ™์ฃผ์˜ LINC01036 ๋ฐ MIR4788 ๋ฐœํ˜„, ์žฅ๋‚ด ๋ฏธ์ƒ๋ฌผ์˜ ์นด๋กœํ‹ฐ๋…ธ์ด๋“œ ์ƒํ•ฉ์„ฑ ๋ฐ RNA ๋ถ„ํ•ด ๊ด€๋ จ ์œ ์ „์ž ๋“ฑ, 10๊ฐœ ๋ณ€์ˆ˜ ์Œ์ด ํ”ผ๋ถ€์—ผ ์—ฌ๋ถ€์— ๋”ฐ๋ฅธ ์ƒ๊ด€์„ฑ ์ฐจ์ด๋ฅผ ๋‚˜ํƒ€๋ƒˆ๋‹ค. ๊ฒฐ๋ก  ๋ณธ ์—ฐ๊ตฌ์—์„œ ์ œ์‹œํ•œ ๋ฐฉ๋ฒ•์ธ FMGM์€ ๊ธฐ์กด ๋ฐฉ๋ฒ•์— ๋น„ํ•ด 2๊ฐœ ์ง‘๋‹จ์˜ ํ˜ผํ•ฉํ˜• ์ž๋ฃŒ์—์„œ ๋„คํŠธ์›Œํฌ๋ฅผ ์ถ”๋ก ํ•  ๋•Œ ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚˜ํƒ€๋ƒˆ๋‹ค. ์‚ฌ์ „ ์ •๋ณด๋ฅผ piFMGM์„ ํ†ตํ•ด ํฌํ•จ์‹œํ‚ฌ ๊ฒฝ์šฐ ๋„คํŠธ์›Œํฌ ์ถ”๋ก ์˜ ์ •ํ™•์„ฑ์ด ํ–ฅ์ƒ๋˜๋‚˜, ๊ทธ ์ฐจ์ด๊ฐ€ ํฌ์ง€ ์•Š์•„ ์ถ”ํ›„ ์—ฐ๊ตฌ์—์„œ ์ด๋ฅผ ๋ฐœ์ „์‹œํ‚ค๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์ด ํ•„์š”ํ•  ๊ฒƒ์œผ๋กœ ๋ณด์ธ๋‹ค. ๋‹ค์ค‘ ์˜ค๋ฏน์Šค ์ž๋ฃŒ์˜ ๋„คํŠธ์›Œํฌ ์ถ”๋ก  ๋ถ„์„์„ ํ†ตํ•ด ์žฅ๋‚ด ๋ฏธ์ƒ๋ฌผ์˜ ์นด๋กœํ‹ฐ๋…ธ์ด๋“œ ์ƒํ•ฉ์„ฑ ๋˜๋Š” RNA ๋ถ„ํ•ด ๊ด€๋ จ ์œ ์ „์ž ๋“ฑ ์•„ํ† ํ”ผ์„ฑ ํ”ผ๋ถ€์—ผ๊ณผ ๊ด€๋ จ๋œ ์ƒ๋ฌผํ•™์  ๋งˆ์ปค๋ฅผ ๋ณต์ˆ˜ ๋ฐœ๊ฒฌํ•˜์˜€์œผ๋ฉฐ, ์ด๋Š” ์•„ํ† ํ”ผ์„ฑ ํ”ผ๋ถ€์—ผ์˜ ๊ธฐ์ €์— ์‚ฐํ™” ์ŠคํŠธ๋ ˆ์Šค ๋˜๋Š” ๋ฏธ์ƒ๋ฌผ RNA ์กฐ์ ˆ ๋“ฑ์ด ๊ด€๋ จ๋  ์ˆ˜ ์žˆ์Œ์„ ์ œ์‹œํ•œ๋‹ค.Chapter 1. Introduction 1 1.1 Study Background 1 1.2 Prior Works 2 1.3 Purpose of Research 5 Chapter 2. Network Inference of 2-class Mixed Data 6 2.1 Introduction 6 2.2 Notations 8 2.3 Model Formulation 8 2.4 Optimization with Fast Proximal Gradient Method 12 2.5 Code Implementation 20 2.6 Simulated Data Analysis 20 2.7 Real Data Analysis: DNA Methylation Data 23 2.8 Discussion 26 Chapter 3. Integration of Prior Information for Network Inference 28 3.1 Introduction 29 3.2 Use of Separate Parameter for Prior Information 29 3.3 Determination of Regularization Parameters 30 3.4 Simulated Data Analysis 33 3.5 Real Data Analysis: Multi-Omics Data from Asthma Patients 35 3.6 Discussion 38 Chapter 4. Multi-Omics Data Analysis of Atopic Dermatitis (AD) 39 4.1 Background 39 4.2 Data Description 40 4.3 Statistical Analysis 43 4.4 Results 43 4.5 Discussion 45 Chapter 5. Conclusion 47 Appendix 49 Bibliography 53 Abstract in Korean 59๋ฐ•

    Graphical models for de novo and pathway-based network prediction over multi-modal high-throughput biological data

    Get PDF
    It is now a standard practice in the study of complex disease to perform many high-throughput -omic experiments (genome wide SNP, copy number, mRNA and miRNA expression) on the same set of patient samples. These multi-modal data should allow researchers to form a more complete, systems-level picture of a sample, but this is only possible if they have a suitable model for integrating the data. Due to the variety of data modalities and possible combinations of data, general, flexible integration methods that will be widely applicable in many settings are desirable. In this dissertation I will present my work using graphical models for de novo structure learning of both undirected and directed sparse graphs over a mixture of Gaussian and categorical variables. Using synthetic and biological data I will show that these models are useful for both variable selection and inference. Selecting the regularization parameters is an important challenge for these models so I will also cover stability based methods for efficiently setting these parameters, and for controlling the false discovery rate of edge predictions. I will also show results from a biological application to data from metastatic melanoma patients where our methods identified a PARP1 slice site variant that is predictive of response to chemotherapy. Finally, I present work incorporating miRNA into a pathway based graphical model called PARADIGM. This extension of the model allows us to study patient-specific changes in miRNA induced silencing in cancer
    corecore