11 research outputs found
Learning mixed graphical models with separate sparsity parameters and stability-based model selection
Background: Mixed graphical models (MGMs) are graphical models learned over a combination of continuous and discrete variables. Mixed variable types are common in biomedical datasets. MGMs consist of a parameterized joint probability density, which implies a network structure over these heterogeneous variables. The network structure reveals direct associations between the variables and the joint probability density allows one to ask arbitrary probabilistic questions on the data. This information can be used for feature selection, classification and other important tasks. Results: We studied the properties of MGM learning and applications of MGMs to high-dimensional data (biological and simulated). Our results show that MGMs reliably uncover the underlying graph structure, and when used for classification, their performance is comparable to popular discriminative methods (lasso regression and support vector machines). We also show that imposing separate sparsity penalties for edges connecting different types of variables significantly improves edge recovery performance. To choose these sparsity parameters, we propose a new efficient model selection method, named Stable Edge-specific Penalty Selection (StEPS). StEPS is an expansion of an earlier method, StARS, to mixed variable types. In terms of edge recovery, StEPS selected MGMs outperform those models selected using standard techniques, including AIC, BIC and cross-validation. In addition, we use a heuristic search that is linear in size of the sparsity value search space as opposed to the cubic grid search required by other model selection methods. We applied our method to clinical and mRNA expression data from the Lung Genomics Research Consortium (LGRC) and the learned MGM correctly recovered connections between the diagnosis of obstructive or interstitial lung disease, two diagnostic breathing tests, and cigarette smoking history. Our model also suggested biologically relevant mRNA markers that are linked to these three clinical variables. Conclusions: MGMs are able to accurately recover dependencies between sets of continuous and discrete variables in both simulated and biomedical datasets. Separation of sparsity penalties by edge type is essential for accurate network edge recovery. Furthermore, our stability based method for model selection determines sparsity parameters faster and more accurately (in terms of edge recovery) than other model selection methods. With the ongoing availability of comprehensive clinical and biomedical datasets, MGMs are expected to become a valuable tool for investigating disease mechanisms and answering an array of critical healthcare questions
Implementation of Lesson Study in Learning Process: A Study of Biology Student Learning Activities
Prospective student educators can use Lesson Study to improve their ability to plan and design learning. However, understanding of the implementation of Lesson Study in the learning process by 1st semester students at the Postgraduate Biology Education Study Program, Universitas Negeri Malang are still lacking. Thus, a study or research is necessary for the implementation of Lesson Study by students. This study aims to determine student learning activities through the application of Lesson Study and as a form of training for students to transform their knowledge as prospective educators. Descriptive qualitative research is the type of research conducted here. There were three stages to this research, namely planning, implementing, and reflecting. The data obtained from this study are the implementation of Lesson Study activities with the application of the Jigsaw type cooperative learning model, the implementation of Lesson Study activities with the application of the STAD cooperative learning model combined with Snowball Throwing, student learning activities, and the results of reflection on the implementation of Lesson Study. According to the study's findings, Lesson Study can accurately reflect student activities and its implementation in the learning process is also able to provide prospective educators with an understanding of pedagogical competence
Identification and comparison of pandemic-to-symptom networks of South Korea and the United States
BackgroundThe Coronavirus (COVID-19) pandemic resulted in a dramatic increase in the prevalence of anxiety and depression globally. Although the impact on the mental health of young adults was especially strong, its underlying mechanisms remain elusive.Materials and methodsUsing a network approach, the present study investigated the putative pathways between pandemic-related factors and anxiety and depressive symptoms among young adults in South Korea and the U.S. Network analyses were conducted on cross-country data collected during the COVID-19 lockdown period (nโ=โ1,036). Our model included depression symptoms (PHQ-9), generalized anxiety symptoms (GAD-7), and COVID-19-related factors (e.g., COVID-19-related traumatic stress, pandemic concerns, access to medical/mental health services).ResultsThe overall structure of pandemic-to-symptom networks of South Korea and the U.S. were found to be similar. In both countries, COVID-related stress and negative future anticipation (an anxiety symptom) were identified as bridging nodes between pandemic-related factors and psychological distress. In addition, worry-related symptoms (e.g., excessive worry, uncontrollable worry) were identified as key contributors in maintaining the overall pandemic-to-symptom network in both countries.ConclusionThe similar network structures and patterns observed in both countries imply that there may exist a stable relationship between the pandemic and internalizing symptoms above and beyond the sociocultural differences. The current findings provide new insights into the common potential pathway between the pandemic and internalizing symptoms in South Korea and in the U.S. and inform policymakers and mental health professionals of potential intervention targets to alleviate internalizing symptoms
Recommended from our members
Distinct COPD subtypes in former smokers revealed by gene network perturbation analysis
Background
Chronic obstructive pulmonary disease (COPD) varies significantly in symptomatic and physiologic presentation. Identifying disease subtypes from molecular data, collected from easily accessible blood samples, can help stratify patients and guide disease management and treatment.
Methods
Blood gene expression measured by RNA-sequencing in the COPDGene Study was analyzed using a network perturbation analysis method. Each COPD sample was compared against a learned reference gene network to determine the part that is deregulated. Gene deregulation values were used to cluster the disease samples.
Results
The discovery set included 617 former smokers from COPDGene. Four distinct gene network subtypes are identified with significant differences in symptoms, exercise capacity and mortality. These clusters do not necessarily correspond with the levels of lung function impairment and are independently validated in two external cohorts: 769 former smokers from COPDGene and 431 former smokers in the Multi-Ethnic Study of Atherosclerosis (MESA). Additionally, we identify several genes that are significantly deregulated across these subtypes, including DSP and GSTM1, which have been previously associated with COPD through genome-wide association study (GWAS).
Conclusions
The identified subtypes differ in mortality and in their clinical and functional characteristics, underlining the need for multi-dimensional assessment potentially supplemented by selected markers of gene expression. The subtypes were consistent across cohorts and could be used for new patient stratification and disease prognosis
๋ง๋ฅด์ฝํ ๋๋ค ํ๋ ๋ชจํ์ ์ด์ฉํ 2๊ฐ ์ง๋จ์ ํผํฉ ๊ทธ๋ํ ๋ชจํ ์ถ์ ๋ฐ ์ ์ฉ
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ) -- ์์ธ๋ํ๊ต๋ํ์ : ์์ฐ๊ณผํ๋ํ ํ๋๊ณผ์ ์๋ฌผ์ ๋ณดํ์ ๊ณต, 2022. 8. ์์ฑํธ.Background
Large datasets with a huge number of variables or subjects, such as multi-omics data, have been widely generated recently. Many of these datasets are mixed type including both numeric and categorical variables, which makes their analyses difficult. In some studies, the networks underlying the large dataset may be of interest. There have been several methods that are suggested for the inference of the networks, but most of them can be used only for a single type of data or single class cases.
Objective
The objective of the study is to develop and propose a new method, named fused MGM (FMGM), that infers network structures underlying mixed data in 2 groups, with assumptions that both the networks and the differences are sparse. Also, statistical analyses including the proposed method were conducted to find biological markers of the atopic dermatitis (AD) and underlying network structures from multi-omics data of 6-month-old infants.
Methods
For FMGM, the statistical models of the networks are based on pairwise Markov random field model, and the penalty functions implement the main assumption that the networks in 2 groups and their differences are sparse. Fast proximal gradient method (PGM) was used for the optimization of the target function. The extension of FMGM that allows the inclusion of prior knowledges, named prior-induced FMGM (piFMGM), was also developed. The performance of the method was measured with synthetic datasets that simulate power-law network structures. Also, the multi-omics profiles of 6-month-old infants were analyzed. The profiles include host gene transcriptome (N=199), intestinal microbial compositions (N=197), and predicted intestinal microbial functions (N=98; 84 in common). For the analysis, differential analysis with limma and network inference with FMGM were applied.
Results
From the analysis of simulated 2-class datasets, generated from simulated scale-free networks, FMGM showed superior performances especially in terms of F1-scores compared to the previous method inferring the networks one by one (0.392 & 0.546). FMGM performed better not only in inferring the differences (0.217 & 0.410), but also in inferring the networks (0.492 & 0.572). Utilizing prior information with piFMGM obtained slightly better F1-scores from the inference of networks (0.572 & 0.589), and from the inference of the difference (0.410 & 0.423). As a result, the overall performance showed slight improvement (0.546 & 0.562). From the inference of networks from 6-month-old infantsโ AD data, 10 pairs of variables were shown to have different correlations by disease statuses, including host expression of LINC01036 and MIR4788 and abundance of microbial genes related to carotenoid biosynthesis and RNA degradation.
Conclusions
The proposed method, FMGM inferred the network structures in 2 classes better than the previous method. Inclusion of prior information in piFMGM may be useful in more accurate inference of networks, but since the change was subtle, additional studies may be conducted to improve it. Network inference revealed several markers of AD such as microbial genes related to carotenoid biosynthesis and RNA degradation, suggesting a number of possible underlying metabolisms related to AD such as oxidative stress and microbial RNA balance.์ฐ๊ตฌ ๋ฐฐ๊ฒฝ
์ต๊ทผ ๋ค์ค ์ค๋ฏน์ค ์๋ฃ์ ๊ฐ์ด ๋ค์์ ๋ณ์ ํน์ ๊ด์ฐฐ์ ํฌํจํ๋ ๋์ฉ๋ ์๋ฃ๊ฐ ๊ด๋ฒ์ํ๊ฒ ์์ฐ๋๊ณ ์๋ค. ์ด๋ฌํ ์๋ฃ๋ ์ฐ์ํ ๋ฐ ์ด์ฐํ ๋ณ์๋ฅผ ๋ชจ๋ ํฌํจํ๋ ํผํฉํ ์๋ฃ์ธ ๊ฒฝ์ฐ๊ฐ ๋ง์ผ๋ฉฐ, ์ด๋ ์๋ฃ์ ํต๊ณ์ ๋ถ์์ ์ด๋ ต๊ฒ ํ๋ค. ํนํ ๊ธฐ์ ๋คํธ์ํฌ ์ถ๋ก ์ ๊ฒฝ์ฐ, ๊ทธ๊ฐ ๋ช๋ช ํต๊ณ์ ๋ฐฉ๋ฒ๋ค์ด ์ ์๋์ด ์์ผ๋, ๋๋ถ๋ถ ๋ณ์ ์ ํ์ด ๋จ์ผํ๊ฑฐ๋ ์ง๋จ์ด ํ๋์ธ ๊ฒฝ์ฐ์ ๋ํด์๋ง ์ ์ฉ ๊ฐ๋ฅํ๋ค.
์ฐ๊ตฌ ๋ชฉ์
๋ณธ ์ฐ๊ตฌ์์๋ 2๊ฐ ์ง๋จ์ ํผํฉํ ์๋ฃ๋ก๋ถํฐ ๊ธฐ์ ๋คํธ์ํฌ๋ฅผ ์ถ๋ก ํ๋ ๋ฐฉ๋ฒ์ธ fused MGM (FMGM)์ ๊ฐ๋ฐํ๊ณ ์ ์ํ๊ณ ์ ํ์๋ค. ์ด ๋ฐฉ๋ฒ์ ๋คํธ์ํฌ ์์ฒด์ ๋ํ์ฌ ๊ทธ ์ฐจ์ด ์ญ์ ์ ์ฒด ์๋ฃ์ ๋นํด ํฌ๋ฐํ ๋ฐ๋๋ฅผ ๊ฐ์ง์ ๊ฐ์ ํ๋ค. ๋ํ, 6๊ฐ์ ์๋์ ๋ค์ค ์ค๋ฏน์ค ์๋ฃ์ ์ด ๋ฐฉ๋ฒ์ ํฌํจํ ํต๊ณ์ ๋ถ์ ๋ฐฉ๋ฒ์ ์ ์ฉํ์ฌ, ์ํ ํผ์ฑ ํผ๋ถ์ผ๊ณผ ๊ด๋ จ๋ ์๋ฌผํ์ ๋ง์ปค ๋ฐ ๊ธฐ์ ๋คํธ์ํฌ ๊ตฌ์กฐ๋ฅผ ์ฐพ์๋ด๊ณ ์ ํ์๋ค.
์ฐ๊ตฌ ๋ฐฉ๋ฒ
FMGM์ ์๋ณ ๋ง๋ฅด์ฝํ ๋๋ค ํ๋์ ๊ธฐ๋ฐํ ํต๊ณ์ ๋ชจํ์ ์ฌ์ฉํ๋ฉฐ, ๋ฒ์ ํจ์๋ฅผ ํตํด ๋คํธ์ํฌ ๋ฐ ์ฐจ์ด์ ํฌ๋ฐํจ์ ์ ๋ํ๋ค. ๋ชฉ์ ํจ์์ ์ต์ ํ์๋ ๊ณ ์ ๊ทผ์ ๊ฒฝ์ฌ๋ฒ์ ์ฌ์ฉํ์๋ค. ๋ํ FMGM์ ์ถ๋ก ์ ์ฌ์ ์ ๋ณด๋ฅผ ๋์
ํ ์ ์๋๋ก ํ๋ ์ฌ์ ์ ๋ณด ์ ๋ FMGM (piFMGM) ์ญ์ ๊ฐ๋ฐํ์๋ค. ์ถ๋ก ๋ฐฉ๋ฒ์ ์ฑ๋ฅ์ ์ญ๋ฒ์น ๋คํธ์ํฌ ๊ตฌ์กฐ๋ฅผ ์๋ฎฌ๋ ์ด์
ํ ํฉ์ฑ ์๋ฃ๋ฅผ ํตํด ์ธก์ ํ์๋ค. 6๊ฐ์ ์๋์ ๋ค์ค ์ค๋ฏน์ค ์ ๋ณด ์ญ์ ๋ถ์ํ์์ผ๋ฉฐ, ์ค๋ฏน์ค ์ ๋ณด์๋ ์์ฃผ ์ ์ ์ ์ ์ฌ์ฒด (N=199), ์ฅ๋ด ๋ฏธ์๋ฌผ์ฒด ๊ตฌ์ฑ (N=197) ๋ฐ ์ฅ๋ด ๋ฏธ์๋ฌผ ๊ธฐ๋ฅ ์ ๋ณด (N=98)๊ฐ ํฌํจ๋๋ค (๊ณตํต ํ๋ณธ ์ 84). ๋ถ์์๋ ์ ํ ๋ชจํ์ ํตํ ์ฐจ์ด ๋ถ์๊ณผ FMGM์ ํตํ ๋คํธ์ํฌ ์ถ๋ก ์ ์ฌ์ฉํ์๋ค.
์ฐ๊ตฌ ๊ฒฐ๊ณผ
์๋ฎฌ๋ ์ด์
ํ ๋ฌด์ฒ๋ ๋คํธ์ํฌ๋ก๋ถํฐ 2๊ฐ ์ง๋จ ์๋ฃ๋ฅผ ์์ฑํ์ฌ ๋ถ์ํ ๊ฒฐ๊ณผ, ๊ฐ๋ณ ์ง๋จ์ ๋ํด ๋คํธ์ํฌ๋ฅผ ์ถ๋ก ํ ๊ฒฐ๊ณผ์ ๋น๊ตํ์ฌ FMGM์ด ๋ ๋์ F1 ์ ์๋ฅผ ๋ํ๋ด์ด ์ฑ๋ฅ์ด ๋ ์ฐ์ํจ์ ๋ณด์๋ค (0.392 & 0.546). FMGM์ ๋คํธ์ํฌ ๊ฐ ์ฐจ์ด (0.217 & 0.410)๋ฟ๋ง ์๋๋ผ ๋คํธ์ํฌ ์์ฒด์ ์ถ๋ก ์์๋ ๋ ์ฐ์ํ ์ฑ๋ฅ์ ๋ณด์๋ค (0.492 & 0.572). ์ฌ์ ์ ๋ณด๋ฅผ piFMGM์ ํตํด ๋์
ํ ๊ฒฝ์ฐ ์ ์ฒด์ ์ธ ์ฑ๋ฅ์ด ๋ฏธ์ธํ ์ฆ๊ฐ๋ฅผ ๋ณด์๋ค (0.546 & 0.562). ๋คํธ์ํฌ์ ์ถ๋ก ๋ฟ๋ง ์๋๋ผ (0.572 & 0.589), ์ฐจ์ด๋ฅผ ์ถ๋ก ํ ๋์ ์ฑ๋ฅ ์ญ์ ์์ ์ฆ๊ฐ์ธ๋ฅผ ๋์๋ค (0.410 & 0.423). 6๊ฐ์ ์๋์ ์ํ ํผ์ฑ ํผ๋ถ์ผ ์๋ฃ๋ก๋ถํฐ ๋คํธ์ํฌ ์ถ๋ก ์ ์ํํ ๊ฒฐ๊ณผ ์์ฃผ์ LINC01036 ๋ฐ MIR4788 ๋ฐํ, ์ฅ๋ด ๋ฏธ์๋ฌผ์ ์นด๋กํฐ๋
ธ์ด๋ ์ํฉ์ฑ ๋ฐ RNA ๋ถํด ๊ด๋ จ ์ ์ ์ ๋ฑ, 10๊ฐ ๋ณ์ ์์ด ํผ๋ถ์ผ ์ฌ๋ถ์ ๋ฐ๋ฅธ ์๊ด์ฑ ์ฐจ์ด๋ฅผ ๋ํ๋๋ค.
๊ฒฐ๋ก
๋ณธ ์ฐ๊ตฌ์์ ์ ์ํ ๋ฐฉ๋ฒ์ธ FMGM์ ๊ธฐ์กด ๋ฐฉ๋ฒ์ ๋นํด 2๊ฐ ์ง๋จ์ ํผํฉํ ์๋ฃ์์ ๋คํธ์ํฌ๋ฅผ ์ถ๋ก ํ ๋ ๋ ์ข์ ์ฑ๋ฅ์ ๋ํ๋๋ค. ์ฌ์ ์ ๋ณด๋ฅผ piFMGM์ ํตํด ํฌํจ์ํฌ ๊ฒฝ์ฐ ๋คํธ์ํฌ ์ถ๋ก ์ ์ ํ์ฑ์ด ํฅ์๋๋, ๊ทธ ์ฐจ์ด๊ฐ ํฌ์ง ์์ ์ถํ ์ฐ๊ตฌ์์ ์ด๋ฅผ ๋ฐ์ ์ํค๊ธฐ ์ํ ๋ฐฉ๋ฒ์ด ํ์ํ ๊ฒ์ผ๋ก ๋ณด์ธ๋ค. ๋ค์ค ์ค๋ฏน์ค ์๋ฃ์ ๋คํธ์ํฌ ์ถ๋ก ๋ถ์์ ํตํด ์ฅ๋ด ๋ฏธ์๋ฌผ์ ์นด๋กํฐ๋
ธ์ด๋ ์ํฉ์ฑ ๋๋ RNA ๋ถํด ๊ด๋ จ ์ ์ ์ ๋ฑ ์ํ ํผ์ฑ ํผ๋ถ์ผ๊ณผ ๊ด๋ จ๋ ์๋ฌผํ์ ๋ง์ปค๋ฅผ ๋ณต์ ๋ฐ๊ฒฌํ์์ผ๋ฉฐ, ์ด๋ ์ํ ํผ์ฑ ํผ๋ถ์ผ์ ๊ธฐ์ ์ ์ฐํ ์คํธ๋ ์ค ๋๋ ๋ฏธ์๋ฌผ RNA ์กฐ์ ๋ฑ์ด ๊ด๋ จ๋ ์ ์์์ ์ ์ํ๋ค.Chapter 1. Introduction 1
1.1 Study Background 1
1.2 Prior Works 2
1.3 Purpose of Research 5
Chapter 2. Network Inference of 2-class Mixed Data 6
2.1 Introduction 6
2.2 Notations 8
2.3 Model Formulation 8
2.4 Optimization with Fast Proximal Gradient Method 12
2.5 Code Implementation 20
2.6 Simulated Data Analysis 20
2.7 Real Data Analysis: DNA Methylation Data 23
2.8 Discussion 26
Chapter 3. Integration of Prior Information for Network Inference 28
3.1 Introduction 29
3.2 Use of Separate Parameter for Prior Information 29
3.3 Determination of Regularization Parameters 30
3.4 Simulated Data Analysis 33
3.5 Real Data Analysis: Multi-Omics Data from Asthma Patients 35
3.6 Discussion 38
Chapter 4. Multi-Omics Data Analysis of Atopic Dermatitis (AD) 39
4.1 Background 39
4.2 Data Description 40
4.3 Statistical Analysis 43
4.4 Results 43
4.5 Discussion 45
Chapter 5. Conclusion 47
Appendix 49
Bibliography 53
Abstract in Korean 59๋ฐ
Graphical models for de novo and pathway-based network prediction over multi-modal high-throughput biological data
It is now a standard practice in the study of complex disease to perform many high-throughput -omic experiments (genome wide SNP, copy number, mRNA and miRNA expression) on the same set of patient samples. These multi-modal data should allow researchers to form a more complete, systems-level picture of a sample, but this is only possible if they have a suitable model for integrating the data. Due to the variety of data modalities and possible combinations of data, general, flexible integration methods that will be widely applicable in many settings are desirable. In this dissertation I will present my work using graphical models for de novo structure learning of both undirected and directed sparse graphs over a mixture of Gaussian and categorical variables. Using synthetic and biological data I will show that these models are useful for both variable selection and inference. Selecting the regularization parameters is an important challenge for these models so I will also cover stability based methods for efficiently setting these parameters, and for controlling the false discovery rate of edge predictions. I will also show results from a biological application to data from metastatic melanoma patients where our methods identified a PARP1 slice site variant that is predictive of response to chemotherapy. Finally, I present work incorporating miRNA into a pathway based graphical model called PARADIGM. This extension of the model allows us to study patient-specific changes in miRNA induced silencing in cancer