Search CORE

11 research outputs found

Learning mixed graphical models with separate sparsity parameters and stability-based model selection

Author: Benos PV
Donovan RM
Sedgewick AJ
Shi I
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/06/2016
Field of study

Background: Mixed graphical models (MGMs) are graphical models learned over a combination of continuous and discrete variables. Mixed variable types are common in biomedical datasets. MGMs consist of a parameterized joint probability density, which implies a network structure over these heterogeneous variables. The network structure reveals direct associations between the variables and the joint probability density allows one to ask arbitrary probabilistic questions on the data. This information can be used for feature selection, classification and other important tasks. Results: We studied the properties of MGM learning and applications of MGMs to high-dimensional data (biological and simulated). Our results show that MGMs reliably uncover the underlying graph structure, and when used for classification, their performance is comparable to popular discriminative methods (lasso regression and support vector machines). We also show that imposing separate sparsity penalties for edges connecting different types of variables significantly improves edge recovery performance. To choose these sparsity parameters, we propose a new efficient model selection method, named Stable Edge-specific Penalty Selection (StEPS). StEPS is an expansion of an earlier method, StARS, to mixed variable types. In terms of edge recovery, StEPS selected MGMs outperform those models selected using standard techniques, including AIC, BIC and cross-validation. In addition, we use a heuristic search that is linear in size of the sparsity value search space as opposed to the cubic grid search required by other model selection methods. We applied our method to clinical and mRNA expression data from the Lung Genomics Research Consortium (LGRC) and the learned MGM correctly recovered connections between the diagnosis of obstructive or interstitial lung disease, two diagnostic breathing tests, and cigarette smoking history. Our model also suggested biologically relevant mRNA markers that are linked to these three clinical variables. Conclusions: MGMs are able to accurately recover dependencies between sets of continuous and discrete variables in both simulated and biomedical datasets. Separation of sparsity penalties by edge type is essential for accurate network edge recovery. Furthermore, our stability based method for model selection determines sparsity parameters faster and more accurately (in terms of edge recovery) than other model selection methods. With the ongoing availability of comprehensive clinical and biomedical datasets, MGMs are expected to become a valuable tool for investigating disease mechanisms and answering an array of critical healthcare questions

Crossref

Springer - Publisher Connector

PubMed Central

D-Scholarship@Pitt

Implementation of Lesson Study in Learning Process: A Study of Biology Student Learning Activities

Author: Akbar Muh. Nur
Ikalor Allvanialista
Rohma Aida Fitriyatur
Publication venue: Program Studi Pendidikan Biologi FKIP Universitas Pakuan
Publication date: 30/11/2022
Field of study

Prospective student educators can use Lesson Study to improve their ability to plan and design learning. However, understanding of the implementation of Lesson Study in the learning process by 1st semester students at the Postgraduate Biology Education Study Program, Universitas Negeri Malang are still lacking. Thus, a study or research is necessary for the implementation of Lesson Study by students. This study aims to determine student learning activities through the application of Lesson Study and as a form of training for students to transform their knowledge as prospective educators. Descriptive qualitative research is the type of research conducted here. There were three stages to this research, namely planning, implementing, and reflecting. The data obtained from this study are the implementation of Lesson Study activities with the application of the Jigsaw type cooperative learning model, the implementation of Lesson Study activities with the application of the STAD cooperative learning model combined with Snowball Throwing, student learning activities, and the results of reflection on the implementation of Lesson Study. According to the study's findings, Lesson Study can accurately reflect student activities and its implementation in the learning process is also able to provide prospective educators with an understanding of pedagogical competence

Scientific Journals of Universitas Pakuan

Identification and comparison of pandemic-to-symptom networks of South Korea and the United States

Author: Deachul Seo
Gayeon Lee
Hyunjung Yang
Ji Geun Kim
Juyoen Hur
Larkin S. McReynolds
Larkin S. McReynolds
Lawrence Amsel
Mijeong Park
Sanghoon Han
Soo Hyun Park
Young-Hoon Kim
Publication venue: 'Frontiers Media SA'
Publication date: 01/06/2023
Field of study

BackgroundThe Coronavirus (COVID-19) pandemic resulted in a dramatic increase in the prevalence of anxiety and depression globally. Although the impact on the mental health of young adults was especially strong, its underlying mechanisms remain elusive.Materials and methodsUsing a network approach, the present study investigated the putative pathways between pandemic-related factors and anxiety and depressive symptoms among young adults in South Korea and the U.S. Network analyses were conducted on cross-country data collected during the COVID-19 lockdown period (n = 1,036). Our model included depression symptoms (PHQ-9), generalized anxiety symptoms (GAD-7), and COVID-19-related factors (e.g., COVID-19-related traumatic stress, pandemic concerns, access to medical/mental health services).ResultsThe overall structure of pandemic-to-symptom networks of South Korea and the U.S. were found to be similar. In both countries, COVID-related stress and negative future anticipation (an anxiety symptom) were identified as bridging nodes between pandemic-related factors and psychological distress. In addition, worry-related symptoms (e.g., excessive worry, uncontrollable worry) were identified as key contributors in maintaining the overall pandemic-to-symptom network in both countries.ConclusionThe similar network structures and patterns observed in both countries imply that there may exist a stable relationship between the pandemic and internalizing symptoms above and beyond the sociocultural differences. The current findings provide new insights into the common potential pathway between the pandemic and internalizing symptoms in South Korea and in the U.S. and inform policymakers and mental health professionals of potential intervention targets to alleviate internalizing symptoms

Directory of Open Access Journals

Recommended from our members

Distinct COPD subtypes in former smokers revealed by gene network perturbation analysis

Author: Aguet Francois
Ardlie Kristin G.
Benos Panayiotis V.
Buschur Kristina L.
Castaldi Peter
Craig Johnson W.
Durda Peter
Graham Barr R.
Hersh Craig P.
Kasela Silva
Lappalainen Tuuli
Liu Yongmei
Manichaikul Ani
Rich Stephen S.
Riley Craig
Rotter Jerome I.
Saferali Aabida
Sciurba Frank
Smith Josh
Taylor Kent D.
Tracy Russell P.
Zhang Grace
Publication venue
Publication date: 01/01/2023
Field of study

Background Chronic obstructive pulmonary disease (COPD) varies significantly in symptomatic and physiologic presentation. Identifying disease subtypes from molecular data, collected from easily accessible blood samples, can help stratify patients and guide disease management and treatment. Methods Blood gene expression measured by RNA-sequencing in the COPDGene Study was analyzed using a network perturbation analysis method. Each COPD sample was compared against a learned reference gene network to determine the part that is deregulated. Gene deregulation values were used to cluster the disease samples. Results The discovery set included 617 former smokers from COPDGene. Four distinct gene network subtypes are identified with significant differences in symptoms, exercise capacity and mortality. These clusters do not necessarily correspond with the levels of lung function impairment and are independently validated in two external cohorts: 769 former smokers from COPDGene and 431 former smokers in the Multi-Ethnic Study of Atherosclerosis (MESA). Additionally, we identify several genes that are significantly deregulated across these subtypes, including DSP and GSTM1, which have been previously associated with COPD through genome-wide association study (GWAS). Conclusions The identified subtypes differ in mortality and in their clinical and functional characteristics, underlining the need for multi-dimensional assessment potentially supplemented by selected markers of gene expression. The subtypes were consistent across cohorts and could be used for new patient stratification and disease prognosis

Columbia University Academic Commons

마르코프 랜덤 필드 모형을 이용한 2개 집단의 혼합 그래프 모형 추정 및 적용

Author: 박재현
Publication venue: 서울대학교 대학원
Publication date: 01/08/2022
Field of study

학위논문(박사) -- 서울대학교대학원 : 자연과학대학 협동과정 생물정보학전공, 2022. 8. 원성호.Background Large datasets with a huge number of variables or subjects, such as multi-omics data, have been widely generated recently. Many of these datasets are mixed type including both numeric and categorical variables, which makes their analyses difficult. In some studies, the networks underlying the large dataset may be of interest. There have been several methods that are suggested for the inference of the networks, but most of them can be used only for a single type of data or single class cases. Objective The objective of the study is to develop and propose a new method, named fused MGM (FMGM), that infers network structures underlying mixed data in 2 groups, with assumptions that both the networks and the differences are sparse. Also, statistical analyses including the proposed method were conducted to find biological markers of the atopic dermatitis (AD) and underlying network structures from multi-omics data of 6-month-old infants. Methods For FMGM, the statistical models of the networks are based on pairwise Markov random field model, and the penalty functions implement the main assumption that the networks in 2 groups and their differences are sparse. Fast proximal gradient method (PGM) was used for the optimization of the target function. The extension of FMGM that allows the inclusion of prior knowledges, named prior-induced FMGM (piFMGM), was also developed. The performance of the method was measured with synthetic datasets that simulate power-law network structures. Also, the multi-omics profiles of 6-month-old infants were analyzed. The profiles include host gene transcriptome (N=199), intestinal microbial compositions (N=197), and predicted intestinal microbial functions (N=98; 84 in common). For the analysis, differential analysis with limma and network inference with FMGM were applied. Results From the analysis of simulated 2-class datasets, generated from simulated scale-free networks, FMGM showed superior performances especially in terms of F1-scores compared to the previous method inferring the networks one by one (0.392 & 0.546). FMGM performed better not only in inferring the differences (0.217 & 0.410), but also in inferring the networks (0.492 & 0.572). Utilizing prior information with piFMGM obtained slightly better F1-scores from the inference of networks (0.572 & 0.589), and from the inference of the difference (0.410 & 0.423). As a result, the overall performance showed slight improvement (0.546 & 0.562). From the inference of networks from 6-month-old infants’ AD data, 10 pairs of variables were shown to have different correlations by disease statuses, including host expression of LINC01036 and MIR4788 and abundance of microbial genes related to carotenoid biosynthesis and RNA degradation. Conclusions The proposed method, FMGM inferred the network structures in 2 classes better than the previous method. Inclusion of prior information in piFMGM may be useful in more accurate inference of networks, but since the change was subtle, additional studies may be conducted to improve it. Network inference revealed several markers of AD such as microbial genes related to carotenoid biosynthesis and RNA degradation, suggesting a number of possible underlying metabolisms related to AD such as oxidative stress and microbial RNA balance.연구 배경 최근 다중 오믹스 자료와 같이 다수의 변수 혹은 관찰을 포함하는 대용량 자료가 광범위하게 생산되고 있다. 이러한 자료는 연속형 및 이산형 변수를 모두 포함하는 혼합형 자료인 경우가 많으며, 이는 자료의 통계적 분석을 어렵게 한다. 특히 기저 네트워크 추론의 경우, 그간 몇몇 통계적 방법들이 제시되어 왔으나, 대부분 변수 유형이 단일하거나 집단이 하나인 경우에 대해서만 적용 가능하다. 연구 목적 본 연구에서는 2개 집단의 혼합형 자료로부터 기저 네트워크를 추론하는 방법인 fused MGM (FMGM)을 개발하고 제시하고자 하였다. 이 방법은 네트워크 자체에 더하여 그 차이 역시 전체 자료에 비해 희박한 밀도를 가짐을 가정한다. 또한, 6개월 아동의 다중 오믹스 자료에 이 방법을 포함한 통계적 분석 방법을 적용하여, 아토피성 피부염과 관련된 생물학적 마커 및 기저 네트워크 구조를 찾아내고자 하였다. 연구 방법 FMGM은 쌍별 마르코프 랜덤 필드에 기반한 통계적 모형을 사용하며, 벌점 함수를 통해 네트워크 및 차이의 희박함을 유도한다. 목적함수의 최적화에는 고속 근위 경사법을 사용하였다. 또한 FMGM의 추론에 사전 정보를 도입할 수 있도록 하는 사전 정보 유도 FMGM (piFMGM) 역시 개발하였다. 추론 방법의 성능은 역법칙 네트워크 구조를 시뮬레이션한 합성 자료를 통해 측정하였다. 6개월 아동의 다중 오믹스 정보 역시 분석하였으며, 오믹스 정보에는 숙주 유전자 전사체 (N=199), 장내 미생물체 구성 (N=197) 및 장내 미생물 기능 정보 (N=98)가 포함된다 (공통 표본 수 84). 분석에는 선형 모형을 통한 차이 분석과 FMGM을 통한 네트워크 추론을 사용하였다. 연구 결과 시뮬레이션한 무척도 네트워크로부터 2개 집단 자료를 생성하여 분석한 결과, 개별 집단에 대해 네트워크를 추론한 결과와 비교하여 FMGM이 더 높은 F1 점수를 나타내어 성능이 더 우수함을 보였다 (0.392 & 0.546). FMGM은 네트워크 간 차이 (0.217 & 0.410)뿐만 아니라 네트워크 자체의 추론에서도 더 우수한 성능을 보였다 (0.492 & 0.572). 사전 정보를 piFMGM을 통해 도입한 경우 전체적인 성능이 미세한 증가를 보였다 (0.546 & 0.562). 네트워크의 추론뿐만 아니라 (0.572 & 0.589), 차이를 추론할 때의 성능 역시 작은 증가세를 띄었다 (0.410 & 0.423). 6개월 아동의 아토피성 피부염 자료로부터 네트워크 추론을 수행한 결과 숙주의 LINC01036 및 MIR4788 발현, 장내 미생물의 카로티노이드 생합성 및 RNA 분해 관련 유전자 등, 10개 변수 쌍이 피부염 여부에 따른 상관성 차이를 나타냈다. 결론 본 연구에서 제시한 방법인 FMGM은 기존 방법에 비해 2개 집단의 혼합형 자료에서 네트워크를 추론할 때 더 좋은 성능을 나타냈다. 사전 정보를 piFMGM을 통해 포함시킬 경우 네트워크 추론의 정확성이 향상되나, 그 차이가 크지 않아 추후 연구에서 이를 발전시키기 위한 방법이 필요할 것으로 보인다. 다중 오믹스 자료의 네트워크 추론 분석을 통해 장내 미생물의 카로티노이드 생합성 또는 RNA 분해 관련 유전자 등 아토피성 피부염과 관련된 생물학적 마커를 복수 발견하였으며, 이는 아토피성 피부염의 기저에 산화 스트레스 또는 미생물 RNA 조절 등이 관련될 수 있음을 제시한다.Chapter 1. Introduction 1 1.1 Study Background 1 1.2 Prior Works 2 1.3 Purpose of Research 5 Chapter 2. Network Inference of 2-class Mixed Data 6 2.1 Introduction 6 2.2 Notations 8 2.3 Model Formulation 8 2.4 Optimization with Fast Proximal Gradient Method 12 2.5 Code Implementation 20 2.6 Simulated Data Analysis 20 2.7 Real Data Analysis: DNA Methylation Data 23 2.8 Discussion 26 Chapter 3. Integration of Prior Information for Network Inference 28 3.1 Introduction 29 3.2 Use of Separate Parameter for Prior Information 29 3.3 Determination of Regularization Parameters 30 3.4 Simulated Data Analysis 33 3.5 Real Data Analysis: Multi-Omics Data from Asthma Patients 35 3.6 Discussion 38 Chapter 4. Multi-Omics Data Analysis of Atopic Dermatitis (AD) 39 4.1 Background 39 4.2 Data Description 40 4.3 Statistical Analysis 43 4.4 Results 43 4.5 Discussion 45 Chapter 5. Conclusion 47 Appendix 49 Bibliography 53 Abstract in Korean 59박

SNU Open Repository and Archive

Learning mixed graphical models with separate sparsity parameters and stability-based model selection

Author: AA Walsh
AJ Sedgewick
Andrew J. Sedgewick
B Bollobás
B Efron
B Fellinghauer
BW Matthews
DM Mannino
E Yang
G Schwarz
GO Consortium
GT Huang
H Akaike
H Liu
H Liu
H Zou
I Tsamardinos
I Tur
IO Rosas
Ivy Shi
J Besag
J Friedman
J Lee
L Zhang
N Meinshausen
Panayiotis V. Benos
RB Brem
Rory M. Donovan
S Chen
SL Lauritzen
T Abeel
T Zhao
W Mazur
W Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Graphical models for de novo and pathway-based network prediction over multi-modal high-throughput biological data

Author: Sedgewick Andrew
Publication venue
Publication date: 07/09/2016
Field of study

It is now a standard practice in the study of complex disease to perform many high-throughput -omic experiments (genome wide SNP, copy number, mRNA and miRNA expression) on the same set of patient samples. These multi-modal data should allow researchers to form a more complete, systems-level picture of a sample, but this is only possible if they have a suitable model for integrating the data. Due to the variety of data modalities and possible combinations of data, general, flexible integration methods that will be widely applicable in many settings are desirable. In this dissertation I will present my work using graphical models for de novo structure learning of both undirected and directed sparse graphs over a mixture of Gaussian and categorical variables. Using synthetic and biological data I will show that these models are useful for both variable selection and inference. Selecting the regularization parameters is an important challenge for these models so I will also cover stability based methods for efficiently setting these parameters, and for controlling the false discovery rate of edge predictions. I will also show results from a biological application to data from metastatic melanoma patients where our methods identified a PARP1 slice site variant that is predictive of response to chemotherapy. Finally, I present work incorporating miRNA into a pathway based graphical model called PARADIGM. This extension of the model allows us to study patient-specific changes in miRNA induced silencing in cancer

D-Scholarship@Pitt