7 research outputs found

    Visualizing engineering design data using a modified two-level self-organizing map clustering approach

    Get PDF
    Engineers tasked with designing large and complex systems are continually in need of decision-making aids able to sift through enormous amounts of data produced through simulation and experimentation. Understanding these systems often requires visualizing multidimensional design data. Visual cues such as size, color, and symbols are often used to denote specific variables (dimensions) as well as characteristics of the data. However, these cues are unable to effectively convey information attributed to a system containing more than three dimensions. Two general techniques can be employed to reduce the complexity of information presented to an engineer: dimension reduction, and individual variable comparison. Each approach can provide a comprehensible visualization of the resulting design space, which is vital for an engineer to decide upon an appropriate optimization algorithm. Visualization techniques, such as self-organizing maps (SOMs), offer powerful methods able to surmount the difficulties of reducing the complexity of n-dimensional data by producing simple to understand visual representations that quickly highlight trends to support decision-making. The SOM can be extended by providing relevant output information in the form of contextual labels. Furthermore, these contextual labels can be leveraged to visualize a set of output maps containing statistical evaluations of each node residing within a trained SOM. These maps give a designer a visual context to the data set’s natural topology by highlighting the nodal performance amongst the maps. A drawback to using SOMs is the clustering of promising points with predominately less desirable data. Similar data groupings can be revealed from the trained output maps using visualization techniques such as the SOM, but these are not inherently cluster analysis methods. Cluster analysis is an approach able to assimilate similar data objects into “natural groups” from an otherwise unknown prior knowledge of a data set. Engineering data composed of design alternatives with associated variable parameters often contain data objects with unknown classification labels. Consequently, identifying the correct classifications can be difficult and costly. This thesis applies a cluster analysis technique to SOMs to segment a high-dimensional dataset into “meta-clusters”. Furthermore, the thesis will describe the algorithm created to establish these meta-clusters through the development of several computational metrics involving intra and inter cluster densities. The results from this work show the presented algorithm’s ability to narrow a large-complex system’s plethora of design alternatives into a few overarching set of design groups containing similar principal characteristics, which saves the time a designer would otherwise spend analyzing numerous design alternatives

    Particle swarm optimization for support vector clustering Separating hyper-plane of unlabeled data

    Get PDF
    International audienceThe objective of this work is to design a new method to solve the problem of integrating the Vapnik theory, as regards support vector machines, in the field of clustering data. For this we turned to bio-inspired meta-heuristics. Bio-inspired approaches aim to develop models resolving a class of problems by drawing on patterns of behavior developed in ethology. For instance, the Particle Swarm Optimization (PSO) is one of the latest and widely used methods in this regard. Inspired by this paradigm we propose a new method for clustering. The proposed method PSvmC ensures the best separation of the unlabeled data sets into two groups. It aims specifically to explore the basic principles of SVM and to combine it with the meta-heuristic of particle swarm optimization to resolve the clustering problem. Indeed, it makes a contribution in the field of analysis of multivariate data. Obtained results present groups as homogeneous as possible. Indeed, the intra-class value is more efficient when comparing it to those obtained by Hierarchical clustering, Simple K-means and EM algorithms for different database of benchmark

    Use of microbiome data to explain the expression of productive traits in domestic species

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Veterinaria, leída el 11-03-2022El descubrimiento de comunidades microbianas asociadas simbióticamente a organismos eucariotas ha llevado a un cambio de paradigma en la definición de individuo biológico, que ahora se ve como una combinación codependiente del hospedador y su microbioma, u holobionte. Por tanto, el estudio de los microbiomas se ha convertido en algo fundamental para comprender la biología de los organismos vivos complejos. De hecho, se ha observado que las comunidades microbianas poseen un papel crucial en la salud, supervivencia, desarrollo y metabolismo del hospedador. Los recientes avances en secuenciación genética han supuesto un importante impulso para la investigación en microbiología, al permitir la obtención de bases de datos de secuenciación masiva que abarcan una gran parte de la diversidad presente dentro de los microbiomas. La era del next-generation sequencing ha aportado nuevos conocimientos sobre el efecto de las comunidades microbianas sobre el fenotipo del hospedador, con especial relevancia del microbioma intestinal. Para la industria ganadera este hecho ha dado lugar a importantes avances en la comprensión de los mecanismos biológicos que influyen en productividad, sostenibilidad y bienestar animal, lo que podría ser útil para afrontar los desafíos existentes en este sector...The discovery of microbial communities symbiotically associated with eukaryotic organisms has led to a paradigm shift in the definition of the biological individual, which is now seen as a co-dependent combination of the host and its microbiome, or holobiont. Thus, the study of microbiomes has become essential to understand the biology of complex living organisms. Indeed, current research points to a crucial role of microbial communities in host health, survivability, development and metabolism. Recent advances in DNA sequencing have entailed a significant boost to microbial research, allowing the generation of massive sequencing databases encompassing a large proportion of the diversity inside microbiomes. The era of next-generation sequencing has brought new knowledge about the role of microbial communities, with special significance for gut microbiomes, in host phenotype. For livestock industry, this has led to important advances in the understanding of biological mechanisms influencing animal welfare, productivity and sustainability, which could be useful to face existing challenges in animal production...Fac. de VeterinariaTRUEunpu

    16S rRNA 메타 유전체 연관성 분석 통계 방법론의 개발과 적용

    Get PDF
    학위논문(박사) -- 서울대학교대학원 : 보건대학원 보건학과(보건학전공), 2021.8. 원성호.연구배경: 시퀀싱 기술의 발달과 시퀀싱 비용 감소는 미생물 군집에 대한 대규모 분석을 가능하게 하였고 메타 유전체학이 탄생하였으며 이 분야가 광범위하게 발전하였다. 구성 편향과 제로 팽창 문제는 메타 게놈 데이터의 연관 분석을 위한 통계적 방법도 수행하기 어렵게 만든다. 또한 이러한 문제는 반복 측정 내에서 복잡한 상관 관계를 고려해야하는 종단 분석의 모델링을 더 어렵게 만든다. 이러한 희박함과 다양한 데이터베이스 및 클러스터링 방법 선택은 미생물 군유 전체 데이터 세트의 이질성을 유도한다. 연구목적: 이 연구의 목적은 (1) 다양한 클러스터링 방법과 데이터베이스를 기반으로 결과를 비교할 수있는 구성 편향, 제로 인플레이션, 패키지 구현 등 문제를 수정하는 통계적 방법을 개발하는 것이다. (2) 구성 편향, 제로 인플레이션, 종단 데이터 세트 반복 측정 간의 상관 관계 등 문제를 수정하는 통계적 방법 개발, (3) 제 2형 당뇨병 위험 지표에 영향을 줄 수 있는 미생물을 식별하고 다중 오믹스 자료를 활용한 종단 연관분석을 통하여 이를 설명하는 생물학적 배경을 발견한다. 연구방법: 미생물 군유 전체 데이터의 특성을 수정하고 구성 편향 및 제로 팽창 문제를 수정하기 위해 풍부도를 정규화하고 트리 참조 트리 구조와 결합합니다. 전처리 절차와 다른 데이터베이스와의 결과 비교 및 클러스터링 방법을 포함하는 패키지가 개발되어 이질성 문제를 처리 할 수 있습니다. 반복 측정 값 간의 상관 관계는 각각 로버 스트 점수와 Wald 통계를 사용하여 일반화 된 추정 방정식을 반영한다. 제 2 형 당뇨병 위험 지표는 일반화 된 추정 방정식이있는 모델이며 생물학적 메커니즘은 추정 된 기능 게놈 및 SNP를 통해 탐색되었다. 목표 미생물과 제 2 형 당뇨병 위험 사이의 인과 관계를 조사하기 위해 Mendelian 무작위 분석도 수행되었다. 연구 결과 및 결론: 계통 발생 트리 기반 미생물 군집 연합 테스트 (TMAT)는 미생물 풍부도를 표준화하고 계통 발생 트리 구조와 결합하였다. 계통 발생 수를 기반으로 한 시퀀싱 판독의 통합은 제로 인플레이션을 줄이고 두 미생물 풍부 사이의 비율을 취하면 구성 편향을 수정하였다. 다양한 데이터베이스와 클러스터링 방법을 기반으로 한 파이프 라인 구축 미생물 수표를 포함하는 패키지 인 포괄적 인 미생물 군유 전체 연관 분석 (AMAA)과 메타 게놈 전체 연관 분석 방법을 개발하였으며 이를 통해 다양한 데이터베이스 또는 클러스터링 방식을 기반으로 한 통합 전처리 및 결과 비교를 통해 다양한 미생물 군유 전체 연관성 분석 방법을 편리하게 사용할 수 있을 것이다. TMAT의 확장 버전 mTMAT는 강력한 분산 추정기를 사용하며 반복 측정 된 샘플에 적용 할 수 있다. mTMAT의 통계적 파워는 명목 유형 1 오류를 보존하는 대부분의 시나리오에서 다른 방법보다 우수하였다. 우리는 Lachnospiraceae 계통의 GU174097이 제 2 형 당뇨병 위험 지표와 상관 관계가 있음을 발견하였다. 또한 이 속은 단쇄 지방산 (SCFA)과 관련된 경로와 관련이있을 수 있음이 밝혀져 있으며 MR 분석 및 생물학적 배경 조사는 이 속이 당뇨의 위험을 증가시킬 수 있다는 가능성을 시사한다.Background: Increased availability of affordable sequencing technology and advances in throughput technology have led to the birth and widespread development of a new scientific discipline, metagenomics that includes large-scale analysis of microbial communities. However, analysis with metagenomics data suffers from compositional bias and zero-inflated problems, and the statistical methods available for association analysis with 16S rRNA data is very limited, especially for the repeatedly observed 16S rRNA data. Therefore investigation on the statistical method and software development is necessary. Objective: The main goal is (1) to develop new methods with cross-sectional and repeatedly observed 16S rRNA data that correct for the problems including compositional bias, zero-inflation and package implementation that can unify the preprocessing procedures; (2) to identify microorganisms which can be affect type-2 diabetes (T2D)-related traits with repeatedly observed 16S rRNA data. Methods: To consider the characteristics of microbiome data and correct compositional bias and zero-inflated problem, the phylogenetic tree based method, TMAT, and its extension to the repeatedly observed 16S rRNA measurement, mTMAT, were developed. I also implemented a new package that can generate both statistics, and conduct OTU clustering with different databases. This package also allows the comparison of different statistics. Furthermore, association analysis of microorganisms with T2D were conducted by using repeatedly measured EV in urine samples. EV-derived metagenomic (N = 393), clinical (N = 5032), and metabolite (N = 574) data were observed for a prospective and longitudinal Korean community-based cohort (KARE) three times and genetic data was available. They were analyzed with generalized linear mixed model to identify microbes associated with T2D and their interaction with metabolites. Results and Conclusions: The proposed phylogenetic tree-based microbiome association test (TMAT) normalized microbial abundances and pooled abundances based on the phylogenetic tree structure was utilized for association analysis. Results from simulation studies showed that TMAT correctly controls type-1 error rates, and statistically more powerful. Second, I also implemented all-inclusive microbiome association analysis (AMAA) package. AMAA package provides the analysis result of various methods including TMAT under a unified preprocessing and allows comparison of the results based on different databases or clustering methods. Third, mTMAT which is the extended version of TMAT for repeatedly measured 16S rRNA data was developed. It uses generalized estimating equations with robust variance estimator and can be applied to repeated measured samples. Statistical power of mTMAT was superior to existing methods in terms of controlling the type-1 error and minimizing the type-2 error, and it is robust against the compositional bias. Fourth, from the association analysis with repeatedly measured EV-based metagenome data, it was found that GU174097_g, an uncultured Lachnospiraceae, was associated with T2D (β = −189.13; p = 0.00006). These results indicates that GU174097_g may decrease the HbA1c level and the risk of T2D.Chapter 1. Introduction 8 1. Study Background 8 2. Literature Review 11 3. Purpose of Research 15 Chapter 2. Phylogenetic Tree-based Microbiome Association Test and Package Development for Microbiome Analysis 16 2.1 Introduction 16 2.2 Materials and Methods 19 2.3 Results 35 2.4 Discussion 55 Chapter 3. Longitudinal Microbiome Association Test based on Phylogenetic Tree 59 3.1 Introduction 59 3.2 Materials and Methods 62 3.3 Results 73 3.4 Discussion 110 Chapter 4. Longitudinal Measurement of Urine Microbiome Reveals the Role of uncultured Lachnospiraceae on Type-2 Diabetes Pathogenesis 112 4.1 Introduction 112 4.2 Materials and Methods 114 4.3 Results 126 4.4 Discussion 147 Chapter 5. Conclusions 150 References 152 Abstract in Korean 159박

    Database clustering methods

    No full text

    Learning topic description from clustering of trusted user roles and event models characterizing distributed provenance networks: a reinforcement learning approach

    No full text
    Abstract This paper proposes a reinforcement learning based message transfer model for transferring news report messages through a selected path in a trusted provenance network with the objective of maximizing the reward values based on trust or importance based and network congestion or utility based cost measures. The reward values are calculated along a dynamically defined policy path connecting start topic or event node to a goal topic or event or issue nodes for incrementally defined time windows for a given network congestion situation. A hierarchy of agents of trusted roles is used to accomplish the sub-goals associated with sub-story or subtopic in the provenance structure where an agent role may assume the semantic role of the associated sub-topic. The twitted news story thread or plan of events is defined in this work from the starting topic or event node to the goal topic or event node for incrementally defined intervals of time. The graphs are clustered into subtopic and these sub-goals or sub topic nodes of a topic node at every level of granularity are associated with cluster of news reports which describe activities associated with sub-goal or sub-topic events. Such cluster of nodes may also represent drilled down sequence of sub-events describing a sub-topic or sub-goal node. The policy path in a topic or story graph model is defined by applying reinforcement learning principles on dynamically defined event models associated with evolution of topic definition observed from incrementally acquired samples of input training data spanning multiple time windows. We provide a methodology for unifying similar provenance graph models for adapting and averaging the policy path classifiers associated with individual models to produce a reduced set of unified models derived during training. A minimum set cover of classifiers is identified for the models and a clustering procedure of the models is suggested based on these classifiers. Other database clustering methods have also been suggested as alternatives for clustering these models. A collection of unified models are identified from the models identified within a cluster and the policy path classifiers associated with these models provide the story or topic descriptions destined to goal topic or event nodes characterizing these models within a cluster
    corecore