9 research outputs found

    A Weiszfeld algorithm for the solution of an asymmetric extension of the generalized Fermat location problem

    Get PDF
    AbstractThe Generalized Fermat Problem (in the plane) is: given n≥3 destination points find the point x̄∗ which minimizes the sum of Euclidean distances from x̄∗ to each of the destination points.The Weiszfeld iterative algorithm for this problem is globally convergent, independent of the initial guess. Also, a test is available, à priori, to determine when x̄∗ a destination point. This paper generalizes earlier work by the first author by introducing an asymmetric Euclidean distance in which, at each destination, the x-component is weighted differently from the y-component. A Weiszfeld algorithm is studied to compute x̄∗ and is shown to be a descent method which is globally convergent (except possibly for a denumerable number of starting points). Local convergence properties are characterized. When x̄∗ is not a destination point the iteration matrix at x̄∗ is shown to be convergent and local convergence is always linear. When x̄∗ is a destination point, local convergence can be linear, sub-linear or super-linear, depending upon a computable criterion. A test, which does not require iteration, for x̄∗ to be a destination, is derived. Comparisons are made between the symmetric and asymmetric problems. Numerical examples are given

    Some Applications of the Weighted Combinatorial Laplacian

    Get PDF
    The weighted combinatorial Laplacian of a graph is a symmetric matrix which is the discrete analogue of the Laplacian operator. In this thesis, we will study a new application of this matrix to matching theory yielding a new characterization of factor-criticality in graphs and matroids. Other applications are from the area of the physical design of very large scale integrated circuits. The placement of the gates includes the minimization of a quadratic form given by a weighted Laplacian. A method based on the dual constrained subgradient method is proposed to solve the simultaneous placement and gate-sizing problem. A crucial step of this method is the projection to the flow space of an associated graph, which can be performed by minimizing a quadratic form given by the unweighted combinatorial Laplacian.Andwendungen der gewichteten kombinatorischen Laplace-Matrix Die gewichtete kombinatorische Laplace-Matrix ist das diskrete Analogon des Laplace-Operators. In dieser Arbeit stellen wir eine neuartige Charakterisierung von Faktor-Kritikalität von Graphen und Matroiden mit Hilfe dieser Matrix vor. Wir untersuchen andere Anwendungen im Bereich des Entwurfs von höchstintegrierten Schaltkreisen. Die Platzierung basiert auf der Minimierung einer quadratischen Form, die durch eine gewichtete kombinatorische Laplace-Matrix gegeben ist. Wir präsentieren einen Algorithmus für das allgemeine simultane Platzierungs- und Gattergrößen-Optimierungsproblem, der auf der dualen Subgradientenmethode basiert. Ein wichtiger Bestandteil dieses Verfahrens ist eine Projektion auf den Flussraum eines assoziierten Graphen, die als die Minimierung einer durch die Laplace-Matrix gegebenen quadratischen Form aufgefasst werden kann

    마르코프 랜덤 필드 모형을 이용한 2개 집단의 혼합 그래프 모형 추정 및 적용

    Get PDF
    학위논문(박사) -- 서울대학교대학원 : 자연과학대학 협동과정 생물정보학전공, 2022. 8. 원성호.Background Large datasets with a huge number of variables or subjects, such as multi-omics data, have been widely generated recently. Many of these datasets are mixed type including both numeric and categorical variables, which makes their analyses difficult. In some studies, the networks underlying the large dataset may be of interest. There have been several methods that are suggested for the inference of the networks, but most of them can be used only for a single type of data or single class cases. Objective The objective of the study is to develop and propose a new method, named fused MGM (FMGM), that infers network structures underlying mixed data in 2 groups, with assumptions that both the networks and the differences are sparse. Also, statistical analyses including the proposed method were conducted to find biological markers of the atopic dermatitis (AD) and underlying network structures from multi-omics data of 6-month-old infants. Methods For FMGM, the statistical models of the networks are based on pairwise Markov random field model, and the penalty functions implement the main assumption that the networks in 2 groups and their differences are sparse. Fast proximal gradient method (PGM) was used for the optimization of the target function. The extension of FMGM that allows the inclusion of prior knowledges, named prior-induced FMGM (piFMGM), was also developed. The performance of the method was measured with synthetic datasets that simulate power-law network structures. Also, the multi-omics profiles of 6-month-old infants were analyzed. The profiles include host gene transcriptome (N=199), intestinal microbial compositions (N=197), and predicted intestinal microbial functions (N=98; 84 in common). For the analysis, differential analysis with limma and network inference with FMGM were applied. Results From the analysis of simulated 2-class datasets, generated from simulated scale-free networks, FMGM showed superior performances especially in terms of F1-scores compared to the previous method inferring the networks one by one (0.392 & 0.546). FMGM performed better not only in inferring the differences (0.217 & 0.410), but also in inferring the networks (0.492 & 0.572). Utilizing prior information with piFMGM obtained slightly better F1-scores from the inference of networks (0.572 & 0.589), and from the inference of the difference (0.410 & 0.423). As a result, the overall performance showed slight improvement (0.546 & 0.562). From the inference of networks from 6-month-old infants’ AD data, 10 pairs of variables were shown to have different correlations by disease statuses, including host expression of LINC01036 and MIR4788 and abundance of microbial genes related to carotenoid biosynthesis and RNA degradation. Conclusions The proposed method, FMGM inferred the network structures in 2 classes better than the previous method. Inclusion of prior information in piFMGM may be useful in more accurate inference of networks, but since the change was subtle, additional studies may be conducted to improve it. Network inference revealed several markers of AD such as microbial genes related to carotenoid biosynthesis and RNA degradation, suggesting a number of possible underlying metabolisms related to AD such as oxidative stress and microbial RNA balance.연구 배경 최근 다중 오믹스 자료와 같이 다수의 변수 혹은 관찰을 포함하는 대용량 자료가 광범위하게 생산되고 있다. 이러한 자료는 연속형 및 이산형 변수를 모두 포함하는 혼합형 자료인 경우가 많으며, 이는 자료의 통계적 분석을 어렵게 한다. 특히 기저 네트워크 추론의 경우, 그간 몇몇 통계적 방법들이 제시되어 왔으나, 대부분 변수 유형이 단일하거나 집단이 하나인 경우에 대해서만 적용 가능하다. 연구 목적 본 연구에서는 2개 집단의 혼합형 자료로부터 기저 네트워크를 추론하는 방법인 fused MGM (FMGM)을 개발하고 제시하고자 하였다. 이 방법은 네트워크 자체에 더하여 그 차이 역시 전체 자료에 비해 희박한 밀도를 가짐을 가정한다. 또한, 6개월 아동의 다중 오믹스 자료에 이 방법을 포함한 통계적 분석 방법을 적용하여, 아토피성 피부염과 관련된 생물학적 마커 및 기저 네트워크 구조를 찾아내고자 하였다. 연구 방법 FMGM은 쌍별 마르코프 랜덤 필드에 기반한 통계적 모형을 사용하며, 벌점 함수를 통해 네트워크 및 차이의 희박함을 유도한다. 목적함수의 최적화에는 고속 근위 경사법을 사용하였다. 또한 FMGM의 추론에 사전 정보를 도입할 수 있도록 하는 사전 정보 유도 FMGM (piFMGM) 역시 개발하였다. 추론 방법의 성능은 역법칙 네트워크 구조를 시뮬레이션한 합성 자료를 통해 측정하였다. 6개월 아동의 다중 오믹스 정보 역시 분석하였으며, 오믹스 정보에는 숙주 유전자 전사체 (N=199), 장내 미생물체 구성 (N=197) 및 장내 미생물 기능 정보 (N=98)가 포함된다 (공통 표본 수 84). 분석에는 선형 모형을 통한 차이 분석과 FMGM을 통한 네트워크 추론을 사용하였다. 연구 결과 시뮬레이션한 무척도 네트워크로부터 2개 집단 자료를 생성하여 분석한 결과, 개별 집단에 대해 네트워크를 추론한 결과와 비교하여 FMGM이 더 높은 F1 점수를 나타내어 성능이 더 우수함을 보였다 (0.392 & 0.546). FMGM은 네트워크 간 차이 (0.217 & 0.410)뿐만 아니라 네트워크 자체의 추론에서도 더 우수한 성능을 보였다 (0.492 & 0.572). 사전 정보를 piFMGM을 통해 도입한 경우 전체적인 성능이 미세한 증가를 보였다 (0.546 & 0.562). 네트워크의 추론뿐만 아니라 (0.572 & 0.589), 차이를 추론할 때의 성능 역시 작은 증가세를 띄었다 (0.410 & 0.423). 6개월 아동의 아토피성 피부염 자료로부터 네트워크 추론을 수행한 결과 숙주의 LINC01036 및 MIR4788 발현, 장내 미생물의 카로티노이드 생합성 및 RNA 분해 관련 유전자 등, 10개 변수 쌍이 피부염 여부에 따른 상관성 차이를 나타냈다. 결론 본 연구에서 제시한 방법인 FMGM은 기존 방법에 비해 2개 집단의 혼합형 자료에서 네트워크를 추론할 때 더 좋은 성능을 나타냈다. 사전 정보를 piFMGM을 통해 포함시킬 경우 네트워크 추론의 정확성이 향상되나, 그 차이가 크지 않아 추후 연구에서 이를 발전시키기 위한 방법이 필요할 것으로 보인다. 다중 오믹스 자료의 네트워크 추론 분석을 통해 장내 미생물의 카로티노이드 생합성 또는 RNA 분해 관련 유전자 등 아토피성 피부염과 관련된 생물학적 마커를 복수 발견하였으며, 이는 아토피성 피부염의 기저에 산화 스트레스 또는 미생물 RNA 조절 등이 관련될 수 있음을 제시한다.Chapter 1. Introduction 1 1.1 Study Background 1 1.2 Prior Works 2 1.3 Purpose of Research 5 Chapter 2. Network Inference of 2-class Mixed Data 6 2.1 Introduction 6 2.2 Notations 8 2.3 Model Formulation 8 2.4 Optimization with Fast Proximal Gradient Method 12 2.5 Code Implementation 20 2.6 Simulated Data Analysis 20 2.7 Real Data Analysis: DNA Methylation Data 23 2.8 Discussion 26 Chapter 3. Integration of Prior Information for Network Inference 28 3.1 Introduction 29 3.2 Use of Separate Parameter for Prior Information 29 3.3 Determination of Regularization Parameters 30 3.4 Simulated Data Analysis 33 3.5 Real Data Analysis: Multi-Omics Data from Asthma Patients 35 3.6 Discussion 38 Chapter 4. Multi-Omics Data Analysis of Atopic Dermatitis (AD) 39 4.1 Background 39 4.2 Data Description 40 4.3 Statistical Analysis 43 4.4 Results 43 4.5 Discussion 45 Chapter 5. Conclusion 47 Appendix 49 Bibliography 53 Abstract in Korean 59박

    Joint optimization of location and inventory decisions for improving supply chain cost performance

    Get PDF
    This dissertation is focused on investigating the integration of inventory and facility location decisions in different supply chain settings. Facility location and inventory decisions are interdependent due to the economies of scale that are inherent in transportation and replenishment costs. The facility location decisions have an impact on the transportation and replenishment costs which, in turn, affect the optimal inventory policy. On the other hand, the inventory policy dictates the frequency of shipments to replenish inventory which, in turn, affects the number of deliveries, and, hence, the transportation costs, between the facilities. Therefore, our main research objectives are to: • compare the optimal facility location, determined by minimizing total transportation costs, to the one determined by the models that also consider the timing and quantity of inventory replenishments and corresponding costs, • investigate the effect of facility location decisions on optimal inventory decisions, and • measure the impact of integrated decision-making on overall supply chain cost performance. Placing a special emphasis on the explicit modeling of transportation costs, we develop several novel models in mixed integer linear and nonlinear optimization programming. Based on how the underlying facility location problem is modeled, these models fall into two main groups: 1) continuous facility location problems, and 2) discrete facility location problems. For the stylistic models, the focus is on the development of analytical solutions. For the more general models, the focus is on the development of efficient algorithms. Our results demonstrate • the impact of explicit transportation costs on integrated decisions, • the impact of different transportation cost functions on integrated decisions in the context of continuous facility location problems of interest, • the value of integrated decision-making in different supply chain settings, and • the performance of solution methods that jointly optimize facility location and inventory decisions

    "Rotterdam econometrics": publications of the econometric institute 1956-2005

    Get PDF
    This paper contains a list of all publications over the period 1956-2005, as reported in the Rotterdam Econometric Institute Reprint series during 1957-2005.

    "Rotterdam econometrics": publications of the econometric institute 1956-2005

    Get PDF
    This paper contains a list of all publications over the period 1956-2005, as reported in the Rotterdam Econometric Institute Reprint series during 1957-2005

    CONTRIBUTIONS IN CLASSIFICATION: VISUAL PRUNING FOR DECISION TREES, P-SPLINE BASED CLUSTERING OF CORRELATED SERIES, BOOSTED-ORIENTED PROBABILISTIC CLUSTERING OF SERIES.

    Get PDF
    This work consists of three papers written during my Ph.D. period. The thesis consists of five chapters. In chapter 2 the basic building blocks of our works are introduced. In particular we briefly recall the concepts of classification (supervised and unsupervised) and penalized spline. In chapter 3 we present a paper whose idea was presented at Cladag 2013 Symposium. Within the framework of recursive partitioning algorithms by tree-based methods, this paper provides a contribution on both the visual representation of the data partition in a geometrical space and the selection of the decision tree. In our visual approach the identification of both the best tree and of weakest links is immediately evaluable by the graphical analysis of the tree structure without considering the pruning sequence. The results in terms of error rate are really similar to the ones returned by the Classification And Regression Trees procedure, showing how this new way to select the best tree is a valid alternative to the well known cost-complexity pruning In chapter 4 we present a paper on parsimonious clustering of correlated series. Clustering of time series has become an important topic, motivated by the increased interest in these type of data. Most of the time, these procedures do not facilitate the removal of noise from data, have difficulties handling time series with unequal length and require a preprocessing step of the data considered, i.e. by modeling each series with an appropriate model for time series. In this work we propose a new clustering data (time) series way, which can be considered as belonging to both model-based and feature-based approach. Our method consists of since we model each series by penalized spline (P-spline) smoothers and performing clustering directly on spline coefficients. Using the P-spline smoothers the signal of series is separated from the noise, capturing the different shapes of series. The P-spline coefficients are close to the fitted curve and present the skeleton of the fit. Thus, summarizing each series by coefficients reduces the dimensionality of the problem, improving significantly computation time without reduction in performance of clustering procedure. To select the smoothing parameter we adopt a V-curve procedure. This criterion does not require the computation of the effective model dimension and it is insensitive to serial correlation in the noise around the trend. Using the P-spline smoothers, moments of the original data are conserved. This implies that mean and variance of the estimated series are equal to those of the raw series. This consideration allows to use a similar approach in dealing with series of different length. The performance is evaluated analyzing a simulated data set,also considering series with different length. An application of our proposal on financial time series is also performed. In Chapter 5 we present a paper that proposes a fuzzy clustering algorithm that is independent from the choice of the fuzzifier. It comes from two approaches, theoretically motivated for respectively unsupervised and supervised classification cases. The first is the Probabilistic Distance (PD) clustering procedure. The second is the well known Boosting philosophy. From the PD approach we took the idea of determining the probabilities of each series to any of the k clusters. As this probability is unequivocally related to the distance of each series from the cluster centers, there are no degrees of freedom in determine the membership matrix. From the Boosting approach we took the idea of weighting each series according some measure of badness of fit in order to define an unsupervised learning process based on a weighted re-sampling procedure. Our idea is to adapt the boosting philosophy to unsupervised learning problems, specially to non hierarchical cluster analysis. In such a case there not exists a target variable, but as the goal is to assign each instance (i.e. a series) of a data set to a cluster, we have a target instance. The representative instance of a given cluster (i.e. the center of a cluster) can be assumed as a target instance, a loss function to be minimized can be assumed as a synthetic index of the global performance, the probability of each series to belong to a given cluster can be assumed as the individual contribution of a given instance to the overall solution. In contrast to the boosting approach, the higher is the probability of a given series to be member of a given cluster, the higher is the weight of that instance in the re-sampling process. As a learner we use a P-spline smoother. To define the probabilities of each series to belong to a given cluster we use the PD clustering approach. This approach allows us to define a suitable loss function and, at the same time, to propose a fuzzy clustering procedure that does not depend on the definition of a fuzzifier parameter. The global performance of the proposed method is investigated by three experiments (one of them on simulated data and the remaining two on data sets known in literature) evaluated by using a fuzzy variant of the Rand Index. Chapter 6 concludes the thesis

    Advances in multidimensional unfolding

    Get PDF
    Meerdimensionale ontvouwing is een analyse techniek die afbeeldingen maakt van twee sets van objecten, bijvoorbeeld van personen en producten, gebaseerd op de voorkeuren van de personen voor die producten. De afstanden tussen de personen en de producten in de afbeelding dienen zo goed mogelijk te corresponderen met deze voorkeuren en wel zo dat een kleine afstand overeenkomt met een grote voorkeur, terwijl een grote afstand correspondeert met een geringe voorkeur. Ontvouwing heeft echter sinds zijn conceptie in de jaren zestig te maken met het zogenaamde degeneratieprobleem, waardoor de oplossingen perfect zijn in termen van de verliesfunctie (de afstanden geven de voorkeuren perfect weer), maar die volstrekt onbruikbaar zijn in termen van interpretatie (de perfecte weergave is nietszeggend). In dit proefschrift worden twee mogelijke oplossingen aangedragen voor het degeneratieprobleem. De meest algemene oplossing gebruikt een penaltyfunctie, die straft indien de oplossing dreigt te degenereren. Het algoritme is gebruikt voor de implementatie van PREFSCAL, het ontvouwingsprogramma van IBM SPSS STATISTICS. Met de controle over het degeneratieprobleem is de weg vrij gemaakt om het ontvouwingsmodel verder te ontwikkelen: extra, verklarende variabelen kunnen worden toegevoegd voor interpretatie en het doen van voorspellingen. De mate waarin gegevens mogen ontbreken zonder een doorslaggevende invloed te hebben op de eindoplossing, de afbeelding, is ook uitgebreid onderzocht.LEI Universiteit LeidenMultivariate analysis of psychological data - ou
    corecore