4,535 research outputs found

    The Design of Pre-Processing Multidimensional Data Based on Component Analysis

    Get PDF
    Increased implementation of new databases related to multidimensional data involving techniques to support efficient query process, create opportunities for more extensive research. Pre-processing is required because of lack of data attribute values, noisy data, errors, inconsistencies or outliers and differences in coding. Several types of pre-processing based on component analysis will be carried out for cleaning, data integration and transformation, as well as to reduce the dimensions. Component analysis can be done by statistical methods, with the aim to separate the various sources of data into a statistical pattern independent. This paper aims to improve the quality of pre-processed data based on component analysis. RapidMiner is used for data pre-processing using FastICA algorithm. Kernel K-mean is used to cluster the pre-processed data and Expectation Maximization (EM) is used to model. The model was tested using wisconsin breast cancer datasets, lung cancer datasets and prostate cancer datasets. The result shows that the performance of the cluster vector value is higher and the processing time is shorter

    A variational approach to linear control structure problems

    Get PDF
    Imperial Users onl

    Models of consumer shopping behaviour in urban areas

    Get PDF
    SIGLEAvailable from British Library Lending Division - LD:D65637/86 / BLDSC - British Library Document Supply CentreGBUnited Kingdo

    New approaches in statistical network data analysis

    Get PDF
    This cumulative dissertation is dedicated to the statistical analysis of network data. The general approach of combining network science with statistical methodology became very popular in recent years. An important reason for this development lies in the ability of statistical network data analysis to provide a means to model and quantify interdependencies of complex systems. A network can be comprehended as a structure consisting of nodes and edges. The nodes represent general entities that are related via the edges. Depending on the research question at hand, it is either of interest to analyze the dependence structure among the nodes or the distribution of the edges given the nodes. This thesis consists of six contributed manuscripts that are concerned with the latter. Based on statistical models, edges in different dynamic and weighted networks are investigated or reconstructed. To put the contributing articles in a general context, the thesis starts with an introductory chapter. In this introduction, central concepts and models from statistical network data analysis are explained. Besides giving an overview of the available methodology, the advantages and drawbacks of the models are given, supplemented with a discussion of potential extensions and modifications. Content-wise it is possible to divide the articles into two projects. One project is focused on the statistical analysis of international arms trade networks. Two articles are devoted to the global exchange of major conventional weapons with a focus on the dynamic structure of the system and the volume traded. A third article explores latent patterns in the international trade system of small arms and ammunition. Additionally, the arms trade data is used in a survey paper that is concerned with dynamic network models. The second project regards the reconstruction of financial networks from their marginals and includes two articles. All contributing articles are attached in the form as published as a preprint. For publications in scientific journals, the respective sources are given. Additionally, the contributions of all authors are included. All computations were done with the statistical software R and the corresponding code is available from Github.Diese kumulative Dissertation beschĂ€ftigt sich mit der statistischen Analyse von Netzwerkdaten. Der generelle Ansatz, interdependente Systeme als Netzwerke zu konzeptualisieren um sie anschließend mit statistischer Methodik zu analysieren, hat in den vergangenen Jahren deutlich an Relevanz gewonnen. Insbesondere die FlexibilitĂ€t der Methodik, zusammen mit der Möglichkeit komplexe AbhĂ€ngigkeitsstrukturen zu modellieren, hat zu ihrer PopularitĂ€t beigetragen. Ein Netzwerk ist ein System, das sich aus Knoten und Kanten zusammensetzt. Dabei sind die Knoten generelle Einheiten, die durch die Kanten miteinander in Verbindung gebracht werden. Je nach Forschungsfrage interessieren entweder die AbhĂ€ngigkeiten zwischen den Knoten oder die Verteilung der Kanten mit gegebenen Knoten. Diese Arbeit greift mit insgesamt sechs Artikeln den zweiten Ansatz auf. Unter Zuhilfenahme von statistischen Modellen werden die Kanten in verschiedenen binĂ€ren und gewichteten Netzwerken analysiert, beziehungsweise rekonstruiert. Um der Arbeit einen generellen Kontext zu geben, wird den angehĂ€ngten Artikeln ein Mantelteil vorangestellt. In diesem wird auf zentrale Konzepte und Modelle der statistischen Netzwerkanalyse eingegangen. Dabei werden die Vorteile als auch die Nachteile der Modelle diskutiert und potenzielle Erweiterungen und Modifikationen beschrieben. Die in dieser Dissertation enthaltenen Artikel lassen sich grob in zwei verschiedene Projekte einordnen. In einem Projekt steht die statistische Modellierung des internationalen Waffenhandels im Fokus. Zwei Artikel untersuchen den globalen Austausch von Großwaffen (Major Conventional Weapons), dabei wird sowohl die dynamische Struktur als auch das gehandelte Waffenvolumen analysiert. Ein weiterer Artikel widmet sich den latenten Strukturen im internationalen Kleinwaffenhandel (Small Arms and Ammunition). Weiterhin werden die Waffenhandelsdaten in einem Übersichtsartikel, der sich mit dynamischen Netzwerkmodellen beschĂ€ftigt, verwendet. Das zweite Projekt befasst sich, verteilt ĂŒber zwei Artikel, mit der Rekonstruktion von finanziellen Netzwerken basierend auf den Randsummen von Netzwerkmatrizen. Alle in dieser Dissertation angehĂ€ngten Artikel befinden sich in der Form, in der sie als Vorabversion veröffentlicht wurden. Bei Veröffentlichungen in Fachjournalen wird die jeweilige Quelle angegeben. Zudem wird vor jedem Artikel der Beitrag des jeweiligen Autors angegeben. SĂ€mtliche Analysen wurden mit der statistischen Software R durchgefĂŒhrt. Der dazugehörige Code ist ĂŒber Github verfĂŒgbar

    Constraining the metallicities, ages, star formation histories, and ionizing continua of extragalactic massive star populations

    Full text link
    We infer the properties of massive star populations using the far-ultraviolet stellar continua of 61 star-forming galaxies: 42 at low-z observed with HST and 19 at z~2 from the Megasaura sample. We fit each stellar continuum with a linear combination of up to 50 single age and single metallicity Starburst99 models. From these fits, we derive light-weighted ages and metallicities, which agree with stellar wind and photospheric spectral features, and infer the spectral shapes and strengths of the ionizing continua. Inferred light-weighted stellar metallicities span 0.05-1.5 Z⊙_\odot and are similar to the measured nebular metallicities. We quantify the ionizing continua using the ratio of the ionizing flux at 900\AA\ to the non-ionizing flux at 1500\AA\ and demonstrate the evolution of this ratio with stellar age and metallicity using theoretical single burst models. These single burst models only match the inferred ionizing continua of half of the sample, while the other half are described by a mixture of stellar ages. Mixed age populations produce stronger and harder ionizing spectra than continuous star formation histories, but, contrary to previous studies that assume constant star formation, have similar stellar and nebular metallicities. Stellar population age and metallicity affect the far-UV continua in different and distinguishable ways; assuming a constant star formation history diminishes the diagnostic power. Finally, we provide simple prescriptions to determine the ionizing photon production efficiency (Οion\xi_{ion}) from the stellar population properties. Οion\xi_{ion} has a range of log(Οion)=24.4−25.7\xi_{ion})=24.4-25.7 Hz erg−1^{-1} that depends on stellar age, metallicity, star formation history, and contributions from binary star evolution. These stellar population properties must be observationally determined to determine the number of ionizing photons generated by massive stars.Comment: 31 pages, 23 figures, resubmitted to ApJ after incorporating the referee's comments. Comments encourage

    High-Frequency Principal Components and Evolution of Liquidity in a Limit Order Market

    Get PDF
    The paper applies a popular methodology of competing risks to the analysis of the timing and interaction between the Deutsche Mark/U.S. dollar transactions, quotes, and cancellations in the Reuters D2000-2 electronic brokerage system. Consistently with previous stock market studies, the bid-ask spread and market depth at the best bid and ask quotes are found to be major determinants of limit order market dynamics at ultra-high frequencies. Consistently with the microstructure approach to exchange rate determination, the signed transaction activity appears to be the main factor behind the limit order market dynamics at lower frequencies. Application of principal component analysis to the covariate indices of competing risks identifies five pervasive factors that capture 85% of the Reuters D2000-2 limit order book activity. The multifactor competing risks model substantially improves the quality of short-term probability forecasts for buyer- and seller initiated transactions, relative to popular moving average-type forecasting rulesforeign exchange, limit order, market order, order flow, liquidity, competing risks, principal component, probability forecast

    New approaches in statistical network data analysis

    Get PDF
    This cumulative dissertation is dedicated to the statistical analysis of network data. The general approach of combining network science with statistical methodology became very popular in recent years. An important reason for this development lies in the ability of statistical network data analysis to provide a means to model and quantify interdependencies of complex systems. A network can be comprehended as a structure consisting of nodes and edges. The nodes represent general entities that are related via the edges. Depending on the research question at hand, it is either of interest to analyze the dependence structure among the nodes or the distribution of the edges given the nodes. This thesis consists of six contributed manuscripts that are concerned with the latter. Based on statistical models, edges in different dynamic and weighted networks are investigated or reconstructed. To put the contributing articles in a general context, the thesis starts with an introductory chapter. In this introduction, central concepts and models from statistical network data analysis are explained. Besides giving an overview of the available methodology, the advantages and drawbacks of the models are given, supplemented with a discussion of potential extensions and modifications. Content-wise it is possible to divide the articles into two projects. One project is focused on the statistical analysis of international arms trade networks. Two articles are devoted to the global exchange of major conventional weapons with a focus on the dynamic structure of the system and the volume traded. A third article explores latent patterns in the international trade system of small arms and ammunition. Additionally, the arms trade data is used in a survey paper that is concerned with dynamic network models. The second project regards the reconstruction of financial networks from their marginals and includes two articles. All contributing articles are attached in the form as published as a preprint. For publications in scientific journals, the respective sources are given. Additionally, the contributions of all authors are included. All computations were done with the statistical software R and the corresponding code is available from Github.Diese kumulative Dissertation beschĂ€ftigt sich mit der statistischen Analyse von Netzwerkdaten. Der generelle Ansatz, interdependente Systeme als Netzwerke zu konzeptualisieren um sie anschließend mit statistischer Methodik zu analysieren, hat in den vergangenen Jahren deutlich an Relevanz gewonnen. Insbesondere die FlexibilitĂ€t der Methodik, zusammen mit der Möglichkeit komplexe AbhĂ€ngigkeitsstrukturen zu modellieren, hat zu ihrer PopularitĂ€t beigetragen. Ein Netzwerk ist ein System, das sich aus Knoten und Kanten zusammensetzt. Dabei sind die Knoten generelle Einheiten, die durch die Kanten miteinander in Verbindung gebracht werden. Je nach Forschungsfrage interessieren entweder die AbhĂ€ngigkeiten zwischen den Knoten oder die Verteilung der Kanten mit gegebenen Knoten. Diese Arbeit greift mit insgesamt sechs Artikeln den zweiten Ansatz auf. Unter Zuhilfenahme von statistischen Modellen werden die Kanten in verschiedenen binĂ€ren und gewichteten Netzwerken analysiert, beziehungsweise rekonstruiert. Um der Arbeit einen generellen Kontext zu geben, wird den angehĂ€ngten Artikeln ein Mantelteil vorangestellt. In diesem wird auf zentrale Konzepte und Modelle der statistischen Netzwerkanalyse eingegangen. Dabei werden die Vorteile als auch die Nachteile der Modelle diskutiert und potenzielle Erweiterungen und Modifikationen beschrieben. Die in dieser Dissertation enthaltenen Artikel lassen sich grob in zwei verschiedene Projekte einordnen. In einem Projekt steht die statistische Modellierung des internationalen Waffenhandels im Fokus. Zwei Artikel untersuchen den globalen Austausch von Großwaffen (Major Conventional Weapons), dabei wird sowohl die dynamische Struktur als auch das gehandelte Waffenvolumen analysiert. Ein weiterer Artikel widmet sich den latenten Strukturen im internationalen Kleinwaffenhandel (Small Arms and Ammunition). Weiterhin werden die Waffenhandelsdaten in einem Übersichtsartikel, der sich mit dynamischen Netzwerkmodellen beschĂ€ftigt, verwendet. Das zweite Projekt befasst sich, verteilt ĂŒber zwei Artikel, mit der Rekonstruktion von finanziellen Netzwerken basierend auf den Randsummen von Netzwerkmatrizen. Alle in dieser Dissertation angehĂ€ngten Artikel befinden sich in der Form, in der sie als Vorabversion veröffentlicht wurden. Bei Veröffentlichungen in Fachjournalen wird die jeweilige Quelle angegeben. Zudem wird vor jedem Artikel der Beitrag des jeweiligen Autors angegeben. SĂ€mtliche Analysen wurden mit der statistischen Software R durchgefĂŒhrt. Der dazugehörige Code ist ĂŒber Github verfĂŒgbar

    Twenty years of P-splines

    Get PDF
    P-splines first appeared in the limelight twenty years ago. Since then they have become popular in applications and in theoretical work. The combination of a rich B-spline basis and a simple difference penalty lends itself well to a variety of generalizations, because it is based on regression. In effect, P-splines allow the building of a “backbone” for the “mixing and matching” of a variety of additive smooth structure components, while inviting all sorts of extensions: varying-coefficient effects, signal (functional) regressors, two-dimensional surfaces, non-normal responses, quantile (expectile) modelling, among others. Strong connections with mixed models and Bayesian analysis have been established. We give an overview of many of the central developments during the first two decades of P-splines.Peer Reviewe

    Twenty years of P-splines

    Get PDF
    P-splines first appeared in the limelight twenty years ago. Since then they have become popular in applications and in theoretical work. The combination of a rich B-spline basis and a simple difference penalty lends itself well to a variety of generalizations, because it is based on regression. In effect, P-splines allow the building of a “backbone” for the “mixing and matching” of a variety of additive smooth structure components, while inviting all sorts of extensions: varying-coefficient effects, signal (functional) regressors, two-dimensional surfaces, non-normal responses, quantile (expectile) modelling, among others. Strong connections with mixed models and Bayesian analysis have been established. We give an overview of many of the central developments during the first two decades of P-splines
    • 

    corecore