4,535 research outputs found
The Design of Pre-Processing Multidimensional Data Based on Component Analysis
Increased implementation of new databases related to multidimensional data involving techniques to support
efficient query process, create opportunities for more extensive research. Pre-processing is required because of
lack of data attribute values, noisy data, errors, inconsistencies or outliers and differences in coding. Several types of pre-processing based on component analysis will be carried out for cleaning, data integration and transformation,
as well as to reduce the dimensions. Component analysis can be done by statistical methods, with the aim to separate the various sources of data into a statistical pattern independent. This paper aims to improve the quality of pre-processed data based on component analysis. RapidMiner is used for data pre-processing using FastICA algorithm. Kernel K-mean is used to cluster the pre-processed data and Expectation Maximization (EM) is used to model. The model was tested using wisconsin breast cancer datasets, lung cancer datasets and prostate cancer datasets. The result shows that the performance of the cluster vector value is higher and the processing time is shorter
A variational approach to linear control structure problems
Imperial Users onl
Models of consumer shopping behaviour in urban areas
SIGLEAvailable from British Library Lending Division - LD:D65637/86 / BLDSC - British Library Document Supply CentreGBUnited Kingdo
New approaches in statistical network data analysis
This cumulative dissertation is dedicated to the statistical analysis of network data. The general approach of combining network science with statistical methodology became very popular in recent years. An important reason for this development lies in the ability of statistical network data analysis to provide a means to model and quantify interdependencies of complex systems.
A network can be comprehended as a structure consisting of nodes and edges. The nodes represent general entities that are related via the edges. Depending on the research question at hand, it is either of interest to analyze the dependence structure among the nodes or the distribution of the edges given the nodes. This thesis consists of six contributed manuscripts that are concerned with the latter. Based on statistical models, edges in different dynamic and weighted networks are investigated or reconstructed.
To put the contributing articles in a general context, the thesis starts with an introductory chapter. In this introduction, central concepts and models from statistical network data analysis are explained. Besides giving an overview of the available methodology, the advantages and drawbacks of the models are given, supplemented with a discussion of potential extensions and modifications.
Content-wise it is possible to divide the articles into two projects. One project is focused on the statistical analysis of international arms trade networks. Two articles are devoted to the global exchange of major conventional weapons with a focus on the dynamic structure of the system and the volume traded. A third article explores latent patterns in the international trade system of small arms and ammunition. Additionally, the arms trade data is used in a survey paper that is concerned with dynamic network models. The second project regards the reconstruction of financial networks from their marginals and includes two articles.
All contributing articles are attached in the form as published as a preprint. For publications in scientific journals, the respective sources are given. Additionally, the contributions of all authors are included. All computations were done with the statistical software R and the corresponding code is available from Github.Diese kumulative Dissertation beschĂ€ftigt sich mit der statistischen Analyse von Netzwerkdaten. Der generelle Ansatz, interdependente Systeme als Netzwerke zu konzeptualisieren um sie anschlieĂend mit statistischer Methodik zu analysieren, hat in den vergangenen Jahren deutlich an Relevanz gewonnen. Insbesondere die FlexibilitĂ€t der Methodik, zusammen mit der Möglichkeit komplexe AbhĂ€ngigkeitsstrukturen zu modellieren, hat zu ihrer PopularitĂ€t beigetragen.
Ein Netzwerk ist ein System, das sich aus Knoten und Kanten zusammensetzt. Dabei sind die Knoten generelle Einheiten, die durch die Kanten miteinander in Verbindung gebracht werden. Je nach Forschungsfrage interessieren entweder die AbhÀngigkeiten zwischen den Knoten oder die Verteilung der Kanten mit gegebenen Knoten. Diese Arbeit greift mit insgesamt sechs Artikeln den zweiten Ansatz auf. Unter Zuhilfenahme von statistischen Modellen werden die Kanten in verschiedenen binÀren und gewichteten Netzwerken analysiert, beziehungsweise rekonstruiert.
Um der Arbeit einen generellen Kontext zu geben, wird den angehÀngten Artikeln ein Mantelteil vorangestellt. In diesem wird auf zentrale Konzepte und Modelle der statistischen Netzwerkanalyse eingegangen. Dabei werden die Vorteile als auch die Nachteile der Modelle diskutiert und potenzielle Erweiterungen und Modifikationen beschrieben.
Die in dieser Dissertation enthaltenen Artikel lassen sich grob in zwei verschiedene Projekte einordnen. In einem Projekt steht die statistische Modellierung des internationalen Waffenhandels im Fokus. Zwei Artikel untersuchen den globalen Austausch von GroĂwaffen (Major Conventional Weapons), dabei wird sowohl die dynamische Struktur als auch das gehandelte Waffenvolumen analysiert. Ein weiterer Artikel widmet sich den latenten Strukturen im internationalen Kleinwaffenhandel (Small Arms and Ammunition). Weiterhin werden die Waffenhandelsdaten in einem Ăbersichtsartikel, der sich mit dynamischen Netzwerkmodellen beschĂ€ftigt, verwendet. Das zweite Projekt befasst sich, verteilt ĂŒber zwei Artikel, mit der Rekonstruktion von finanziellen Netzwerken basierend auf den Randsummen von Netzwerkmatrizen.
Alle in dieser Dissertation angehĂ€ngten Artikel befinden sich in der Form, in der sie als Vorabversion veröffentlicht wurden. Bei Veröffentlichungen in Fachjournalen wird die jeweilige Quelle angegeben. Zudem wird vor jedem Artikel der Beitrag des jeweiligen Autors angegeben. SĂ€mtliche Analysen wurden mit der statistischen Software R durchgefĂŒhrt. Der dazugehörige Code ist ĂŒber Github verfĂŒgbar
Recommended from our members
The Graphical Representation of Structured Multivariate Data
During the past two decades or so, graphical representations have been used increasingly for the examination, summarisation and communication of statistical data. Many graphical techniques exist for exploratory data analysis (ie. for deciding which model it is appropriate to fit to the data) and a number of graphical diagnostic techniques exist for checking the appropriateness of a fitted model. However, very few techniques exist for the representation of the fitted model itself. This thesis is concerned with the development of some new and existing graphical representation techniques for the communication and interpretation of fitted statistical models.
The first part of this thesis takes the form of a general overview of the use in statistics of graphical representations for exploratory data analysis and diagnostic model checking. In relation to the concern of this thesis, particular consideration is given to the few graphical techniques which already exist for the representation of fitted models. A number of novel two-dimensional approaches are then proposed which go partway towards providing a graphical representation of the main effects and interaction terms for fitted models. This leads on to a description of conditional independence graphs, and consideration of the suitability of conditional independence graphs as a technique for the representation of fitted models. Conditional independence graphs are then developed further in accordance with the research aims.
Since it becomes apparent that it is not possible to use any of the approaches taken m order to develop a simple two-dimensional pen-and-paper technique for the unambiguous graphical representation of all fitted statistical models, an interactive computer package based on the conditional independence graph approach is developed for the construction, communication and interpretation of graphical representations for fitted statistical models. This package, called the "Conditional Independence Graph Enhancer" (CIGE), does provide unambiguous graphical representations for all fitted statistical models considered
Constraining the metallicities, ages, star formation histories, and ionizing continua of extragalactic massive star populations
We infer the properties of massive star populations using the far-ultraviolet
stellar continua of 61 star-forming galaxies: 42 at low-z observed with HST and
19 at z~2 from the Megasaura sample. We fit each stellar continuum with a
linear combination of up to 50 single age and single metallicity Starburst99
models. From these fits, we derive light-weighted ages and metallicities, which
agree with stellar wind and photospheric spectral features, and infer the
spectral shapes and strengths of the ionizing continua. Inferred light-weighted
stellar metallicities span 0.05-1.5 Z and are similar to the measured
nebular metallicities. We quantify the ionizing continua using the ratio of the
ionizing flux at 900\AA\ to the non-ionizing flux at 1500\AA\ and demonstrate
the evolution of this ratio with stellar age and metallicity using theoretical
single burst models. These single burst models only match the inferred ionizing
continua of half of the sample, while the other half are described by a mixture
of stellar ages. Mixed age populations produce stronger and harder ionizing
spectra than continuous star formation histories, but, contrary to previous
studies that assume constant star formation, have similar stellar and nebular
metallicities. Stellar population age and metallicity affect the far-UV
continua in different and distinguishable ways; assuming a constant star
formation history diminishes the diagnostic power. Finally, we provide simple
prescriptions to determine the ionizing photon production efficiency
() from the stellar population properties. has a range
of log( Hz erg that depends on stellar age,
metallicity, star formation history, and contributions from binary star
evolution. These stellar population properties must be observationally
determined to determine the number of ionizing photons generated by massive
stars.Comment: 31 pages, 23 figures, resubmitted to ApJ after incorporating the
referee's comments. Comments encourage
High-Frequency Principal Components and Evolution of Liquidity in a Limit Order Market
The paper applies a popular methodology of competing risks to the analysis of the timing and interaction between the Deutsche Mark/U.S. dollar transactions, quotes, and cancellations in the Reuters D2000-2 electronic brokerage system. Consistently with previous stock market studies, the bid-ask spread and market depth at the best bid and ask quotes are found to be major determinants of limit order market dynamics at ultra-high frequencies. Consistently with the microstructure approach to exchange rate determination, the signed transaction activity appears to be the main factor behind the limit order market dynamics at lower frequencies. Application of principal component analysis to the covariate indices of competing risks identifies five pervasive factors that capture 85% of the Reuters D2000-2 limit order book activity. The multifactor competing risks model substantially improves the quality of short-term probability forecasts for buyer- and seller initiated transactions, relative to popular moving average-type forecasting rulesforeign exchange, limit order, market order, order flow, liquidity, competing risks, principal component, probability forecast
New approaches in statistical network data analysis
This cumulative dissertation is dedicated to the statistical analysis of network data. The general approach of combining network science with statistical methodology became very popular in recent years. An important reason for this development lies in the ability of statistical network data analysis to provide a means to model and quantify interdependencies of complex systems.
A network can be comprehended as a structure consisting of nodes and edges. The nodes represent general entities that are related via the edges. Depending on the research question at hand, it is either of interest to analyze the dependence structure among the nodes or the distribution of the edges given the nodes. This thesis consists of six contributed manuscripts that are concerned with the latter. Based on statistical models, edges in different dynamic and weighted networks are investigated or reconstructed.
To put the contributing articles in a general context, the thesis starts with an introductory chapter. In this introduction, central concepts and models from statistical network data analysis are explained. Besides giving an overview of the available methodology, the advantages and drawbacks of the models are given, supplemented with a discussion of potential extensions and modifications.
Content-wise it is possible to divide the articles into two projects. One project is focused on the statistical analysis of international arms trade networks. Two articles are devoted to the global exchange of major conventional weapons with a focus on the dynamic structure of the system and the volume traded. A third article explores latent patterns in the international trade system of small arms and ammunition. Additionally, the arms trade data is used in a survey paper that is concerned with dynamic network models. The second project regards the reconstruction of financial networks from their marginals and includes two articles.
All contributing articles are attached in the form as published as a preprint. For publications in scientific journals, the respective sources are given. Additionally, the contributions of all authors are included. All computations were done with the statistical software R and the corresponding code is available from Github.Diese kumulative Dissertation beschĂ€ftigt sich mit der statistischen Analyse von Netzwerkdaten. Der generelle Ansatz, interdependente Systeme als Netzwerke zu konzeptualisieren um sie anschlieĂend mit statistischer Methodik zu analysieren, hat in den vergangenen Jahren deutlich an Relevanz gewonnen. Insbesondere die FlexibilitĂ€t der Methodik, zusammen mit der Möglichkeit komplexe AbhĂ€ngigkeitsstrukturen zu modellieren, hat zu ihrer PopularitĂ€t beigetragen.
Ein Netzwerk ist ein System, das sich aus Knoten und Kanten zusammensetzt. Dabei sind die Knoten generelle Einheiten, die durch die Kanten miteinander in Verbindung gebracht werden. Je nach Forschungsfrage interessieren entweder die AbhÀngigkeiten zwischen den Knoten oder die Verteilung der Kanten mit gegebenen Knoten. Diese Arbeit greift mit insgesamt sechs Artikeln den zweiten Ansatz auf. Unter Zuhilfenahme von statistischen Modellen werden die Kanten in verschiedenen binÀren und gewichteten Netzwerken analysiert, beziehungsweise rekonstruiert.
Um der Arbeit einen generellen Kontext zu geben, wird den angehÀngten Artikeln ein Mantelteil vorangestellt. In diesem wird auf zentrale Konzepte und Modelle der statistischen Netzwerkanalyse eingegangen. Dabei werden die Vorteile als auch die Nachteile der Modelle diskutiert und potenzielle Erweiterungen und Modifikationen beschrieben.
Die in dieser Dissertation enthaltenen Artikel lassen sich grob in zwei verschiedene Projekte einordnen. In einem Projekt steht die statistische Modellierung des internationalen Waffenhandels im Fokus. Zwei Artikel untersuchen den globalen Austausch von GroĂwaffen (Major Conventional Weapons), dabei wird sowohl die dynamische Struktur als auch das gehandelte Waffenvolumen analysiert. Ein weiterer Artikel widmet sich den latenten Strukturen im internationalen Kleinwaffenhandel (Small Arms and Ammunition). Weiterhin werden die Waffenhandelsdaten in einem Ăbersichtsartikel, der sich mit dynamischen Netzwerkmodellen beschĂ€ftigt, verwendet. Das zweite Projekt befasst sich, verteilt ĂŒber zwei Artikel, mit der Rekonstruktion von finanziellen Netzwerken basierend auf den Randsummen von Netzwerkmatrizen.
Alle in dieser Dissertation angehĂ€ngten Artikel befinden sich in der Form, in der sie als Vorabversion veröffentlicht wurden. Bei Veröffentlichungen in Fachjournalen wird die jeweilige Quelle angegeben. Zudem wird vor jedem Artikel der Beitrag des jeweiligen Autors angegeben. SĂ€mtliche Analysen wurden mit der statistischen Software R durchgefĂŒhrt. Der dazugehörige Code ist ĂŒber Github verfĂŒgbar
Twenty years of P-splines
P-splines first appeared in the limelight twenty years ago. Since then they have become popular in applications and in theoretical work. The combination of a rich B-spline basis and a simple difference penalty lends itself well to a variety of generalizations, because it is based on regression. In effect, P-splines allow the building of a âbackboneâ for the âmixing and matchingâ of a variety of additive smooth structure components, while inviting all sorts of extensions: varying-coefficient effects, signal (functional) regressors, two-dimensional surfaces, non-normal responses, quantile (expectile) modelling, among others. Strong connections with mixed models and Bayesian analysis have been established. We give an overview of many of the central developments during the first two decades of P-splines.Peer Reviewe
Twenty years of P-splines
P-splines first appeared in the limelight twenty years ago. Since then they have become popular in applications and in theoretical work. The combination of a rich B-spline basis and a simple difference penalty lends itself well to a variety of generalizations, because it is based on regression. In effect, P-splines allow the building of a âbackboneâ for the âmixing and matchingâ of a variety of additive smooth structure components, while inviting all sorts of extensions: varying-coefficient effects, signal (functional) regressors, two-dimensional surfaces, non-normal responses, quantile (expectile) modelling, among others. Strong connections with mixed models and Bayesian analysis have been established. We give an overview of many of the central developments during the first two decades of P-splines
- âŠ