Search CORE

4,535 research outputs found

The Design of Pre-Processing Multidimensional Data Based on Component Analysis

Author: Jasni Mohamad Zain
Rahmat Widia Sembiring
Publication venue: 'Canadian Center of Science and Education'
Publication date: 01/01/2011
Field of study

Increased implementation of new databases related to multidimensional data involving techniques to support efficient query process, create opportunities for more extensive research. Pre-processing is required because of lack of data attribute values, noisy data, errors, inconsistencies or outliers and differences in coding. Several types of pre-processing based on component analysis will be carried out for cleaning, data integration and transformation, as well as to reduce the dimensions. Component analysis can be done by statistical methods, with the aim to separate the various sources of data into a statistical pattern independent. This paper aims to improve the quality of pre-processed data based on component analysis. RapidMiner is used for data pre-processing using FastICA algorithm. Kernel K-mean is used to cluster the pre-processed data and Expectation Maximization (EM) is used to model. The model was tested using wisconsin breast cancer datasets, lung cancer datasets and prostate cancer datasets. The result shows that the performance of the cluster vector value is higher and the processing time is shorter

CiteSeerX

UMP Institutional Repository

A variational approach to linear control structure problems

Author: Johnson M. A
Johnson M. A
Publication venue: Department of Electrical Engineering, Imperial College London
Publication date: 01/01/1978
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

Models of consumer shopping behaviour in urban areas

Author: Uncles M. D
Publication venue
Publication date: 01/01/1985
Field of study

SIGLEAvailable from British Library Lending Division - LD:D65637/86 / BLDSC - British Library Document Supply CentreGBUnited Kingdo

OpenGrey Repository

Explore Bristol Research

New approaches in statistical network data analysis

Author: Lebacher Michael
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 15/11/2019
Field of study

This cumulative dissertation is dedicated to the statistical analysis of network data. The general approach of combining network science with statistical methodology became very popular in recent years. An important reason for this development lies in the ability of statistical network data analysis to provide a means to model and quantify interdependencies of complex systems. A network can be comprehended as a structure consisting of nodes and edges. The nodes represent general entities that are related via the edges. Depending on the research question at hand, it is either of interest to analyze the dependence structure among the nodes or the distribution of the edges given the nodes. This thesis consists of six contributed manuscripts that are concerned with the latter. Based on statistical models, edges in different dynamic and weighted networks are investigated or reconstructed. To put the contributing articles in a general context, the thesis starts with an introductory chapter. In this introduction, central concepts and models from statistical network data analysis are explained. Besides giving an overview of the available methodology, the advantages and drawbacks of the models are given, supplemented with a discussion of potential extensions and modifications. Content-wise it is possible to divide the articles into two projects. One project is focused on the statistical analysis of international arms trade networks. Two articles are devoted to the global exchange of major conventional weapons with a focus on the dynamic structure of the system and the volume traded. A third article explores latent patterns in the international trade system of small arms and ammunition. Additionally, the arms trade data is used in a survey paper that is concerned with dynamic network models. The second project regards the reconstruction of financial networks from their marginals and includes two articles. All contributing articles are attached in the form as published as a preprint. For publications in scientific journals, the respective sources are given. Additionally, the contributions of all authors are included. All computations were done with the statistical software R and the corresponding code is available from Github.Diese kumulative Dissertation beschäftigt sich mit der statistischen Analyse von Netzwerkdaten. Der generelle Ansatz, interdependente Systeme als Netzwerke zu konzeptualisieren um sie anschließend mit statistischer Methodik zu analysieren, hat in den vergangenen Jahren deutlich an Relevanz gewonnen. Insbesondere die Flexibilität der Methodik, zusammen mit der Möglichkeit komplexe Abhängigkeitsstrukturen zu modellieren, hat zu ihrer Popularität beigetragen. Ein Netzwerk ist ein System, das sich aus Knoten und Kanten zusammensetzt. Dabei sind die Knoten generelle Einheiten, die durch die Kanten miteinander in Verbindung gebracht werden. Je nach Forschungsfrage interessieren entweder die Abhängigkeiten zwischen den Knoten oder die Verteilung der Kanten mit gegebenen Knoten. Diese Arbeit greift mit insgesamt sechs Artikeln den zweiten Ansatz auf. Unter Zuhilfenahme von statistischen Modellen werden die Kanten in verschiedenen binären und gewichteten Netzwerken analysiert, beziehungsweise rekonstruiert. Um der Arbeit einen generellen Kontext zu geben, wird den angehängten Artikeln ein Mantelteil vorangestellt. In diesem wird auf zentrale Konzepte und Modelle der statistischen Netzwerkanalyse eingegangen. Dabei werden die Vorteile als auch die Nachteile der Modelle diskutiert und potenzielle Erweiterungen und Modifikationen beschrieben. Die in dieser Dissertation enthaltenen Artikel lassen sich grob in zwei verschiedene Projekte einordnen. In einem Projekt steht die statistische Modellierung des internationalen Waffenhandels im Fokus. Zwei Artikel untersuchen den globalen Austausch von Großwaffen (Major Conventional Weapons), dabei wird sowohl die dynamische Struktur als auch das gehandelte Waffenvolumen analysiert. Ein weiterer Artikel widmet sich den latenten Strukturen im internationalen Kleinwaffenhandel (Small Arms and Ammunition). Weiterhin werden die Waffenhandelsdaten in einem Übersichtsartikel, der sich mit dynamischen Netzwerkmodellen beschäftigt, verwendet. Das zweite Projekt befasst sich, verteilt über zwei Artikel, mit der Rekonstruktion von finanziellen Netzwerken basierend auf den Randsummen von Netzwerkmatrizen. Alle in dieser Dissertation angehängten Artikel befinden sich in der Form, in der sie als Vorabversion veröffentlicht wurden. Bei Veröffentlichungen in Fachjournalen wird die jeweilige Quelle angegeben. Zudem wird vor jedem Artikel der Beitrag des jeweiligen Autors angegeben. Sämtliche Analysen wurden mit der statistischen Software R durchgeführt. Der dazugehörige Code ist über Github verfügbar

Recommended from our members

The Graphical Representation of Structured Multivariate Data

Author: Cottee Michaela J.
Publication venue
Publication date: 07/01/1997
Field of study

During the past two decades or so, graphical representations have been used increasingly for the examination, summarisation and communication of statistical data. Many graphical techniques exist for exploratory data analysis (ie. for deciding which model it is appropriate to fit to the data) and a number of graphical diagnostic techniques exist for checking the appropriateness of a fitted model. However, very few techniques exist for the representation of the fitted model itself. This thesis is concerned with the development of some new and existing graphical representation techniques for the communication and interpretation of fitted statistical models. The first part of this thesis takes the form of a general overview of the use in statistics of graphical representations for exploratory data analysis and diagnostic model checking. In relation to the concern of this thesis, particular consideration is given to the few graphical techniques which already exist for the representation of fitted models. A number of novel two-dimensional approaches are then proposed which go partway towards providing a graphical representation of the main effects and interaction terms for fitted models. This leads on to a description of conditional independence graphs, and consideration of the suitability of conditional independence graphs as a technique for the representation of fitted models. Conditional independence graphs are then developed further in accordance with the research aims. Since it becomes apparent that it is not possible to use any of the approaches taken m order to develop a simple two-dimensional pen-and-paper technique for the unambiguous graphical representation of all fitted statistical models, an interactive computer package based on the conditional independence graph approach is developed for the construction, communication and interpretation of graphical representations for fitted statistical models. This package, called the "Conditional Independence Graph Enhancer" (CIGE), does provide unambiguous graphical representations for all fitted statistical models considered

Open Research Online (The Open University)

Constraining the metallicities, ages, star formation histories, and ionizing continua of extragalactic massive star populations

Author: Bayliss M.
Berg D. A.
Chisholm J.
Dahle H.
Gladders M.
Rigby J. R.
Sharon K.
Publication venue: 'American Astronomical Society'
Publication date: 01/01/2019
Field of study

We infer the properties of massive star populations using the far-ultraviolet stellar continua of 61 star-forming galaxies: 42 at low-z observed with HST and 19 at z~2 from the Megasaura sample. We fit each stellar continuum with a linear combination of up to 50 single age and single metallicity Starburst99 models. From these fits, we derive light-weighted ages and metallicities, which agree with stellar wind and photospheric spectral features, and infer the spectral shapes and strengths of the ionizing continua. Inferred light-weighted stellar metallicities span 0.05-1.5 Z

_\odot

and are similar to the measured nebular metallicities. We quantify the ionizing continua using the ratio of the ionizing flux at 900\AA\ to the non-ionizing flux at 1500\AA\ and demonstrate the evolution of this ratio with stellar age and metallicity using theoretical single burst models. These single burst models only match the inferred ionizing continua of half of the sample, while the other half are described by a mixture of stellar ages. Mixed age populations produce stronger and harder ionizing spectra than continuous star formation histories, but, contrary to previous studies that assume constant star formation, have similar stellar and nebular metallicities. Stellar population age and metallicity affect the far-UV continua in different and distinguishable ways; assuming a constant star formation history diminishes the diagnostic power. Finally, we provide simple prescriptions to determine the ionizing photon production efficiency (

\xi_{ion}

) from the stellar population properties.

\xi_{ion}

has a range of log(

\xi_{ion})=24.4-25.7

Hz erg

^{-1}

that depends on stellar age, metallicity, star formation history, and contributions from binary star evolution. These stellar population properties must be observationally determined to determine the number of ionizing photons generated by massive stars.Comment: 31 pages, 23 figures, resubmitted to ApJ after incorporating the referee's comments. Comments encourage

arXiv.org e-Print Archive

NORA - Norwegian Open Research Archives

High-Frequency Principal Components and Evolution of Liquidity in a Limit Order Market

Author: Konstantin Tyurin
Publication venue
Publication date
Field of study

The paper applies a popular methodology of competing risks to the analysis of the timing and interaction between the Deutsche Mark/U.S. dollar transactions, quotes, and cancellations in the Reuters D2000-2 electronic brokerage system. Consistently with previous stock market studies, the bid-ask spread and market depth at the best bid and ask quotes are found to be major determinants of limit order market dynamics at ultra-high frequencies. Consistently with the microstructure approach to exchange rate determination, the signed transaction activity appears to be the main factor behind the limit order market dynamics at lower frequencies. Application of principal component analysis to the covariate indices of competing risks identifies five pervasive factors that capture 85% of the Reuters D2000-2 limit order book activity. The multifactor competing risks model substantially improves the quality of short-term probability forecasts for buyer- and seller initiated transactions, relative to popular moving average-type forecasting rulesforeign exchange, limit order, market order, order flow, liquidity, competing risks, principal component, probability forecast

Research Papers in Economics

New approaches in statistical network data analysis

Author: Lebacher Michael
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 15/11/2019
Field of study

Digitale Hochschulschriften der LMU

Twenty years of P-splines

Author: Durbán Maria
Eilers Paul H.C.
Marx Brian D.
Publication venue: Institut d'Estadística de Catalunya
Publication date: 01/12/2015
Field of study

P-splines first appeared in the limelight twenty years ago. Since then they have become popular in applications and in theoretical work. The combination of a rich B-spline basis and a simple difference penalty lends itself well to a variety of generalizations, because it is based on regression. In effect, P-splines allow the building of a “backbone” for the “mixing and matching” of a variety of additive smooth structure components, while inviting all sorts of extensions: varying-coefficient effects, signal (functional) regressors, two-dimensional surfaces, non-normal responses, quantile (expectile) modelling, among others. Strong connections with mixed models and Bayesian analysis have been established. We give an overview of many of the central developments during the first two decades of P-splines.Peer Reviewe

UPCommons. Portal del coneixement obert de la UPC