5,509 research outputs found

    On the consistency of a spatial-type interval-valued median for random intervals

    Full text link
    The sample dΞd_\theta-median is a robust estimator of the central tendency or location of an interval-valued random variable. While the interval-valued sample mean can be highly influenced by outliers, this spatial-type interval-valued median remains much more reliable. In this paper, we show that under general conditions the sample dΞd_\theta-median is a strongly consistent estimator of the dΞd_\theta-median of an interval-valued random variable.Comment: 14 page

    Set-valued Data: Regression, Design and Outliers

    Get PDF
    The focus of this dissertation is to study set‐valued data from three aspects, namely regression, optimal design and outlier identification. This dissertation consists of three peer‐reviewed published articles, each of them addressing one aspect. Their titles and abstracts are listed below: 1. Local regression smoothers with set‐valued outcome data: This paper proposes a method to conduct local linear regression smoothing in the presence of set‐valued outcome data. The proposed estimator is shown to be consistent, and its mean squared error and asymptotic distribution are derived. A method to build error tubes around the estimator is provided, and a small Monte Carlo exercise is conducted to confirm the good finite sample properties of the estimator. The usefulness of the method is illustrated on a novel dataset from a clinical trial to assess the effect of certain genes’ expressions on different lung cancer treatments outcomes. 2. Optimal design for multivariate multiple linear regression with set‐identified response: We consider the partially identified regression model with set‐identified responses, where the estimator is the set of the least square estimators obtained for all possible choices of points sampled from set‐identified observations. We address the issue of determining the optimal design for this case and show that, for objective functions mimicking those for several classical optimal designs, their set‐identified analogues coincide with the optimal designs for point‐identified real‐valued responses. 3. Depth and outliers for samples of sets and random sets distributions: We suggest several constructions suitable to define the depth of set‐valued observations with respect to a sample of convex sets or with respect to the distribution of a random closed convex set. With the concept of a depth, it is possible to determine if a given convex set should be regarded an outlier with respect to a sample of convex closed sets. Some of our constructions are motivated by the known concepts of half‐space depth and band depth for function‐valued data. A novel construction derives the depth from a family of non‐linear expectations of random sets. Furthermore, we address the role of positions of sets for evaluation of their depth. Two case studies concern interval regression for Greek wine data and detection of outliers in a sample of particles

    A spatial-type interval-valued median for random intervals

    Get PDF
    © 2018 Informa UK Limited, trading as Taylor & Francis Group. To estimate the central tendency or location of a sample of interval-valued data, a standard statistic is the interval-valued sample mean. Its strong sensitivity to outliers or data changes motivates the search for more robust alternatives. In this respect, a more robust location statistic is studied in this paper. This measure is inspired by the concept of spatial median and makes use of the versatile generalized Bertoluzza's metric between intervals, the so-called dΞ distance. The problem of minimizing the mean dΞ distance to the values the random interval takes, which defines the spatial-type dΞ-median, is analysed. Existence and uniqueness of the sample version are shown. Furthermore, the robustness of this proposal is investigated by deriving its finite sample breakdown point. Finally, a real-life example from the Economics field illustrates the robustness of the sample dΞ-median, and simulation studies show some comparisons with respect to the mean and several recently introduced robust location measures for interval-valued data.status: publishe

    Cox regression survival analysis with compositional covariates: application to modelling mortality risk from 24-h physical activity patterns

    Get PDF
    Survival analysis is commonly conducted in medical and public health research to assess the association of an exposure or intervention with a hard end outcome such as mortality. The Cox (proportional hazards) regression model is probably the most popular statistical tool used in this context. However, when the exposure includes compositional covariables (that is, variables representing a relative makeup such as a nutritional or physical activity behaviour composition), some basic assumptions of the Cox regression model and associated significance tests are violated. Compositional variables involve an intrinsic interplay between one another which precludes results and conclusions based on considering them in isolation as is ordinarily done. In this work, we introduce a formulation of the Cox regression model in terms of log-ratio coordinates which suitably deals with the constraints of compositional covariates, facilitates the use of common statistical inference methods, and allows for scientifically meaningful interpretations. We illustrate its practical application to a public health problem: the estimation of the mortality hazard associated with the composition of daily activity behaviour (physical activity, sitting time and sleep) using data from the U.S. National Health and Nutrition Examination Survey (NHANES)

    3rd Workshop in Symbolic Data Analysis: book of abstracts

    Get PDF
    This workshop is the third regular meeting of researchers interested in Symbolic Data Analysis. The main aim of the event is to favor the meeting of people and the exchange of ideas from different fields - Mathematics, Statistics, Computer Science, Engineering, Economics, among others - that contribute to Symbolic Data Analysis

    Uncertainty propagation in nonlinear systems.

    Get PDF
    This thesis examines the effects of uncertainty on a variety of different engineering systems. Uncertainty can be best described as a lack of knowledge for a particular system, and can come from a variety of different sources. Within this thesis the possibilistic branch of uncertainty quantification is used. A combination of simulated and real-life engineering systems are studied, covering some of the most popular types of computational models. An outline of various background topics is presented first, as these topics are all subsequently used within the thesis. The most important of these is the transformation method, a possibilistic uncertainty approach derived from fuzzy arithmetic. Most of the work here examines uncertain systems by implementing Ben-Haim's information gap theory. Uncertainty is deliberately introduced into the parameters of the various computational models to use the concept of “opportunity”. The basic rationale is that if some degree of tolerance can be accepted on a model prediction of a system, it is possible to obtain a lower value of prediction error than with a standard crisp-valued model. For the use of interval-valued computational models there is generally a trade-off to be made between minimising the prediction error of the model and minimising the range of predicted outputs, to reduce the tolerance on the solution. The studied models all use a “degree of uncertainty” parameter that allows any user to select the suitable trade-off level for their particular application. The thesis then concludes with a real-life engineering study, undertaken as a nine month placement on a European Union project entitled MADUSE. The work was done at Centro Ricerche Fiat, and examined the dynamic effects of uncertainties related to automotive spot welds. This study used both finite element modelling and experimental modal testing of manufactured specimens

    Uncertainty modelling in power spectrum estimation of environmental processes

    Get PDF
    For efficient reliability analysis of buildings and structures, robust load models are required in stochastic dynamics, which can be estimated in particular from environmental processes, such as earthquakes or wind loads. To determine the response behaviour of a dynamic system under such loads, the power spectral density (PSD) function is a widely used tool for identifying the frequency components and corresponding amplitudes of environmental processes. Since the real data records required for this purpose are often subject to aleatory and epistemic uncertainties, and the PSD estimation process itself can induce further uncertainties, a rigorous quantification of these is essential, as otherwise a highly inaccurate load model could be generated which may yield in misleading simulation results. A system behaviour that is actually catastrophic can thus be shifted into an acceptable range, classifying the system as safe even though it is exposed to a high risk of damage or collapse. To address these issues, alternative loading models are proposed using probabilistic and non-deterministic models, that are able to efficiently account for these uncertainties and to model the loadings accordingly. Various methods are used in the generation of these load models, which are selected in particular according to the characteristic of the data and the number of available records. In case multiple data records are available, reliable statistical information can be extracted from a set of similar PSD functions that differ, for instance, only slightly in shape and peak frequency. Based on these statistics, a PSD function model is derived utilising subjective probabilities to capture the epistemic uncertainties and represent this information effectively. The spectral densities are characterised as random variables instead of employing discrete values, and thus the PSD function itself represents a non-stationary random process comprising a range of possible valid PSD functions for a given data set. If only a limited amount of data records is available, it is not possible to derive such reliable statistical information. Therefore, an interval-based approach is proposed that determines only an upper and lower bound and does not rely on any distribution within these bounds. A set of discrete-valued PSD functions is transformed into an interval-valued PSD function by optimising the weights of pre-derived basis functions from a Radial Basis Function Network such that they compose an upper and lower bound that encompasses the data set. Therefore, a range of possible values and system responses are identified rather than discrete values, which are able to quantify the epistemic uncertainties. When generating such a load model using real data records, the problem can arise that the individual records exhibit a high spectral variance in the frequency domain and therefore differ too much from each other, although they appear to be similar in the time domain. A load model derived from these data may not cover the entire spectral range and is therefore not representative. The data are therefore grouped according to their similarity using the Bhattacharyya distance and k-means algorithm, which may generate two or more load models from the entire data set. These can be applied separately to the structure under investigation, leading to more accurate simulation results. This approach can also be used to estimate the spectral similarity of individual data sets in the frequency domain, which is particularly relevant for the load models mentioned above. If the uncertainties are modelled directly in the time signal, it can be a challenging task to transform them efficiently into the frequency domain. Such a signal may consist only of reliable bounds in which the actual signal lies. A method is presented that can automatically propagate this interval uncertainty through the discrete Fourier transform, obtaining the exact bounds on the Fourier amplitude and an estimate of the PSD function. The method allows such an interval signal to be propagated without making assumptions about the dependence and distribution of the error over the time steps. These novel representations of load models are able to quantify epistemic uncertainties inherent in real data records and induced due to the PSD estimation process. The strengths and advantages of these approaches in practice are demonstrated by means of several numerical examples concentrated in the field of stochastic dynamics.FĂŒr eine effiziente ZuverlĂ€ssigkeitsanalyse von GebĂ€uden und Strukturen sind robuste Belastungsmodelle in der stochastischen Dynamik erforderlich, die insbesondere aus Umweltprozessen wie Erdbeben oder Windlasten geschĂ€tzt werden können. Um das Antwortverhalten eines dynamischen Systems unter solchen Belastungen zu bestimmen, ist die Funktion der Leistungsspektraldichte (PSD) ein weit verbreitetes Werkzeug zur Identifizierung der Frequenzkomponenten und der entsprechenden Amplituden von Umweltprozessen. Da die zu diesem Zweck benötigten realen DatensĂ€tze hĂ€ufig mit aleatorischen und epistemischen Unsicherheiten behaftet sind und der PSD-SchĂ€tzprozess selbst weitere Unsicherheiten induzieren kann, ist eine strenge Quantifizierung dieser Unsicherheiten unerlĂ€sslich, da andernfalls ein sehr ungenaues Belastungsmodell erzeugt werden könnte, das zu fehlerhaften Simulationsergebnissen fĂŒhren kann. Ein eigentlich katastrophales Systemverhalten kann so in einen akzeptablen Bereich verschoben werden, so dass das System als sicher eingestuft wird, obwohl es einem hohen Risiko der BeschĂ€digung oder des Zusammenbruchs ausgesetzt ist. Um diese Probleme anzugehen, werden alternative Belastungsmodelle vorgeschlagen, die probabilistische und nicht-deterministische Modelle verwenden, welche in der Lage sind, diese Unsicherheiten effizient zu berĂŒcksichtigen und die Belastungen entsprechend zu modellieren. Bei der Erstellung dieser Lastmodelle werden verschiedene Methoden verwendet, die insbesondere nach dem Charakter der Daten und der Anzahl der verfĂŒgbaren DatensĂ€tze ausgewĂ€hlt werden. Wenn mehrere DatensĂ€tze verfĂŒgbar sind, können zuverlĂ€ssige statistische Informationen aus einer Reihe Ă€hnlicher PSD-Funktionen extrahiert werden, die sich z.B. nur geringfĂŒgig in Form und Spitzenfrequenz unterscheiden. Auf der Grundlage dieser Statistiken wird ein Modell der PSD-Funktion abgeleitet, das subjektive Wahrscheinlichkeiten verwendet, um die epistemischen Unsicherheiten zu erfassen und diese Informationen effektiv darzustellen. Die spektralen Leistungsdichten werden als Zufallsvariablen charakterisiert, anstatt diskrete Werte zu verwenden, somit stellt die PSD-Funktion selbst einen nicht-stationĂ€ren Zufallsprozess dar, der einen Bereich möglicher gĂŒltiger PSD-Funktionen fĂŒr einen gegebenen Datensatz umfasst. Wenn nur eine begrenzte Anzahl von DatensĂ€tzen zur VerfĂŒgung steht, ist es nicht möglich, solche zuverlĂ€ssigen statistischen Informationen abzuleiten. Daher wird ein intervallbasierter Ansatz vorgeschlagen, der nur eine obere und untere Grenze bestimmt und sich nicht auf eine Verteilung innerhalb dieser Grenzen stĂŒtzt. Ein Satz von diskret wertigen PSD-Funktionen wird in eine intervallwertige PSD-Funktion umgewandelt, indem die Gewichte von vorab abgeleiteten Basisfunktionen aus einem Radialbasisfunktionsnetz so optimiert werden, dass sie eine obere und untere Grenze bilden, die den Datensatz umfassen. Damit wird ein Bereich möglicher Werte und Systemreaktionen anstelle diskreter Werte ermittelt, welche in der Lage sind, epistemische Unsicherheiten zu erfassen. Bei der Erstellung eines solchen Lastmodells aus realen DatensĂ€tzen kann das Problem auftreten, dass die einzelnen DatensĂ€tze eine hohe spektrale Varianz im Frequenzbereich aufweisen und sich daher zu stark voneinander unterscheiden, obwohl sie im Zeitbereich Ă€hnlich erscheinen. Ein aus diesen Daten abgeleitetes Lastmodell deckt möglicherweise nicht den gesamten Spektralbereich ab und ist daher nicht reprĂ€sentativ. Die Daten werden daher mit Hilfe der Bhattacharyya-Distanz und des k-means-Algorithmus nach ihrer Ähnlichkeit gruppiert, wodurch zwei oder mehr Belastungsmodelle aus dem gesamten Datensatz erzeugt werden können. Diese können separat auf die zu untersuchende Struktur angewandt werden, was zu genaueren Simulationsergebnissen fĂŒhrt. Dieser Ansatz kann auch zur SchĂ€tzung der spektralen Ähnlichkeit einzelner DatensĂ€tze im Frequenzbereich verwendet werden, was fĂŒr die oben genannten Lastmodelle besonders relevant ist. Wenn die Unsicherheiten direkt im Zeitsignal modelliert werden, kann es eine schwierige Aufgabe sein, sie effizient in den Frequenzbereich zu transformieren. Ein solches Signal kann möglicherweise nur aus zuverlĂ€ssigen Grenzen bestehen, in denen das tatsĂ€chliche Signal liegt. Es wird eine Methode vorgestellt, mit der diese Intervallunsicherheit automatisch durch die diskrete Fourier Transformation propagiert werden kann, um die exakten Grenzen der Fourier-Amplitude und der SchĂ€tzung der PSD-Funktion zu erhalten. Die Methode ermöglicht es, ein solches Intervallsignal zu propagieren, ohne Annahmen ĂŒber die AbhĂ€ngigkeit und Verteilung des Fehlers ĂŒber die Zeitschritte zu treffen. Diese neuartigen Darstellungen von Lastmodellen sind in der Lage, epistemische Unsicherheiten zu quantifizieren, die in realen DatensĂ€tzen enthalten sind und durch den PSD-SchĂ€tzprozess induziert werden. Die StĂ€rken und Vorteile dieser AnsĂ€tze in der Praxis werden anhand mehrerer numerischer Beispiele aus dem Bereich der stochastischen Dynamik demonstriert
    • 

    corecore