19 research outputs found

    Spatial depth-based methods for functional data

    Get PDF
    Mención Internacional en el título de doctorIn this thesis we deal with functional data, and in particular with the notion of functional depth. A functional depth is a measure that allows to order and rank the curves in a functional sample from the most to the least central curve. In functional data analysis (FDA), unlike in univariate statistics where R provides a natural order criterion for observations, the ways how several existing functional depths rank curves differ among them. Moreover, there is no agreement about the existence of a best available functional depth. For these reasons among others, there is still ongoing research in the functional depth topic and this thesis intends to enhance the progress in this field of FDA. As first contribution, we enlarge the number of available functional depths by introducing the kernelized functional spatial depth (KFSD). In the course of the dissertation, we show that KFSD is the result of a modification of an existing functional depth known as functional spatial depth (FSD). FSD falls into the category of global functional depths, which means that the FSD value of a given curve relative to a functional sample depends equally on the rest of the curves in the sample. However, first in the multivariate framework, where also the notion of depth is used, and then in FDA, several authors suggested that a local approach to the depth problem may result useful. Therefore, some local depths for which the depth value of a given observation depends more on close than distant observations have been proposed in the literature. Unlike FSD, KFSD falls in the category of local depths, and it can be interpreted as a local version of FSD. As the name of KFSD suggests, we achieve the transition from global to local proposing a kernel-type modification of FSD. KFSD, as well as any functional depth, may result useful for several purposes. For instance, using KFSD it is possible to identify the most central curve in a functional sample, that is, the KFSD-based sample median. Also, using the p% most central curves, we can draw a p%-central region (0 < p < 100). Another application is the computation of robust means such as the -trimmed mean, 0 < < 1, which consists in the functional mean calculated after deleting the proportion of least central curves. The use of functional depths in FDA has gone beyond the previous examples and nowadays functional depths are also used to solve other types of problems. In particular, in this thesis we consider supervised functional classification and functional outlier detection, and we study and propose methods based on KFSD. Our approach to both classification and outlier detection has a main feature: we are interested in scenarios where the solution of the problem is not extremely graphically clear. In more detail, in classification we focus on cases in which the different groups of curves are hardly recognizable looking at a graph, and we overlook problems where the classes of curves are easily graphically detectable. Similarly, we do not deal with outliers that are excessively distant from the rest of the curves, but we consider low magnitude, shape and partial outliers, which are harder to detect. We deal with this type of problems because in these challenging scenarios it is possible to appreciate important differences among both depths and methods, while these differences tend to be much smaller in easier problems. Regarding classification, methods based on functional depths are already available. In this thesis we consider three existing depth-based procedures. For the first time, several functional depths (KFSD and six more depths) are employed to implement these depth-based techniques. The main result is that KFSD stands out among its competitors. Indeed, KFSD, when used together with one of the depth based methods, i.e., the within maximum depth procedure, shows the most stable and best performances along a simulation study that considers six different curve generating processes and for the classification of two real datasets. Therefore, the results supports the introduction of KFSD as a new functional depth. For what concerns outlier detection, we also consider some existing depth-based procedures and the above-mentioned battery of functional depths. In addition, we propose three new methods exclusively designed for KFSD. They are all based on a desirable feature for a functional depth, that is, a functional depth should assign a low depth value to an outlier. During our research, we have observed that KFSD is endowed with this feature. Moreover, thanks to its local approach, KFSD in general succeeds in ranking correctly outliers that do not stand out evidently in a graph. However, a low KFSD value is not enough to detect outliers, and it is necessary to have at disposal a threshold value for KFSD to distinguish between normal curves and outliers. Indeed, the three methods that we present provide alternative ways to choose a threshold for KFSD. The simulation study that we carry out for outlier detection is similarly extensive as in classification. Besides our proposals, we consider three existing depth-based methods and seven depths, and two techniques that do not use functional depths. The results of this second simulation study are also encouraging: the proposed KFSD-based methods are the only procedures that have good correct outlier detection performances in all the six scenarios and for the two contamination probabilities that we consider. To summarize, in this thesis we will present a new local functional depth, KFSD, which will turn out to be a useful tool in supervised classification, when it used in conjunction with some existing depth-based methods, and in outlier detection, by means of some new procedures that we will also present in this work.El tema de esta tesis es el análisis de datos funcionales, y en particular de la noción de profundidad funcional. Una medida de profundidad funcional permite ordenar las curvas de una muestra funcional de la más central a la menos central. Al contrario de lo que ocurre en R donde existe una forma natural de ordenar las observaciones, en el análisis de datos funcionales (FDA) no existe una forma única de ordenar las curvas, y por tanto las diferentes profundidades funcionales existentes ordenan las curvas de distintas formas. Además, no existe un acuerdo sobre la existencia de una profundidad funcional mejor para todas las situaciones entre las disponibles. Por estas razones, entre otras, el tema de la noción de profundidad funcional es todavía un área de estudio de investigación activa, y esta tesis se propone colaborar en los avances en este campo de FDA. Como primera contribución, en esta tesis se amplía el número de profundidades funcionales disponibles mediante la introducción de la profundidad espacial funcional kernelizada (KFSD). A lo largo de este trabajo, se muestra que KFSD es el resultado de una modificación de una profundidad funcional existente conocida como profundidad espacial funcional (FSD). FSD se puede englobar dentro de la categoría de las profundidades funcionales globales, lo que significa que el valor de FSD para una curva dada, en relación con una muestra funcional, depende igualmente del resto de las curvas en la muestra. Sin embargo, como en el contexto multivariante, donde también se utiliza el concepto de profundidad, varios autores han sugerido que un enfoque local para la definición de una profundidad puede resultar útil también en FDA. Por este motivo, en la literatura se han propuesto algunas profundidades locales para las que el valor de la profundidad de una observación depende más de las observaciones cercanas que de las distantes. A diferencia de FSD, KFSD se puede clasificar en la categoría de las profundidades locales, y puede ser interpretada como una versión local de FSD. Como el nombre de KFSD sugiere, la transición de lo global a lo local se logrará mediante una modificación de FSD basada en el uso de los kernels. KFSD, así como cualquier otra profundidad funcional, puede resultar ´ útil para varios propósitos en el ámbito del análisis estadístico de datos. Por ejemplo, usando KFSD es posible identificar la curva más central en una muestra funcional, es decir, la mediana de la muestra según KFSD. Además, utilizando el p% de las curvas centrales, es posible definir la p%-región central (0 < p% < 100). Otra aplicación es el cálculo de medias robustas, como por ejemplo la -media truncada, con 0 < < 1, que consiste en la media funcional calculada sin considerar la proporción de las curvas menos centrales. El uso de las profundidades funcionales en FDA ha ido más allá de los ejemplos anteriores, y en la actualidad las profundidades funcionales también se utilizan para resolver otros tipos de problemas. En particular, en esta tesis se consideran la clasificación supervisada funcional y la detección de curvas atípicas, y se estudian y proponen métodos basados en KFSD. El enfoque que se presenta en esta tesis en clasificación y detección de atípicos tiene una característica principal: el foco del trabajo está puesto en escenarios en los que la solución del problema no resulta muy clara gráficamente. Específicamente, en el apartado de clasificación se consideran casos en los que los diferentes grupos de curvas son apenas reconocibles mirando un gráfico, mientras que no se consideran problemas donde las clases de las curvas son fácilmente detectables gráficamente. De manera similar, no está entre nuestros objetivos detectar curvas atípicas que están excesivamente alejadas gráficamente del resto de las curvas, y por el contrario se consideran atípicos de baja magnitud, de forma y atípicos parciales, que son más difíciles de detectar con los procedimientos que ya existen en la literatura. En este sentido, se pondrá en evidencia que en este tipo de problemas existen diferencias sustanciales entre las profundidades y los métodos de análisis, mientras que estas diferencias tienden a ser menores en problemas más sencillos o visualmente más evidentes. En relación con el problema de clasificación funcional, existen en la literatura métodos basados en el uso de las profundidades funcionales. En esta tesis se consideran tres procedimientos de este tipo, y por primera vez se combinan con varias profundidades funcionales (KFSD y seis más) con el objetivo de establecer comparativas entre métodos y/o profundidades con los mismos escenarios. El resultado principal que se observa es que KFSD se destaca entre sus competidores. De hecho, KFSD, cuando se utiliza junto a uno de los métodos conocidos como el procedimiento de profundidad máxima en los grupos, muestra los resultados mejores y más estables a lo largo de un estudio de simulación que considera seis procesos diferentes para generar las curvas, así como en la clasificación de dos conjuntos de datos reales. Por lo tanto, los resultados obtenidos sustentan la introducción de KFSD como nueva profundidad funcional. Por lo que se refiere a la detección de curvas atípicas, también se consideran algunos procedimientos ya existentes basados en el uso de la noción de profundidad y el grupo de sietes profundidades mencionado arriba. Además, se proponen tres nuevos métodos diseñados exclusivamente para KFSD. Todos ellos se basan en una característica deseable en una profundidad funcional, es decir, que ésta asigne un valor de profundidad baja a una curva atípica. Durante nuestra investigación, se ha observado que KFSD posee esta característica. Además, gracias a su enfoque local, KFSD es en general capaz de ordenar correctamente los atípicos que no se destacan claramente en un gráfico. Sin embargo, un valor bajo de KFSD no es suficiente para detectar curvas atípicas, y es necesario tener a disposición un valor umbral para KFSD para distinguir entre curvas normales y atípicas. De hecho, los tres métodos que se presentan ofrecen formas alternativas para elegir un umbral para KFSD. Desde un punto de vista metodológico, estos procedimientos están respaldados por resultados teóricos de corte probabilísticos. El estudio de simulación que se lleva a cabo para la detección de atípicos es igualmente extenso como en el caso de clasificación. Además de nuestras propuestas, se consideran tres métodos existentes que están basados en el uso de profundidades funcionales y dos técnicas que no utilizan profundidades funcionales. Los resultados de este segundo estudio de simulación son también positivos: los métodos basados en KFSD que se proponen en esta tesis resultan ser los procedimientos que detectan mejor los atípicos para un conjunto de seis escenarios simulados y para las dos probabilidades de contaminación que se consideran. En resumen, en esta tesis se presenta una nueva profundidad funcional local, KFSD, que resulta ser una herramienta útil en clasificación supervisada cuando se utiliza conjuntamente con algunos métodos basados en el uso de profundidades, y en la detección de curvas atípicas por medio de algunos nuevos procedimientos que también se presentan en este trabajo.The author and the advisors had the partial support of the following research projects: Spanish Ministry of Science and Innovation grant ECO2011-25706 and by Spanish Ministry of Economy and Competition grant ECO2012-38442.Programa Oficial de Doctorado en Economía de la Empresa y Métodos CuantitativosPresidente: Juan Romo Urroz; Secretario: Manuel Febrero Bande; Vocal: Ricardo Fraima

    Spatial depth-based classification for functional data

    Get PDF
    Functional data are becoming increasingly available and tractable because of the last technological advances. We enlarge the number of functional depths by defining two new depth functions for curves. Both depths are based on a spatial approach: the functional spatial depth (FSD), that shows an interesting connection with the functional extension of the notion of spatial quantiles, and the kernelized functional spatial depth (KFSD), which is useful for studying functional samples that require an analysis at a local level. Afterwards, we consider supervised functional classification problems, and in particular we focus on cases in which the samples may contain outlying curves. For these situations, some robust methods based on the use of functional depths are available. By means of a simulation study, we show how FSD and KFSD perform as depth functions for these depth-based methods. The results indicate that a spatial depthbased classification approach may result helpful when the datasets are contaminated, and that in general it is stable and satisfactory if compared with a benchmark procedure such as the functional k-nearest neighbor classifier. Finally, we also illustrate our approach with a real dataset.This research was partially supported by Spanish Ministry of Education and Science grant 2007/04438/001, by Madrid Region grant 2011/00068/001, by Spanish Ministry of Science and Innovation grant 2012/00084/001 and by MCI grant MTM2008-03010

    Functional outlier detection with a local spatial depth

    Get PDF
    This paper proposes methods to detect outliers in functional datasets. We are interested in challenging scenarios where functional samples are contaminated by outliers that may be difficult to recognize. The task of identifying a typical curves is carried out using the recently proposed kernelized functional spatial depth (KFSD). KFSD is a localdepth that can be used to order the curves of a sample from the most to the least central. Since outliers are usually among the least central curves, we introduce three new procedures that provide a threshold value for KFSD such that curves with depth values lower than the threshold are detected as outliers. The results of a simulation study show that our proposals generally out perform a battery of competitors. Finally, we consider areal application with environmental data consisting in levels of nitrogen oxidesThis research was partially supported by Spanish Ministry of Science and Innovation grant ECO2011-25706 and by Spanish Ministry of Economy and Competition grant ECO2012-3844

    Advances in Understanding High-Mass X-ray Binaries with INTEGRAL and Future Directions

    Get PDF
    High mass X-ray binaries are among the brightest X-ray sources in the Milky Way, as well as in nearby Galaxies. Thanks to their highly variable emissions and complex phenomenology, they have attracted the interest of the high energy astrophysical community since the dawn of X-ray Astronomy. In more recent years, they have challenged our comprehension of physical processes in many more energy bands, ranging from the infrared to very high energies. In this review, we provide a broad but concise summary of the physical processes dominating the emission from high mass X-ray binaries across virtually the whole electromagnetic spectrum. These comprise the interaction of stellar winds with the high gravitational and magnetic fields of compact objects, the behaviour of matter under extreme magnetic and gravity conditions, and the perturbation of the massive star evolutionary processes by presence in a binary system. We highlight the role of the INTEGRAL mission in the discovery of many of the most interesting objects in the high mass X-ray binary class and its contribution in reviving the interest for these sources over the past two decades. We show how the INTEGRAL discoveries have not only contributed to significantly increase the number of high mass X-ray binaries known, thus advancing our understanding of the population as a whole, but also have opened new windows of investigation that stimulated the multi-wavelength approach nowadays common in most astrophysical research fields. We conclude the review by providing an overview of future facilities being planned from the X-ray to the very high energy domain that will hopefully help us in finding an answer to the many questions left open after more than 18 years of INTEGRAL scientific observations.The INTEGRALteams in the participating countries acknowledge the continuous support from their space agencies and funding organizations: the Italian Space Agency ASI (via different agreements including the latest one, 2019-35HH, and the ASIINAF agreement 2017-14-H.0), the French Centre national d’études spatiales (CNES), the Russian Foundation for Basic Research (KP, 19-02-00790), the Russian Science Foundation (ST, VD, AL; 19-12-00423), the Spanish State Research Agency (via different grants including ESP2017-85691-P, ESP2017-87676-C5-1-R and Unidad de Excelencia María de Maeztu – CAB MDM-2017-0737). IN is partially supported by the Spanish Government under grant PGC2018-093741-B-C21/C22 (MICIU/AEI/FEDER, UE). LD acknowledges grant 50 OG 1902

    Advances in Understanding High-Mass X-ray Binaries with INTEGRALand Future Directions

    Get PDF
    High mass X-ray binaries are among the brightest X-ray sources in the Milky Way, as well as in nearby Galaxies. Thanks to their highly variable emissions and complex phenomenology, they have attracted the interest of the high energy astrophysical community since the dawn of X-ray Astronomy. In more recent years, they have challenged our comprehension of physical processes in many more energy bands, ranging from the infrared to very high energies.In this review, we provide a broad but concise summary of the physical processes dominating the emission from high mass X-ray binaries across virtually the whole electromagnetic spectrum. These comprise the interaction of stellar winds with the high gravitational and magnetic fields of compact objects, the behaviour of matter under extreme magnetic and gravity conditions, and the perturbation of the massive star evolutionary processes by presence in a binary system.We highlight the role of the INTEGRAL mission in the discovery of many of the most interesting objects in the high mass X-ray binary class and its contribution in reviving the interest for these sources over the past two decades. We show how the INTEGRAL discoveries have not only contributed to significantly increase the number of high mass X-ray binaries known, thus advancing our understanding of the population as a whole, but also have opened new windows of investigation that stimulated the multi-wavelength approach nowadays common in most astrophysical research fields.We conclude the review by providing an overview of future facilities being planned from the X-ray to the very high energy domain that will hopefully help us in finding an answer to the many questions left open after more than 18 years of INTEGRAL scientific observations.</p
    corecore