19 research outputs found
Spatial depth-based methods for functional data
Mención Internacional en el tÃtulo de doctorIn this thesis we deal with functional data, and in particular with the notion of functional depth. A functional depth is a measure that allows to order and rank the curves in a functional
sample from the most to the least central curve. In functional data analysis (FDA), unlike in
univariate statistics where R provides a natural order criterion for observations, the ways how
several existing functional depths rank curves differ among them. Moreover, there is no agreement
about the existence of a best available functional depth. For these reasons among others,
there is still ongoing research in the functional depth topic and this thesis intends to enhance
the progress in this field of FDA.
As first contribution, we enlarge the number of available functional depths by introducing
the kernelized functional spatial depth (KFSD). In the course of the dissertation, we show that
KFSD is the result of a modification of an existing functional depth known as functional spatial
depth (FSD). FSD falls into the category of global functional depths, which means that the
FSD value of a given curve relative to a functional sample depends equally on the rest of the
curves in the sample. However, first in the multivariate framework, where also the notion of
depth is used, and then in FDA, several authors suggested that a local approach to the depth
problem may result useful. Therefore, some local depths for which the depth value of a given
observation depends more on close than distant observations have been proposed in the literature.
Unlike FSD, KFSD falls in the category of local depths, and it can be interpreted as a
local version of FSD. As the name of KFSD suggests, we achieve the transition from global to
local proposing a kernel-type modification of FSD.
KFSD, as well as any functional depth, may result useful for several purposes. For instance, using KFSD it is possible to identify the most central curve in a functional sample, that
is, the KFSD-based sample median. Also, using the p% most central curves, we can draw a
p%-central region (0 < p < 100). Another application is the computation of robust means such
as the -trimmed mean, 0 < < 1, which consists in the functional mean calculated after
deleting the proportion of least central curves. The use of functional depths in FDA has gone
beyond the previous examples and nowadays functional depths are also used to solve other
types of problems. In particular, in this thesis we consider supervised functional classification
and functional outlier detection, and we study and propose methods based on KFSD.
Our approach to both classification and outlier detection has a main feature: we are interested
in scenarios where the solution of the problem is not extremely graphically clear. In more
detail, in classification we focus on cases in which the different groups of curves are hardly recognizable
looking at a graph, and we overlook problems where the classes of curves are easily
graphically detectable. Similarly, we do not deal with outliers that are excessively distant from
the rest of the curves, but we consider low magnitude, shape and partial outliers, which are
harder to detect. We deal with this type of problems because in these challenging scenarios it
is possible to appreciate important differences among both depths and methods, while these
differences tend to be much smaller in easier problems.
Regarding classification, methods based on functional depths are already available. In this
thesis we consider three existing depth-based procedures. For the first time, several functional
depths (KFSD and six more depths) are employed to implement these depth-based techniques.
The main result is that KFSD stands out among its competitors. Indeed, KFSD, when used
together with one of the depth based methods, i.e., the within maximum depth procedure,
shows the most stable and best performances along a simulation study that considers six different
curve generating processes and for the classification of two real datasets. Therefore, the
results supports the introduction of KFSD as a new functional depth.
For what concerns outlier detection, we also consider some existing depth-based procedures
and the above-mentioned battery of functional depths. In addition, we propose three
new methods exclusively designed for KFSD. They are all based on a desirable feature for a functional depth, that is, a functional depth should assign a low depth value to an outlier. During
our research, we have observed that KFSD is endowed with this feature. Moreover, thanks
to its local approach, KFSD in general succeeds in ranking correctly outliers that do not stand
out evidently in a graph. However, a low KFSD value is not enough to detect outliers, and it is
necessary to have at disposal a threshold value for KFSD to distinguish between normal curves
and outliers. Indeed, the three methods that we present provide alternative ways to choose a
threshold for KFSD. The simulation study that we carry out for outlier detection is similarly
extensive as in classification. Besides our proposals, we consider three existing depth-based
methods and seven depths, and two techniques that do not use functional depths. The results
of this second simulation study are also encouraging: the proposed KFSD-based methods are
the only procedures that have good correct outlier detection performances in all the six scenarios
and for the two contamination probabilities that we consider.
To summarize, in this thesis we will present a new local functional depth, KFSD, which will
turn out to be a useful tool in supervised classification, when it used in conjunction with some
existing depth-based methods, and in outlier detection, by means of some new procedures that
we will also present in this work.El tema de esta tesis es el análisis de datos funcionales, y en particular de la noción de profundidad
funcional. Una medida de profundidad funcional permite ordenar las curvas de
una muestra funcional de la más central a la menos central. Al contrario de lo que ocurre en
R donde existe una forma natural de ordenar las observaciones, en el análisis de datos funcionales
(FDA) no existe una forma única de ordenar las curvas, y por tanto las diferentes profundidades
funcionales existentes ordenan las curvas de distintas formas. Además, no existe
un acuerdo sobre la existencia de una profundidad funcional mejor para todas las situaciones
entre las disponibles. Por estas razones, entre otras, el tema de la noción de profundidad funcional
es todavÃa un área de estudio de investigación activa, y esta tesis se propone colaborar
en los avances en este campo de FDA.
Como primera contribución, en esta tesis se amplÃa el número de profundidades funcionales
disponibles mediante la introducción de la profundidad espacial funcional kernelizada
(KFSD). A lo largo de este trabajo, se muestra que KFSD es el resultado de una modificación de una profundidad funcional existente conocida como profundidad espacial funcional
(FSD). FSD se puede englobar dentro de la categorÃa de las profundidades funcionales globales,
lo que significa que el valor de FSD para una curva dada, en relación con una muestra
funcional, depende igualmente del resto de las curvas en la muestra. Sin embargo, como en el
contexto multivariante, donde también se utiliza el concepto de profundidad, varios autores
han sugerido que un enfoque local para la definición de una profundidad puede resultar útil
también en FDA. Por este motivo, en la literatura se han propuesto algunas profundidades
locales para las que el valor de la profundidad de una observación depende más de las observaciones cercanas que de las distantes. A diferencia de FSD, KFSD se puede clasificar en
la categorÃa de las profundidades locales, y puede ser interpretada como una versión local de
FSD. Como el nombre de KFSD sugiere, la transición de lo global a lo local se logrará mediante
una modificación de FSD basada en el uso de los kernels.
KFSD, asà como cualquier otra profundidad funcional, puede resultar ´ útil para varios propósitos en el ámbito del análisis estadÃstico de datos. Por ejemplo, usando KFSD es posible
identificar la curva más central en una muestra funcional, es decir, la mediana de la muestra
según KFSD. Además, utilizando el p% de las curvas centrales, es posible definir la p%-región
central (0 < p% < 100). Otra aplicación es el cálculo de medias robustas, como por ejemplo la
-media truncada, con 0 < < 1, que consiste en la media funcional calculada sin considerar
la proporción de las curvas menos centrales. El uso de las profundidades funcionales en FDA
ha ido más allá de los ejemplos anteriores, y en la actualidad las profundidades funcionales
también se utilizan para resolver otros tipos de problemas. En particular, en esta tesis se consideran
la clasificación supervisada funcional y la detección de curvas atÃpicas, y se estudian y
proponen métodos basados en KFSD.
El enfoque que se presenta en esta tesis en clasificación y detección de atÃpicos tiene una
caracterÃstica principal: el foco del trabajo está puesto en escenarios en los que la solución del
problema no resulta muy clara gráficamente. EspecÃficamente, en el apartado de clasificación
se consideran casos en los que los diferentes grupos de curvas son apenas reconocibles mirando
un gráfico, mientras que no se consideran problemas donde las clases de las curvas son
fácilmente detectables gráficamente. De manera similar, no está entre nuestros objetivos detectar
curvas atÃpicas que están excesivamente alejadas gráficamente del resto de las curvas, y
por el contrario se consideran atÃpicos de baja magnitud, de forma y atÃpicos parciales, que son
más difÃciles de detectar con los procedimientos que ya existen en la literatura. En este sentido,
se pondrá en evidencia que en este tipo de problemas existen diferencias sustanciales entre las
profundidades y los métodos de análisis, mientras que estas diferencias tienden a ser menores
en problemas más sencillos o visualmente más evidentes.
En relación con el problema de clasificación funcional, existen en la literatura métodos basados en el uso de las profundidades funcionales. En esta tesis se consideran tres procedimientos
de este tipo, y por primera vez se combinan con varias profundidades funcionales (KFSD
y seis más) con el objetivo de establecer comparativas entre métodos y/o profundidades con
los mismos escenarios. El resultado principal que se observa es que KFSD se destaca entre
sus competidores. De hecho, KFSD, cuando se utiliza junto a uno de los métodos conocidos como el procedimiento de profundidad máxima en los grupos, muestra los resultados mejores
y más estables a lo largo de un estudio de simulación que considera seis procesos diferentes
para generar las curvas, asà como en la clasificación de dos conjuntos de datos reales. Por lo
tanto, los resultados obtenidos sustentan la introducción de KFSD como nueva profundidad funcional.
Por lo que se refiere a la detección de curvas atÃpicas, también se consideran algunos procedimientos
ya existentes basados en el uso de la noción de profundidad y el grupo de sietes
profundidades mencionado arriba. Además, se proponen tres nuevos métodos diseñados exclusivamente
para KFSD. Todos ellos se basan en una caracterÃstica deseable en una profundidad
funcional, es decir, que ésta asigne un valor de profundidad baja a una curva atÃpica.
Durante nuestra investigación, se ha observado que KFSD posee esta caracterÃstica. Además,
gracias a su enfoque local, KFSD es en general capaz de ordenar correctamente los atÃpicos
que no se destacan claramente en un gráfico. Sin embargo, un valor bajo de KFSD no es suficiente
para detectar curvas atÃpicas, y es necesario tener a disposición un valor umbral para
KFSD para distinguir entre curvas normales y atÃpicas. De hecho, los tres métodos que se
presentan ofrecen formas alternativas para elegir un umbral para KFSD. Desde un punto de
vista metodológico, estos procedimientos están respaldados por resultados teóricos de corte
probabilÃsticos. El estudio de simulación que se lleva a cabo para la detección de atÃpicos es
igualmente extenso como en el caso de clasificación. Además de nuestras propuestas, se consideran
tres métodos existentes que están basados en el uso de profundidades funcionales y
dos técnicas que no utilizan profundidades funcionales. Los resultados de este segundo estudio
de simulación son también positivos: los métodos basados en KFSD que se proponen en
esta tesis resultan ser los procedimientos que detectan mejor los atÃpicos para un conjunto de seis escenarios simulados y para las dos probabilidades de contaminación que se consideran.
En resumen, en esta tesis se presenta una nueva profundidad funcional local, KFSD, que
resulta ser una herramienta útil en clasificación supervisada cuando se utiliza conjuntamente
con algunos métodos basados en el uso de profundidades, y en la detección de curvas atÃpicas
por medio de algunos nuevos procedimientos que también se presentan en este trabajo.The author and the advisors had the partial support of the following research projects: Spanish Ministry of Science and Innovation grant ECO2011-25706 and by Spanish Ministry of Economy and Competition grant ECO2012-38442.Programa Oficial de Doctorado en EconomÃa de la Empresa y Métodos CuantitativosPresidente: Juan Romo Urroz; Secretario: Manuel Febrero Bande; Vocal: Ricardo Fraima
Spatial depth-based classification for functional data
Functional data are becoming increasingly available and tractable because of the last
technological advances. We enlarge the number of functional depths by defining two
new depth functions for curves. Both depths are based on a spatial approach: the
functional spatial depth (FSD), that shows an interesting connection with the functional
extension of the notion of spatial quantiles, and the kernelized functional spatial depth
(KFSD), which is useful for studying functional samples that require an analysis at a
local level. Afterwards, we consider supervised functional classification problems, and
in particular we focus on cases in which the samples may contain outlying curves. For
these situations, some robust methods based on the use of functional depths are
available. By means of a simulation study, we show how FSD and KFSD perform as
depth functions for these depth-based methods. The results indicate that a spatial depthbased
classification approach may result helpful when the datasets are contaminated,
and that in general it is stable and satisfactory if compared with a benchmark procedure
such as the functional k-nearest neighbor classifier. Finally, we also illustrate our
approach with a real dataset.This research was partially supported by
Spanish Ministry of Education and Science grant 2007/04438/001, by Madrid Region
grant 2011/00068/001, by Spanish Ministry of Science and Innovation grant
2012/00084/001 and by MCI grant MTM2008-03010
Functional outlier detection with a local spatial depth
This paper proposes methods to detect outliers in functional datasets. We are interested in challenging scenarios where functional samples are contaminated by outliers that may be difficult to recognize. The task of identifying a typical curves is carried out using the recently proposed kernelized functional spatial depth (KFSD). KFSD is a localdepth that can be used to order the curves of a sample from the most to the least central. Since outliers are usually among the least central curves, we introduce three new procedures that provide a threshold value for KFSD such that curves with depth values lower than the threshold are detected as outliers. The results of a simulation study show that our proposals generally out perform a battery of competitors. Finally, we consider areal application with environmental data consisting in levels of nitrogen oxidesThis research was partially supported by Spanish Ministry of
Science and Innovation grant ECO2011-25706 and by Spanish Ministry of Economy
and Competition grant ECO2012-3844
Advances in Understanding High-Mass X-ray Binaries with INTEGRAL and Future Directions
High mass X-ray binaries are among the brightest X-ray sources in the Milky Way, as well as in nearby Galaxies. Thanks to their highly variable emissions and complex phenomenology, they have attracted the interest of the high energy astrophysical community since the dawn of X-ray Astronomy. In more recent years, they have challenged our comprehension of physical processes in many more energy bands, ranging from the infrared to very high energies. In this review, we provide a broad but concise summary of the physical processes dominating the emission from high mass X-ray binaries across virtually the whole electromagnetic spectrum. These comprise the interaction of stellar winds with the high gravitational and magnetic fields of compact objects, the behaviour of matter under extreme magnetic and gravity conditions, and the perturbation of the massive star evolutionary processes by presence in a binary system. We highlight the role of the INTEGRAL mission in the discovery of many of the most interesting objects in the high mass X-ray binary class and its contribution in reviving the interest for these sources over the past two decades. We show how the INTEGRAL discoveries have not only contributed to significantly increase the number of high mass X-ray binaries known, thus advancing our understanding of the population as a whole, but also have opened new windows of investigation that stimulated the multi-wavelength approach nowadays common in most astrophysical research fields. We conclude the review by providing an overview of future facilities being planned from the X-ray to the very high energy domain that will hopefully help us in finding an answer to the many questions left open after more than 18 years of INTEGRAL scientific observations.The INTEGRALteams in the participating countries acknowledge the continuous support from their space agencies and funding organizations: the Italian Space Agency ASI (via different agreements including the latest one, 2019-35HH, and the ASIINAF agreement 2017-14-H.0), the French Centre national d’études spatiales (CNES), the Russian Foundation for Basic Research (KP, 19-02-00790), the Russian Science Foundation (ST, VD, AL; 19-12-00423), the Spanish State Research Agency (via different grants including ESP2017-85691-P, ESP2017-87676-C5-1-R and Unidad de Excelencia MarÃa de Maeztu – CAB MDM-2017-0737). IN is partially supported by the Spanish Government under grant PGC2018-093741-B-C21/C22 (MICIU/AEI/FEDER, UE). LD acknowledges grant 50 OG 1902
Advances in Understanding High-Mass X-ray Binaries with INTEGRALand Future Directions
High mass X-ray binaries are among the brightest X-ray sources in the Milky Way, as well as in nearby Galaxies. Thanks to their highly variable emissions and complex phenomenology, they have attracted the interest of the high energy astrophysical community since the dawn of X-ray Astronomy. In more recent years, they have challenged our comprehension of physical processes in many more energy bands, ranging from the infrared to very high energies.In this review, we provide a broad but concise summary of the physical processes dominating the emission from high mass X-ray binaries across virtually the whole electromagnetic spectrum. These comprise the interaction of stellar winds with the high gravitational and magnetic fields of compact objects, the behaviour of matter under extreme magnetic and gravity conditions, and the perturbation of the massive star evolutionary processes by presence in a binary system.We highlight the role of the INTEGRAL mission in the discovery of many of the most interesting objects in the high mass X-ray binary class and its contribution in reviving the interest for these sources over the past two decades. We show how the INTEGRAL discoveries have not only contributed to significantly increase the number of high mass X-ray binaries known, thus advancing our understanding of the population as a whole, but also have opened new windows of investigation that stimulated the multi-wavelength approach nowadays common in most astrophysical research fields.We conclude the review by providing an overview of future facilities being planned from the X-ray to the very high energy domain that will hopefully help us in finding an answer to the many questions left open after more than 18 years of INTEGRAL scientific observations.</p