21 research outputs found
Deviation from power law of the global seismic moment distribution
The distribution of seismic moment is of capital interest to evaluate earthquake hazard, in particular regarding the most extreme events. We make use of likelihood-ratio tests to compare the simple Gutenberg-Richter power-law (PL) distribution with two statistical models that incorporate an exponential tail, the so-called tapered Gutenberg-Richter (Tap) and the truncated gamma, when fitted to the global CMT earthquake catalog. Although the Tap distribution does not introduce any significant improvement of fit respect the PL, the truncated gamma does. Simulated samples of this distribution, with parameters β = 0.68 and m = 9.15 and reshuffled in order to mimic the time occurrence of the order statistics of the empirical data, are able to explain the temporal heterogeneity of global seismicity both before and after the great Sumatra-Andaman earthquake of 2004
Modelling and predicting extreme behavior in critical real-time systems with advanced statistics
In the last decade, the market for Critical Real-Time
Embedded Systems (CRTES) has increased significantly. According
to Global Markets Insight [1], the embedded systems
market will reach a total size of US $258 billion in 2023
at an average annual growth rate of 5.6%. Their extensive
use in domains such as automotive, aerospace and avionics
industry demands ever increasing performance requirements
[2]. To satisfy those requirements the CRTES industry has
implemented more complex processors, a higher number of
memory modules, and accelerators units. Thus the demanding
performance requirements have led to a merge of CRTES with
High Performance systems. All of these industries work within
the framework of CRTES, which puts several restrictions in
their design and implementation. Real Time systems require
to deliver a response to an event in a restricted time frame
or deadline. Real-time systems where missing a deadline
provokes a total system failure (hard real-time systems) need
satisfy certain guidelines and standards to show that they
comply with test for functional and timing behaviour. These
standards change depending on the industry, for instance the
automotive industry follows ISO 26262 [3] and the aerospace
industry follows DO-178C [4]. Researches have developed
techniques to analyse the timing correctness in a CRTES.
Here, we will expose how they perform on the estimation
of the Worst-Case Execution Time (WCET). The WCET is
the maximum time that a particular software takes to execute.
Estimating its value is crucial from a timing analysis point of
view. However there is still not a generalised precise and safe
method to produce estimates of WCET [5]. In the CRTES
the estimations of the WCET cannot be lower than the true
WCET, as they are deemed unsafe; but they cannot exceed it
by a significant margin, as they will be deemed pessimistic
and impractical
Software timing analysis for complex hardware with survivability and risk analysis
The increasing automation of safety-critical real-time systems, such as those in cars and planes, leads, to more complex and performance-demanding on-board software and the subsequent adoption of multicores and accelerators. This causes software's execution time dispersion to increase due to variable-latency resources such as caches, NoCs, advanced memory controllers and the like. Statistical analysis has been proposed to model the Worst-Case Execution Time (WCET) of software running such complex systems by providing reliable probabilistic WCET (pWCET) estimates. However, statistical models used so far, which are based on risk analysis, are overly pessimistic by construction. In this paper we prove that statistical survivability and risk analyses are equivalent in terms of tail analysis and, building upon survivability analysis theory, we show that Weibull tail models can be used to estimate pWCET distributions reliably and tightly. In particular, our methodology proves the correctness-by-construction of the approach, and our evaluation provides evidence about the tightness of the pWCET estimates obtained, which allow decreasing them reliably by 40% for a railway case study w.r.t. state-of-the-art exponential tails.This work is a collaboration between Argonne National Laboratory and the Barcelona Supercomputing Center within the Joint Laboratory for Extreme-Scale Computing. This research is supported by the
U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, under contract number DE-AC02-
06CH11357, program manager Laura Biven, and by the Spanish Government (SEV2015-0493), by the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P), by Generalitat de Catalunya (contract 2014-SGR-1051).Peer ReviewedPostprint (author's final draft
Introducció a l'estadística amb recursos tecnològics (TIC) apropiats a cursos amb multiplicitat de grups
En aquest treball presentem una experiència duta a terme al llarg de quatre anys que ens ha permès motivar els estudiants de Biologia per a l'estudi de l'Estadística, entenent que aquesta és una matèria pròpia de la titulació que han escollit. Hem incrementat així decisivament la seva participació en els exàmens. La idea inicial ha estat realitzar amb els estudiants un treball de camp d'estadística en cadascuna de les seves etapes. Per això hem introduït unes classes pràctiques amb ordinador i hem adaptat la metodologia docent, tot actualitzant algunes parts del programa. Paral·lelament, ha anat pujant clarament el nombre d'estudiants que superen l'assignatura. Les noves tecnologies de la informació i la comunicació fan possible treballar amb un gran nombre d'estudiants (500) de tal manera que sembli un treball personalitzat. Cada estudiant té un treball diferent i a la vegada equivalent, de manera que això el porta a treballar amb col·laboració sentint-se atès i avaluat individualment.In this paper we present an experience developed over four years with students of Biology. Through it we have been able to motivate first year students to introduce themselves in the study of the Statistics as a major not strange to their curricular interest. The idea was to carry out with the students a statistics work in each one of its phases. To achieve that goal, we organized practical sessions with computer and adapted the educational methodology and program. The new technologies make possible to work with a large number of students in a way apparently personalized. Each student is assigned to an individual and equivalent homework, and so moved to work cooperatively and feeling to be assessed individually. The fact is that the index of participation in evaluation test has significantly increased, as well as the index of success of students
Introducció a l'estadística amb recursos tecnològics (tic) apropiats a cursos amb multiplicitat de grups : corrector de pràctiques
Durant els darrers 4 anys, els professors del departament de matemàtiques de l'assignatura d'estadística a 1er de biologia hem dissenyat un model docent implementant recursos tecnològics a un curs introductori a l'estadística. Aquest model es pot aplicar a molts estudis que ofereix la nostra universitat. En particular, és molt interessant per a cursos amb multiplicitat de grups. Alguns de vosaltres us haureu trobat en algun moment, impartint matèries amb multiplicitat de grups. En aquest cas, hi ha dues possibilitats, o bé treballar de forma independent per a cada grup d'alumnes, o bé en equip. Aquest model que us presentem, és un software que hem creat per a la part pràctica, el qual aposta pel treball en equip i resol els problemes que comporta fer-ho ( com per exemple l'avaluació ). Cal notar que cada estudiant té un treball diferent però a la vegada equivalent, de manera que això el porta a treballar amb col·laboració, sentint-se atès i avaluat individualment.During last 4 years, professors from mathematics department teaching an introduction to statistics in biology have been developing a teaching model, using technological resources, for a first course in statistics. This model can be applied to plenty of degrees in our university. In particular, it is very interesting for courses with multiple groups. Probably, most of you, at some point, have been teaching courses with multiple groups. In this case, there are two possibilities: either every professor works independently or together as a team. Our choice is to work as a team. The model that we are showing is a software that we have developed for the practical part and solves the problems that it supposes (like test corrections). It is well worth to say that every student has a different, but equivalent, work so that he found himself individually attended
Probability estimation of a Carrington-like geomagnetic storm
Altres ajuts: RecerCaixa 2015ACUP00129 i Fundación Santander UniversidadesIntense geomagnetic storms can cause severe damage to electrical systems and communications. This work proposes a counting process with Weibull inter-occurrence times in order to estimate the probability of extreme geomagnetic events. It is found that the scale parameter of the inter-occurrence time distribution grows exponentially with the absolute value of the intensity threshold defining the storm, whereas the shape parameter keeps rather constant. The model is able to forecast the probability of occurrence of an event for a given intensity threshold; in particular, the probability of occurrence on the next decade of an extreme event of a magnitude comparable or larger than the well-known Carrington event of 1859 is explored, and estimated to be between 0.46% and 1.88% (with a 95% confidence), a much lower value than those reported in the existing literature
Using Markov’s inequality with power-of-k function for probabilistic WCET estimation
Deriving WCET estimates for software programs with probabilistic means (a.k.a. pWCET estimation) has received significant attention during last years as a way to deal with the increased complexity of the processors used in real-time systems. Many works build on Extreme Value Theory (EVT) that is fed with a sample of the collected data (execution times). In its application, EVT carries two sources of uncertainty: the first one that is intrinsic to the EVT model and relates to determining the subset of the sample that belongs to the (upper) tail, and hence, is actually used by EVT for prediction; and the second one that is induced by the sampling process and hence is inherent to all sample-based methods. In this work, we show that Markov’s inequality can be used to obtain provable trustworthy probabilistic bounds to the tail of a distribution without incurring any model-intrinsic uncertainty. Yet, it produces pessimistic estimates that we shave substantially by proposing the use of a power-of-k function instead of the default identity function used by Markov’s inequality. Lastly, we propose a method to deal with sampling uncertainty for Markov’s inequality that consistently improves EVT estimates on synthetic and real data obtained from a railway application.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grant PID2019-110854RB-I00 / AEI / 10.13039/501100011033 and the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 772773).Peer ReviewedPostprint (published version
HRM: merging hardware event monitors for improved timing analysis of complex MPSoCs
The Performance Monitoring Unit (PMU) in MPSoCs is at the heart of the latest measurement-based timing analysis techniques in Critical Embedded Systems. In particular, hardware event monitors (HEMs) in the PMU are used as building blocks in the process of budgeting and verifying software timing by tracking and controlling access counts to shared resources. While the number of HEMs in current MPSoCs reaches hundreds, they are read via Performance Monitoring Counters whose number is usually limited to 4-8, thus requiring multiple runs of each experiment in order to collect all desired HEMs. Despite the effort of engineers in controlling the execution conditions of each experiment, the complexity of current MPSoCs makes it arguably impossible to completely remove the noise affecting each run. As a result, HEMs read in different runs are subject to different variability, and hence, those HEMs captured in different runs cannot be ‘blindly’ merged. In this work, we focus on the NXP T2080 platform where we observed up to 59% variability across different runs of the same experiment for some relevant HEMs (e.g. processor cycles). We develop a
HEM reading and merging (HRM) approach to join reliably HEMs across different runs as a fundamental element of any measurement-based timing budgeting and verification technique.
Our method builds on order statistics and the selection of an anchor HEM read in all runs to derive the most plausible combination of HEM readings that keep the distribution of each HEM and their relationship with the anchor HEM intact.This work has been partially supported by the Spanish Ministry of Science and Innovation under grant PID2019-107255GB, the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 772773) and the HiPEAC Network of Excellence.Peer ReviewedPostprint (author's final draft
Modelos estadísticos para valores extremos y aplicaciones
Los valores extremos los hallamos en muchos ámbitos de las ciencias y su modelización se
utiliza en varios campos tales como la hidrología, los seguros, las finanzas y la ciencia medio
ambiental. La singularidad de los valores extremos hace que debamos tratarlos de un modo
separado al resto de datos que observamos.
El objeto que analiza los valores extremos desde un punto de vista estadístico son las colas de
las distribuciones sobre un umbral (que simplemente llamaremos colas). Generalmente, las
colas hacen referencia a aquello que puede suceder una vez de cada mil, en contraposición a la
estadística habitual que se fija como mucho en lo que sucede una de cada 20 o 100 veces.
La teoría de valores extremos (EVT) tomó importancia en los años 20 con problemas
relacionados principalmente con la hidrología y dieron lugar al primer teorema fundamental en
EVT de Fisher-Tippet (1928) y Gnedenko (1948) que caracteriza la distribución asintótica del
máximo observado. Otro punto de vista surgió en los años 70 con el segundo teorema
fundamental de EVT de Pickands (1975) y Balkema-de Haan (1974) cuando todo parecía
resuelto. Este resultado caracteriza la distribución asintótica de las colas como una distribución
de la familia Pareto generalizada (GPD). A partir de aquí, la teoría de valores extremos ha
seguido evolucionando y a su vez, a menudo se aparta de las necesidades prácticas, de la
modelización estadística, ver
Diebold et al. (1998).
Actualmente, los ámbitos que presentan más problemas relacionados con valores extremos se
clasifican según dónde deriva el riesgo que producen: en el ámbito financiero, en el ámbito
medio-ambiental o en el ámbito de la salud. En este trabajo trataremos aplicaciones prácticas en
los dos primeros ámbitos.
Enumeremos los retos principales de la modelización estadística de los valores extremos. En
primer y segundo lugar, la estimación del índice de la cola así como la estimación del umbral
óptimo donde enlazar con el modelo GPD. I en tercer lugar, hallar modelos alternativos a la
GPD que den resultados satisfactorios. En Coles (2001), Embrechts et al. (1997), McNeil et al.
(2005) y Beirlant et al. (2004), hallamos revisiones satisfactorias de estos puntos clave en
modelización estadística, pero aún así y como veremos en este trabajo, todavía hay trabajo que
hacer.
Este trabajo está dividido en 5 Capítulos. El primero introduciremos algunos preliminares
básicos. El Capítulo 2 revisaremos el estado de la modelización estadística de valores extremos
de un modo crítico. En esta revisión vamos a mostrar que el problema de estimación de
parámetros de la GPD es un obstáculo en el progreso de la modelización y por ello, trataremos
este tema en el Capítulo 3 en el cual hallaremos un nuevo enfoque del modelo que resolverá esta
cuestión. De esta forma y junto con el trabajo de Castillo et al. (2013) sobre el coeficiente de
variación residual podremos concluir en el Capítulo 5 con un protocolo de estimación del
umbral óptimo y del índice de la cola que es satisfactorio, manejable y más riguroso, desde un
punto de vista teórico, que otros métodos que se usan habitualmente. El reto de hallar nuevos
modelos para colas es iniciado en el Capítulo 4 dónde presentaremos un modelo analítico nuevo
que nos permitirá fijar los criterios para decidir si un modelo es apto para modelar colas.
Finalmente, en el Capítulo 5 hallaremos las conclusiones generales de este trabajo.The extreme values are in many fields of science and modeling is used in several fields such as
hydrology, insurance, finance and environmental science. The uniqueness of outliers makes that
we must treat them in a separate mode to other observations.
The main object in analyzing the extreme values, from a statistical viewpoint, is the left
truncated distributions or the distributions above thresholds (which are known as tails).
Generally, the tails do reference to what can happen once in thousand times, it is in contrast
with the usual statistical who is more dedicated to what happen once in 20 or 100 times.
The extreme value theory (EVT) became important in the '20s, from problems mainly related to
hydrology and led to the first fundamental theorem in EVT by Fisher - Tippet (1928) and
Gnedenko (1948) to characterizing the asymptotic distribution of the maximum in observed
data. When everything seemed settled, another point of view emerged in the '70s with the
second fundamental theorem in EVT by Pickands (1975) and Balkema -de Haan (1974). This
result characterizes the asymptotic distribution of the tails as a distribution in the generalized
Pareto family. From here, the extreme value theory has continued to evolve and in turn, to often
has departs from the practical needs of the modeling statistics, see Diebold et al. (1998).
Currently, the fields that they have more problems in extreme values are classified according to
where the risk of occurrence is derived: in the financial field, in the environmental field or the
field of health. We are going to discuss applications practices in the first two areas.
Lately, tools, techniques and processes used in statistical modeling of extreme values are
questioned, since from a practical point of view are limitations. Moreover, the fact that the GPD
characterizes distribution of the tails has made this model is considered as the reference model,
when in fact this model sometimes produces unsatisfactory results, Dutta and Perry (2006).
We list the main challenges in statistical modeling of extreme values. First and second are the
estimation of the tail index and the optimal threshold to bind to the GPD model. Third, find
alternative models to the GPD with satisfactory results. In Coles (2001), Embrechts et al.
(1997), McNeil et al. (2005) and Beirlant et al. (2004), we find a satisfactory review of these
points in statistical modeling.
This paper is divided into 5 Chapters. The first contains an introduction for some basic
preliminaries. Chapter 2 will review the status of statistical modeling of extreme values in a
critical way. In this review we will show that the problem of estimating GPD parameters is an
obstacle in the progress of the modeling. Therefore in Chapter 3 we discuss and we find a new
approach to solve this question. In this way and with the work of Castillo et al. (2013) on the
residual coefficient of variation we conclude in Chapter 5 with a protocol to estimate the
optimal threshold and the tail index, manageable and more rigorous, from a theoretic point of
view than other commonly used methods. Moreover, the challenge of finding new models for
tails is initiated in Chapter 4 where we present a new analytical model that will allow us
establish the criteria for deciding whether a model is suitable for modeling tails. Finally, in
Chapter 5 we find the conclusions general of this work
Deviation from power law of the global seismic moment distribution
The distribution of seismic moment is of capital interest to evaluate earthquake hazard, in particular regarding the most extreme events. We make use of likelihood-ratio tests to compare the simple Gutenberg-Richter power-law (PL) distribution with two statistical models that incorporate an exponential tail, the so-called tapered Gutenberg-Richter (Tap) and the truncated gamma, when fitted to the global CMT earthquake catalog. Although the Tap distribution does not introduce any significant improvement of fit respect the PL, the truncated gamma does. Simulated samples of this distribution, with parameters β = 0.68 and m = 9.15 and reshuffled in order to mimic the time occurrence of the order statistics of the empirical data, are able to explain the temporal heterogeneity of global seismicity both before and after the great Sumatra-Andaman earthquake of 2004