628,926 research outputs found

    Principal Patterns on Graphs: Discovering Coherent Structures in Datasets

    Get PDF
    Graphs are now ubiquitous in almost every field of research. Recently, new research areas devoted to the analysis of graphs and data associated to their vertices have emerged. Focusing on dynamical processes, we propose a fast, robust and scalable framework for retrieving and analyzing recurring patterns of activity on graphs. Our method relies on a novel type of multilayer graph that encodes the spreading or propagation of events between successive time steps. We demonstrate the versatility of our method by applying it on three different real-world examples. Firstly, we study how rumor spreads on a social network. Secondly, we reveal congestion patterns of pedestrians in a train station. Finally, we show how patterns of audio playlists can be used in a recommender system. In each example, relevant information previously hidden in the data is extracted in a very efficient manner, emphasizing the scalability of our method. With a parallel implementation scaling linearly with the size of the dataset, our framework easily handles millions of nodes on a single commodity server

    Non-stationarities in stock returns

    Get PDF
    The paper outlines a methodology for analyzing daily stock returns that relinquishes the assumption of global stationarity. Giving up this common working hypothesis reflects our belief that fundamental features of the financial markets are continuously and significantly changing. Our approach approximates locally the non-stationary data by stationary models. The methodology is applied to the S&P 500 series of returns covering a period of over seventy years of market activity. We find most of the dynamics of this time series to be concentrated in shifts of the unconditional variance. The forecasts based on our non-stationary unconditional modeling were found to be superior to those obtained in a stationary long memory framework or to those based on a stationary Garch(1,1) data generating process.stock returns, non-stationarities, locally stationary processes, volatility, sample autocorrelation, long range dependence, Garch(1,1) data generating process.

    Doubly-online changepoint detection for monitoring health status during sports activities

    Get PDF
    We provide an online framework for analyzing data recorded by smart watches during running activities. In particular, we focus on identifying variations in the behavior of one or more measurements caused by changes in physical condition, such as physical discomfort, periods of prolonged de-training, or even the malfunction of measuring devices. Our framework considers data as a sequence of running activities represented by multivariate time series of physical and biometric data. We combine classical changepoint detection models with an unknown number of components with Gaussian state space models to detect distributional changes between a sequence of activities. The model considers multiple sources of dependence due to the sequential nature of subsequent activities, the autocorrelation structure within each activity, and the contemporaneous dependence between different vari-ables. We provide an online expectation-maximization (EM) algorithm involving a sequential Monte Carlo (SMC) approximation of changepoint pre-dicted probabilities. As a byproduct of our model assumptions, our proposed approach processes sequences of multivariate time series in a doubly-online framework. While classical changepoint models detect changes between subsequent activities, the state space framework, coupled with the online EM algorithm, provides the additional benefit of estimating the real-time probability that a current activity is a changepoint

    Predicting distributional profiles of physical activity in the NHANES database using a Partially Linear Single-Index Fr\'echet Regression model

    Full text link
    Object-oriented data analysis is a fascinating and developing field in modern statistical science with the potential to make significant and valuable contributions to biomedical applications. This statistical framework allows for the formalization of new methods to analyze complex data objects that capture more information than traditional clinical biomarkers. The paper applies the object-oriented framework to analyzing and predicting physical activity measured by accelerometers. As opposed to traditional summary metrics, we utilize a recently proposed representation of physical activity data as a distributional object, providing a more sophisticated and complete profile of individual energetic expenditure in all ranges of monitoring intensity. For the purpose of predicting these distributional objects, we propose a novel hybrid Frechet regression model and apply it to US population accelerometer data from NHANES 2011-2014. The semi-parametric character of the new model allows us to introduce non-linear effects for essential variables, such as age, that are known from a biological point of view to have nuanced effects on physical activity. At the same time, the inclusion of a global for linear term retains the advantage of interpretability for other variables, particularly categorical covariates such as ethnicity and sex. The results obtained in our analysis are helpful from a public health perspective and may lead to new strategies for optimizing physical activity interventions in specific American subpopulations

    Detecting recessions in the Great Moderation: a real-time analysis

    Get PDF
    The nature of the business cycle, particularly in the United States, has changed dramatically over the past several decades. In the 1970s and early 1980s, the U.S. economy often whipsawed up and down. Since then, real economic activity stabilized considerably, entering a period economists call the “Great Moderation.” With the ups and downs of the economy becoming less dramatic, it has become harder to determine in real-time when the economy dips into recession. ; Economists have a variety of methods to determine when the economy is entering a recession. These methods range from directly analyzing a broad spectrum of data to the formal use of recession prediction models. The National Bureau of Economic Research (NBER) uses the first approach, relying on several data series to make a determination of when the economy enters or exits a recession. Their decisions are intended to be accurate, not timely. More formal recession prediction models are designed to send a timely signal, but often do not take account of how the Great Moderation has altered the business cycle. ; Davig uses a framework that efficiently uses a large set of data in a “business cycle tracking” model. The model accounts for shifts in overall economic volatility – to capture the Great Moderation – and sends a signal when the economy is shifting between periods of low and high economic activity. The model can be used in different ways to extract a signal regarding whether the economy is likely heading for an NBER recession.

    Statistical Analysis of Zebrafish Locomotor Response

    Get PDF
    Zebrafish larvae display rich locomotor behaviour upon external stimulation. The movement can be simultaneously tracked from many larvae arranged in multi-well plates. The resulting time-series locomotor data have been used to reveal new insights into neurobiology and pharmacology. However, the data are of large scale, and the corresponding locomotor behavior is affected by multiple factors. These issues pose a statistical challenge for comparing larval activities. To address this gap, this study has analyzed a visually-driven locomotor behaviour named the visual motor response (VMR) by the Hotelling's T-squared test. This test is congruent with comparing locomotor profiles from a time period. Different wild-type (WT) strains were compared using the test, which shows that they responded differently to light change at different developmental stages. The performance of this test was evaluated by a power analysis, which shows that the test was sensitive for detecting differences between experimental groups with sample numbers that were commonly used in various studies. In addition, this study investigated the effects of various factors that might affect the VMR by multivariate analysis of variance (MANOVA). The results indicate that the larval activity was generally affected by stage, light stimulus, their interaction, and location in the plate. Nonetheless, different factors affected larval activity differently over time, as indicated by a dynamical analysis of the activity at each second. Intriguingly, this analysis also shows that biological and technical repeats had negligible effect on larval activity. This finding is consistent with that from the Hotelling's T-squared test, and suggests that experimental repeats can be combined to enhance statistical power. Together, these investigations have established a statistical framework for analyzing VMR data, a framework that should be generally applicable to other locomotor data with similar structure

    Flexible Methods for the Analysis of Clustered Event Data in Observational Studies

    Full text link
    Clustered event data are frequently encountered in observational studies. In this dissertation, I am focusing on correlated event outcomes clustered by subjects (multivariate events), facilities, and both hierarchically. The main approaches to analyzing correlated event data include frailty models with random effects and marginal models with robust variance estimation. Difficulties for the existing methods include a) computational demands and speed in the presence of numerous clusters (e.g., recurrent events); b) lacking rigorous diagnostic tools to prespecify the distribution of the random effects; c) analyzing a multi-state model that follows a semi-Markov renewal process. The growing need for flexible, computationally fast, and accurate estimating approaches to analyzing clustered event data motivates my methodological exploration in the following chapters. In Chapter II, I propose a log-normal correlated frailty model to analyze recurrent event incidence rates and duration jointly. The regression parameters are estimated through a penalized partial likelihood, and the variance-covariance matrix of the frailty is estimated via a recursive estimating formula. The proposed methods are more flexible and faster than existing approaches and have the potential to be extended to other frequently encountered data structures (e.g., joint modeling with longitudinal outcomes). In Chapter III, I propose a class of semiparametric frailty models that leave the distribution of frailties unspecified. Parameter estimation proceeds through estimating equations derived from first- and second-moment conditions. Estimation techniques have been developed for three different models, including a shared frailty model for a single event; a correlated frailty model for multiple events; and a hierarchically structured nested failure time model. Extensive simulation studies demonstrate that the proposed approach can accurately estimate the regression parameters, baseline event rates, and variance components. Moreover, the computation time is fast, permitting application to very large data sets. In Chapter IV, I develop a class of multi-state rate models to study the association of exposure to lead, a major endocrine disruptive agent, with behavioral changes captured by accelerometer measurements from wearable device ActiGraph GT3X. Categorized from personal activity counts over time by validated cutoffs, activity states are defined and analyzed through their in-state transitions using the proposed multi-state rate models in which the baseline rates are estimated nonparametrically. The proposed models combine the advantage of regular event rate models with the concept of competing risks, allowing to incorporate a daily renewal property and share baselines in the activity transition rates across different days. The regression parameters are specified in the event rate functions, leading to a semiparametric modeling framework. Statistical inference is based on a robust sandwich variance estimator that accounts for correlations between different event types and their recurrences. I found that the evaluated exposure to lead is associated with an increased transition from low activity to vigorous activity. Chapter V is a special project of modeling the COVID-19 surveillance data in China, in which I develop two extended susceptible-infected-recovered (SIR) state-space models under a Bayesian state-space model framework. I propose to include a time-varying transmission rate or a time-dependent quarantine process in the classical SIR model to assess the effectiveness of macro-control measures issued by the government to mitigate the pandemic. The proposed compartment models enable to predict both short-term and long-term prevalence of the COVID-19 infection with quantification of prediction uncertainty. I provide and maintain an open-source R package on GitHub (lilywang1988/eSIR) for the developed analytics.PHDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163013/1/lilywang_1.pd

    Descriptive And Review Study Adaptive Control Of Nonlinear Systems In Discrete Time

    Get PDF
    Nowadays, analyzing different control systems is a must for virtually all types of modern industries and factories. Analyzing these control systems allows optimizing and streamlining processes, which in many cases are carried out manually, leading to large errors, delays and costly processes. Continuous-time adaptive control of nonlinear systems has been an area of increasing research activity [1] and globally, regulation and tracking results have been obtained for several types of nonlinear systems [2]. However, the adaptive technique is gradually becoming more dynamic after 25 years of research and experimentation. Important theoretical results on stability and structure have been established. There is still much theoretical work to be done [3]. On the other hand, adaptive control in discrete-time nonlinear systems has received much less attention, in part because of the difficulties associated with the sampled data of nonlinear systems [2]. Thus, it is in some theories where adaptive control laws are implemented admitting the intervening nonlinearities in the real system [4] where investigations about the regulation of the system are created. The purpose of this is to implement a very simple adaptive control law and to check the convergence of the closed loop.  However, Zhongsheng Hou, author of several well-regarded papers proposes a model-free adaptive control approach for a class of discrete-time nonlinear SISO systems with a systematic framework [5]-[6]
    corecore