1,356 research outputs found

    Recent advances in directional statistics

    Get PDF
    Mainstream statistical methodology is generally applicable to data observed in Euclidean space. There are, however, numerous contexts of considerable scientific interest in which the natural supports for the data under consideration are Riemannian manifolds like the unit circle, torus, sphere and their extensions. Typically, such data can be represented using one or more directions, and directional statistics is the branch of statistics that deals with their analysis. In this paper we provide a review of the many recent developments in the field since the publication of Mardia and Jupp (1999), still the most comprehensive text on directional statistics. Many of those developments have been stimulated by interesting applications in fields as diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics, image analysis, text mining, environmetrics, and machine learning. We begin by considering developments for the exploratory analysis of directional data before progressing to distributional models, general approaches to inference, hypothesis testing, regression, nonparametric curve estimation, methods for dimension reduction, classification and clustering, and the modelling of time series, spatial and spatio-temporal data. An overview of currently available software for analysing directional data is also provided, and potential future developments discussed.Comment: 61 page

    Distributed Load Testing by Modeling and Simulating User Behavior

    Get PDF
    Modern human-machine systems such as microservices rely upon agile engineering practices which require changes to be tested and released more frequently than classically engineered systems. A critical step in the testing of such systems is the generation of realistic workloads or load testing. Generated workload emulates the expected behaviors of users and machines within a system under test in order to find potentially unknown failure states. Typical testing tools rely on static testing artifacts to generate realistic workload conditions. Such artifacts can be cumbersome and costly to maintain; however, even model-based alternatives can prevent adaptation to changes in a system or its usage. Lack of adaptation can prevent the integration of load testing into system quality assurance, leading to an incomplete evaluation of system quality. The goal of this research is to improve the state of software engineering by addressing open challenges in load testing of human-machine systems with a novel process that a) models and classifies user behavior from streaming and aggregated log data, b) adapts to changes in system and user behavior, and c) generates distributed workload by realistically simulating user behavior. This research contributes a Learning, Online, Distributed Engine for Simulation and Testing based on the Operational Norms of Entities within a system (LODESTONE): a novel process to distributed load testing by modeling and simulating user behavior. We specify LODESTONE within the context of a human-machine system to illustrate distributed adaptation and execution in load testing processes. LODESTONE uses log data to generate and update user behavior models, cluster them into similar behavior profiles, and instantiate distributed workload on software systems. We analyze user behavioral data having differing characteristics to replicate human-machine interactions in a modern microservice environment. We discuss tools, algorithms, software design, and implementation in two different computational environments: client-server and cloud-based microservices. We illustrate the advantages of LODESTONE through a qualitative comparison of key feature parameters and experimentation based on shared data and models. LODESTONE continuously adapts to changes in the system to be tested which allows for the integration of load testing into the quality assurance process for cloud-based microservices

    Demand response performance and uncertainty: A systematic literature review

    Get PDF
    The present review has been carried out, resorting to the PRISMA methodology, analyzing 218 published articles. A comprehensive analysis has been conducted regarding the consumer's role in the energy market. Moreover, the methods used to address demand response uncertainty and the strategies used to enhance performance and motivate participation have been reviewed. The authors find that participants will be willing to change their consumption pattern and behavior given that they have a complete awareness of the market environment, seeking the optimal decision. The authors also find that a contextual solution, giving the right signals according to the different behaviors and to the different types of participants in the DR event, can improve the performance of consumers' participation, providing a reliable response. DR is a mean of demand-side management, so both these concepts are addressed in the present paper. Finally, the pathways for future research are discussed.This article is a result of the project RETINA (NORTE-01-0145- FEDER-000062), supported by Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF). We also acknowledge the work facilities and equipment provided by GECAD research center (UIDB/00760/2020) to the project team, and grants CEECIND/02887/2017 and SFRH/BD/144200/2019.info:eu-repo/semantics/publishedVersio

    Proceedings of the 35th International Workshop on Statistical Modelling : July 20- 24, 2020 Bilbao, Basque Country, Spain

    Get PDF
    466 p.The InternationalWorkshop on Statistical Modelling (IWSM) is a reference workshop in promoting statistical modelling, applications of Statistics for researchers, academics and industrialist in a broad sense. Unfortunately, the global COVID-19 pandemic has not allowed holding the 35th edition of the IWSM in Bilbao in July 2020. Despite the situation and following the spirit of the Workshop and the Statistical Modelling Society, we are delighted to bring you the proceedings book of extended abstracts

    Hidden Markov models reveal complexity in the diving behaviour of short-finned pilot whales

    Get PDF
    This work was supported by award RC-2154 from the Strategic Environmental Research and Development Program and funding from the Naval Facilities Engineering Command Atlantic and NOAA Fisheries, Southeast Region. DS was supported by the United States Office of Naval Research grant N00014-12-1-0204, under the project entitled Multi-study Ocean acoustics Human effects Analysis (MOCHA).Diving behaviour of short-finned pilot whales is often described by two states; deep foraging and shallow, non-foraging dives. However, this simple classification system ignores much of the variation that occurs during subsurface periods. We used multi-state hidden Markov models (HMM) to characterize states of diving behaviour and the transitions between states in short-finned pilot whales. We used three parameters (number of buzzes, maximum dive depth and duration) measured in 259 dives by digital acoustic recording tags (DTAGs) deployed on 20 individual whales off Cape Hatteras, North Carolina, USA. The HMM identified a four-state model as the best descriptor of diving behaviour. The state-dependent distributions for the diving parameters showed variation between states, indicative of different diving behaviours. Transition probabilities were considerably higher for state persistence than state switching, indicating that dive types occurred in bouts. Our results indicate that subsurface behaviour in short-finned pilot whales is more complex than a simple dichotomy of deep and shallow diving states, and labelling all subsurface behaviour as deep dives or shallow dives discounts a significant amount of important variation. We discuss potential drivers of these patterns, including variation in foraging success, prey availability and selection, bathymetry, physiological constraints and socially mediated behaviour.Publisher PDFPeer reviewe

    Effective demand response gathering and deployment in smart grids for intensive renewable integration using aggregation and machine learning

    Get PDF
    Tesis por compendio de publicaciones.[EN] Distributed generation, namely renewables-based technologies, have emerged as a crucial component in the transition to mitigate the effects of climate change, providing a decentralized approach to electricity production. However, the volatile behavior of distributed generation has created new challenges in maintaining system balance and reliability. In this context, the demand response concept and corresponding programs arise giving the local energy communities prominence. In demand response concept, it is expected an empowerment of the consumer in the electricity sector. This has a significant impact on grid operations and brings complex interactions due to the volatile behavior, privacy concerns, and lack of consumer knowledge in the energy market context. For this, aggregators play a crucial role addressing these challenges. It is crucial to develop tools that allow the aggregators helping consumers to make informed decisions, maximize the benefits of their flexibility resources, and contribute to the overall success of grid operations. This thesis, through innovative solutions and resorting to artificial intelligence models, addresses the integration of renewables, promoting fair participation among all demand response providers. The thesis ultimately results in an innovative decision support system - MAESTRO, the Machine learning Assisted Energy System management Tool for Renewable integration using demand respOnse. MAESTRO is composed by a set of diversified models that together contribute for handling the complexity of managing energy communities with distributed generation resources, demand response providers, energy storage systems and electric vehicles. This PhD thesis comprises a comprehensive analysis of state-of-the-art techniques, system design and development, experimental results, and key findings. In this research were published twenty-six scientific papers, in both international journals and conference proceedings. Contributions to international projects and Portuguese projects was accomplished. [ES] La generación distribuida, en particular las tecnologías basadas en energías renovables, se ha convertido en un componente crucial en la transición para mitigar los efectos del cambio climático, al proporcionar un enfoque descentralizado para la producción de electricidad. Sin embargo, el comportamiento volátil de la generación distribuida ha generado nuevos desafíos para mantener el equilibrio y la confiabilidad del sistema. En este contexto, surge el concepto de respuesta de la demanda y los programas correspondientes, otorgando prominencia a las comunidades energéticas locales. En el concepto de "respuesta a la demanda" (DR por sus siglas en inglés), se espera un empoderamiento del consumidor en el sector eléctrico. Esto tiene un impacto significativo en la operación de la red y genera interacciones complejas debido al comportamiento volátil, las preocupaciones de privacidad y la falta de conocimiento del consumidor en el contexto del mercado energético. Para esto, los agregadores desempeñan un papel crucial al abordar estos desafíos. Es fundamental desarrollar herramientas que permitan a los agregadores ayudar a los consumidores a tomar decisiones informadas, maximizar los beneficios de sus recursos de flexibilidad y contribuir al éxito general de las operaciones de la red. Esta tesis, a través de soluciones innovadoras y utilizando modelos de inteligencia artificial, aborda la integración de energías renovables, promoviendo una participación justa entre todos los proveedores de respuesta de la demanda. La tesis resulta en última instancia en un sistema de apoyo a la toma de decisiones innovador: MAESTRO, Machine learning Assisted Energy System management Tool for Renewable integration using demand respOnse. MAESTRO está compuesto por un conjunto de modelos diversificados que contribuyen juntos para manejar la complejidad de la gestión de comunidades energéticas con recursos de generación distribuida, proveedores de respuesta de la demanda, sistemas de almacenamiento de energía y vehículos eléctricos. Esta tesis de doctorado comprende un análisis exhaustivo de las técnicas de vanguardia, el diseño y desarrollo del sistema, los resultados experimentales y los hallazgos clave. En esta investigación se publicaron veintiséis artículos científicos, tanto en revistas internacionales como en actas de conferencias. Se lograron contribuciones a proyectos internacionales y proyectos portugueses. [POR] A produção distribuída, nomeadamente as tecnologias baseadas em energias renováveis, emergiram como um componente crucial na transição para mitigar os efeitos das alterações climáticas, proporcionando uma abordagem descentralizada à produção de eletricidade. No entanto, o comportamento volátil da geração distribuída criou desafios na manutenção do equilíbrio e da fiabilidade do sistema. Nesse contexto, surge o conceito de resposta à procura e os programas correspondentes, conferindo proeminência às comunidades energéticas locais. No conceito de resposta à procura, espera-se um empoderamento do consumidor no setor elétrico. Isso tem um impacto significativo nas operações da rede e gera interações complexas devido ao comportamento volátil, preocupações com a privacidade e falta de conhecimento dos consumidores no contexto do mercado energético. Para isso, os agregadores desempenham um papel crucial ao lidar com esses desafios. É fundamental desenvolver ferramentas que permitam aos agregadores ajudar os consumidores a tomar decisões informadas, maximizar os benefícios de seus recursos de flexibilidade e contribuir para o sucesso global das operações da rede. Esta tese de doutoramento, através de soluções inovadoras e recorrendo a modelos de inteligência artificial, aborda a integração de energias renováveis, promovendo uma participação justa entre todos os fornecedores de resposta à procura. A tese resulta, em última instância, num sistema inovador de apoio à tomada de decisões - MAESTRO, Machine learning Assisted Energy System management Tool for Renewable integration using demand respOnse. A ferramenta MAESTRO é composta por um conjunto de modelos diversificados que, em conjunto, contribuem para lidar com a complexidade da gestão de comunidades energéticas com recursos de geração distribuída, fornecedores de resposta à procura, sistemas de armazenamento de energia e veículos elétricos. Esta tese de doutoramento abrange uma análise abrangente de técnicas de ponta, design e desenvolvimento do sistema, resultados experimentais e descobertas-chave. Nesta pesquisa, foram publicados vinte e seis artigos científicos, tanto em revistas internacionais como em atas de conferências. Foram realizadas contribuições para projetos internacionais e projetos portugueses

    Bayesian networks for omics data analysis

    Get PDF
    This thesis focuses on two aspects of high throughput technologies, i.e. data storage and data analysis, in particular in transcriptomics and metabolomics. Both technologies are part of a research field that is generally called ‘omics’ (or ‘-omics’, with a leading hyphen), which refers to genomics, transcriptomics, proteomics, or metabolomics. Although these techniques study different entities (genes, gene expression, proteins, or metabolites), they all have in common that they use high-throughput technologies such as microarrays and mass spectrometry, and thus generate huge amounts of data. Experiments conducted using these technologies allow one to compare different states of a living cell, for example a healthy cell versus a cancer cell or the effect of food on cell condition, and at different levels. The tools needed to apply omics technologies, in particular microarrays, are often manufactured by different vendors and require separate storage and analysis software for the data generated by them. Moreover experiments conducted using different technologies cannot be analyzed simultaneously to answer a biological question. Chapter 3 presents MADMAX, our software system which supports storage and analysis of data from multiple microarray platforms. It consists of a vendor-independent database which is tightly coupled with vendor-specific analysis tools. Upcoming technologies like metabolomics, proteomics and high-throughput sequencing can easily be incorporated in this system. Once the data are stored in this system, one obviously wants to deduce a biological relevant meaning from these data and here statistical and machine learning techniques play a key role. The aim of such analysis is to search for relationships between entities of interest, such as genes, metabolites or proteins. One of the major goals of these techniques is to search for causal relationships rather than mere correlations. It is often emphasized in the literature that "correlation is not causation" because people tend to jump to conclusions by making inferences about causal relationships when they actually only see correlations. Statistics are often good in finding these correlations; techniques called linear regression and analysis of variance form the core of applied multivariate statistics. However, these techniques cannot find causal relationships, neither are they able to incorporate prior knowledge of the biological domain. Graphical models, a machine learning technique, on the other hand do not suffer from these limitations. Graphical models, a combination of graph theory, statistics and information science, are one of the most exciting things happening today in the field of machine learning applied to biological problems (see chapter 2 for a general introduction). This thesis deals with a special type of graphical models known as probabilistic graphical models, belief networks or Bayesian networks. The advantage of Bayesian networks over classical statistical techniques is that they allow the incorporation of background knowledge from a biological domain, and that analysis of data is intuitive as it is represented in the form of graphs (nodes and edges). Standard statistical techniques are good in describing the data but are not able to find non-linear relations whereas Bayesian networks allow future prediction and discovering nonlinear relations. Moreover, Bayesian networks allow hierarchical representation of data, which makes them particularly useful for representing biological data, since most biological processes are hierarchical by nature. Once we have such a causal graph made either by a computer program or constructed manually we can predict the effects of a certain entity by manipulating the state of other entities, or make backward inferences from effects to causes. Of course, if the graph is big, doing the necessary calculations can be very difficult and CPU-expensive, and in such cases approximate methods are used. Chapter 4 demonstrates the use of Bayesian networks to determine the metabolic state of feeding and fasting mice to determine the effect of a high fat diet on gene expression. This chapter also shows how selection of genes based on key biological processes generates more informative results than standard statistical tests. In chapter 5 the use of Bayesian networks is shown on the combination of gene expression data and clinical parameters, to determine the effect of smoking on gene expression and which genes are responsible for the DNA damage and the raise in plasma cotinine levels of blood of a smoking population. This study was conducted at Maastricht University where 22 twin smokers were profiled. Chapter 6 presents the reconstruction of a key metabolic pathway which plays an important role in ripening of tomatoes, thus showing the versatility of the use of Bayesian networks in metabolomics data analysis. The general trend in research shows a flood of data emerging from sequencing and metabolomics experiments. This means that to perform data mining on these data one requires intelligent techniques that are computationally feasible and able to take the knowledge of experts into account to generate relevant results. Graphical models fit this paradigm well and we expect them to play a key role in mining the data generated from omics experiments. <br/
    corecore