553 research outputs found

    A Comprehensive Survey on Rare Event Prediction

    Full text link
    Rare event prediction involves identifying and forecasting events with a low probability using machine learning and data analysis. Due to the imbalanced data distributions, where the frequency of common events vastly outweighs that of rare events, it requires using specialized methods within each step of the machine learning pipeline, i.e., from data processing to algorithms to evaluation protocols. Predicting the occurrences of rare events is important for real-world applications, such as Industry 4.0, and is an active research area in statistical and machine learning. This paper comprehensively reviews the current approaches for rare event prediction along four dimensions: rare event data, data processing, algorithmic approaches, and evaluation approaches. Specifically, we consider 73 datasets from different modalities (i.e., numerical, image, text, and audio), four major categories of data processing, five major algorithmic groupings, and two broader evaluation approaches. This paper aims to identify gaps in the current literature and highlight the challenges of predicting rare events. It also suggests potential research directions, which can help guide practitioners and researchers.Comment: 44 page

    Contribution On the Estimation of the Copulas Parameters

    Get PDF
    the task of modeling high dimensional datasets has become increasingly difficult and challenging due to the large amount of redundancy present in the data. This redundancy often leads to the presence of noise and inaccurate data modeling and analysis results. While numerous statistical methods have been proposed to address this problem, many of them involve multiple operations and have high time complexity, often resulting in poor classification performance. To deal with that, in this thesis, three Dimensionality Reduction based on the inter-correlation between the huge data attributes are proposed, where this correlation is modeled using the theory of Copulas. The first two Dimensionality Reduction techniques aim to reduce redundancy by selecting only relevant attributes. While the third proposed technique is a feature extraction process that combines Principal Component Analysis PCA and the bivariate Copulas. All these techniques are performed using real-world datasets and compared against powerful Dimensionality Reduction methods in term of reduction, information capturing and models accuracy of the obtained reduced data to evaluate the effectiveness of each technique

    The literal/non-literal divide synchronically and diachronically: The lexical semantics of an English posture verb

    Get PDF
    This thesis' main research goal is to provide an account of the English posture verb sit, from a synchronic a diachronic perspective. My proposed account of sit comprises various components, including a characterisation of the different possible meanings of sit and a comparison with stand and lie. The two relevant meanings are a literal one and non-literal one (The girl is sitting on the chair vs. The wine bottle is sitting on the chair; in the former the subject is described to be in a sitting position, while in the latter the subject is not in a sitting position). I analyse each meaning/use separately, noting which semantic patterns occur with one type only and those which occur with both. I argue that the non-literal use is diachronically connected to the literal one, and I motivate this claim based on the shared components identified in the thesis and on data from corpus studies reported in the thesis. A consequence of acknowledging a divide between the literal and non-literal uses---a perspective not usually taken in theoretical linguistics---is that I am able to account for important semantic details which might be otherwise overlooked. The cognitive and typological literature includes account of posture verbs cross-linguistically, but in the theoretical literature these verbs have not received much attention. In this thesis, I review existing proposals and highlight the uncertainties surrounding the posture verbs. In order to fillthese gaps in the literature and to better understand the phenomena, I analyse data from synchronic and diachronic corpus studies, and incorporate these insights into my account of sitEl principal objetivo de investigación de esta tesis es dar cuenta del verbo de postura inglés sit (`sentarse¿), desde una perspectiva sincrónica y diacrónica. La descripción que propongo de sit comprende varios componentes, incluida una caracterización de los diferentes significados posibles de sit y una comparación con stand (`estar de pie¿) y lie (`estar echado¿). La literatura cognitiva y tipológica incluye una descripción de los verbos de postura de forma interlingüística, pero en la literatura teórica estos verbos no han recibido mucha atención. En esta tesis, reviso las propuestas existentes y destaco las preguntas sin responder que rodean a los verbos de postura. Para llenar estos vacíos en la literatura científica y comprender mejor los fenómenos, analizo datos de estudios de corpus sincrónicos y diacrónicos, e incorporo estos conocimientos en mi explicación de sit. Los dos significados relevantes son uno literal y uno no literal (The girl is sitting on the chair `La niña está sentada en la silla' vs. The wine bottle is sitting on the chair `(lit.) La botella de vino está sentada en la silla¿; en la primera frase, se describe el sujeto en posición de estar sentado, mientras que en la segunda frase el sujeto no está sentado). Analizo cada significado/uso por separado, notando qué patrones semánticos ocurren con un solo tipo y cuáles ocurren con ambos. Argumento que el uso no literal está conectado diacrónicamente con el literal, y motivo esta afirmación a partir de los componentes compartidos identificados en la tesis y en los datos de los estudios de corpus tratados en la tesis. Una consecuencia de reconocer una división entre los usos literales y no literales (una perspectiva que no suele adoptarse en la lingüística teórica) es que se consigue dar cuenta de importantes detalles semánticos que de otro modo podrían pasarse por alto

    2009 Undergraduate Research Symposium Abstract Book

    Get PDF
    Abstract book from the 2009 UMM Undergraduate Research Symposium (URS) which celebrates student scholarly achievement and creative activities

    Variational Approaches For Learning Finite Scaled Dirichlet Mixture Models

    Get PDF
    With a massive amount of data created on a daily basis, the ubiquitous demand for data analysis is undisputed. Recent development of technology has made machine learning techniques applicable to various problems. Particularly, we emphasize on cluster analysis, an important aspect of data analysis. Recent works with excellent results on the aforementioned task using finite mixture models have motivated us to further explore their extents with different applications. In other words, the main idea of mixture model is that the observations are generated from a mixture of components, in each of which the probability distribution should provide strong flexibility in order to fit numerous types of data. Indeed, the Dirichlet family of distributions has been known to achieve better clustering performances than those of Gaussian when the data are clearly non-Gaussian, especially proportional data.  Thus, we introduce several variational approaches for finite Scaled Dirichlet mixture models. The proposed algorithms guarantee reaching convergence while avoiding the computational complexity of conventional Bayesian inference. In summary, our contributions are threefold. First, we propose a variational Bayesian learning framework for finite Scaled Dirichlet mixture models, in which the parameters and complexity of the models are naturally estimated through the process of minimizing the Kullback-Leibler (KL) divergence between the approximated posterior distribution and the true one. Secondly, we integrate component splitting into the first model, a local model selection scheme, which gradually splits the components based on their mixing weights to obtain the optimal number of components. Finally, an online variational inference framework for finite Scaled Dirichlet mixture models is developed by employing a stochastic approximation method in order to improve the scalability of finite mixture models for handling large scale data in real time. The effectiveness of our models is validated with real-life challenging problems including object, texture, and scene categorization, text-based and image-based spam email detection

    Quantitative Modelling of Climate Change Impact on Hydro-climatic Extremes

    Get PDF
    In recent decades, climate change has caused a more volatile climate leading to more extreme events such as severe rainstorms, heatwaves and floods which are likely to become more frequent. Aiming to reveal climate change impact on the hydroclimatic extremes in a quantitative sense, this thesis presents a comprehensive analysis from three main strands. The first strand focuses on developing a quantitative modelling framework to quantify the spatiotemporal variation of hydroclimatic extremes for the areas of concern. A spatial random sampling toolbox (SRS-GDA) is designed for randomizing the regions of interest (ROIs) with different geographic locations, sizes, shapes and orientations where the hydroclimatic extremes are parameterised by a nonstationary distribution model whose parameters are assumed to be time-varying. The parameters whose variation with respect to different spatial features of ROIs and climate change are finally quantified by various statistical models such as the generalised linear model. The framework is applied to quantify the spatiotemporal variation of rainfall extremes in Great Britain (GB) and Australia and is further used in a comparison study to quantify the bias between observed and climate projected extremes. Then the framework is extended to a multivariate framework to estimate the time-varying joint probability of more than one hydroclimatic variable in the perspective of non-stationarity. A case study for evaluating compound floods in Ho Chi Minh City, Vietnam is applied for demonstrating the application of the framework. The second strand aims to recognise, classify and track the development of hydroclimatic extremes (e.g., severe rainstorms) by developing a stable computer algorithm (i.e., the SPER toolbox). The SPER toolbox can detect the boundary of the event area, extract the spatial and physical features of the event, which can be used not only for pattern recognition but also to support AI-based training for labelling/cataloguing the pattern from the large-sized, grid-based, multi-scaled environmental datasets. Three illustrative cases are provided; and as the front-end of AI study, an example for training a convolution neural network is given for classifying the rainfall extremes in the last century of GB. The third strand turns to support decision making by building both theory-driven and data-driven decision-making models to simulate the decisions in the context of flood forecasting and early warning, using the data collected via laboratory-style experiments based on various information of probabilistic flood forecasts and consequences. The research work demonstrated in this thesis has been able to bridge the knowledge gaps in the related field and it also provides a precritical insight in managing future risks arising from hydroclimatic extremes, which makes perfect sense given the urgent situation of climate change and the related challenges our societies are facing

    Meta-KANSEI modeling with Valence-Arousal fMRI dataset of brain

    Get PDF
    Background: Traditional KANSEI methodology is an important tool in the field of psychology to comprehend the concepts and meanings; it mainly focusses on semantic differential methods. Valence-Arousal is regarded as a reflection of the KANSEI adjectives, which is the core concept in the theory of effective dimensions for brain recognition. From previous studies, it has been found that brain fMRI datasets can contain significant information related to Valence and Arousal. Methods: In this current work, a Valence-Arousal based meta-KANSEI modeling method is proposed to improve the traditional KANSEI presentation. Functional Magnetic Resonance Imaging (fMRI) was used to acquire the response dataset of Valence-Arousal of the brain in the amygdala and orbital frontal cortex respectively. In order to validate the feasibility of the proposed modeling method, the dataset was processed under dimension reduction by using Kernel Density Estimation (KDE) based segmentation and Mean Shift (MS) clustering. Furthermore, Affective Norm English Words (ANEW) by IAPS (International Affective Picture System) were used for comparison and analysis. The data sets from fMRI and ANEW under four KANSEI adjectives of angry, happy, sad and pleasant were processed by the Fuzzy C-Means (FCM) algorithm. Finally, a defined distance based on similarity computing was adopted for these two data sets. Results: The results illustrate that the proposed model is feasible and has better stability per the normal distribution plotting of the distance. The effectiveness of the experimental methods proposed in the current work was higher than in the literature. Conclusions: mean shift can be used to cluster and central points based meta-KANSEI model combining with the advantages of a variety of existing intelligent processing methods are expected to shift the KANSEI Engineering (KE) research into the medical imaging field

    The Interplay of Architecture and Correlated Variability in Neuronal Networks

    Get PDF
    This much is certain: neurons are coupled, and they exhibit covariations in their output. The extent of each does not have a single answer. Moreover, the strength of neuronal correlations, in particular, has been a subject of hot debate within the neuroscience community over the past decade, as advancing recording techniques have made available a lot of new, sometimes seemingly conflicting, datasets. The impact of connectivity and the resulting correlations on the ability of animals to perform necessary tasks is even less well understood. In order to answer relevant questions in these categories, novel approaches must be developed. This work focuses on three somewhat distinct, but inseparably coupled, crucial avenues of research within the broader field of computational neuroscience. First, there is a need for tools which can be applied, both by experimentalists and theorists, to understand how networks transform their inputs. In turn, these tools will allow neuroscientists to tease apart the structure which underlies network activity. The Generalized Thinning and Shift framework, presented in Chapter 4, addresses this need. Next, taking for granted a general understanding of network architecture as well as some grasp of the behavior of its individual units, we must be able to reverse the activity to structure relationship, and understand instead how network structure determines dynamics. We achieve this in Chapters 5 through 7 where we present an application of linear response theory yielding an explicit approximation of correlations in integrate--and--fire neuronal networks. This approximation reveals the explicit relationship between correlations, structure, and marginal dynamics. Finally, we must strive to understand the functional impact of network dynamics and architecture on the tasks that a neural network performs. This need motivates our analysis of a biophysically detailed model of the blow fly visual system in Chapter 8. Our hope is that the work presented here represents significant advances in multiple directions within the field of computational neuroscience.Mathematics, Department o
    corecore