21 research outputs found

    Event-based Vision: A Survey

    Get PDF
    Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world

    Unsupervised learning for anomaly detection in Australian medical payment data

    Full text link
    Fraudulent or wasteful medical insurance claims made by health care providers are costly for insurers. Typically, OECD healthcare organisations lose 3-8% of total expenditure due to fraud. As Australia’s universal public health insurer, Medicare Australia, spends approximately A34billionperannumontheMedicareBenefitsSchedule(MBS)andPharmaceuticalBenefitsScheme,wastedspendingofA 34 billion per annum on the Medicare Benefits Schedule (MBS) and Pharmaceutical Benefits Scheme, wasted spending of A1–2.7 billion could be expected.However, fewer than 1% of claims to Medicare Australia are detected as fraudulent, below international benchmarks. Variation is common in medicine, and health conditions, along with their presentation and treatment, are heterogenous by nature. Increasing volumes of data and rapidly changing patterns bring challenges which require novel solutions. Machine learning and data mining are becoming commonplace in this field, but no gold standard is yet available. In this project, requirements are developed for real-world application to compliance analytics at the Australian Government Department of Health and Aged Care (DoH), covering: unsupervised learning; problem generalisation; human interpretability; context discovery; and cost prediction. Three novel methods are presented which rank providers by potentially recoverable costs. These methods used association analysis, topic modelling, and sequential pattern mining to provide interpretable, expert-editable models of typical provider claims. Anomalous providers are identified through comparison to the typical models, using metrics based on costs of excess or upgraded services. Domain knowledge is incorporated in a machine-friendly way in two of the methods through the use of the MBS as an ontology. Validation by subject-matter experts and comparison to existing techniques shows that the methods perform well. The methods are implemented in a software framework which enables rapid prototyping and quality assurance. The code is implemented at the DoH, and further applications as decision-support systems are in progress. The developed requirements will apply to future work in this fiel

    Multimodal sentiment analysis in real-life videos

    Get PDF
    This thesis extends the emerging field of multimodal sentiment analysis of real-life videos, taking two components into consideration: the emotion and the emotion's target. The emotion component of media is traditionally represented as a segment-based intensity model of emotion classes. This representation is replaced here by a value- and time-continuous view. Adjacent research fields, such as affective computing, have largely neglected the linguistic information available from automatic transcripts of audio-video material. As is demonstrated here, this text modality is well-suited for time- and value-continuous prediction. Moreover, source-specific problems, such as trustworthiness, have been largely unexplored so far. This work examines perceived trustworthiness of the source, and its quantification, in user-generated video data and presents a possible modelling path. Furthermore, the transfer between the continuous and discrete emotion representations is explored in order to summarise the emotional context at a segment level. The other component deals with the target of the emotion, for example, the topic the speaker is addressing. Emotion targets in a video dataset can, as is shown here, be coherently extracted based on automatic transcripts without limiting a priori parameters, such as the expected number of targets. Furthermore, alternatives to purely linguistic investigation in predicting targets, such as knowledge-bases and multimodal systems, are investigated. A new dataset is designed for this investigation, and, in conjunction with proposed novel deep neural networks, extensive experiments are conducted to explore the components described above. The developed systems show robust prediction results and demonstrate strengths of the respective modalities, feature sets, and modelling techniques. Finally, foundations are laid for cross-modal information prediction systems with applications to the correction of corrupted in-the-wild signals from real-life videos

    The quantification of pressure and saturation changes in clastic reservoirs using 4D seismic data

    Get PDF
    The problem of quantifying pressure and saturation changes from 4D seismic data is an area of active research faced with many challenges concerning the non-uniqueness of seismic data inversion, non-repeatability noise in the data, the formulation of the inverse problem, and the use of appropriate constraints. The majority of the inversion methods rely on empirical rock-physics model calibrations linking elastic properties to expected pressure and saturation changes. Model-driven techniques indeed provide a theoretical framework for the practical interpretation of the 4D seismic response but pressure and saturation separation based on this approach are inconsistent with the observed 4D seismic response and insights from reservoir engineering. The outcome is a bias in estimated pressure and saturation changes and for some a leakage between the two. Others have addressed some of this bias using the causality between the induced-production and the observed 4D seismic response to formulate a direct, quick and less compute-intensive inversion - characterised by data-driven techniques. But challenges still remain as to the accuracy of the causality link- as defined by the reservoir’s sensitivity to production effects, and in defining appropriate constraints to tackle non-uniqueness of the seismic inversion and uncertainties in the 4D seismic data. The main contributions of this thesis are the enhancement of data-driven inversion approach by using multiple monitor 4D seismic data to quantify the reservoir’s sensitivity to pressure and saturation changes, together with the introduction of engineering-consistent constraints provided by multiple history-matched fluid-flow simulation models. A study using observed 4D seismic data (amplitudes and times shifts) acquired at different monitor times on four producing North Sea clastic fields demonstrates the reliability of the seismic-based method to decouple the reservoir’s sensitivity specific to each field’s geological characteristics. A natural extension is to combine multiple monitor 4D seismic data in an inversion scheme that solves for the reservoir sensitivity to pressure and saturation changes, the pressure and saturation changes themselves and the uncertainties in the inversion solution. At least two monitor 4D seismic datasets are required to solve for the reservoir’s sensitivity, and offset stacks (near, mid, and far) are required to decouple pressure, water and gas saturation changes. The generation and use of geologically-constrained and production-constrained multiple simulation models provided spatial constraints to the solution space, making the inversion scheme robust. Within the inversion, the fitness to spatial historical data, i.e. 4D seismic data acquired at different monitor times is analysed. The added benefit of using multiple monitor data is that it allows for a soft “close-the-loop” between the engineering and the 4D seismic domain. One step in the inversion scheme is repeated for as many history-matched simulation models as generated. Each model provides pressure and saturation input to the inversion to obtain maps of the reservoir’s sensitivity. By computing the norm of residuals for each inversion based on each model input, the best model (having the lowest norm of residuals) can be identified, besides the use of a history-matching objective. The inversion scheme thus marks the first step for a seismic-assisted history matching procedure, suggesting that pressure and saturation inversion is best done within the history-matching process. In addition, analysis of uncertainties in quantitative 4D seismic data interpretation is performed by developing a seismic modelling method that links the shot timings of a real field towed streamer and a permanent reservoir monitoring (PRM) acquisition to the reservoir under production. It is found that pressure and saturation fluctuations that occur during the shooting of monitor acquisitions creates a complicated spatio-temporal imprint on the pre-stack data, and errors if 4D seismic data is analysed in the post-stack domain. Pressure and saturation changes as imaged across the offset stacks (near, mid and far offset) are not the same, adding to the problems in separating pressure and saturation changes using offset stacks of 4D seismic data. The approximate modelling relay that the NRMS errors between offset stacks (up to 7.5%) caused by the intra-survey effects are likely at the limit of 4D seismic measurements using towed streamer technology, but are potentially observable, particularly for PRM technology. Intra-survey effects should thus be considered during 4D survey planning as well as during data processing and analysis. It is recommended that the shot timestamps of the acquisition is used to sort the seismic data immediately after pre-stack migration and before any stacking. The seismic data should also be shot quickly in a consistent pattern to optimise time and fold coverage. It is common to relate the simulation model output to a specific time within the acquisition (start, middle or end of survey), but this study reveals that it is best to take an average of simulation model predictions output at fine time intervals over the entire length of the acquisition, as this is a better temporal comparison to the acquired post-stack 4D seismic data

    Harnessing rare category trinity for complex data

    Get PDF
    In the era of big data, we are inundated with the sheer volume of data being collected from various domains. In contrast, it is often the rare occurrences that are crucially important to many high-impact domains with diverse data types. For example, in online transaction platforms, the percentage of fraudulent transactions might be small, but the resultant financial loss could be significant; in social networks, a novel topic is often neglected by the majority of users at the initial stage, but it could burst into an emerging trend afterward; in the Sloan Digital Sky Survey, the vast majority of sky images (e.g., known stars, comets, nebulae, etc.) are of no interest to the astronomers, while only 0.001% of the sky images lead to novel scientific discoveries; in the worldwide pandemics (e.g., SARS, MERS, COVID19, etc.), the primary cases might be limited, but the consequences could be catastrophic (e.g., mass mortality and economic recession). Therefore, studying such complex rare categories have profound significance and longstanding impact in many aspects of modern society, from preventing financial fraud to uncovering hot topics and trends, from supporting scientific research to forecasting pandemic and natural disasters. In this thesis, we propose a generic learning mechanism with trinity modules for complex rare category analysis: (M1) Rare Category Characterization - characterizing the rare patterns with a compact representation; (M2) Rare Category Explanation - interpreting the prediction results and providing relevant clues for the end-users; (M3) Rare Category Generation - producing synthetic rare category examples that resemble the real ones. The key philosophy of our mechanism lies in "all for one and one for all" - each module makes unique contributions to the whole mechanism and thus receives support from its companions. In particular, M1 serves as the de-novo step to discover rare category patterns on complex data; M2 provides a proper lens to the end-users to examine the outputs and understand the learning process; and M3 synthesizes real rare category examples for data augmentation to further improve M1 and M2. To enrich the learning mechanism, we develop principled theorems and solutions to characterize, understand, and synthesize rare categories on complex scenarios, ranging from static rare categories to time-evolving rare categories, from attributed data to graph-structured data, from homogeneous data to heterogeneous data, from low-order connectivity patterns to high-order connectivity patterns, etc. It is worthy of mentioning that we have also launched one of the first visual analytic systems for dynamic rare category analysis, which integrates our developed techniques and enables users to investigate complex rare categories in practice

    Exploratory Data Analysis of the Large Scale Gas Injection Test (Lasgit)

    Get PDF
    This thesis presents an Exploratory Data Analysis (EDA) performed on the dataset arising from the operation of the Large Scale Gas Injection Test (Lasgit). Lasgit is a field scale experiment located approximately 420m underground at the Äspö Hard Rock Laboratory (HRL) in Sweden. The experiment is designed to study the impact of gas build-up and subsequent migration through the Engineered Barrier System (EBS) of a KBS-3 concept radioactive waste repository. Investigation of the smaller scale, or ‘second order’ features of the dataset are the focus of the EDA, with the study of such features intended to contribute to the understanding of the experiment. In order to investigate Lasgit’s substantial (26 million datum point) dataset, a bespoke computational toolkit, the Non-Uniform Data Analysis Toolkit (NUDAT), designed to expose and quantify difficult to observe phenomena in large, non-uniform datasets has been developed. NUDAT has been designed with capabilities including non-parametric trend detection, frequency domain analysis, and second order event candidate detection. The various analytical modules developed and presented in this thesis were verified against simulated data that possessed prescribed and quantified phenomena, before application to Lasgit’s dataset. The Exploratory Data Analysis of Lasgit’s dataset presented in this thesis reveals and quantifies a number of phenomena, for example: the tendency for spiking to occur within groups of sensor records; estimates for the long term trends; the temperature profile of the experiment with depth and time along with the approximate seasonal variation in stress/pore-water pressure; and, in particular, the identification of second order event candidates as small as 0.1% of the macro-scale behaviours in which they reside. A selection of the second order event candidates have been aggregated together into second order events using the event candidates’ mutual synchronicities. Interpretation of these events suggests the possibility of small scale discrete gas flow pathways forming, possibly via a dilatant flow mechanism. The interpreted events typical behaviours, in addition to the observed spiking tendency, also support the grouping of sensors by sensor type. The developed toolkit, NUDAT, and its subsequent application to Lasgit’s dataset have enabled an investigation into the small scale, or ‘second order’ features of the experiment’s results. The analysis presented in this thesis provides insight into Lasgit’s experimental behaviour, and as such, contributes to the understanding of the experiment

    Time- and value-continuous explainable affect estimation in-the-wild

    Get PDF
    Today, the relevance of Affective Computing, i.e., of making computers recognise and simulate human emotions, cannot be overstated. All technology giants (from manufacturers of laptops to mobile phones to smart speakers) are in a fierce competition to make their devices understand not only what is being said, but also how it is being said to recognise user’s emotions. The goals have evolved from predicting the basic emotions (e.g., happy, sad) to now the more nuanced affective states (e.g., relaxed, bored) real-time. The databases used in such research too have evolved, from earlier featuring the acted behaviours to now spontaneous behaviours. There is a more powerful shift lately, called in-the-wild affect recognition, i.e., taking the research out of the laboratory, into the uncontrolled real-world. This thesis discusses, for the very first time, affect recognition for two unique in-the-wild audiovisual databases, GRAS2 and SEWA. The GRAS2 is the only database till date with time- and value-continuous affect annotations for Labov effect-free affective behaviours, i.e., without the participant’s awareness of being recorded (which otherwise is known to affect the naturalness of one’s affective behaviour). The SEWA features participants from six different cultural backgrounds, conversing using a video-calling platform. Thus, SEWA features in-the-wild recordings further corrupted by unpredictable artifacts, such as the network-induced delays, frame-freezing and echoes. The two databases present a unique opportunity to study time- and value-continuous affect estimation that is truly in-the-wild. A novel ‘Evaluator Weighted Estimation’ formulation is proposed to generate a gold standard sequence from several annotations. An illustration is presented demonstrating that the moving bag-of-words (BoW) representation better preserves the temporal context of the features, yet remaining more robust against the outliers compared to other statistical summaries, e.g., moving average. A novel, data-independent randomised codebook is proposed for the BoW representation; especially useful for cross-corpus model generalisation testing when the feature-spaces of the databases differ drastically. Various deep learning models and support vector regressors are used to predict affect dimensions time- and value-continuously. Better generalisability of the models trained on GRAS2 , despite the smaller training size, makes a strong case for the collection and use of Labov effect-free data. A further foundational contribution is the discovery of the missing many-to-many mapping between the mean square error (MSE) and the concordance correlation coefficient (CCC), i.e., between two of the most popular utility functions till date. The newly invented cost function |MSE_{XY}/σ_{XY}| has been evaluated in the experiments aimed at demystifying the inner workings of a well-performing, simple, low-cost neural network effectively utilising the BoW text features. Also proposed herein is the shallowest-possible convolutional neural network (CNN) that uses the facial action unit (FAU) features. The CNN exploits sequential context, but unlike RNNs, also inherently allows data- and process-parallelism. Interestingly, for the most part, these white-box AI models have shown to utilise the provided features consistent with the human perception of emotion expression

    Neural Methods for Effective, Efficient, and Exposure-Aware Information Retrieval

    Get PDF
    Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents--or short passages--in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms--such as a person's name or a product model number--not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections--such as the document index of a commercial Web search engine--containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks.Comment: PhD thesis, Univ College London (2020
    corecore