29 research outputs found

    CL-MAE: Curriculum-Learned Masked Autoencoders

    Full text link
    Masked image modeling has been demonstrated as a powerful pretext task for generating robust representations that can be effectively generalized across multiple downstream tasks. Typically, this approach involves randomly masking patches (tokens) in input images, with the masking strategy remaining unchanged during training. In this paper, we propose a curriculum learning approach that updates the masking strategy to continually increase the complexity of the self-supervised reconstruction task. We conjecture that, by gradually increasing the task complexity, the model can learn more sophisticated and transferable representations. To facilitate this, we introduce a novel learnable masking module that possesses the capability to generate masks of different complexities, and integrate the proposed module into masked autoencoders (MAE). Our module is jointly trained with the MAE, while adjusting its behavior during training, transitioning from a partner to the MAE (optimizing the same reconstruction loss) to an adversary (optimizing the opposite loss), while passing through a neutral state. The transition between these behaviors is smooth, being regulated by a factor that is multiplied with the reconstruction loss of the masking module. The resulting training procedure generates an easy-to-hard curriculum. We train our Curriculum-Learned Masked Autoencoder (CL-MAE) on ImageNet and show that it exhibits superior representation learning capabilities compared to MAE. The empirical results on five downstream tasks confirm our conjecture, demonstrating that curriculum learning can be successfully used to self-supervise masked autoencoders

    New perspectives and methods for stream learning in the presence of concept drift.

    Get PDF
    153 p.Applications that generate data in the form of fast streams from non-stationary environments, that is,those where the underlying phenomena change over time, are becoming increasingly prevalent. In thiskind of environments the probability density function of the data-generating process may change overtime, producing a drift. This causes that predictive models trained over these stream data become obsoleteand do not adapt suitably to the new distribution. Specially in online learning scenarios, there is apressing need for new algorithms that adapt to this change as fast as possible, while maintaining goodperformance scores. Examples of these applications include making inferences or predictions based onfinancial data, energy demand and climate data analysis, web usage or sensor network monitoring, andmalware/spam detection, among many others.Online learning and concept drift are two of the most hot topics in the recent literature due to theirrelevance for the so-called Big Data paradigm, where nowadays we can find an increasing number ofapplications based on training data continuously available, named as data streams. Thus, learning in nonstationaryenvironments requires adaptive or evolving approaches that can monitor and track theunderlying changes, and adapt a model to accommodate those changes accordingly. In this effort, Iprovide in this thesis a comprehensive state-of-the-art approaches as well as I identify the most relevantopen challenges in the literature, while focusing on addressing three of them by providing innovativeperspectives and methods.This thesis provides with a complete overview of several related fields, and tackles several openchallenges that have been identified in the very recent state of the art. Concretely, it presents aninnovative way to generate artificial diversity in ensembles, a set of necessary adaptations andimprovements for spiking neural networks in order to be used in online learning scenarios, and finally, adrift detector based on this former algorithm. All of these approaches together constitute an innovativework aimed at presenting new perspectives and methods for the field

    Coherent Optical OFDM Modem Employing Artificial Neural Networks for Dispersion and Nonlinearity Compensation in a Long-Haul Transmission System

    Get PDF
    In order to satisfy the ever increasing demand for the bandwidth requirement in broadband services the optical orthogonal frequency division multiplexing (OOFDM) scheme is being considered as a promising technique for future high-capacity optical networks. The aim of this thesis is to investigate, theoretically, the feasibility of implementing the coherent optical OFDM (CO-OOFDM) technique in long haul transmission networks. For CO-OOFDM and Fast-OFDM systems a set of modulation formats dependent analogue to digital converter (ADC) clipping ratio and the quantization bit have been identified, moreover, CO-OOFDM is more resilient to the chromatic dispersion (CD) when compared to the bandwidth efficient Fast-OFDM scheme. For CO-OOFDM systems numerical simulations are undertaken to investigate the effect of the number of sub-carriers, the cyclic prefix (CP), and ADC associated parameters such as the sampling speed, the clipping ratio, and the quantisation bit on the system performance over single mode fibre (SMF) links for data rates up to 80 Gb/s. The use of a large number of sub-carriers is more effective in combating the fibre CD compared to employing a long CP. Moreover, in the presence of fibre non-linearities identifying the optimum number of sub-carriers is a crucial factor in determining the modem performance. For a range of signal data rates up to 40 Gb/s, a set of data rate and transmission distance-dependent optimum ADC parameters are identified in this work. These parameters give rise to a negligible clipping and quantisation noise, moreover, ADC sampling speed can increase the dispersion tolerance while transmitting over SMF links. In addition, simulation results show that the use of adaptive modulation schemes improves the spectrum usage efficiency, thus resulting in higher tolerance to the CD when compared to the case where identical modulation formats are adopted across all sub-carriers. For a given transmission distance utilizing an artificial neural networks (ANN) equalizer improves the system bit error rate (BER) performance by a factor of 50% and 70%, respectively when considering SMF firstly CD and secondly nonlinear effects with CD. Moreover, for a fixed BER of 10-3 utilizing ANN increases the transmission distance by 1.87 times and 2 times, respectively while considering SMF CD and nonlinear effects. The proposed ANN equalizer performs more efficiently in combating SMF non-linearities than the previously published Kerr nonlinearity electrical compensation technique by a factor of 7

    Design of a High-Speed Architecture for Stabilization of Video Captured Under Non-Uniform Lighting Conditions

    Get PDF
    Video captured in shaky conditions may lead to vibrations. A robust algorithm to immobilize the video by compensating for the vibrations from physical settings of the camera is presented in this dissertation. A very high performance hardware architecture on Field Programmable Gate Array (FPGA) technology is also developed for the implementation of the stabilization system. Stabilization of video sequences captured under non-uniform lighting conditions begins with a nonlinear enhancement process. This improves the visibility of the scene captured from physical sensing devices which have limited dynamic range. This physical limitation causes the saturated region of the image to shadow out the rest of the scene. It is therefore desirable to bring back a more uniform scene which eliminates the shadows to a certain extent. Stabilization of video requires the estimation of global motion parameters. By obtaining reliable background motion, the video can be spatially transformed to the reference sequence thereby eliminating the unintended motion of the camera. A reflectance-illuminance model for video enhancement is used in this research work to improve the visibility and quality of the scene. With fast color space conversion, the computational complexity is reduced to a minimum. The basic video stabilization model is formulated and configured for hardware implementation. Such a model involves evaluation of reliable features for tracking, motion estimation, and affine transformation to map the display coordinates of a stabilized sequence. The multiplications, divisions and exponentiations are replaced by simple arithmetic and logic operations using improved log-domain computations in the hardware modules. On Xilinx\u27s Virtex II 2V8000-5 FPGA platform, the prototype system consumes 59% logic slices, 30% flip-flops, 34% lookup tables, 35% embedded RAMs and two ZBT frame buffers. The system is capable of rendering 180.9 million pixels per second (mpps) and consumes approximately 30.6 watts of power at 1.5 volts. With a 1024×1024 frame, the throughput is equivalent to 172 frames per second (fps). Future work will optimize the performance-resource trade-off to meet the specific needs of the applications. It further extends the model for extraction and tracking of moving objects as our model inherently encapsulates the attributes of spatial distortion and motion prediction to reduce complexity. With these parameters to narrow down the processing range, it is possible to achieve a minimum of 20 fps on desktop computers with Intel Core 2 Duo or Quad Core CPUs and 2GB DDR2 memory without a dedicated hardware

    New perspectives and methods for stream learning in the presence of concept drift.

    Get PDF
    153 p.Applications that generate data in the form of fast streams from non-stationary environments, that is,those where the underlying phenomena change over time, are becoming increasingly prevalent. In thiskind of environments the probability density function of the data-generating process may change overtime, producing a drift. This causes that predictive models trained over these stream data become obsoleteand do not adapt suitably to the new distribution. Specially in online learning scenarios, there is apressing need for new algorithms that adapt to this change as fast as possible, while maintaining goodperformance scores. Examples of these applications include making inferences or predictions based onfinancial data, energy demand and climate data analysis, web usage or sensor network monitoring, andmalware/spam detection, among many others.Online learning and concept drift are two of the most hot topics in the recent literature due to theirrelevance for the so-called Big Data paradigm, where nowadays we can find an increasing number ofapplications based on training data continuously available, named as data streams. Thus, learning in nonstationaryenvironments requires adaptive or evolving approaches that can monitor and track theunderlying changes, and adapt a model to accommodate those changes accordingly. In this effort, Iprovide in this thesis a comprehensive state-of-the-art approaches as well as I identify the most relevantopen challenges in the literature, while focusing on addressing three of them by providing innovativeperspectives and methods.This thesis provides with a complete overview of several related fields, and tackles several openchallenges that have been identified in the very recent state of the art. Concretely, it presents aninnovative way to generate artificial diversity in ensembles, a set of necessary adaptations andimprovements for spiking neural networks in order to be used in online learning scenarios, and finally, adrift detector based on this former algorithm. All of these approaches together constitute an innovativework aimed at presenting new perspectives and methods for the field

    A real-time data mining technique applied for critical ECG rhythm on handheld device

    Get PDF
    Sudden cardiac arrest is often caused by ventricular arrhythmias and these episodes can lead to death for patients with chronic heart disease. Hence, detection of such arrhythmia is crucial in mobile ECG monitoring. In this research, a systematic study is carried out to investigate the possible limitations that are preventing the realisation of a real-time ECG arrhythmia data-mining algorithm suitable for application on mobile devices. Based on the findings, a computationally lightweight algorithm is devised and tested. Ventricular tachycardia (VT) is the most common type of ventricular arrhythmias and is also the deadliest.. A ventricular tachycardia (VT) episode is due to a disorder ofthe regular contractions ofthe heart. It occurs when the human heart ventricles generate a rapid heartbeat which disrupts the regular physiology cycle. The normal sinus rhythm (NSR) of a regular human heart beat signal has its signature PQRST waveform and in regular pattern. Whereas, the characteristics of a ventricular tachycardia (VT) signal waveforms are short R-R intervals, widen QRS duration and the absence of P-waves. Each type of ECG arrhythmia previously mentioned has a unique waveform signature that can be exploited as features to be used for the realization of an automated ECG analysis application. In order to extract this known ECG waveform feature, a time-domain analysis is proposed for feature extraction. Cross-correlation allows the computation of a co-efficient that quantifies the similarity between two times-series. Hence, by cross-correlating known ECG waveform templates with an unknown ECG signal, the coefficient can indicate the similarities. In previous published work, a preliminary study was carried out. The cross-correlation coefficient wave (CCW) technique was introduced for feature extraction. The outcome ofthis work presents CCW as a promising feature to differentiate between NSR, VT and Vfib signals. Moreover, cross-correlation computation does not require high computational overhead. Next, an automated detection algorithm requires a classification mechanism to make sense of the feature extracted. A further study is conducted and published, a fuzzy set k-NN classifier was introduced for the classification of CCW feature extracted from ECG signal segments. A training set of size 180 is used. The outcome of the study indicates that the computationally light-weight fuzzy k-NN classifier can reliably classify between NSR and VT signals, the class detection rate is low for classifying Vfib signal using the fuzzy k-NN classifier. Hence, a modified algorithm known as fuzzy hybrid classifier is proposed. By implementing an expert knowledge based fuzzy inference system for classification of ECG signal; the Vfib signal detection rate was improved. The comparison outcome was that the hybrid fuzzy classifier is able to achieve 91.1% correct rate, 100% sensitivity and 100% specificity. The previously mentioned result outperforms the compared classifiers. The proposed detection and classification algorithm is able to achieve high accuracy in analysing ECG signal feature of NSR, VT and Vfib nature. Moreover, the proposed classifier is successfully implemented on a smart mobile device and it is able to perform data-mining of the ECG signal with satisfiable results

    Artificial intelligence for advanced manufacturing quality

    Get PDF
    100 p.This Thesis addresses the challenge of AI-based image quality control systems applied to manufacturing industry, aiming to improve this field through the use of advanced techniques for data acquisition and processing, in order to obtain robust, reliable and optimal systems. This Thesis presents contributions onthe use of complex data acquisition techniques, the application and design of specialised neural networks for the defect detection, and the integration and validation of these systems in production processes. It has been developed in the context of several applied research projects that provided a practical feedback of the usefulness of the proposed computational advances as well as real life data for experimental validation

    3D Medical Image Lossless Compressor Using Deep Learning Approaches

    Get PDF
    The ever-increasing importance of accelerated information processing, communica-tion, and storing are major requirements within the big-data era revolution. With the extensive rise in data availability, handy information acquisition, and growing data rate, a critical challenge emerges in efficient handling. Even with advanced technical hardware developments and multiple Graphics Processing Units (GPUs) availability, this demand is still highly promoted to utilise these technologies effectively. Health-care systems are one of the domains yielding explosive data growth. Especially when considering their modern scanners abilities, which annually produce higher-resolution and more densely sampled medical images, with increasing requirements for massive storage capacity. The bottleneck in data transmission and storage would essentially be handled with an effective compression method. Since medical information is critical and imposes an influential role in diagnosis accuracy, it is strongly encouraged to guarantee exact reconstruction with no loss in quality, which is the main objective of any lossless compression algorithm. Given the revolutionary impact of Deep Learning (DL) methods in solving many tasks while achieving the state of the art results, includ-ing data compression, this opens tremendous opportunities for contributions. While considerable efforts have been made to address lossy performance using learning-based approaches, less attention was paid to address lossless compression. This PhD thesis investigates and proposes novel learning-based approaches for compressing 3D medical images losslessly.Firstly, we formulate the lossless compression task as a supervised sequential prediction problem, whereby a model learns a projection function to predict a target voxel given sequence of samples from its spatially surrounding voxels. Using such 3D local sampling information efficiently exploits spatial similarities and redundancies in a volumetric medical context by utilising such a prediction paradigm. The proposed NN-based data predictor is trained to minimise the differences with the original data values while the residual errors are encoded using arithmetic coding to allow lossless reconstruction.Following this, we explore the effectiveness of Recurrent Neural Networks (RNNs) as a 3D predictor for learning the mapping function from the spatial medical domain (16 bit-depths). We analyse Long Short-Term Memory (LSTM) models’ generalisabil-ity and robustness in capturing the 3D spatial dependencies of a voxel’s neighbourhood while utilising samples taken from various scanning settings. We evaluate our proposed MedZip models in compressing unseen Computerized Tomography (CT) and Magnetic Resonance Imaging (MRI) modalities losslessly, compared to other state-of-the-art lossless compression standards.This work investigates input configurations and sampling schemes for a many-to-one sequence prediction model, specifically for compressing 3D medical images (16 bit-depths) losslessly. The main objective is to determine the optimal practice for enabling the proposed LSTM model to achieve a high compression ratio and fast encoding-decoding performance. A solution for a non-deterministic environments problem was also proposed, allowing models to run in parallel form without much compression performance drop. Compared to well-known lossless codecs, experimental evaluations were carried out on datasets acquired by different hospitals, representing different body segments, and have distinct scanning modalities (i.e. CT and MRI).To conclude, we present a novel data-driven sampling scheme utilising weighted gradient scores for training LSTM prediction-based models. The objective is to determine whether some training samples are significantly more informative than others, specifically in medical domains where samples are available on a scale of billions. The effectiveness of models trained on the presented importance sampling scheme was evaluated compared to alternative strategies such as uniform, Gaussian, and sliced-based sampling
    corecore