42 research outputs found
Picture processing for enhancement and recognition
Recent years have been characterized by an incredible growth in computing power and storage capabilities, communication speed and bandwidth availability, either for desktop platform or mobile device. The combination of these factors have led to a new era of multimedia applications:
browsing of huge image archives, consultation of online video databases, location based services and many other. Multimedia is almost everywhere and requires high quality data, easy retrieval of multimedia contents, increase in network access capacity and bandwidth per user. To meet all the mentioned requirements many efforts have to be made in various research areas, ranging from signal processing, image and video analysis, communication protocols, etc.
The research activity developed during these three years concerns the field of multimedia signal processing, with particular attention to image and video analysis and processing. Two main topics have been faced: the first is relating to image and video reconstruction/restoration (using super resolution techniques) in web based application for multimedia contents' fruition; the second is
relating to image analysis for location based systems in indoor scenario.
The first topic is relating to image and video processing, in particular the focus has been put on the development of algorithm for super resolution reconstruction of image and video sequences in order to make easier the fruition of multimedia data over the web. On one hand, latest years have been characterized by an incredible proliferation and surprising success of user generated multimedia
contents, and also distributed and collaborative multimedia database over the web. This brought to serious issues related to their management and maintenance: bandwidth limitation and service costs are important factors when dealing with mobile multimedia contents’ fruition. On the other hand, the current multimedia consumer market has been characterized by the advent of cheap but rather high-quality high definition displays. However, this trend is only partially supported by the deployment of high-resolution multimedia services, thus the resulting disparity between content and display formats have to be addressed and older productions need to be either re-mastered or postprocessed in order to be broadcasted for HD exploitation. In the presented scenario, superresolution
reconstruction represents a major solution. Image or video super resolution techniques allow restoring the original spatial resolution from low-resolution compressed data. In this way, both content and service providers, not to tell the final users, are relieved from the burden of
providing and supporting large multimedia data transfer. The second topic addressed during my Phd
research activity is related to the implementation of an image based positioning system for an indoor navigator. As modern mobile device become faster, classical signal processing is suggested to be used for new applications, such location based service. The exponential growth of wearable devices, such as smartphone and PDA in general, equipped with embedded motion (accelerometers)
and rotation (gyroscopes) sensors, Internet connection and high-resolution cameras makes it ideal for INS (Inertial Navigation System) applications aiming to support the localization/navigation of objects and/or users in an indoor environment where common localization systems, such as GPS (Global Positioning System), fail. Thus the need to use alternative positioning techniques.
A series of intensive tests have been carried out, showing how modern signal processing techniques can be successfully applied in different scenarios, from image and video enhancement up to image recognition for localization purpose, providing low costs solutions and ensuring real-time performance
Audio Processing and Loudness Estimation Algorithms with iOS Simulations
abstract: The processing power and storage capacity of portable devices have improved considerably over the past decade. This has motivated the implementation of sophisticated audio and other signal processing algorithms on such mobile devices. Of particular interest in this thesis is audio/speech processing based on perceptual criteria. Specifically, estimation of parameters from human auditory models, such as auditory patterns and loudness, involves computationally intensive operations which can strain device resources. Hence, strategies for implementing computationally efficient human auditory models for loudness estimation have been studied in this thesis. Existing algorithms for reducing computations in auditory pattern and loudness estimation have been examined and improved algorithms have been proposed to overcome limitations of these methods. In addition, real-time applications such as perceptual loudness estimation and loudness equalization using auditory models have also been implemented. A software implementation of loudness estimation on iOS devices is also reported in this thesis. In addition to the loudness estimation algorithms and software, in this thesis project we also created new illustrations of speech and audio processing concepts for research and education. As a result, a new suite of speech/audio DSP functions was developed and integrated as part of the award-winning educational iOS App 'iJDSP." These functions are described in detail in this thesis. Several enhancements in the architecture of the application have also been introduced for providing the supporting framework for speech/audio processing. Frame-by-frame processing and visualization functionalities have been developed to facilitate speech/audio processing. In addition, facilities for easy sound recording, processing and audio rendering have also been developed to provide students, practitioners and researchers with an enriched DSP simulation tool. Simulations and assessments have been also developed for use in classes and training of practitioners and students.Dissertation/ThesisM.S. Electrical Engineering 201
Applications of fuzzy counterpropagation neural networks to non-linear function approximation and background noise elimination
An adaptive filter which can operate in an unknown environment by performing a learning mechanism that is suitable for the speech enhancement process. This research develops a novel ANN model which incorporates the fuzzy set approach and which can perform a non-linear function approximation. The model is used as the basic structure of an adaptive filter. The learning capability of ANN is expected to be able to reduce the development time and cost of the designing adaptive filters based on fuzzy set approach. A combination of both techniques may result in a learnable system that can tackle the vagueness problem of a changing environment where the adaptive filter operates. This proposed model is called Fuzzy Counterpropagation Network (Fuzzy CPN). It has fast learning capability and self-growing structure. This model is applied to non-linear function approximation, chaotic time series prediction and background noise elimination
Recommended from our members
Employing Information and Communications Technologies in Homes and Cities for the Health and Well-Being of Older People
YesHe X and Sheriff RE (Eds.) Employing ICT in Homes and Cities for the Health and Well-Being of Older People. Workshop Proceedings of ICT4HOP’16. 15-17 Aug 2016. Sichuan University, Chengdu, China.British Council, Researcher Links, Newton Fund, NSF
Recommended from our members
A novel framework for high-quality voice source analysis and synthesis
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The analysis, parameterization and modeling of voice source estimates obtained via inverse filtering of recorded speech are some of the most challenging areas of speech processing owing to the fact humans produce a wide range of voice source realizations and that the voice source estimates commonly contain artifacts due to the non-linear time-varying source-filter coupling. Currently, the most widely adopted representation of voice source signal is Liljencrants-Fant's (LF) model which was developed in late 1985. Due to the overly simplistic interpretation of voice source dynamics, LF model can not represent the fine temporal structure of glottal flow derivative realizations nor can it carry the sufficient spectral richness to facilitate a truly natural sounding speech synthesis. In this thesis we have introduced Characteristic Glottal Pulse Waveform Parameterization and Modeling (CGPWPM) which constitutes an entirely novel framework for voice source analysis, parameterization and reconstruction. In comparative evaluation of CGPWPM and LF model we have demonstrated that the proposed method is able to preserve higher levels of speaker dependant information from the voice source estimates and realize a more natural sounding speech synthesis. In general, we have shown that CGPWPM-based speech synthesis rates highly on the scale of absolute perceptual acceptability and that speech signals are faithfully reconstructed on consistent basis, across speakers, gender. We have applied CGPWPM to voice quality profiling and text-independent voice quality conversion method. The proposed voice conversion method is able to achieve the desired perceptual effects and the modified
speech remained as natural sounding and intelligible as natural speech. In this thesis, we have also developed an optimal wavelet thresholding strategy for voice source signals which is able to suppress aspiration noise and still retain both the slow and the rapid variations in the voice source estimate
Review : Deep learning in electron microscopy
Deep learning is transforming most areas of science and technology, including electron microscopy. This review paper offers a practical perspective aimed at developers with limited familiarity. For context, we review popular applications of deep learning in electron microscopy. Following, we discuss hardware and software needed to get started with deep learning and interface with electron microscopes. We then review neural network components, popular architectures, and their optimization. Finally, we discuss future directions of deep learning in electron microscopy
Proceedings of the Scientific Data Compression Workshop
Continuing advances in space and Earth science requires increasing amounts of data to be gathered from spaceborne sensors. NASA expects to launch sensors during the next two decades which will be capable of producing an aggregate of 1500 Megabits per second if operated simultaneously. Such high data rates cause stresses in all aspects of end-to-end data systems. Technologies and techniques are needed to relieve such stresses. Potential solutions to the massive data rate problems are: data editing, greater transmission bandwidths, higher density and faster media, and data compression. Through four subpanels on Science Payload Operations, Multispectral Imaging, Microwave Remote Sensing and Science Data Management, recommendations were made for research in data compression and scientific data applications to space platforms
Recommended from our members
Image coding employing vector quantisation
The work described in this thesis is concerned with the coding of digitised images employing vector quantisation (VQ). A new VQ-based coding system, named Directional Classified Gain-Shape Vector Quantisation (DCGSVQ), has been developed. It combines vector quantisation with transform coding tech-niques and exploits various properties of the human visual system (HVS) like frequency sensitivity, the masking effect, and orientation sensitivity, to produce reconstructed images with good subjective quality at low bit rates (0.48 bit per pixel).
A content classifier, operating in the spatial domain, is employed to classify each image block of 8x8 pixels into one of several classes which represent various image patterns (edges in various directions, monotone areas, complex texture, etc.). Then a classified gain-shape vector quantiser is employed in the cosine domain to encode vectors of AC transform coefficients, while using either a scalar quantiser or a gain-shape vector quantiser to encode the DC coefficients. A new vector configuration strategy for defining AC vectors in the cosine domain has been proposed to better adapt the system to the local statistics of the image blocks. Accordingly, the AC coefficients are first weighted by an equivalent modulation transfer function (MTF) that represents the filtering characteristics of the HVS, and then they are grouped into directional vectors according to their direction in the cosine domain. An optional simple method for feature enhancement, based on inherent properties of the proposed strategy, has also been proposed enabling further image processing at the receiver.
A new algorithm for designing the various DCGSVQ codebooks has been developed in two steps. First, a general-purpose new algorithm for classified VQ (CVQ) codebook design has been developed as an alternative to empirical methods proposed in the literature. The new algorithm provides a simple and systematic method for codebook design and reduces considerably the total num-ber of mathematical operations during codebook design. We have named this new algorithm Classified Nearest Neighbour Clustering (CNNC). A fast search algorithm has also been developed to reduce further computational efforts during codebook design.
Secondly, a new optimisation criterion which is more suitable for shape code-book design has been developed and employed within the CNNC algorithm to design classified shape codebooks for the DCGSVQ. We have named this algo-rithm modified CNNC. The new algorithm designs the various shape codebooks simultaneously giving the designer full freedom to assign more importance to certain classes of vectors or to certain training vectors. The DCGSVQ system has been shown to outperform the full search VQ, the CVQ, and the transform coding CVQ (TC-CVQ) producing nicer coded images with better signal to noise ratio (SNR) figures at various bit rates.
To improve further the perceived quality of coded images, a new postpro-cessing algorithm that can be applied at the decoder without increasing the bit rate has been developed. The proposed algorithm is based on various charac-teristics of the signal spectrum and the noise spectrum, and exploits various properties of the HVS. The proposed algorithm is a general-purpose algorithm that can be applied to block-coded images produced by various systems like VQ, transform coding (TC), and Block Truncation Coding (BTC). The algorithm is modular and can be applied in an adaptive way depending on the quality of the block-coded image.
The last theme of this work has been the identification of useful fidelity criteria for image quality assessment. Quality predictors in the form of some subjectively weighted error measures were sought such that a smooth functional relationship exists between them and quality ratings made by human viewers. Quality predictors that incorporate simplified models of the HVS have been proposed and tested on a large set of VQ-coded images. Two such predictors have been shown to be better suited for image quality assessment than the commonly used mean square error (MSE) measure
Low Power Digital Filter Implementation in FPGA
Digital filters suitable for hearing aid application on low power perspective have been developed and implemented in FPGA in this dissertation.
Hearing aids are primarily meant for improving hearing and speech comprehensions. Digital hearing aids score over their analog counterparts. This happens as digital hearing aids provide flexible gain besides facilitating feedback reduction and noise elimination. Recent advances in DSP and Microelectronics have led to the development of superior digital hearing aids. Many researchers have investigated
several algorithms suitable for hearing aid application that demands low noise, feedback cancellation, echo cancellation, etc., however the toughest challenge is the
implementation. Furthermore, the additional constraints are power and area. The device must consume as minimum power as possible to support extended battery life and should be as small as possible for increased portability. In this thesis we have made an attempt to investigate possible digital filter algorithms those are hardware configurable on low power view point.
Suitability of decimation filter for hearing aid application is investigated. In this dissertation decimation filter is implemented using ‘Distributed Arithmetic’ approach.While designing this filter, it is observed that, comb-half band FIR-FIR filter
design uses less hardware compared to the comb-FIR-FIR filter design. The power consumption is also less in case of comb-half band FIR-FIR filter design compared to
the comb-FIR-FIR filter. This filter is implemented in Virtex-II pro board from Xilinx and the resource estimator from the system generator is used to estimate the resources.
However ‘Distributed Arithmetic’ is highly serial in nature and its latency is high; power consumption found is not very low in this type of filter implementation.
So we have proceeded for ‘Adaptive Hearing Aid’ using Booth-Wallace tree multiplier. This algorithm is also implemented in FPGA and power calculation of the whole system is done using Xilinx Xpower analyser. It is observed that power consumed by the hearing aid with Booth-Wallace tree multiplier is less than the hearing aid using Booth multiplier (about 25%). So we can conclude that the hearing aid using Booth-Wallace tree multiplier consumes less power comparatively.
The above two approached are purely algorithmic approach. Next we proceed to combine circuit level VLSI design and with algorithmic approach for further possible reduction in power.
A MAC based FDF-FIR filter (algorithm) that uses dual edge triggered latch (DET) (circuit) is used for hearing aid device. It is observed that DET based MAC FIR filter consumes less power than the traditional (single edge triggered, SET) one (about 41%). The proposed low power latch provides a power saving upto 65% in the FIR filter. This technique consumes less power compared to previous approaches that uses low power technique only at algorithmic abstraction level.
The DET based MAC FIR filter is tested for real-time validation and it is observed that it works perfectly for various signals (speech, music, voice with music). The gain of the filter is tested and is found to be 27 dB (maximum) that matches with most of the hearing aid (manufacturer’s) specifications. Hence it can be concluded that FDF FIR digital filter in conjunction with low power latch is a strong candidate for hearing aid application