1,100 research outputs found

    Artificial Intelligence for Multimedia Signal Processing

    Get PDF
    Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining

    Wrist-based Phonocardiogram Diagnosis Leveraging Machine Learning

    Get PDF
    With the tremendous growth of technology and the fast pace of life, the need for instant information has become an everyday necessity, more so in emergency cases when every minute counts towards saving lives. mHealth has been the adopted approach for quick diagnosis using mobile devices. However, it has been challenging due to the required high quality of data, high computation load, and high-power consumption. The aim of this research is to diagnose the heart condition based on phonocardiogram (PCG) analysis using Machine Learning techniques assuming limited processing power, in order to be encapsulated later in a mobile device. The diagnosis of PCG is performed using two techniques; 1. parametric estimation with multivariate classification, particularly discriminant function. Which will be explored at length using different number of descriptive features. The feature extraction will be performed using Wavelet Transform (Filter Bank). 2. Artificial Neural Networks, and specifically Pattern Recognition. This will also use decomposed version of PCG using Wavelet Transform (Filter Bank). The results showed 97.33% successful diagnosis using the first technique using PCG with a 19 dB Signal-to-Noise-Ratio. When the signal was decomposed into four sub-bands using a Filter Bank of the second order. Each sub-band was described using two features; the signal’s mean and covariance. Additionally, different Filter Bank orders and number of features are explored and compared. Using the second technique the diagnosis resulted in a 100% successful classification with 83.3% trust level. The results are assessed, and new improvements are recommended and discussed as part of future work.Teknologian valtavan kehittymisen ja nopean elämänrytmin myötä välittömästi saatu tieto on noussut jokapäiväiseksi välttämättömyydeksi, erityisesti hätätapauksissa, joissa jokainen säästetty minuutti on tärkeää ihmishenkien pelastamiseksi. Mobiiliterveys, eli mHealth, on yleisesti valjastettu käyttöön nopeaksi diagnoosimenetelmäksi mobiililaitteiden avulla. Käyttö on kuitenkin ollut haastavaa korkean datan laatuvaatimuksen ja suurten tiedonkäsittelyvaatimuksien, nopean laskentatehon ja sekä suuren virrankulutuksen vuoksi. Tämän tutkimuksen tavoitteena oli diagnosoida sydänsairauksia fonokardiogrammianalyysin (PCG) perusteella käyttämällä koneoppimistekniikoita niin, että käytettävä laskentateho rajoitetaan vastaamaan mobiililaitteiden kapasiteettia. PCG-diagnoosi tehtiin käyttäen kahta tekniikkaa 1. Parametrinen estimointi käyttäen moniulotteista luokitusta, erityisesti signaalien erotteluanalyysin avulla. Tätä asiaa tutkittiin syvällisesti käyttäen erilaisia tilastotieteellisesti kuvailevia piirteitä. Piirteiden irrotus suoritettiin käyttäen Wavelet-muunnosta ja suodatinpankkia. 2. Keinotekoisia neuroverkkoja ja erityisesti hahmontunnistusta. Tässä menetelmässä käytetään myös PCG-signaalin hajoitusta ja Wavelet-muunnos -suodatinpankkia. Tulokset osoittivat, että PCG 19dB:n signaali-kohina-suhteella voi johtaa 97,33% onnistuneeseen diagnoosiin käytettäessä ensimmäistä tekniikkaa. Signaalin hajottaminen neljään alikaistaan suoritettiin käyttämällä toisen asteen suodatinpankkia. Jokainen alikaista kuvattiin käyttäen kahta piirrettä: signaalin keskiarvoa ja kovarianssia, näin saatiin yhteensä kahdeksan ominaisuutta kuvaamaan noin yhden minuutin näytettä PCG-signaalista. Lisäksi tutkittiin ja verrattiin eriasteisia suodattimia ja piirteitä. Toista tekniikkaa käyttäen diagnoosi johti 100% onnistuneeseen luokitteluun 83,3% luotettavuustasolla. Tuloksia käsitellään ja pohditaan, sekä tehdään niistä johtopäätöksiä. Lopuksi ehdotetaan ja suositellaan käytettyihin menetelmiin uusia parannuksia jatkotutkimuskohteiksi.fi=vertaisarvioitu|en=peerReviewed

    Proceedings of the 7th Sound and Music Computing Conference

    Get PDF
    Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010

    Signal Processing Using Non-invasive Physiological Sensors

    Get PDF
    Non-invasive biomedical sensors for monitoring physiological parameters from the human body for potential future therapies and healthcare solutions. Today, a critical factor in providing a cost-effective healthcare system is improving patients' quality of life and mobility, which can be achieved by developing non-invasive sensor systems, which can then be deployed in point of care, used at home or integrated into wearable devices for long-term data collection. Another factor that plays an integral part in a cost-effective healthcare system is the signal processing of the data recorded with non-invasive biomedical sensors. In this book, we aimed to attract researchers who are interested in the application of signal processing methods to different biomedical signals, such as an electroencephalogram (EEG), electromyogram (EMG), functional near-infrared spectroscopy (fNIRS), electrocardiogram (ECG), galvanic skin response, pulse oximetry, photoplethysmogram (PPG), etc. We encouraged new signal processing methods or the use of existing signal processing methods for its novel application in physiological signals to help healthcare providers make better decisions

    Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization

    Full text link
    Automatic speech recognition (ASR) has recently become an important challenge when using deep learning (DL). It requires large-scale training datasets and high computational and storage resources. Moreover, DL techniques and machine learning (ML) approaches in general, hypothesize that training and testing data come from the same domain, with the same input feature space and data distribution characteristics. This assumption, however, is not applicable in some real-world artificial intelligence (AI) applications. Moreover, there are situations where gathering real data is challenging, expensive, or rarely occurring, which can not meet the data requirements of DL models. deep transfer learning (DTL) has been introduced to overcome these issues, which helps develop high-performing models using real datasets that are small or slightly different but related to the training data. This paper presents a comprehensive survey of DTL-based ASR frameworks to shed light on the latest developments and helps academics and professionals understand current challenges. Specifically, after presenting the DTL background, a well-designed taxonomy is adopted to inform the state-of-the-art. A critical analysis is then conducted to identify the limitations and advantages of each framework. Moving on, a comparative study is introduced to highlight the current challenges before deriving opportunities for future research

    Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14)

    Get PDF
    The implicit objective of the biennial "international - Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST) is to foster collaboration between international scientific teams by disseminating ideas through both specific oral/poster presentations and free discussions. For its second edition, the iTWIST workshop took place in the medieval and picturesque town of Namur in Belgium, from Wednesday August 27th till Friday August 29th, 2014. The workshop was conveniently located in "The Arsenal" building within walking distance of both hotels and town center. iTWIST'14 has gathered about 70 international participants and has featured 9 invited talks, 10 oral presentations, and 14 posters on the following themes, all related to the theory, application and generalization of the "sparsity paradigm": Sparsity-driven data sensing and processing; Union of low dimensional subspaces; Beyond linear and convex inverse problem; Matrix/manifold/graph sensing/processing; Blind inverse problems and dictionary learning; Sparsity and computational neuroscience; Information theory, geometry and randomness; Complexity/accuracy tradeoffs in numerical methods; Sparsity? What's next?; Sparse machine learning and inference.Comment: 69 pages, 24 extended abstracts, iTWIST'14 website: http://sites.google.com/site/itwist1

    Human experience in the natural and built environment : implications for research policy and practice

    Get PDF
    22nd IAPS conference. Edited book of abstracts. 427 pp. University of Strathclyde, Sheffield and West of Scotland Publication. ISBN: 978-0-94-764988-3

    Nonlinear and distributed sensory estimation

    Get PDF
    Methods to improve performance of sensors with regard to sensor nonlinearity, sensor noise and sensor bandwidths are investigated and new algorithms are developed. The necessity of the proposed research has evolved from the ever-increasing need for greater precision and improved reliability in sensor measurements. After describing the current state of the art of sensor related issues like nonlinearity and bandwidth, research goals are set to create a new trend on the usage of sensors. We begin the investigation with a detailed distortion analysis of nonlinear sensors. A need for efficient distortion compensation procedures is further justified by showing how a slight deviation from the linearity assumption leads to a very severe distortion in time and in frequency domains. It is argued that with a suitable distortion compensation technique the danger of having an infinite bandwidth nonlinear sensory operation, which is dictated by nonlinear distortion, can be avoided. Several distortion compensation techniques are developed and their performance is validated by simulation and experimental results. Like any other model-based technique, modeling errors or model uncertainty affects performance of the proposed scheme, this leads to the innovation of robust signal reconstruction. A treatment for this problem is given and a novel technique, which uses a nominal model instead of an accurate model and produces the results that are robust to model uncertainty, is developed. The means to attain a high operating bandwidth are developed by utilizing several low bandwidth pass-band sensors. It is pointed out that instead of using a single sensor to measure a high bandwidth signal, there are many advantages of using an array of several pass-band sensors. Having shown that employment of sensor arrays is an economic incentive and practical, several multi-sensor fusion schemes are developed to facilitate their implementation. Another aspect of this dissertation is to develop means to deal with outliers in sensor measurements. As fault sensor data detection is an essential element of multi-sensor network implementation, which is used to improve system reliability and robustness, several sensor scheduling configurations are derived to identify and to remove outliers

    Multiple Media Correlation: Theory and Applications

    Get PDF
    This thesis introduces multiple media correlation, a new technology for the automatic alignment of multiple media objects such as text, audio, and video. This research began with the question: what can be learned when multiple multimedia components are analyzed simultaneously? Most ongoing research in computational multimedia has focused on queries, indexing, and retrieval within a single media type. Video is compressed and searched independently of audio, text is indexed without regard to temporal relationships it may have to other media data. Multiple media correlation provides a framework for locating and exploiting correlations between multiple, potentially heterogeneous, media streams. The goal is computed synchronization, the determination of temporal and spatial alignments that optimize a correlation function and indicate commonality and synchronization between media objects. The model also provides a basis for comparison of media in unrelated domains. There are many real-world applications for this technology, including speaker localization, musical score alignment, and degraded media realignment. Two applications, text-to-speech alignment and parallel text alignment, are described in detail with experimental validation. Text-to-speech alignment computes the alignment between a textual transcript and speech-based audio. The presented solutions are effective for a wide variety of content and are useful not only for retrieval of content, but in support of automatic captioning of movies and video. Parallel text alignment provides a tool for the comparison of alternative translations of the same document that is particularly useful to the classics scholar interested in comparing translation techniques or styles. The results presented in this thesis include (a) new media models more useful in analysis applications, (b) a theoretical model for multiple media correlation, (c) two practical application solutions that have wide-spread applicability, and (d) Xtrieve, a multimedia database retrieval system that demonstrates this new technology and demonstrates application of multiple media correlation to information retrieval. This thesis demonstrates that computed alignment of media objects is practical and can provide immediate solutions to many information retrieval and content presentation problems. It also introduces a new area for research in media data analysis
    corecore