550 research outputs found

    Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion

    Get PDF
    The electronic version of this article is the complete one and can be found online at: http://dx.doi.org/10.1186/s13636-015-0063-8Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia information. STD differs from automatic speech recognition (ASR) in that ASR is interested in all the terms/words that appear in the speech data, whereas STD focuses on a selected list of search terms that must be detected within the speech data. This paper presents the systems submitted to the STD ALBAYZIN 2014 evaluation, held as a part of the ALBAYZIN 2014 evaluation campaign within the context of the IberSPEECH 2014 conference. This is the first STD evaluation that deals with Spanish language. The evaluation consists of retrieving the speech files that contain the search terms, indicating their start and end times within the appropriate speech file, along with a score value that reflects the confidence given to the detection of the search term. The evaluation is conducted on a Spanish spontaneous speech database, which comprises a set of talks from workshops and amounts to about 7 h of speech. We present the database, the evaluation metrics, the systems submitted to the evaluation, the results, and a detailed discussion. Four different research groups took part in the evaluation. Evaluation results show reasonable performance for moderate out-of-vocabulary term rate. This paper compares the systems submitted to the evaluation and makes a deep analysis based on some search term properties (term length, in-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and in-language/foreign terms).This work has been partly supported by project CMC-V2 (TEC2012-37585-C02-01) from the Spanish Ministry of Economy and Competitiveness. This research was also funded by the European Regional Development Fund, the Galician Regional Government (GRC2014/024, “Consolidation of Research Units: AtlantTIC Project” CN2012/160)

    Character Recognition

    Get PDF
    Character recognition is one of the pattern recognition technologies that are most widely used in practical applications. This book presents recent advances that are relevant to character recognition, from technical topics such as image processing, feature extraction or classification, to new applications including human-computer interfaces. The goal of this book is to provide a reference source for academic research and for professionals working in the character recognition field

    Deep Neural Networks for Document Processing of Music Score Images

    Get PDF
    [EN] There is an increasing interest in the automatic digitization of medieval music documents. Despite efforts in this field, the detection of the different layers of information on these documents still poses difficulties. The use of Deep Neural Networks techniques has reported outstanding results in many areas related to computer vision. Consequently, in this paper, we study the so-called Convolutional Neural Networks (CNN) for performing the automatic document processing of music score images. This process is focused on layering the image into its constituent parts (namely, background, staff lines, music notes, and text) by training a classifier with examples of these parts. A comprehensive experimentation in terms of the configuration of the networks was carried out, which illustrates interesting results as regards to both the efficiency and effectiveness of these models. In addition, a cross-manuscript adaptation experiment was presented in which the networks are evaluated on a different manuscript from the one they were trained. The results suggest that the CNN is capable of adapting its knowledge, and so starting from a pre-trained CNN reduces (or eliminates) the need for new labeled data.This work was supported by the Social Sciences and Humanities Research Council of Canada, and Universidad de Alicante through grant GRE-16-04.Calvo-Zaragoza, J.; Castellanos, F.; Vigliensoni, G.; Fujinaga, I. (2018). Deep Neural Networks for Document Processing of Music Score Images. Applied Sciences. 8(5). https://doi.org/10.3390/app8050654S85Bainbridge, D., & Bell, T. (2001). Computers and the Humanities, 35(2), 95-121. doi:10.1023/a:1002485918032Byrd, D., & Simonsen, J. G. (2015). Towards a Standard Testbed for Optical Music Recognition: Definitions, Metrics, and Page Images. Journal of New Music Research, 44(3), 169-195. doi:10.1080/09298215.2015.1045424LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444. doi:10.1038/nature14539Rebelo, A., Fujinaga, I., Paszkiewicz, F., Marcal, A. R. S., Guedes, C., & Cardoso, J. S. (2012). Optical music recognition: state-of-the-art and open issues. International Journal of Multimedia Information Retrieval, 1(3), 173-190. doi:10.1007/s13735-012-0004-6Louloudis, G., Gatos, B., Pratikakis, I., & Halatsis, C. (2008). Text line detection in handwritten documents. Pattern Recognition, 41(12), 3758-3772. doi:10.1016/j.patcog.2008.05.011Montagner, I. S., Hirata, N. S. T., & Hirata, R. (2017). Staff removal using image operator learning. Pattern Recognition, 63, 310-320. doi:10.1016/j.patcog.2016.10.002Calvo-Zaragoza, J., MicĂł, L., & Oncina, J. (2016). Music staff removal with supervised pixel classification. International Journal on Document Analysis and Recognition (IJDAR), 19(3), 211-219. doi:10.1007/s10032-016-0266-2Calvo-Zaragoza, J., Pertusa, A., & Oncina, J. (2017). Staff-line detection and removal using a convolutional neural network. Machine Vision and Applications, 28(5-6), 665-674. doi:10.1007/s00138-017-0844-4Shelhamer, E., Long, J., & Darrell, T. (2017). Fully Convolutional Networks for Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4), 640-651. doi:10.1109/tpami.2016.2572683Kato, Z. (2011). Markov Random Fields in Image Segmentation. Foundations and TrendsÂź in Signal Processing, 5(1-2), 1-155. doi:10.1561/2000000035Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. doi:10.1109/5.72679

    Arbitrary Keyword Spotting in Handwritten Documents

    Get PDF
    Despite the existence of electronic media in today’s world, a considerable amount of written communications is in paper form such as books, bank cheques, contracts, etc. There is an increasing demand for the automation of information extraction, classification, search, and retrieval of documents. The goal of this research is to develop a complete methodology for the spotting of arbitrary keywords in handwritten document images. We propose a top-down approach to the spotting of keywords in document images. Our approach is composed of two major steps: segmentation and decision. In the former, we generate the word hypotheses. In the latter, we decide whether a generated word hypothesis is a specific keyword or not. We carry out the decision step through a two-level classification where first, we assign an input image to a keyword or non-keyword class; and then transcribe the image if it is passed as a keyword. By reducing the problem from the image domain to the text domain, we do not only address the search problem in handwritten documents, but also the classification and retrieval, without the need for the transcription of the whole document image. The main contribution of this thesis is the development of a generalized minimum edit distance for handwritten words, and to prove that this distance is equivalent to an Ergodic Hidden Markov Model (EHMM). To the best of our knowledge, this work is the first to present an exact 2D model for the temporal information in handwriting while satisfying practical constraints. Some other contributions of this research include: 1) removal of page margins based on corner detection in projection profiles; 2) removal of noise patterns in handwritten images using expectation maximization and fuzzy inference systems; 3) extraction of text lines based on fast Fourier-based steerable filtering; 4) segmentation of characters based on skeletal graphs; and 5) merging of broken characters based on graph partitioning. Our experiments with a benchmark database of handwritten English documents and a real-world collection of handwritten French documents indicate that, even without any word/document-level training, our results are comparable with two state-of-the-art word spotting systems for English and French documents

    ALBAYZIN 2018 spoken term detection evaluation: a multi-domain international evaluation in Spanish

    Get PDF
    [Abstract] Search on speech (SoS) is a challenging area due to the huge amount of information stored in audio and video repositories. Spoken term detection (STD) is an SoS-related task aiming to retrieve data from a speech repository given a textual representation of a search term (which can include one or more words). This paper presents a multi-domain internationally open evaluation for STD in Spanish. The evaluation has been designed carefully so that several analyses of the main results can be carried out. The evaluation task aims at retrieving the speech files that contain the terms, providing their start and end times, and a score that reflects the confidence given to the detection. Three different Spanish speech databases that encompass different domains have been employed in the evaluation: the MAVIR database, which comprises a set of talks from workshops; the RTVE database, which includes broadcast news programs; and the COREMAH database, which contains 2-people spontaneous speech conversations about different topics. We present the evaluation itself, the three databases, the evaluation metric, the systems submitted to the evaluation, the results, and detailed post-evaluation analyses based on some term properties (within-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and native/foreign terms). Fusion results of the primary systems submitted to the evaluation are also presented. Three different research groups took part in the evaluation, and 11 different systems were submitted. The obtained results suggest that the STD task is still in progress and performance is highly sensitive to changes in the data domain.Ministerio de Economía y Competitividad; TIN2015-64282-R,Ministerio de Economía y Competitividad; RTI2018-093336-B-C22Ministerio de Economía y Competitividad; TEC2015-65345-PXunta de Galicia; ED431B 2016/035Xunta de Galicia; GPC ED431B 2019/003Xunta de Galicia; GRC 2014/024Xunta de Galicia; ED431G/01Xunta de Galicia; ED431G/04Agrupación estratéxica consolidada; GIU16/68Ministerio de Economía y Competitividad; TEC2015-68172-C2-1-

    Analysis of High-dimensional and Left-censored Data with Applications in Lipidomics and Genomics

    Get PDF
    Recently, there has been an occurrence of new kinds of high- throughput measurement techniques enabling biological research to focus on fundamental building blocks of living organisms such as genes, proteins, and lipids. In sync with the new type of data that is referred to as the omics data, modern data analysis techniques have emerged. Much of such research is focusing on finding biomarkers for detection of abnormalities in the health status of a person as well as on learning unobservable network structures representing functional associations of biological regulatory systems. The omics data have certain specific qualities such as left-censored observations due to the limitations of the measurement instruments, missing data, non-normal observations and very large dimensionality, and the interest often lies in the connections between the large number of variables. There are two major aims in this thesis. First is to provide efficient methodology for dealing with various types of missing or censored omics data that can be used for visualisation and biomarker discovery based on, for example, regularised regression techniques. Maximum likelihood based covariance estimation method for data with censored values is developed and the algorithms are described in detail. Second major aim is to develop novel approaches for detecting interactions displaying functional associations from large-scale observations. For more complicated data connections, a technique based on partial least squares regression is investigated. The technique is applied for network construction as well as for differential network analyses both on multiple imputed censored data and next- generation sequencing count data.Uudet mittausteknologiat ovat mahdollistaneet kokonaisvaltaisen ymmÀrryksen lisÀÀmisen elollisten organismien molekyylitason prosesseista. Niin kutsutut omiikka-teknologiat, kuten genomiikka, proteomiikka ja lipidomiikka, kykenevÀt tuottamaan valtavia mÀÀriÀ mittausdataa yksittÀisten geenien, proteiinien ja lipidien ekspressio- tai konsentraatiotasoista ennennÀkemÀttömÀllÀ tarkkuudella. Samanaikaisesti tarve uusien analyysimenetelmien kehittÀmiselle on kasvanut. Kiinnostuksen kohteena ovat olleet erityisesti tiettyjen sairauksien riskiÀ tai prognoosia ennustavien merkkiaineiden tunnistaminen sekÀ biologisten verkkojen rekonstruointi. Omiikka-aineistoilla on useita erityisominaisuuksia, jotka rajoittavat tavanomaisten menetelmien suoraa ja tehokasta soveltamista. NÀistÀ tÀrkeimpiÀ ovat vasemmalta sensuroidut ja puuttuvat havainnot, sekÀ havaittujen muuttujien suuri lukumÀÀrÀ. TÀmÀn vÀitöskirjan ensimmÀisenÀ tavoitteena on tarjota rÀÀtÀlöityjÀ analyysimenetelmiÀ epÀtÀydellisten omiikka-aineistojen visualisointiin ja mallin valintaan kÀyttÀen esimerkiksi regularisoituja regressiomalleja. Kuvailemme myös sensuroidulle aineistolle sopivan suurimman uskottavuuden estimaattorin kovarianssimatriisille. Toisena tavoitteena on kehittÀÀ uusia menetelmiÀ omiikka-aineistojen assosiaatiorakenteiden tarkasteluun. Monimutkaisempien rakenteiden tarkasteluun, visualisoimiseen ja vertailuun esitetÀÀn erilaisia variaatioita osittaisen pienimmÀn neliösumman menetelmÀÀn pohjautuvasta algoritmista, jonka avulla voidaan rekonstruoida assosiaatioverkkoja sekÀ multi-imputoidulle sensuroidulle ettÀ lukumÀÀrÀaineistoille.Siirretty Doriast

    Towards developing a reliable medical device for automated epileptic seizure detection in the ICU

    Get PDF
    Abstract. Epilepsy is a prevalent neurological disorder that affects millions of people globally, and its diagnosis typically involves laborious manual inspection of electroencephalography (EEG) data. Automated detection of epileptic seizures in EEG signals could potentially improve diagnostic accuracy and reduce diagnosis time, but there should be special attention to the number of false alarms to reduce unnecessary treatments and costs. This research presents a study on the use of machine learning techniques for EEG seizure detection with the aim of investigating the effectiveness of different algorithms in terms of high sensitivity and low false alarm rates for feature extraction, selection, pre-processing, classification, and post-processing in designing a medical device for detecting seizure activity in EEG data. The current state-of-the-art methods which are validated clinically using large amounts of data are introduced. The study focuses on finding potential machine learning methods, considering KNN, SVM, decision trees and, Random forests, and compares their performance on the task of seizure detection using features introduced in the literature. Also using ensemble methods namely, bootstrapping and majority voting techniques we achieved a sensitivity of 0.80 and FAR/h of 2.10, accuracy of 97.1% and specificity of 98.2%. Overall, the findings of this study can be useful for developing more accurate and efficient algorithms for EEG seizure detection medical device, which can contribute to the early diagnosis and treatment of epilepsy in the intensive care unit for critically ill patients
    • 

    corecore