84 research outputs found

    Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset

    Full text link
    Audio signals represent a wide diversity of acoustic events, from background environmental noise to spoken communication. Machine learning models such as neural networks have already been proposed for audio signal modeling, where recurrent structures can take advantage of temporal dependencies. This work aims to study the implementation of several neural network-based systems for speech and music event detection over a collection of 77,937 10-second audio segments (216 h), selected from the Google AudioSet dataset. These segments belong to YouTube videos and have been represented as mel-spectrograms. We propose and compare two approaches. The first one is the training of two different neural networks, one for speech detection and another for music detection. The second approach consists on training a single neural network to tackle both tasks at the same time. The studied architectures include fully connected, convolutional and LSTM (long short-term memory) recurrent networks. Comparative results are provided in terms of classification performance and model complexity. We would like to highlight the performance of convolutional architectures, specially in combination with an LSTM stage. The hybrid convolutional-LSTM models achieve the best overall results (85% accuracy) in the three proposed tasks. Furthermore, a distractor analysis of the results has been carried out in order to identify which events in the ontology are the most harmful for the performance of the models, showing some difficult scenarios for the detection of music and speechThis work has been supported by project “DSSL: Redes Profundas y Modelos de Subespacios para Deteccion y Seguimiento de Locutor, Idioma y Enfermedades Degenerativas a partir de la Voz” (TEC2015-68172-C2-1-P), funded by the Ministry of Economy and Competitivity of Spain and FEDE

    Influence of Variety and Storage Time of Fresh Garlic on the Physicochemical and Antioxidant Properties of Black Garlic

    Get PDF
    Black garlic is made from the fresh kind, submitting it to a controlled temperature (~65 C) and humidity (>85 C) for a prolonged period of time. The aim of this study was to assess the di erences in the process and in the final product as a result of employing three garlic varieties (Spanish Roja, Chinese Spring and California White), and to check the influence of the storage time on fresh garlic in the quality of the final product by using garlic obtained in two di erent agricultural seasons, that of the current year (2014) and of the previous one (2013). The results revealed some di erences in the parameters analysed during the manufacturing of the black garlic from the three varieties used, and even according to the harvest in question. However, when comparing initial and final values of the samples, a very similar evolution in their acidity, reducing sugars, Brix, pH, polyphenol content, and antioxidant capacity was note

    Voxceleb-ESP: preliminary experiments detecting Spanish celebrities from their voices

    Full text link
    This paper presents VoxCeleb-ESP, a collection of pointers and timestamps to YouTube videos facilitating the creation of a novel speaker recognition dataset. VoxCeleb-ESP captures real-world scenarios, incorporating diverse speaking styles, noises, and channel distortions. It includes 160 Spanish celebrities spanning various categories, ensuring a representative distribution across age groups and geographic regions in Spain. We provide two speaker trial lists for speaker identification tasks, each of them with same-video or different-video target trials respectively, accompanied by a cross-lingual evaluation of ResNet pretrained models. Preliminary speaker identification results suggest that the complexity of the detection task in VoxCeleb-ESP is equivalent to that of the original and much larger VoxCeleb in English. VoxCeleb-ESP contributes to the expansion of speaker recognition benchmarks with a comprehensive and diverse dataset for the Spanish language

    Score-based Bayesian network structure learning algorithms for modeling radioisotope levels in nuclear power plant reactors

    Get PDF
    Radioactive corrosion products released into the primary coolant loop dominate the final shutdown radiation fields of pressurized water reactors. Thus, reducing the concentration of these corrosion products is a paramount duty in the optimization process of the reactor performance. However, the complexity and uncertainty present in this process make it difficult to predict their evolution in a theoretical way. We propose the application of structural learning of Bayesian networks to discover the complex relations between the corrosion products and the most relevant variables in the primary loop, giving rise to probabilistic models that obtain accurate and reliable predictions of the corrosion products. Our analysis of 5 power plants demonstrates that our approach results in simpler and more reliable models. Additionally, we conclude that the learned structures may represent an interpretable tool for power plant technicians since they reveal useful information that can be directly employed to improve the reactor operationThe authors from the UAM have been supported by the Spanish Ministerio de Ciencia e Innovación, Agencia del Fondo Europeo de Desarrollo Regional (grant reference PID2021-125943OB-I00, MCIN /AEI /10.13039/501100011033/FEDER, UE). The work has been conducted in the context of a signed collaboration agreement between AUDIAS-UAM and ENUSA Industrias Avanzadas S.

    Gaussian Processes for radiation dose prediction in nuclear power plant reactors

    Full text link
    In nuclear power plants, there are high-exposure jobs, like refuelling and maintenance, that require getting close to the reactor between operation cycles. Therefore, reducing radiation dose during these periods is of paramount importance regarding safety regulations. While there are some manipulable variables, like levels of certain corrosion products, that can influence the final level of radiation dose, there is no way to determine it in a principled way. In this work, we propose to use Machine Learning to predict the radiation dose in the reactor at the cycle end based on information available during the cycle operation. In particular, we use a Gaussian Process to model the relation between cobalt radioisotopes (a certain kind of corrosion product) and radiation dose levels. Gaussian Processes acknowledge the uncertainty on their predictions, a desirable property considering the high-risk nature of the present application. We report experiments on real data gathered from five different power plants in Spain. Results show that these models can be used to estimate the future values of radiation dose in a data-driven way. Moreover, there are tools based on these models currently in development for their application in power plantsThe authors from the UAM are funded by the Spanish Ministerio de Ciencia, Innovacion y Universidades (MCIU) and Agencia Estatal de Investigacion (AEI), and also by the European Regional Development Fund (FEDER in Spanish, ERDF in English), by project RTI2018-098091- B-I00. The work has been conducted in the context of a signed collaboration agreement between AUDIAS-UAM and ENUSA Industrias Avanzadas S. A

    Physicochemical Characterization and Biological Activities of Black and White Garlic: In Vivo and In Vitro Assays

    Get PDF
    White and three types of black garlic (13, 32, and 45 days of aging, named 0C1, 1C2, and 2C1, respectively) were selected to study possible differences in their nutraceutic potential. For this purpose, garlic were physicochemically characterized (Brix, pH, aW, L, polyphenol, and antioxidant capacity), and both in vivo and in vitro assays were carried out. Black garlic samples showed higher polyphenol content and antioxidant capacity than the white ones. The biological assays showed that none of the samples (neither raw nor black garlic) produced toxic effects in the Drosophila melanogaster animal genetic model, nor exerted protective effects against H2O2, with the exception of the 0C1 black garlic. Moreover, only white garlic was genotoxic at the highest concentration. On the other hand, 0C1 black garlic was the most antigenotoxic substance. The in vivo longevity assays showed significant extension of lifespan at some concentrations of white and 0C1and 1C2 black garlic. The in vitro experiments showed that all of the garlic samples induced a decrease in leukemia cell growth. However, no type of garlic was able to induce proapoptotic internucleosomal DNA fragmentation. Taking into account the physicochemical and biological data, black garlic could be considered a potential functional food and used in the preventive treatment of age-related diseases. In addition, our findings could be relevant for black-garlic-processing agrifood companies, as the economical and timing costs can significantly be shortened from 45 to 13 days of aging

    Dual-energy contrast-enhanced digital mammography: initial clinical results of a multireader, multicase study

    Get PDF
    Abstract Introduction The purpose of this study was to compare the diagnostic accuracy of dual-energy contrast-enhanced digital mammography (CEDM) as an adjunct to mammography (MX) ± ultrasonography (US) with the diagnostic accuracy of MX ± US alone. Methods One hundred ten consenting women with 148 breast lesions (84 malignant, 64 benign) underwent two-view dual-energy CEDM in addition to MX and US using a specially modified digital mammography system (Senographe DS, GE Healthcare). Reference standard was histology for 138 lesions and follow-up for 12 lesions. Six radiologists from 4 institutions interpreted the images using high-resolution softcopy workstations. Confidence of presence (5-point scale), probability of cancer (7-point scale), and BI-RADS scores were evaluated for each finding. Sensitivity, specificity and ROC curve areas were estimated for each reader and overall. Visibility of findings on MX ± CEDM and MX ± US was evaluated with a Likert scale. Results The average per-lesion sensitivity across all readers was significantly higher for MX ± US ± CEDM than for MX ± US (0.78 vs. 0.71 using BIRADS, p = 0.006). All readers improved their clinical performance and the average area under the ROC curve was significantly superior for MX ± US ± CEDM than for MX ± US ((0.87 vs 0.83, p = 0.045). Finding visibility was similar or better on MX ± CEDM than MX ± US in 80% of cases. Conclusions Dual-energy contrast-enhanced digital mammography as an adjunct to MX ± US improves diagnostic accuracy compared to MX ± US alone. Addition of iodinated contrast agent to MX facilitates the visualization of breast lesions

    An evaluation of the variability of tumor-shape definition derived by experienced observers from CT images of supraglottic carcinomas (ACRIN protocol 6658)

    Get PDF
    Accurate target definition is considered essential for sophisticated, image-guided radiation therapy; however, relatively little information has been reported that measures our ability to identify the precise shape of targets accurately. We decided to assess the manner in which eight “experts” interpreted the size and shape of tumors based on “real life” contrast-enhanced CT scans

    Accuracy of CT Colonography for Detection of Large Adenomas and Cancers

    Get PDF
    Background Computed tomographic (CT) colonography is a noninvasive option in screening for colorectal cancer. However, its accuracy as a screening tool in asymptomatic adults has not been well defined. Methods We recruited 2600 asymptomatic study participants, 50 years of age or older, at 15 study centers. CT colonographic images were acquired with the use of standard bowel preparation, stool and fluid tagging, mechanical insufflation, and multidetector-row CT scanners (with 16 or more rows). Radiologists trained in CT colonography reported all lesions measuring 5 mm or more in diameter. Optical colonoscopy and histologic review were performed according to established clinical protocols at each center and served as the reference standard. The primary end point was detection by CT colonography of histologically confirmed large adenomas and adenocarcinomas (10 mm in diameter or larger) that had been detected by colonoscopy; detection of smaller colorectal lesions (6 to 9 mm in diameter) was also evaluated. Results Complete data were available for 2531 participants (97%). For large adenomas and cancers, the mean (±SE) per-patient estimates of the sensitivity, specificity, positive and negative predictive values, and area under the receiver-operating-characteristic curve for CT colonography were 0.90±0.03, 0.86±0.02, 0.23±0.02, 0.99± Conclusions In this study of asymptomatic adults, CT colonographic screening identified 90% of subjects with adenomas or cancers measuring 10 mm or more in diameter. These findings augment published data on the role of CT colonography in screening patients with an average risk of colorectal cancer. (ClinicalTrials.gov number, NCT00084929; American College of Radiology Imaging Network [ACRIN] number, 6664.

    Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks

    Full text link
    Zazo R, Lozano-Diez A, Gonzalez-Dominguez J, T. Toledano D, Gonzalez-Rodriguez J (2016) Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks. PLoS ONE 11(1): e0146917. doi:10.1371/journal.pone.0146917Long Short Term Memory (LSTM) Recurrent Neural Networks (RNNs) have recently outperformed other state-of-the-art approaches, such as i-vector and Deep Neural Networks (DNNs), in automatic Language Identification (LID), particularly when dealing with very short utterances (similar to 3s). In this contribution we present an open-source, end-to-end, LSTM RNN system running on limited computational resources (a single GPU) that outperforms a reference i-vector system on a subset of the NIST Language Recognition Evaluation (8 target languages, 3s task) by up to a 26%. This result is in line with previously published research using proprietary LSTM implementations and huge computational resources, which made these former results hardly reproducible. Further, we extend those previous experiments modeling unseen languages (out of set, OOS, modeling), which is crucial in real applications. Results show that a LSTM RNN with OOS modeling is able to detect these languages and generalizes robustly to unseen OOS languages. Finally, we also analyze the effect of even more limited test data (from 2.25s to 0.1s) proving that with as little as 0.5s an accuracy of over 50% can be achieved.This work has been supported by project CMC-V2: Caracterizacion, Modelado y Compensacion de Variabilidad en la Señal de Voz (TEC2012-37585-C02-01), funded by Ministerio de Economia y Competitividad, Spain
    corecore