70 research outputs found

    Audio‐Visual Speaker Tracking

    Get PDF
    Target motion tracking found its application in interdisciplinary fields, including but not limited to surveillance and security, forensic science, intelligent transportation system, driving assistance, monitoring prohibited area, medical science, robotics, action and expression recognition, individual speaker discrimination in multi‐speaker environments and video conferencing in the fields of computer vision and signal processing. Among these applications, speaker tracking in enclosed spaces has been gaining relevance due to the widespread advances of devices and technologies and the necessity for seamless solutions in real‐time tracking and localization of speakers. However, speaker tracking is a challenging task in real‐life scenarios as several distinctive issues influence the tracking process, such as occlusions and an unknown number of speakers. One approach to overcome these issues is to use multi‐modal information, as it conveys complementary information about the state of the speakers compared to single‐modal tracking. To use multi‐modal information, several approaches have been proposed which can be classified into two categories, namely deterministic and stochastic. This chapter aims at providing multimedia researchers with a state‐of‐the‐art overview of tracking methods, which are used for combining multiple modalities to accomplish various multimedia analysis tasks, classifying them into different categories and listing new and future trends in this field

    Deep Gated Recurrent Unit for Smartphone-Based Image Captioning

    Get PDF
    Expressing the visual content of an image in natural language form has gained relevance due to technological and algorithmic advances together with improved computational processing capacity. Many smartphone applications for image captioning have been developed recently as built-in cameras provide advantages of easy-operation and portability, resulting in capturing an image whenever or wherever needed. Here, an encoder-decoder framework based new image captioning approach with a multi-layer gated recurrent unit is proposed. The Inception-v3 convolutional neural network is employed in the encoder due to its capability of more feature extraction from small regions. The proposed recurrent neural network-based decoder utilizes these features in the multi-layer gated recurrent unit to produce a natural language expression word-by-word. Experimental evaluations on the MSCOCO dataset demonstrate that our proposed approach has the advantage over existing approaches consistently across different evaluation metrics. With the integration of the proposed approach to our custom-designed Android application, named “VirtualEye+”, it has great potential to implement image captioning in daily routine

    From Sophisticated Analysis to Colorimetric Determination: Smartphone Spectrometers and Colorimetry

    Get PDF
    Smartphone-based spectrometer and colorimetry have been gaining relevance due to the widespread advances of devices with increasing computational power, their relatively low cost and portable designs with user-friendly interfaces, and their compatibility with data acquisition and processing for “lab-on-a-chip” systems. They find applications in interdisciplinary fields, including but not limited to medical science, water monitoring, agriculture, and chemical and biological sensing. However, spectrometer and colorimetry designs are challenging tasks in real-life scenarios as several distinctive issues influence the quantitative evaluation process, such as ambient light conditions and device independence. Several approaches have been proposed to overcome the aforementioned challenges and to enhance the performance of smartphone-based colorimetric analysis. This chapter aims at providing researchers with a state-of-the-art overview of smartphone-based spectrometer and colorimetry, which includes hardware designs with 3D printers and sensors and software designs with image processing algorithms and smartphone applications. In addition, assay preparation to mimic the real-life testing environments and performance metrics for quantitative evaluation of proposed designs are presented with the list of new and future trends in this field

    Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning

    Full text link
    Automated audio captioning (AAC) which generates textual descriptions of audio content. Existing AAC models achieve good results but only use the high-dimensional representation of the encoder. There is always insufficient information learning of high-dimensional methods owing to high-dimensional representations having a large amount of information. In this paper, a new encoder-decoder model called the Low- and High-Dimensional Feature Fusion (LHDFF) is proposed. LHDFF uses a new PANNs encoder called Residual PANNs (RPANNs) to fuse low- and high-dimensional features. Low-dimensional features contain limited information about specific audio scenes. The fusion of low- and high-dimensional features can improve model performance by repeatedly emphasizing specific audio scene information. To fully exploit the fused features, LHDFF uses a dual transformer decoder structure to generate captions in parallel. Experimental results show that LHDFF outperforms existing audio captioning models.Comment: INTERSPEECH 2023. arXiv admin note: substantial text overlap with arXiv:2210.0503

    Meme Kanserli Olgularda Adjuvan Radyoterapinin Tiroid Fonksiyonları Üzerine Etkisi

    Get PDF
    Amaç: Meme kanseri görülme sıklığı yıllar içerisinde artmasına rağmen meme kanserine bağlı ölüm oranlarında belirgin azalma dikkat çekmektedir. Buna bağlı olarak artan genel sağkalım süreleri, meme kanseri tedavisinde görülen yan etkilerin önemini arttırmıştır. Bu çalışmada meme kanseri nedeniyle adjuvan radyoterapi RT uygulanan olgularda, RT’nin tiroid fonksiyonlarına etkisinin değerlendirilmesi amaçlanmıştır.Gereç ve Yöntem: Bölümümüzde 2009-2017 yılları arasında meme kanseri nedeniyle RT uygulanmış 87 hastanın demografik verileri, tümör evreleri, tedavi modaliteleri, tiroid hacimleri, tedavi öncesi ve sonrası TSH, T3 ve T4 değerleri, RT doz ve hacimleri ile tiroid bezinin doz-volüm değerleri hasta dosyaları, planlama sistemi ve Gazi Üniversitesi Tıp Fakültesi elektronik veritabanından toplanarak değerlendirildi.Veriler SPSS 22.0 dosyasına aktarılarak hipotiroidizm ile RT doz-hacim, tiroid hacmi ilişkisi ki-kare ve parametrik olmayan testlerle analiz edildi.Bulgular: Medyan yaş 50 idi. 87 hastanın 85’i kadın %98 , 2 tanesi erkekti %2 . 59 hastanın metastatik lenf nodu %59 mevcuttu. 53 hastada %61 supraklaviküler alan SKF tedavi hacmi içerisindeydi. Toplamda 12 hastada 14% RT’ye bağlı hipotiroidizm görüldü. Lenf nodu pozitifliği, ve buna bağlı olarak yapılan SKF ışınlaması hipotiroidizm gelişmesi ile anlamlı olarak ilişkili saptandı sırasıyla p=0.013, p=0.003 . Çok değişkenli analizde sadece ortalama doz Dort=24Gy ve V30 %30 değerleri hipotiroidizm gelişmesi açısından istatistiksel olarak anlamlı bulundu sırasıyla p=0.038, p=0.044 . V30 değerlerinin >%30 olduğu olgularda hipotirodizm gelişme riskinin daha yüksek olduğu gözlendi

    Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention

    Full text link
    Audio captioning aims to generate text descriptions of audio clips. In the real world, many objects produce similar sounds. How to accurately recognize ambiguous sounds is a major challenge for audio captioning. In this work, inspired by inherent human multimodal perception, we propose visually-aware audio captioning, which makes use of visual information to help the description of ambiguous sounding objects. Specifically, we introduce an off-the-shelf visual encoder to extract video features and incorporate the visual features into an audio captioning system. Furthermore, to better exploit complementary audio-visual contexts, we propose an audio-visual attention mechanism that adaptively integrates audio and visual context and removes the redundant information in the latent space. Experimental results on AudioCaps, the largest audio captioning dataset, show that our proposed method achieves state-of-the-art results on machine translation metrics.Comment: INTERSPEECH 202

    Minimally invasive repair of pectus excavatum (MIRPE) in adults: is it a proper choice?

    Full text link
    Introduction : The Nuss procedure is suitable for prepubertal and early pubertal patients but can also be used in adult patients. Aim : To determine whether the minimally invasive technique (MIRPE) can also be performed successfully in adults. Material and methods : Between July 2006 and January 2016, 836 patients (744 male, 92 female) underwent correction of pectus excavatum with the MIRPE technique at our institution. The mean age was 16.8 years (2–45 years). There were 236 adult patients (28.2%) (> 18 years) – 20 female, 216 male. The mean age among the adult patients was 23.2 years (18–45 years). The recorded data included length of hospital stay, postoperative complications, number of bars used, duration of the surgical procedure and signs of pneumothorax on the postoperative chest X-ray. Results: MIRPE was performed in 236 adult patients. The average operative time was 44.4 min (25–90 min). The median postoperative stay was 4.92 ±2.81 days (3–21 days) in adults and 4.64 ±1.58 (2–13) in younger patients. The difference was not statistically significant (p = 0.637). Two or more bars were used in 36 (15.8%) adult patients and in 44 (7.5%) younger patients. The difference was not statistically significant either (p = 0.068). Regarding the overall complications, complication rates among the adult patients and younger patients were 26.2% and 11.8% respectively. The difference was statistically significant (p = 0.007). Conclusions : MIRPE is a feasible procedure that produces good long-term results in the treatment of pectus excavatum in adults

    Polymer hydrogel-based microneedles for metformin release

    Get PDF
    Drug delivery devices ensure the effective delivery of a broad range of therapeutics to millions of patients worldwide on a daily basis.1 Microneedles are a class of drug delivery device that provide pain free transdermal delivery with improved patient compliance.2-4 The release of metformin, a drug used in the treatment of cancer and diabetes, from polymer hydrogel-based microneedle patches was demonstrated in vitro. Tuning the composition of the polymer hydrogels enabled preparation of robust microneedle patches with mechanical properties such that they would penetrate skin (insertion force of a single microneedle to be ca. 40 N). Swelling experiments conducted at 20°C, 35°C and 60°C show temperature dependent degrees of swelling and kinetics (Fickian diffusion). Drug release from the hydrogel-based microneedles was fitted to various models (e.g., zero order, first order, second order, Korsmeyer-Peppas, Peppas-Sahlins), observing the best fit for the zero-order model. Such microneedles have potential application for transdermal delivery of metformin for the treatment of cancer and diabetes

    Poly(2-Hydroxyethyl Methacrylate) Hydrogel-Based Microneedles for Metformin Release

    Get PDF
    The release of metformin, a drug used in the treatment of cancer and diabetes, from poly(2-hydroxyethyl methacrylate), pHEMA, hydrogel-based microneedle patches is demonstrated in vitro. Tuning the composition of the pHEMA hydrogels enables preparation of robust microneedle patches with mechanical properties such that they would penetrate skin (insertion force of a single microneedle to be ≈40 N). Swelling experiments conducted at 20, 35, and 60 °C show temperature-dependent degrees of swelling and diffusion kinetics. Drug release from the pHEMA hydrogel-based microneedles is fitted to various models (e.g., zero order, first order, second order). Such pHEMA microneedles have potential application for transdermal delivery of metformin for the treatment of aging, cancer, diabetes, etc
    corecore