42 research outputs found

    Segmentation-Based Bounding Box Generation for Omnidirectional Pedestrian Detection

    Full text link
    We propose a segmentation-based bounding box generation method for omnidirectional pedestrian detection that enables detectors to tightly fit bounding boxes to pedestrians without omnidirectional images for training. Due to the wide angle of view, omnidirectional cameras are more cost-effective than standard cameras and hence suitable for large-scale monitoring. The problem of using omnidirectional cameras for pedestrian detection is that the performance of standard pedestrian detectors is likely to be substantially degraded because pedestrians' appearance in omnidirectional images may be rotated to any angle. Existing methods mitigate this issue by transforming images during inference. However, the transformation substantially degrades the detection accuracy and speed. A recently proposed method obviates the transformation by training detectors with omnidirectional images, which instead incurs huge annotation costs. To obviate both the transformation and annotation works, we leverage an existing large-scale object detection dataset. We train a detector with rotated images and tightly fitted bounding box annotations generated from the segmentation annotations in the dataset, resulting in detecting pedestrians in omnidirectional images with tightly fitted bounding boxes. We also develop pseudo-fisheye distortion augmentation, which further enhances the performance. Extensive analysis shows that our detector successfully fits bounding boxes to pedestrians and demonstrates substantial performance improvement.Comment: Pre-print submitted to Journal of Multimedia Tools and Application

    The Diagnosis and Treatment Approach for Oligo-Recurrent and Oligo-Progressive Renal Cell Carcinoma

    Get PDF
    One-third of renal cell carcinomas (RCCs) without metastases develop metastatic disease after extirpative surgery for the primary tumors. The majority of metastatic RCC cases, along with treated primary lesions, involve limited lesions termed “oligo-recurrent” disease. The role of metastasis-directed therapy (MDT), including stereotactic body radiation therapy (SBRT) and metastasectomy, in the treatment of oligo-recurrent RCC has evolved. Although the surgical resection of all lesions alone can have a curative intent, SBRT is a valuable treatment option, especially for patients concurrently receiving systemic therapy. Contemporary immune checkpoint inhibitor (ICI) combination therapies remain central to the management of metastatic RCC. However, one objective of MDT is to delay the initiation of systemic therapies, thereby sparing patients from potentially unnecessary burdens. Undertaking MDT for cases showing progression under systemic therapies, known as “oligo-progression”, can be complex in considering the treatment approach. Its efficacy may be diminished compared to patients with stable disease. SBRT combined with ICI can be a promising treatment for these cases because radiation therapy has been shown to affect the tumor microenvironment and areas beyond the irradiated sites. This may enhance the efficacy of ICIs, although their efficacy has only been demonstrated in clinical trials

    Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

    No full text
    This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.</p

    Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

    Get PDF
    This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method

    Audio-Visual Speech Recognition Using Lip Movement Extracted from Side-Face Images

    No full text
    This paper proposes an audio-visual speech recognition method using lip movement extracted from side-face images to attempt to increase noise-robustness in mobile environments. Although most previous bimodal speech recognition methods use frontal face (lip) images, these methods are not easy for users since they need to hold a device with a camera in front of their face when talking. Our proposed method capturing lip movement using a small camera installed in a handset is more natural, easy and convenient. This method also effectively avoids a decrease of signal-to-noise ratio (SNR) of input speech. Visual features are extracted by optical-flow analysis and combined with audio features in the framework of HMM-based recognition. Phone HMMs are built by the multi-stream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions, and the best improvement is approximately 6% at 5dB SNR

    Audio-Visual Speech Recognition Using New Lip Features Extracted from Side-Face Images

    No full text
    This paper proposes new visual features for audio-visual speech recognition using lip information extracted from side-face images. In order to increase the noise-robustness of speech recognition, we have proposed an audio-visual speech recognition method using speaker lip information extracted from side-face images taken by a small camera installed in a mobile device. Our previous method used only movement information of lips, measured by optical-flow analysis, as a visual feature. However, since shape information of lips is also obviously important, this paper attempts to combine lip-shape information with lip-movement information to improve the audio-visual speech recognition performance. A combination of an angle value between upper and lower lips (lip-angle) and its derivative is extracted as lip-shape features. Effectiveness of the lip-angle features has been evaluated under various SNR conditions. The proposed features improved recognition accuracies in all SNR conditions in comparison with audio-only recognition results. The best improvement of 8.0 % in absolute value was obtained at 5dB SNR condition. Combining the lip-angle features with our previous features extracted by the optical-flow analysis yielded further improvement. These visual features were confirmed to be effective even when the audio HMM used in our method was adapted to noise by the MLLR method. 1

    First- and Second-Generation Practical Syntheses of Chroman-4-one Derivative: A Key Intermediate for the Preparation of SERT/5-HT<sub>1A</sub> Dual Inhibitors

    No full text
    Two approaches to large-scale synthesis of the key intermediate <b>9</b>, a precursor of novel dual inhibitors of SERT/5-HT<sub>1A</sub> receptor, are described. These two approaches each feature a mild and efficient method for construction of the chroman-4-one scaffold, which can be used with substrates containing base-sensitive functionalities and enable synthesis on kilogram scale without chromatographic purification. The first-generation synthesis enables quick delivery of a kilogram quantity of the key intermediate <b>9</b> with only one slurry purification step. On the other hand, the highly practical second-generation synthesis is suitable for the multikilogram campaign
    corecore