30 research outputs found
Minimization of Back-Electron Transfer Enables the Elusive sp3 C–H Functionalization of Secondary Anilines
Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech Recognition
Achieving high accuracy with low latency has always been a challenge in
streaming end-to-end automatic speech recognition (ASR) systems. By attending
to more future contexts, a streaming ASR model achieves higher accuracy but
results in larger latency, which hurts the streaming performance. In the
Mask-CTC framework, an encoder network is trained to learn the feature
representation that anticipates long-term contexts, which is desirable for
streaming ASR. Mask-CTC-based encoder pre-training has been shown beneficial in
achieving low latency and high accuracy for triggered attention-based ASR.
However, the effectiveness of this method has not been demonstrated for various
model architectures, nor has it been verified that the encoder has the expected
look-ahead capability to reduce latency. This study, therefore, examines the
effectiveness of Mask-CTCbased pre-training for models with different
architectures, such as Transformer-Transducer and contextual block streaming
ASR. We also discuss the effect of the proposed pre-training method on
obtaining accurate output spike timing.Comment: Accepted to EUSIPCO 202
Conversation-oriented ASR with multi-look-ahead CBS architecture
During conversations, humans are capable of inferring the intention of the
speaker at any point of the speech to prepare the following action promptly.
Such ability is also the key for conversational systems to achieve rhythmic and
natural conversation. To perform this, the automatic speech recognition (ASR)
used for transcribing the speech in real-time must achieve high accuracy
without delay. In streaming ASR, high accuracy is assured by attending to
look-ahead frames, which leads to delay increments. To tackle this trade-off
issue, we propose a multiple latency streaming ASR to achieve high accuracy
with zero look-ahead. The proposed system contains two encoders that operate in
parallel, where a primary encoder generates accurate outputs utilizing
look-ahead frames, and the auxiliary encoder recognizes the look-ahead portion
of the primary encoder without look-ahead. The proposed system is constructed
based on contextual block streaming (CBS) architecture, which leverages block
processing and has a high affinity for the multiple latency architecture.
Various methods are also studied for architecting the system, including
shifting the network to perform as different encoders; as well as generating
both encoders' outputs in one encoding pass.Comment: Submitted to ICASSP202
Crop Classification and LAI Estimation Using Original and Resolution-Reduced Images from Two Consumer-Grade Cameras
Consumer-grade cameras are being increasingly used for remote sensing applications in recent years. However, the performance of this type of cameras has not been systematically tested and well documented in the literature. The objective of this research was to evaluate the performance of original and resolution-reduced images taken from two consumer-grade cameras, a RGB camera and a modified near-infrared (NIR) camera, for crop identification and leaf area index (LAI) estimation. Airborne RGB and NIR images taken over a 6.5-square-km cropping area were mosaicked and aligned to create a four-band mosaic with a spatial resolution of 0.4 m. The spatial resolution of the mosaic was then reduced to 1, 2, 4, 10, 15 and 30 m for comparison. Six supervised classifiers were applied to the RGB images and the four-band images for crop identification, and 10 vegetation indices (VIs) derived from the images were related to ground-measured LAI. Accuracy assessment showed that maximum likelihood applied to the 0.4-m images achieved an overall accuracy of 83.3% for the RGB image and 90.4% for the four-band image. Regression analysis showed that the 10 VIs explained 58.7% to 83.1% of the variability in LAI. Moreover, spatial resolutions at 0.4, 1, 2 and 4 m achieved better classification results for both crop identification and LAI prediction than the coarser spatial resolutions at 10, 15 and 30 m. The results from this study indicate that imagery from consumer-grade cameras can be a useful data source for crop identification and canopy cover estimation
Crop Classification and LAI Estimation Using Original and Resolution-Reduced Images from Two Consumer-Grade Cameras
Consumer-grade cameras are being increasingly used for remote sensing applications in recent years. However, the performance of this type of cameras has not been systematically tested and well documented in the literature. The objective of this research was to evaluate the performance of original and resolution-reduced images taken from two consumer-grade cameras, a RGB camera and a modified near-infrared (NIR) camera, for crop identification and leaf area index (LAI) estimation. Airborne RGB and NIR images taken over a 6.5-square-km cropping area were mosaicked and aligned to create a four-band mosaic with a spatial resolution of 0.4 m. The spatial resolution of the mosaic was then reduced to 1, 2, 4, 10, 15 and 30 m for comparison. Six supervised classifiers were applied to the RGB images and the four-band images for crop identification, and 10 vegetation indices (VIs) derived from the images were related to ground-measured LAI. Accuracy assessment showed that maximum likelihood applied to the 0.4-m images achieved an overall accuracy of 83.3% for the RGB image and 90.4% for the four-band image. Regression analysis showed that the 10 VIs explained 58.7% to 83.1% of the variability in LAI. Moreover, spatial resolutions at 0.4, 1, 2 and 4 m achieved better classification results for both crop identification and LAI prediction than the coarser spatial resolutions at 10, 15 and 30 m. The results from this study indicate that imagery from consumer-grade cameras can be a useful data source for crop identification and canopy cover estimation
Crop Classification and LAI Estimation Using Original and Resolution-Reduced Images from Two Consumer-Grade Cameras
Consumer-grade cameras are being increasingly used for remote sensing applications in recent years. However, the performance of this type of cameras has not been systematically tested and well documented in the literature. The objective of this research was to evaluate the performance of original and resolution-reduced images taken from two consumer-grade cameras, a RGB camera and a modified near-infrared (NIR) camera, for crop identification and leaf area index (LAI) estimation. Airborne RGB and NIR images taken over a 6.5-square-km cropping area were mosaicked and aligned to create a four-band mosaic with a spatial resolution of 0.4 m. The spatial resolution of the mosaic was then reduced to 1, 2, 4, 10, 15 and 30 m for comparison. Six supervised classifiers were applied to the RGB images and the four-band images for crop identification, and 10 vegetation indices (VIs) derived from the images were related to ground-measured LAI. Accuracy assessment showed that maximum likelihood applied to the 0.4-m images achieved an overall accuracy of 83.3% for the RGB image and 90.4% for the four-band image. Regression analysis showed that the 10 VIs explained 58.7% to 83.1% of the variability in LAI. Moreover, spatial resolutions at 0.4, 1, 2 and 4 m achieved better classification results for both crop identification and LAI prediction than the coarser spatial resolutions at 10, 15 and 30 m. The results from this study indicate that imagery from consumer-grade cameras can be a useful data source for crop identification and canopy cover estimation
Registration for Optical Multimodal Remote Sensing Images Based on FAST Detection,Window Selection, and Histogram Specification
In recent years, digital frame cameras have been increasingly used for remote sensing applications. However, it is always a challenge to align or register images captured with different cameras or different imaging sensor units. In this research, a novel registration method was proposed. Coarse registration was first applied to approximately align the sensed and reference images. Window selection was then used to reduce the search space and a histogram specification was applied to optimize the grayscale similarity between the images. After comparisons with other commonly-used detectors, the fast corner detector, FAST (Features from Accelerated Segment Test), was selected to extract the feature points. The matching point pairs were then detected between the images, the outliers were eliminated, and geometric transformation was performed. The appropriate window size was searched and set to one-tenth of the image width. The images that were acquired by a two-camera system, a camera with five imaging sensors, and a camera with replaceable filters mounted on a manned aircraft, an unmanned aerial vehicle, and a ground-based platform, respectively, were used to evaluate the performance of the proposed method. The image analysis results showed that, through the appropriate window selection and histogram specification, the number of correctly matched point pairs had increased by 11.30 times, and that the correct matching rate had increased by 36%, compared with the results based on FAST alone. The root mean square error (RMSE) in the x and y directions was generally within 0.5 pixels. In comparison with the binary robust invariant scalable keypoints (BRISK), curvature scale space (CSS), Harris, speed up robust features (SURF), and commercial software ERDAS and ENVI, this method resulted in larger numbers of correct matching pairs and smaller, more consistent RMSE. Furthermore, it was not necessary to choose any tie control points manually before registration. The results from this study indicate that the proposed method can be effective for registering optical multimodal remote sensing images that have been captured with different imaging sensors
Crop Classification and LAI Estimation Using Original and Resolution-Reduced Images from Two Consumer-Grade Cameras
Consumer-grade cameras are being increasingly used for remote sensing applications in recent years. However, the performance of this type of cameras has not been systematically tested and well documented in the literature. The objective of this research was to evaluate the performance of original and resolution-reduced images taken from two consumer-grade cameras, a RGB camera and a modified near-infrared (NIR) camera, for crop identification and leaf area index (LAI) estimation. Airborne RGB and NIR images taken over a 6.5-square-km cropping area were mosaicked and aligned to create a four-band mosaic with a spatial resolution of 0.4 m. The spatial resolution of the mosaic was then reduced to 1, 2, 4, 10, 15 and 30 m for comparison. Six supervised classifiers were applied to the RGB images and the four-band images for crop identification, and 10 vegetation indices (VIs) derived from the images were related to ground-measured LAI. Accuracy assessment showed that maximum likelihood applied to the 0.4-m images achieved an overall accuracy of 83.3% for the RGB image and 90.4% for the four-band image. Regression analysis showed that the 10 VIs explained 58.7% to 83.1% of the variability in LAI. Moreover, spatial resolutions at 0.4, 1, 2 and 4 m achieved better classification results for both crop identification and LAI prediction than the coarser spatial resolutions at 10, 15 and 30 m. The results from this study indicate that imagery from consumer-grade cameras can be a useful data source for crop identification and canopy cover estimation
Streaming Automatic Speech Recognition with Low Latency and High Accuracy
早稲田大学修士(工学)master thesi
Registration for Optical Multimodal Remote Sensing Images Based on FAST Detection,Window Selection, and Histogram Specification
In recent years, digital frame cameras have been increasingly used for remote sensing applications. However, it is always a challenge to align or register images captured with different cameras or different imaging sensor units. In this research, a novel registration method was proposed. Coarse registration was first applied to approximately align the sensed and reference images. Window selection was then used to reduce the search space and a histogram specification was applied to optimize the grayscale similarity between the images. After comparisons with other commonly-used detectors, the fast corner detector, FAST (Features from Accelerated Segment Test), was selected to extract the feature points. The matching point pairs were then detected between the images, the outliers were eliminated, and geometric transformation was performed. The appropriate window size was searched and set to one-tenth of the image width. The images that were acquired by a two-camera system, a camera with five imaging sensors, and a camera with replaceable filters mounted on a manned aircraft, an unmanned aerial vehicle, and a ground-based platform, respectively, were used to evaluate the performance of the proposed method. The image analysis results showed that, through the appropriate window selection and histogram specification, the number of correctly matched point pairs had increased by 11.30 times, and that the correct matching rate had increased by 36%, compared with the results based on FAST alone. The root mean square error (RMSE) in the x and y directions was generally within 0.5 pixels. In comparison with the binary robust invariant scalable keypoints (BRISK), curvature scale space (CSS), Harris, speed up robust features (SURF), and commercial software ERDAS and ENVI, this method resulted in larger numbers of correct matching pairs and smaller, more consistent RMSE. Furthermore, it was not necessary to choose any tie control points manually before registration. The results from this study indicate that the proposed method can be effective for registering optical multimodal remote sensing images that have been captured with different imaging sensors