Search CORE

50 research outputs found

Unified Segment-to-Segment Framework for Simultaneous Sequence Generation

Author: Feng Yang
Zhang Shaolei
Publication venue
Publication date: 30/11/2023
Field of study

Simultaneous sequence generation is a pivotal task for real-time scenarios, such as streaming speech recognition, simultaneous machine translation and simultaneous speech translation, where the target sequence is generated while receiving the source sequence. The crux of achieving high-quality generation with low latency lies in identifying the optimal moments for generating, accomplished by learning a mapping between the source and target sequences. However, existing methods often rely on task-specific heuristics for different sequence types, limiting the model's capacity to adaptively learn the source-target mapping and hindering the exploration of multi-task learning for various simultaneous tasks. In this paper, we propose a unified segment-to-segment framework (Seg2Seg) for simultaneous sequence generation, which learns the mapping in an adaptive and unified manner. During the process of simultaneous generation, the model alternates between waiting for a source segment and generating a target segment, making the segment serve as the natural bridge between the source and target. To accomplish this, Seg2Seg introduces a latent segment as the pivot between source to target and explores all potential source-target mappings via the proposed expectation training, thereby learning the optimal moments for generating. Experiments on multiple simultaneous generation tasks demonstrate that Seg2Seg achieves state-of-the-art performance and exhibits better generality across various tasks.Comment: Accepted at NeurIPS 202

arXiv.org e-Print Archive

End-to-End Simultaneous Speech Translation with Differentiable Segmentation

Author: Feng Yang
Zhang Shaolei
Publication venue
Publication date: 17/06/2023
Field of study

End-to-end simultaneous speech translation (SimulST) outputs translation while receiving the streaming speech inputs (a.k.a. streaming speech translation), and hence needs to segment the speech inputs and then translate based on the current received speech. However, segmenting the speech inputs at unfavorable moments can disrupt the acoustic integrity and adversely affect the performance of the translation model. Therefore, learning to segment the speech inputs at those moments that are beneficial for the translation model to produce high-quality translation is the key to SimulST. Existing SimulST methods, either using the fixed-length segmentation or external segmentation model, always separate segmentation from the underlying translation model, where the gap results in segmentation outcomes that are not necessarily beneficial for the translation process. In this paper, we propose Differentiable Segmentation (DiSeg) for SimulST to directly learn segmentation from the underlying translation model. DiSeg turns hard segmentation into differentiable through the proposed expectation training, enabling it to be jointly trained with the translation model and thereby learn translation-beneficial segmentation. Experimental results demonstrate that DiSeg achieves state-of-the-art performance and exhibits superior segmentation capability.Comment: Accepted at ACL 2023 finding

arXiv.org e-Print Archive

Simultaneous Machine Translation with Tailored Reference

Author: Feng Yang
Guo Shoutao
Zhang Shaolei
Publication venue
Publication date: 25/10/2023
Field of study

Simultaneous machine translation (SiMT) generates translation while reading the whole source sentence. However, existing SiMT models are typically trained using the same reference disregarding the varying amounts of available source information at different latency. Training the model with ground-truth at low latency may introduce forced anticipations, whereas utilizing reference consistent with the source word order at high latency results in performance degradation. Consequently, it is crucial to train the SiMT model with appropriate reference that avoids forced anticipations during training while maintaining high quality. In this paper, we propose a novel method that provides tailored reference for the SiMT models trained at different latency by rephrasing the ground-truth. Specifically, we introduce the tailor, induced by reinforcement learning, to modify ground-truth to the tailored reference. The SiMT model is trained with the tailored reference and jointly optimized with the tailor to enhance performance. Importantly, our method is applicable to a wide range of current SiMT approaches. Experiments on three translation tasks demonstrate that our method achieves state-of-the-art performance in both fixed and adaptive policies.Comment: Accepted to EMNLP 2023; 15 pages, 8 figure

arXiv.org e-Print Archive

Learning Optimal Policy for Simultaneous Machine Translation via Binary Search

Author: Feng Yang
Guo Shoutao
Zhang Shaolei
Publication venue
Publication date: 27/05/2023
Field of study

Simultaneous machine translation (SiMT) starts to output translation while reading the source sentence and needs a precise policy to decide when to output the generated translation. Therefore, the policy determines the number of source tokens read during the translation of each target token. However, it is difficult to learn a precise translation policy to achieve good latency-quality trade-offs, because there is no golden policy corresponding to parallel sentences as explicit supervision. In this paper, we present a new method for constructing the optimal policy online via binary search. By employing explicit supervision, our approach enables the SiMT model to learn the optimal policy, which can guide the model in completing the translation during inference. Experiments on four translation tasks show that our method can exceed strong baselines across all latency scenarios.Comment: Accepted to ACL 2023. 14 pages, 5 figure

arXiv.org e-Print Archive

SiLLM: Large Language Models for Simultaneous Machine Translation

Author: Feng Yang
Guo Shoutao
Ma Zhengrui
Zhang Min
Zhang Shaolei
Publication venue
Publication date: 20/02/2024
Field of study

Simultaneous Machine Translation (SiMT) generates translations while reading the source sentence, necessitating a policy to determine the optimal timing for reading and generating words. Despite the remarkable performance achieved by Large Language Models (LLM) across various NLP tasks, existing SiMT methods predominantly focus on conventional transformers, employing a single model to concurrently determine the policy and generate the translations. However, given the complexity of SiMT, it is challenging to effectively address both tasks with a single model. Therefore, there is a need to decouple the SiMT task into policy-decision and translation sub-tasks. We propose SiLLM, which delegates the two sub-tasks to separate agents, thereby incorporating LLM into SiMT. The policy-decision agent is managed by a conventional SiMT model, responsible for determining the translation policy. The translation agent, leveraging the capabilities of LLM, generates translation using the partial source sentence. The two agents collaborate to accomplish SiMT. To facilitate the application of token-level policies determined by conventional SiMT models to LLM, we propose a word-level policy adapted for LLM. Experiments on two datasets demonstrate that, with a small amount of data for fine-tuning LLM, SiLLM attains state-of-the-art performance.Comment: 13 pages, 6 tables, 7 figure

arXiv.org e-Print Archive

Non-autoregressive Streaming Transformer for Simultaneous Translation

Author: Feng Yang
Guo Shoutao
Ma Zhengrui
Shao Chenze
Zhang Min
Zhang Shaolei
Publication venue
Publication date: 23/10/2023
Field of study

Simultaneous machine translation (SiMT) models are trained to strike a balance between latency and translation quality. However, training these models to achieve high quality while maintaining low latency often leads to a tendency for aggressive anticipation. We argue that such issue stems from the autoregressive architecture upon which most existing SiMT models are built. To address those issues, we propose non-autoregressive streaming Transformer (NAST) which comprises a unidirectional encoder and a non-autoregressive decoder with intra-chunk parallelism. We enable NAST to generate the blank token or repetitive tokens to adjust its READ/WRITE strategy flexibly, and train it to maximize the non-monotonic latent alignment with an alignment-based latency loss. Experiments on various SiMT benchmarks demonstrate that NAST outperforms previous strong autoregressive SiMT baselines.Comment: EMNLP 2023 main conference; Source code is available at https://github.com/ictnlp/NAS

arXiv.org e-Print Archive

Neuroform stent – assisted coil embolization: New treatment strategy for complex intracranial aneurysms with midterm results

Author: Alhothi Ali Ismail
Guo Shaolei
Huang Zhengsong
Liang Feng
Qi Tiewei
Shi Songzhong
Yang Lixuan
Publication venue: London Academic Publishing
Publication date: 15/05/2010
Field of study

Objective: To present detailed results of our treatment experience in using Neuroform Stent-Assisted Coil embolization to treat complex cerebral aneurysms over 3-year period, emphasizing on the technical difficulties, procedure-related complications, and to evaluate midterm results. Methods: Patients underwent Neuroform stent-assisted coil embolization were registered in a database. We assessed patients’ history, aneurysm morphology, indications for stenting, and technical details of the procedures, complications and midterm follow-up data.Results: This study included twenty-six patients with 39 aneurysms. A total of 32 of 39 aneurysms were treated by Neuroform stent-assisted embolization (SAC). Three anuerysms stented without coiling, 2 aneurysms coiled without stenting and 2 aneuysms surgically clipped. The indications for use included broad-necked aneurysms (n = 28), giant or large aneurysms (n = 6), and fusiform aneurysms (n = 5). Of the 32 aneurysms treated by Neuroform SAC, we achieved complete (100%) and near complete (> 95%) occlusion in 27 aneurysms, and Partial (< 95%) occlusion in 5 aneurysms. Follow-up angiographic data avialble in 22 of 32 aneurysms treated by Neuroform SAC (68.7%) (average follow-up, 12 mo; range 4–24 mo) demonstrating recanalization in 3 aneurysms (13.6%), and stable occlusion in 19 aneurysms (86.4%). No delayed progressive embolization or in-stent stenosis observed. Conclusion: Neuroform microstent system led to a significant evolution in the endovascular treatment of complex intracranial aneurysms. Our results and midterm follow-up showed Neuroform stent-assisted coil embolization is safe and effective technique in the treatment of complex cerebral aneurysms. Although, the clinically significant complications are uncommon and the evaluation at midterm follow-up is encouraging, further studies needed to assess the long-term stability and durability of the stent

London Academic Publishing Ltd.: Arts & Humanities Journals

Recommended from our members

A Hierarchical, HMMbased Accuracy for a Digital Library of Books

Author: Feng Shaolei
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/06/2006
Field of study

A number of projects are creating searchable digital libraries of printed books. These include the Million Book Project, the Google Book project and similar eorts from Yahoo and Microsoft. Content-based on line book retrieval usually requires rst converting printed text into machine readable (e.g. ASCII) text using an optical character recognition (OCR) engine and then doing full text search on the results. Many of these books are old and there are a variety of processing steps that are required to create an end to end system. Changing any step (including the scanning process) can aect OCR performance and hence a good automatic statistical evaluation of OCR performance on book length material is needed. Evaluating OCR performance on the entire book is non-trivial. The only easily obtainable ground truth (the Gutenberg e-texts) must be automatically aligned with the OCR output over the entire length of a book. This may be viewed as equivalent to the problem of aligning two large (easily a million long) sequences. The problem is further complicated by OCR errors as well as the possibility of large chunks of missing material in one of the sequences. We propose a Hidden Markov Model (HMM) based hierarchical alignment algorithm to align OCR output and the ground truth for books. We believe this is the rst work to automatically align a whole book without using any book structure information. The alignment process works by breaking up the problem of aligning two long sequences into the problem of aligning many smaller subsequences. This can be rapidly and eectively done. Experimental results show that our hierarchical alignment approach works very well even if OCR output has a high recognition error rate. Finally, we evaluate the performance of a commercial OCR engine over a large dataset of books based on the alignment results

ScholarWorks@UMass Amherst

Statistical models for text query-based image retrieval

Author: Feng Shaolei
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2008
Field of study

Image indexing and retrieval has been an active research area for more than one decade. Although many accomplishments have been made in this domain, it is still a challenging problem and far from being solved. Traditional content-based approaches make use of queries based on image examples or image attributes like color and texture, and images are retrieved according to the similarity of each target image with the query image. However, image query based retrieval systems do not really capture the semantics or meanings of images well. Furthermore, image queries are difficult and inconvenient to form for most users. To capture the semantics of images, libraries and other organizations have manually annotated each image with keywords and captions, and then search on those annotations using text retrieval engines. The disadvantage of this approach is the huge cost of annotating large number of images and the inconsistency of annotations by different people. In this work, we focus on general image and historical handwritten document retrieval based on textual queries. We explore statistical model based techniques that allow us to retrieve general images and historical handwritten document images with text queries. These techniques are (i) image retrieval based on automatic annotation, (ii) direct retrieval based on computing the posterior of an image given a text query, and (iii) handwritten document image recognition. We compare the performance of these approaches on several general image and historical handwritten document collections. The main contributions of this work include (i) two probabilistic generative models for annotation-based retrieval, (ii) a direct retrieval model for general images, and (iii) a thorough investigation of machine learning models for handwritten document recognition. Our experimental results and retrieval systems show that our proposed approaches may be applied to practical textual query based retrieval systems on large image data sets

CiteSeerX

ScholarWorks@UMass Amherst

Postface

Author: Shaolei Feng
Publication venue: 'OpenEdition'
Publication date: 05/02/2013
Field of study

Conjointement organisée par la Société chinoise d’études de l’histoire de France, la Maison des sciences de l’homme, l’université Paris-I Panthéon-Sorbonne et l’Institut de recherche sur les relations internationales et le développement régional de l’université normale supérieure de l’Est de la Chine (ECNU), l’Université d’automne a déjà six ans d’histoire derrière elle. Chaque édition de l’université présente les derniers travaux de recherches, au plus haut niveau, portant sur l’histoire et ..

OpenEdition