Search CORE

1,384 research outputs found

A non-Gaussian continuous state space model for asset degradation

Author: Ma Lin
Mathew Joseph
Zhou Yifan
Publication venue: Springer-Verlag London Ltd.
Publication date: 01/01/2008
Field of study

The degradation model plays an essential role in asset life prediction and condition based maintenance. Various degradation models have been proposed. Within these models, the state space model has the ability to combine degradation data and failure event data. The state space model is also an effective approach to deal with the multiple observations and missing data issues. Using the state space degradation model, the deterioration process of assets is presented by a system state process which can be revealed by a sequence of observations. Current research largely assumes that the underlying system development process is discrete in time or states. Although some models have been developed to consider continuous time and space, these state space models are based on the Wiener process with the Gaussian assumption. This paper proposes a Gamma-based state space degradation model in order to remove the Gaussian assumption. Both condition monitoring observations and failure events are considered in the model so as to improve the accuracy of asset life prediction. A simulation study is carried out to illustrate the application procedure of the proposed model

Queensland University of Technology ePrints Archive

Bridging SMT and TM with translation recommendation

Author: He Yifan
Ma Yanjun
van Genabith Josef
Way Andy
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 11/07/2010
Field of study

We propose a translation recommendation framework to integrate Statistical Machine Translation (SMT) output with Translation Memory (TM) systems. The framework recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for post-editing than the hits provided by the TM. We describe an implementation of this framework using an SVM binary classifier. We exploit methods to fine-tune the classifier and investigate a variety of features of different types. We rely on automatic MT evaluation metrics to approximate human judgements in our experiments. Experimental results show that our system can achieve 0.85 precision at 0.89 recall, excluding exact matches. futhermore, it is possible for the end-user to achieve a desired balance between precision and recall by adjusting confidence levels

Irish Universities

DCU Online Research Access Service

Integrating N-best SMT outputs into a TM system

Author: He Yifan
Ma Yanjun
van Genabith Josef
Way Andy
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2010
Field of study

In this paper, we propose a novel frame- work to enrich Translation Memory (TM) systems with Statistical Machine Translation (SMT) outputs using ranking. In order to offer the human translators multiple choices, instead of only using the top SMT output and top TM hit, we merge the N-best output from the SMT system and the k-best hits with highest fuzzy match scores from the TM system. The merged list is then ranked according to the prospective post-editing effort and provided to the translators to aid their work. Experiments show that our ranked output achieve 0.8747 precision at top 1 and 0.8134 precision at top 5. Our framework facilitates a tight integration between SMT and TM, where full advantage is taken of TM while high quality SMT output is availed of to improve the productivity of human translators

Irish Universities

DCU Online Research Access Service

Improving the post-editing experience using translation recommendation: a user study

Author: He Yifan
Ma Yanjun
Roturier Johann
van Genabith Josef
Way Andy
Publication venue: Association for Machine Translation in the Americas
Publication date: 01/01/2010
Field of study

We report findings from a user study with professional post-editors using a translation recommendation framework (He et al., 2010) to integrate Statistical Machine Translation (SMT) output with Translation Memory (TM) systems. The framework recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for post-editing than the hits provided by the TM. We analyze the effectiveness of the model as well as the reaction of potential users. Based on the performance statistics and the users’comments, we find that translation recommendation can reduce the workload of professional post-editors and improve the acceptance of MT in the localization industry

Irish Universities

DCU Online Research Access Service

Transferring Procedural Knowledge across Commonsense Tasks

Author: Ilievski Filip
Jiang Yifan
Ma Kaixin
Publication venue
Publication date: 27/07/2023
Field of study

Stories about everyday situations are an essential part of human communication, motivating the need to develop AI agents that can reliably understand these stories. Despite the long list of supervised methods for story completion and procedural understanding, current AI has no mechanisms to automatically track and explain procedures in unseen stories. To bridge this gap, we study the ability of AI models to transfer procedural knowledge to novel narrative tasks in a transparent manner. We design LEAP: a comprehensive framework that integrates state-of-the-art modeling architectures, training regimes, and augmentation strategies based on both natural and synthetic stories. To address the lack of densely annotated training data, we devise a robust automatic labeler based on few-shot prompting to enhance the augmented data. Our experiments with in- and out-of-domain tasks reveal insights into the interplay of different architectures, training regimes, and augmentation strategies. LEAP's labeler has a clear positive impact on out-of-domain datasets, while the resulting dense annotation provides native explainability

arXiv.org e-Print Archive

LIDER: An Efficient High-dimensional Learned Index for Large-scale Dense Passage Retrieval

Author: Ma Haodi
Wang Daisy Zhe
Wang Yifan
Publication venue
Publication date: 09/10/2022
Field of study

Many recent approaches of passage retrieval are using dense embeddings generated from deep neural models, called "dense passage retrieval". The state-of-the-art end-to-end dense passage retrieval systems normally deploy a deep neural model followed by an approximate nearest neighbor (ANN) search module. The model generates embeddings of the corpus and queries, which are then indexed and searched by the high-performance ANN module. With the increasing data scale, the ANN module unavoidably becomes the bottleneck on efficiency. An alternative is the learned index, which achieves significantly high search efficiency by learning the data distribution and predicting the target data location. But most of the existing learned indexes are designed for low dimensional data, which are not suitable for dense passage retrieval with high-dimensional dense embeddings. In this paper, we propose LIDER, an efficient high-dimensional Learned Index for large-scale DEnse passage Retrieval. LIDER has a clustering-based hierarchical architecture formed by two layers of core models. As the basic unit of LIDER to index and search data, a core model includes an adapted recursive model index (RMI) and a dimension reduction component which consists of an extended SortingKeys-LSH (SK-LSH) and a key re-scaling module. The dimension reduction component reduces the high-dimensional dense embeddings into one-dimensional keys and sorts them in a specific order, which are then used by the RMI to make fast prediction. Experiments show that LIDER has a higher search speed with high retrieval quality comparing to the state-of-the-art ANN indexes on passage retrieval tasks, e.g., on large-scale data it achieves 1.2x search speed and significantly higher retrieval quality than the fastest baseline in our evaluation. Furthermore, LIDER has a better capability of speed-quality trade-off.Comment: Accepted by VLDB 202

arXiv.org e-Print Archive