Search CORE

2,972 research outputs found

Lattice score based data cleaning for phrase-based statistical machine translation

Author: Carson-Berndsen Julie
Jiang Jie
Way Andy
Publication venue: European Association for Machine Translation
Publication date: 01/05/2010
Field of study

Statistical machine translation relies heavily on parallel corpora to train its models for translation tasks. While more and more bilingual corpora are readily available, the quality of the sentence pairs should be taken into consideration. This paper presents a novel lattice score-based data cleaning method to select proper sentence pairs from the ones extracted from a bilingual corpus by the sentence alignment methods. The proposed method is carried out as follows: firstly, an initial phrasebased model is trained on the full sentencealigned corpus; then for each of the sentence pairs in the corpus, word alignments are used to create anchor pairs and sourceside lattices; thirdly, based on the translation model, target-side phrase networks are expanded on the lattices and Viterbi searching is used to find approximated decoding results; finally, BLEU score thresholds are used to filter out the low-score sentence pairs for the data cleaning purpose. Our experiments on the FBIS corpus showed improvements of BLEU score from 23.78 to 24.02 in Chinese-English

CiteSeerX

Irish Universities

DCU Online Research Access Service

Recent Developments in Smart Healthcare

Author: Tie Qiu (Ed.)
Wenbing Zhao (Ed.)
Xiong Luo (Ed.)
Publication venue: MDPI - Multidisciplinary Digital Publishing Institute
Publication date: 01/01/2018
Field of study

Medicine is undergoing a sector-wide transformation thanks to the advances in computing and networking technologies. Healthcare is changing from reactive and hospital-centered to preventive and personalized, from disease focused to well-being centered. In essence, the healthcare systems, as well as fundamental medicine research, are becoming smarter. We anticipate significant improvements in areas ranging from molecular genomics and proteomics to decision support for healthcare professionals through big data analytics, to support behavior changes through technology-enabled self-management, and social and motivational support. Furthermore, with smart technologies, healthcare delivery could also be made more efficient, higher quality, and lower cost. In this special issue, we received a total 45 submissions and accepted 19 outstanding papers that roughly span across several interesting topics on smart healthcare, including public health, health information technology (Health IT), and smart medicine

Directory of Open Access Books (DOAB)

i, Poet: Automatic Chinese Poetry Composition through a Generative Summarization Framework under Constrained Optimization

Author: Jiang Han
Lapata Mirella
Li Xiaoming
Lin Shou-De
Lv Xueqiang
Yan Rui
Publication venue
Publication date: 01/01/2013
Field of study

Part of the long lasting cultural heritage of China is the classical ancient Chinese poems which follow strict formats and complicated linguistic rules. Automatic Chinese poetry composition by programs is considered as a challenging problem in computational linguistics and requires high Artificial Intelligence assistance, and has not been well addressed. In this paper, we formulate the poetry composition task as an optimization problem based on a generative summarization framework under several constraints. Given the user specified writing intents, the system retrieves cand idate terms out of a large poem corpus, and then orders these terms to fit into poetry formats, satisfying tonal and rhythm requirements. The optimization process under constraints is conducted via iterative term substitutions till convergence, and outputs the subset with the highest utility as the generated poem. For experiments, we perform generation on large datasets of 61,960 classic poems from Tang and Song Dynasty of China. A comprehensive evaluation, using both human judgments and ROUGE scores, has demonstrated the effectiveness of our proposed approach.EI

Edinburgh Research Explorer

Spoken content retrieval: A survey of techniques and technologies

Author: Ani Nenkova
C A. Nenkova
K. Mckeown
Kathleen Mckeown
Publication venue: 'Now Publishers'
Publication date: 01/01/2012
Field of study

Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

CLiFF Notes: Research In Natural Language Processing at the University of Pennsylvania

Author: Graduate Students Faculty &
Publication venue: ScholarlyCommons
Publication date: 01/12/1990
Field of study

CLIFF is the Computational Linguists\u27 Feedback Forum. We are a group of students and faculty who gather once a week to hear a presentation and discuss work currently in progress. The \u27feedback\u27 in the group\u27s name is important: we are interested in sharing ideas, in discussing ongoing research, and in bringing together work done by the students and faculty in Computer Science and other departments. However, there are only so many presentations which we can have in a year. We felt that it would be beneficial to have a report which would have, in one place, short descriptions of the work in Natural Language Processing at the University of Pennsylvania. This report then, is a collection of abstracts from both faculty and graduate students, in Computer Science, Psychology and Linguistics. We want to stress the close ties between these groups, as one of the things that we pride ourselves on here at Penn is the communication among different departments and the inter-departmental work. Rather than try to summarize the varied work currently underway at Penn, we suggest reading the abstracts to see how the students and faculty themselves describe their work. The report illustrates the diversity of interests among the researchers here, as well as explaining the areas of common interest. In addition, since it was our intent to put together a document that would be useful both inside and outside of the university, we hope that this report will explain to everyone some of what we are about

ScholarlyCommons@Penn

Utilising a modern quality function deployment process in ship modularisation

Author: Shaik Imaad
Publication venue
Publication date: 17/06/2019
Field of study

Rising passenger numbers in the leisure cruise industry has resulted in cruise ships sailing in full capacity. This trend is estimated to grow further by 42% until 2027 causing the cruise companies to order around 120 new ships to be delivered by 2027. The increasing order numbers have put most of the shipyards around the world that build cruise ships to run in full capacity. However, to increase their productivity while staying competitive, the shipyards have to take a different approach to build ships. One such approach is building modularly designed ships. This thesis study forms a part of the case company’s efforts to explore the modular ship design process by utilising a tailor-made Quality Function Deployment approach. Specifically, the study looks into finding a robust approach of generating product requirements by incorporating customers’ desires and wishes with the help of the QFD process recommended by the newly standardised ISO 16355 series of standards. Study of the modular design process, the case company’s internal design process along with the QFD approach recommended by the ISO standard, the author creates a draft QFD process, which is tested out in the shipyard along with the technical experts to get insights on the draft approach. The study also analyses the suggestions to the classical QFD by the ISO documents and recommends the better alternative since not many case studies have been made using the ISO recommended QFD approach. The feedback obtained along with the observations made helped to create a robust tailored QFD approach for the case company to incorporate in their modular product development efforts. Further, the author recommends solutions to eliminate or reduce the impact of the challenges the case company might face while implementing the recommended QFD approach in the new modular design process

Aaltodoc Publication Archive

Modélisation des comportements de recherche basé sur les interactions des utilisateurs

Author: Lugo Martinez Luis Eduardo
Publication venue
Publication date: 15/12/2021
Field of study

Les utilisateurs de systèmes d'information divisent normalement les tâches en une séquence de plusieurs étapes pour les résoudre. En particulier, les utilisateurs divisent les tâches de recherche en séquences de requêtes, en interagissant avec les systèmes de recherche pour mener à bien le processus de recherche d'informations. Les interactions des utilisateurs sont enregistrées dans des journaux de requêtes, ce qui permet de développer des modèles pour apprendre automatiquement les comportements de recherche à partir des interactions des utilisateurs avec les systèmes de recherche. Ces modèles sont à la base de multiples applications d'assistance aux utilisateurs qui aident les systèmes de recherche à être plus interactifs, faciles à utiliser, et cohérents. Par conséquent, nous proposons les contributions suivantes : un modèle neuronale pour apprendre à détecter les limites des tâches de recherche dans les journaux de requête ; une architecture de regroupement profond récurrent qui apprend simultanément les représentations de requête et regroupe les requêtes en tâches de recherche ; un modèle non supervisé et indépendant d'utilisateur pour l'identification des tâches de recherche prenant en charge les requêtes dans seize langues ; et un modèle de tâche de recherche multilingue, une approche non supervisée qui modélise simultanément l'intention de recherche de l'utilisateur et les tâches de recherche. Les modèles proposés améliorent les méthodes existantes de modélisation, en tenant compte de la confidentialité des utilisateurs, des réponses en temps réel et de l'accessibilité linguistique. Le respect de la vie privée de l'utilisateur est une préoccupation majeure, tandis que des réponses rapides sont essentielles pour les systèmes de recherche qui interagissent avec les utilisateurs en temps réel, en particulier dans la recherche par conversation. Dans le même temps, l'accessibilité linguistique est essentielle pour aider les utilisateurs du monde entier, qui interagissent avec les systèmes de recherche dans de nombreuses langues. Les contributions proposées peuvent bénéficier à de nombreuses applications d'assistance aux utilisateurs, en aidant ces derniers à mieux résoudre leurs tâches de recherche lorsqu'ils accèdent aux systèmes de recherche pour répondre à leurs besoins d'information.Users of information systems normally divide tasks in a sequence of multiple steps to solve them. In particular, users divide search tasks into sequences of queries, interacting with search systems to carry out the information seeking process. User interactions are registered on search query logs, enabling the development of models to automatically learn search patterns from the users' interactions with search systems. These models underpin multiple user assisting applications that help search systems to be more interactive, user-friendly, and coherent. User assisting applications include query suggestion, the ranking of search results based on tasks, query reformulation analysis, e-commerce applications, retrieval of advertisement, query-term prediction, mapping of queries to search tasks, and so on. Consequently, we propose the following contributions: a neural model for learning to detect search task boundaries in query logs; a recurrent deep clustering architecture that simultaneously learns query representations through self-training, and cluster queries into groups of search tasks; Multilingual Graph-Based Clustering, an unsupervised, user-agnostic model for search task identification supporting queries in sixteen languages; and Language-agnostic Search Task Model, an unsupervised approach that simultaneously models user search intent and search tasks. Proposed models improve on existing methods for modeling user interactions, taking into account user privacy, realtime response times, and language accessibility. User privacy is a major concern in Ethics for intelligent systems, while fast responses are critical for search systems interacting with users in realtime, particularly in conversational search. At the same time, language accessibility is essential to assist users worldwide, who interact with search systems in many languages. The proposed contributions can benefit many user assisting applications, helping users to better solve their search tasks when accessing search systems to fulfill their information needs

Thèses en Ligne

Scientific Publications of the University of Toulouse II Le Mirail

Thèses en ligne de l'Université Toulouse III - Paul Sabatier

Recommended from our members

Multimodal News Summarization, Tracking and Annotation Incorporating Tensor Analysis of Memes

Author: Tsai Chun-Yu
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2017
Field of study

We demonstrate four novel multimodal methods for efficient video summarization and comprehensive cross-cultural news video understanding. First, For video quick browsing, we demonstrate a multimedia event recounting system. Based on nine people-oriented design principles, it summarizes YouTube-like videos into short visual segments (812sec) and textual words (less than 10 terms). In the 2013 Trecvid Multimedia Event Recounting competition, this system placed first in recognition time efficiency, while remaining above average in description accuracy. Secondly, we demonstrate the summarization of large amounts of online international news videos. In order to understand an international event such as Ebola virus, AirAsia Flight 8501 and Zika virus comprehensively, we present a novel and efficient constrained tensor factorization algorithm that first represents a video archive of multimedia news stories concerning a news event as a sparse tensor of order 4. The dimensions correspond to extracted visual memes, verbal tags, time periods, and cultures. The iterative algorithm approximately but accurately extracts coherent quad-clusters, each of which represents a significant summary of an important independent aspect of the news event. We give examples of quad-clusters extracted from tensors with at least 108 entries derived from international news coverage. We show the method is fast, can be tuned to give preferences to any subset of its four dimensions, and exceeds three existing methods in performance. Thirdly, noting that the co-occurrence of visual memes and tags in our summarization result is sparse, we show how to model cross-cultural visual meme influence based on normalized PageRank, which more accurately captures the rates at which visual memes are reposted in a specified time period in a specified culture. Lastly, we establish the correspondences of videos and text descriptions in different cultures by reliable visual cues, detect culture-specific tags for visual memes and then annotate videos in a cultural settings. Starting with any video with less text or no text in one culture (say, US), we select candidate annotations in the text of another culture (say, China) to annotate US video. Through analyzing the similarity of images annotated by those candidates, we can derive a set of proper tags from the viewpoints of another culture (China). We illustrate cultural-based annotation examples by segments of international news. We evaluate the generated tags by cross-cultural tag frequency, tag precision, and user studies

Columbia University Academic Commons