Search CORE

488 research outputs found

Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources

Author: EHRMANN MAUD
TURCHI MARCO
Publication venue: Centro de Innovación y Desarrollo Tecnológico en Cómputo, Instituto Politécnico Nacional, Mexico
Publication date: 09/08/2011
Field of study

Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen

JRC Publications Repository

Ramex-Forum: a tool for displaying and analysing complex sequential patterns of financial products

Author: Cavique Luís
Marques Nuno C.
Tiple Pedro
Publication venue: 'Wiley'
Publication date: 01/01/2016
Field of study

Financial data provides a valuable up‐to‐date knowledge of the world economy. However, it is presented in extremely large data volumes, in diverse formats, and is constantly being updated at a high speed. The Ramex‐Forum algorithm is oriented to guide financial experts in finding new and relevant information.We present a sensitivity analysis and newvisualizations using an improved version of the Ramex‐Forum algorithm. The proposed algorithm is applied to two case studies – the petroleum production chain and the European financial institutions risk analysis. Different combinations of parameters and new ways to visualize data are used. Results highlight the importance of Ramex‐Forum for analysing relevant relationships in price variations in financial markets.info:eu-repo/semantics/publishedVersio

Crossref

Repositório Aberto da Universidade Aberta

Summarization from Medical Documents: A Survey

Author: Alfred
Barzilay
Becher
Busemann
Cios Krzysztof
Dalianis
DeJong
Ebadollahi
Edmundson
Elhadad
Endres-Niggemeyer
Endres-Niggemeyer
Endres-Niggemeyer
Endres-Niggemeyer
Endres-Niggemeyer
Futrelle
Gaizauskas
Hersh
Johnson
Kan
Kan
Karkaletsis
Klavans
Lenci
Luhn
Mani
Mani
Mann
Marcu
McKeown
McKeown
Merlino
Merlino
Oepen
Paice
Paice
Panagiotis Stamatopoulos
Pierrakos
Radev
Radev
Reiter
Reiter
Saggion
Salton
Sparck-Jones
Stergos Afantenos
Vangelis Karkaletsis
Woodall
Xenarios
Xingquan
Zabih
Zechner
Publication venue: 'Elsevier BV'
Publication date: 13/04/2005
Field of study

Objective: The aim of this paper is to survey the recent work in medical documents summarization. Background: During the last decade, documents summarization got increasing attention by the AI research community. More recently it also attracted the interest of the medical research community as well, due to the enormous growth of information that is available to the physicians and researchers in medicine, through the large and growing number of published journals, conference proceedings, medical sites and portals on the World Wide Web, electronic medical records, etc. Methodology: This survey gives first a general background on documents summarization, presenting the factors that summarization depends upon, discussing evaluation issues and describing briefly the various types of summarization techniques. It then examines the characteristics of the medical domain through the different types of medical documents. Finally, it presents and discusses the summarization techniques used so far in the medical domain, referring to the corresponding systems and their characteristics. Discussion and conclusions: The paper discusses thoroughly the promising paths for future research in medical documents summarization. It mainly focuses on the issue of scaling to large collections of documents in various languages and from different media, on personalization issues, on portability to new sub-domains, and on the integration of summarization technology in practical applicationsComment: 21 pages, 4 table

arXiv.org e-Print Archive

Crossref

Semantification of text through summarisation

Author: Joshi Monika
Publication venue
Publication date: 01/03/2019
Field of study

Ulster University's Research Portal

Learning to Correct Noisy Labels for Fine-Grained Entity Typing via Co-Prediction Prompt Tuning

Author: He Yongquan
Lin Yang
Tang Minghao
Xu Hongbo
Xu Yongxiu
Zhang Wenyuan
Publication venue
Publication date: 23/10/2023
Field of study

Fine-grained entity typing (FET) is an essential task in natural language processing that aims to assign semantic types to entities in text. However, FET poses a major challenge known as the noise labeling problem, whereby current methods rely on estimating noise distribution to identify noisy labels but are confused by diverse noise distribution deviation. To address this limitation, we introduce Co-Prediction Prompt Tuning for noise correction in FET, which leverages multiple prediction results to identify and correct noisy labels. Specifically, we integrate prediction results to recall labeled labels and utilize a differentiated margin to identify inaccurate labels. Moreover, we design an optimization objective concerning divergent co-predictions during fine-tuning, ensuring that the model captures sufficient information and maintains robustness in noise identification. Experimental results on three widely-used FET datasets demonstrate that our noise correction approach significantly enhances the quality of various types of training samples, including those annotated using distant supervision, ChatGPT, and crowdsourcing.Comment: Accepted by Findings of EMNLP 2023, 11 page

arXiv.org e-Print Archive

RFID Technology in Intelligent Tracking Systems in Construction Waste Logistics Using Optimisation Techniques

Author: ATKINS Anthony
YU Hongnian
ZHANG Lizong
Publication venue: Fourth International Conference on Software, Knowledge,
Publication date: 01/08/2010
Field of study

Construction waste disposal is an urgent issue for protecting our environment. This paper proposes a waste management system and illustrates the work process using plasterboard waste as an example, which creates a hazardous gas when land filled with household waste, and for which the recycling rate is less than 10% in the UK. The proposed system integrates RFID technology, Rule-Based Reasoning, Ant Colony optimization and knowledge technology for auditing and tracking plasterboard waste, guiding the operation staff, arranging vehicles, schedule planning, and also provides evidence to verify its disposal. It h relies on RFID equipment for collecting logistical data and uses digital imaging equipment to give further evidence; the reasoning core in the third layer is responsible for generating schedules and route plans and guidance, and the last layer delivers the result to inform users. The paper firstly introduces the current plasterboard disposal situation and addresses the logistical problem that is now the main barrier to a higher recycling rate, followed by discussion of the proposed system in terms of both system level structure and process structure. And finally, an example scenario will be given to illustrate the system’s utilization

STORE - Staffordshire Online Repository

Personalization platform for multimodal ubiquitous computing applications

Author: Santos Pedro Emanuel Albuquerque e Baptista dos
Publication venue: Faculdade de Ciências e Tecnologia
Publication date: 01/01/2013
Field of study

Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaWe currently live surrounded by a myriad of computing devices running multiple applications. In general, the user experience on each of those scenarios is not adapted to each user’s specific needs, without personalization and integration across scenarios. Moreover, developers usually do not have the right tools to handle that in a standard and generic way. As such, a personalization platform may provide those tools. This kind of platform should be readily available to be used by any developer. Therefore, it must be developed to be available over the Internet. With the advances in IT infrastructure, it is now possible to develop reliable and scalable services running on abstract and virtualized platforms. Those are some of the advantages of cloud computing, which offers a model of utility computing where customers are able to dynamically allocate the resources they need and are charged accordingly. This work focuses on the creation of a cloud-based personalization platform built on a previously developed generic user modeling framework. It provides user profiling and context-awareness tools to third-party developers. A public display-based application was also developed. It provides useful information to students, teachers and others in a university campus as they are detected by Bluetooth scanning. It uses the personalization platform as the basis to select the most relevant information in each situation, while a mobile application was developed to be used as an input mechanism. A user study was conducted to assess the usefulness of the application and to validate some design choices. The results were mostly positive

Repositório da Universidade Nova de Lisboa

Harnessing big data to inform tourism destination management organizations

Author: Fonseca João Pedro Martins Ribeiro da
Publication venue
Publication date: 25/01/2019
Field of study

Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceIn the last few years, Portugal has been witnessing a rapid growth of tourism, which reflects positively in many aspects, especially in what regards economic factors. Although, it also leads to a number of challenges, all of them difficult to quantify: tourist congestions, loss of city identity, degradation of patrimony, etc. It is important to ensure that the required foundations and tools to understand and efficiently manage tourism flows exist, both in the city-level and country-level. This thesis studies the potential of Big data to inform destination management organizations. To do so, three sources of Big data are discussed: Telecom, Social media and Airbnb data. This is done through the demonstration and analysis of a set of visualizations and tools, as well as a discussion of applications and recommendations for challenges that have been identified in the market. The study begins with a background information section, where both global and local trends in tourism will be analyzed, as well as the factors that affect tourism and consequences of the latter. As a way to analyze the growth of tourism in Portugal and provide prototypes of important tools for the development of data driven tourism policy making, Airbnb and telecom data are analyzed using a network science approach to visualize country-wide tourist circulation and presents a model to retrieve and analyze social media. In order to compare the results from the Airbnb analysis, data regarding the Portuguese hotel industry is used as control data

Repositório da Universidade Nova de Lisboa

A Data-driven Methodology Towards Mobility- and Traffic-related Big Spatiotemporal Data Frameworks

Author: Figueiras Paulo Alves
Publication venue
Publication date: 01/12/2021
Field of study

Human population is increasing at unprecedented rates, particularly in urban areas. This increase, along with the rise of a more economically empowered middle class, brings new and complex challenges to the mobility of people within urban areas. To tackle such challenges, transportation and mobility authorities and operators are trying to adopt innovative Big Data-driven Mobility- and Traffic-related solutions. Such solutions will help decision-making processes that aim to ease the load on an already overloaded transport infrastructure. The information collected from day-to-day mobility and traffic can help to mitigate some of such mobility challenges in urban areas. Road infrastructure and traffic management operators (RITMOs) face several limitations to effectively extract value from the exponentially growing volumes of mobility- and traffic-related Big Spatiotemporal Data (MobiTrafficBD) that are being acquired and gathered. Research about the topics of Big Data, Spatiotemporal Data and specially MobiTrafficBD is scattered, and existing literature does not offer a concrete, common methodological approach to setup, configure, deploy and use a complete Big Data-based framework to manage the lifecycle of mobility-related spatiotemporal data, mainly focused on geo-referenced time series (GRTS) and spatiotemporal events (ST Events), extract value from it and support decision-making processes of RITMOs. This doctoral thesis proposes a data-driven, prescriptive methodological approach towards the design, development and deployment of MobiTrafficBD Frameworks focused on GRTS and ST Events. Besides a thorough literature review on Spatiotemporal Data, Big Data and the merging of these two fields through MobiTraffiBD, the methodological approach comprises a set of general characteristics, technical requirements, logical components, data flows and technological infrastructure models, as well as guidelines and best practices that aim to guide researchers, practitioners and stakeholders, such as RITMOs, throughout the design, development and deployment phases of any MobiTrafficBD Framework. This work is intended to be a supporting methodological guide, based on widely used Reference Architectures and guidelines for Big Data, but enriched with inherent characteristics and concerns brought about by Big Spatiotemporal Data, such as in the case of GRTS and ST Events. The proposed methodology was evaluated and demonstrated in various real-world use cases that deployed MobiTrafficBD-based Data Management, Processing, Analytics and Visualisation methods, tools and technologies, under the umbrella of several research projects funded by the European Commission and the Portuguese Government.A população humana cresce a um ritmo sem precedentes, particularmente nas áreas urbanas. Este aumento, aliado ao robustecimento de uma classe média com maior poder económico, introduzem novos e complexos desafios na mobilidade de pessoas em áreas urbanas. Para abordar estes desafios, autoridades e operadores de transportes e mobilidade estão a adotar soluções inovadoras no domínio dos sistemas de Dados em Larga Escala nos domínios da Mobilidade e Tráfego. Estas soluções irão apoiar os processos de decisão com o intuito de libertar uma infraestrutura de estradas e transportes já sobrecarregada. A informação colecionada da mobilidade diária e da utilização da infraestrutura de estradas pode ajudar na mitigação de alguns dos desafios da mobilidade urbana. Os operadores de gestão de trânsito e de infraestruturas de estradas (em inglês, road infrastructure and traffic management operators — RITMOs) estão limitados no que toca a extrair valor de um sempre crescente volume de Dados Espaciotemporais em Larga Escala no domínio da Mobilidade e Tráfego (em inglês, Mobility- and Traffic-related Big Spatiotemporal Data —MobiTrafficBD) que estão a ser colecionados e recolhidos. Os trabalhos de investigação sobre os tópicos de Big Data, Dados Espaciotemporais e, especialmente, de MobiTrafficBD, estão dispersos, e a literatura existente não oferece uma metodologia comum e concreta para preparar, configurar, implementar e usar uma plataforma (framework) baseada em tecnologias Big Data para gerir o ciclo de vida de dados espaciotemporais em larga escala, com ênfase nas série temporais georreferenciadas (em inglês, geo-referenced time series — GRTS) e eventos espacio- temporais (em inglês, spatiotemporal events — ST Events), extrair valor destes dados e apoiar os RITMOs nos seus processos de decisão. Esta dissertação doutoral propõe uma metodologia prescritiva orientada a dados, para o design, desenvolvimento e implementação de plataformas de MobiTrafficBD, focadas em GRTS e ST Events. Além de uma revisão de literatura completa nas áreas de Dados Espaciotemporais, Big Data e na junção destas áreas através do conceito de MobiTrafficBD, a metodologia proposta contem um conjunto de características gerais, requisitos técnicos, componentes lógicos, fluxos de dados e modelos de infraestrutura tecnológica, bem como diretrizes e boas práticas para investigadores, profissionais e outras partes interessadas, como RITMOs, com o objetivo de guiá-los pelas fases de design, desenvolvimento e implementação de qualquer pla- taforma MobiTrafficBD. Este trabalho deve ser visto como um guia metodológico de suporte, baseado em Arqui- teturas de Referência e diretrizes amplamente utilizadas, mas enriquecido com as característi- cas e assuntos implícitos relacionados com Dados Espaciotemporais em Larga Escala, como no caso de GRTS e ST Events. A metodologia proposta foi avaliada e demonstrada em vários cenários reais no âmbito de projetos de investigação financiados pela Comissão Europeia e pelo Governo português, nos quais foram implementados métodos, ferramentas e tecnologias nas áreas de Gestão de Dados, Processamento de Dados e Ciência e Visualização de Dados em plataformas MobiTrafficB

Repositório da Universidade Nova de Lisboa

Student Support Program Outputs, Outcomes and Impacts Report

Author
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/10/2019
Field of study

The Robert B. Daugherty Water for Food Global Institute (DWFI) inititated its Postdoctoral and Student Support Programs in 2014. The following details their achievements. Round One -- The institute first provided undergraduate, graduate student and postdoctoral support to faculty who were selected following a call for proposals in 2014. Support was awarded for two postdocs, five graduate students, and two projects with undergraduate students. By FY19 a small amount of support continues for Francisco Munoz-Arriola’s program. Outputs include presentations, grants and publications. The other faculty who have received support are: Vijendra Boken, UNK Geography & Earth Science; Carrick Detweiler, UNL Computer Science & Engineering; Trenton Franz, UNL School of Natural Resources; Patricio Grassini, UNL Agronomy & Horticulture; Alan Kolok, UNMC College of Public Health and UNO Biology; Robert Oglesby, UNL Earth & Atmospheric Sciences; Julie Shaffer, UNK Biology; and Harkamal Walia, UNL Agronomy & Horticulture. Outcomes include a significant leveraging of RBDF resources to implement the Platte Basin Timelapse project and achieve changes in knowledge, action and ultimately water and food security