Search CORE

6 research outputs found

Recommended from our members

Who, What, When, Where, Why? Comparing Multiple Approaches to the Cross-Lingual 5W Task

Author: Coyne Robert Eric
Diab Mona T.
Grishman Ralph
Hakkani-Tür Dilek
Harper Mary
Ji Heng
Ma Wei Yun
McKeown Kathleen
Meyers Adam
Parton Kristen
Rosenthal Sara
Sun Ang
Tur Gokhan
Xu Wei
Yaman Sibel
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2009
Field of study

Cross-lingual tasks are especially difficult due to the compounding effect of errors in language processing and errors in machine translation (MT). In this paper, we present an error analysis of a new cross-lingual task: the 5W task, a sentence-level understanding task which seeks to return the English 5W's (Who, What, When, Where and Why) corresponding to a Chinese sentence. We analyze systems that we developed, identifying specific problems in language processing and MT that cause errors. The best cross-lingual 5W system was still 19% worse than the best monolingual 5W system, which shows that MT significantly degrades sentence-level understanding. Neither source-language nor target-language analysis was able to circumvent problems in MT, although each approach had advantages relative to the other. A detailed error analysis across multiple systems suggests directions for future research on the problem

Columbia University Academic Commons

Interpolated PLSI for Learning Plausible Verb Arguments

Author: Calvo Hiram
Inui Kentaro
Matsumoto Yuji
Publication venue: City University of Hong Kong
Publication date: 01/01/2009
Field of study

PACLIC 23 / City University of Hong Kong / 3-5 December 200

Waseda University Repository

The five Ws (and one H) of super-hydrophobic surfaces in medicine

Author
Publication venue: 'MDPI AG'
Publication date: 06/11/2020
Field of study

6Super-hydrophobic surfaces (SHSs) are bio-inspired, artificial microfabricated interfaces, in which a pattern of cylindrical micropillars is modified to incorporate details at the nanoscale. For those systems, the integration of different scales translates into superior properties, including the ability of manipulating biological solutions. The five Ws, five Ws and one H or the six Ws (6W), are questions, whose answers are considered basic in information-gathering. They constitute a formula for getting the complete story on a subject. According to the principle of the six Ws, a report can only be considered complete if it answers these questions starting with an interrogative word: who, why, what, where, when, how. Each question should have a factual answer. In what follows, SHSs and some of the most promising applications thereof are reviewed following the scheme of the 6W. We will show how these surfaces can be integrated into bio-photonic devices for the identification and detection of a single molecule. We will describe how SHSs and nanoporous silicon matrices can be combined to yield devices with the capability of harvesting small molecules, where the cut-off size can be adequately controlled. We will describe how this concept is utilized for obtaining a direct TEM image of a DNA molecule. © 2014 by the authors; licensee MDPI, Basel, Switzerland.openopenGentile F.; Coluccio M.L.; Limongi T.; Perozziello G.; Candeloro P.; Di Fabrizio E.Gentile, F.; Coluccio, M. L.; Limongi, T.; Perozziello, G.; Candeloro, P.; Di Fabrizio, E

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

A DATA DRIVEN APPROACH TO IDENTIFY JOURNALISTIC 5WS FROM TEXT DOCUMENTS

Author: Sunkara Venkata Krishna Mohan
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/06/2019
Field of study

Textual understanding is the process of automatically extracting accurate high-quality information from text. The amount of textual data available from different sources such as news, blogs and social media is growing exponentially. These data encode significant latent information which if extracted accurately can be valuable in a variety of applications such as medical report analyses, news understanding and societal studies. Natural language processing techniques are often employed to develop customized algorithms to extract such latent information from text. Journalistic 5Ws refer to the basic information in news articles that describes an event and include where, when, who, what and why. Extracting them accurately may facilitate better understanding of many social processes including social unrest, human rights violations, propaganda spread, and population migration. Furthermore, the 5Ws information can be combined with socio-economic and demographic data to analyze state and trajectory of these processes. In this thesis, a data driven pipeline has been developed to extract the 5Ws from text using syntactic and semantic cues in the text. First, a classifier is developed to identify articles specifically related to social unrest. The classifier has been trained with a dataset of over 80K news articles. We then use NLP algorithms to generate a set of candidates for the 5Ws. Then, a series of algorithms to extract the 5Ws are developed. These algorithms based on heuristics leverage specific words and parts-of-speech customized for individual Ws to compute their scores. The heuristics are based on the syntactic structure of the document as well as syntactic and semantic representations of individual words and sentences. These scores are then combined and ranked to obtain the best answers to Journalistic 5Ws. The classification accuracy of the algorithms is validated using a manually annotated dataset of news articles

Grounding event references in news

Author: Altena R.
Geerlings W.A.
Klingeren B. van
Lange W.C.M. de
Werf T.S.
Publication venue: School of Information Technologies
Publication date: 01/01/2000
Field of study

Events are frequently discussed in natural language, and their accurate identification is central to language understanding. Yet they are diverse and complex in ontology and reference; computational processing hence proves challenging. News provides a shared basis for communication by reporting events. We perform several studies into news event reference. One annotation study characterises each news report in terms of its update and topic events, but finds that topic is better consider through explicit references to background events. In this context, we propose the event linking task which—analogous to named entity linking or disambiguation—models the grounding of references to notable events. It defines the disambiguation of an event reference as a link to the archival article that first reports it. When two references are linked to the same article, they need not be references to the same event. Event linking hopes to provide an intuitive approximation to coreference, erring on the side of over-generation in contrast with the literature. The task is also distinguished in considering event references from multiple perspectives over time. We diagnostically evaluate the task by first linking references to past, newsworthy events in news and opinion pieces to an archive of the Sydney Morning Herald. The intensive annotation results in only a small corpus of 229 distinct links. However, we observe that a number of hyperlinks targeting online news correspond to event links. We thus acquire two large corpora of hyperlinks at very low cost. From these we learn weights for temporal and term overlap features in a retrieval system. These noisy data lead to significant performance gains over a bag-of-words baseline. While our initial system can accurately predict many event links, most will require deep linguistic processing for their disambiguation

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Sydney eScholarship

Radboud Repository

Dissertations of the University of Groningen