6 research outputs found

    Interpolated PLSI for Learning Plausible Verb Arguments

    Get PDF
    PACLIC 23 / City University of Hong Kong / 3-5 December 200

    The five Ws (and one H) of super-hydrophobic surfaces in medicine

    Get PDF
    6Super-hydrophobic surfaces (SHSs) are bio-inspired, artificial microfabricated interfaces, in which a pattern of cylindrical micropillars is modified to incorporate details at the nanoscale. For those systems, the integration of different scales translates into superior properties, including the ability of manipulating biological solutions. The five Ws, five Ws and one H or the six Ws (6W), are questions, whose answers are considered basic in information-gathering. They constitute a formula for getting the complete story on a subject. According to the principle of the six Ws, a report can only be considered complete if it answers these questions starting with an interrogative word: who, why, what, where, when, how. Each question should have a factual answer. In what follows, SHSs and some of the most promising applications thereof are reviewed following the scheme of the 6W. We will show how these surfaces can be integrated into bio-photonic devices for the identification and detection of a single molecule. We will describe how SHSs and nanoporous silicon matrices can be combined to yield devices with the capability of harvesting small molecules, where the cut-off size can be adequately controlled. We will describe how this concept is utilized for obtaining a direct TEM image of a DNA molecule. © 2014 by the authors; licensee MDPI, Basel, Switzerland.openopenGentile F.; Coluccio M.L.; Limongi T.; Perozziello G.; Candeloro P.; Di Fabrizio E.Gentile, F.; Coluccio, M. L.; Limongi, T.; Perozziello, G.; Candeloro, P.; Di Fabrizio, E

    A DATA DRIVEN APPROACH TO IDENTIFY JOURNALISTIC 5WS FROM TEXT DOCUMENTS

    Get PDF
    Textual understanding is the process of automatically extracting accurate high-quality information from text. The amount of textual data available from different sources such as news, blogs and social media is growing exponentially. These data encode significant latent information which if extracted accurately can be valuable in a variety of applications such as medical report analyses, news understanding and societal studies. Natural language processing techniques are often employed to develop customized algorithms to extract such latent information from text. Journalistic 5Ws refer to the basic information in news articles that describes an event and include where, when, who, what and why. Extracting them accurately may facilitate better understanding of many social processes including social unrest, human rights violations, propaganda spread, and population migration. Furthermore, the 5Ws information can be combined with socio-economic and demographic data to analyze state and trajectory of these processes. In this thesis, a data driven pipeline has been developed to extract the 5Ws from text using syntactic and semantic cues in the text. First, a classifier is developed to identify articles specifically related to social unrest. The classifier has been trained with a dataset of over 80K news articles. We then use NLP algorithms to generate a set of candidates for the 5Ws. Then, a series of algorithms to extract the 5Ws are developed. These algorithms based on heuristics leverage specific words and parts-of-speech customized for individual Ws to compute their scores. The heuristics are based on the syntactic structure of the document as well as syntactic and semantic representations of individual words and sentences. These scores are then combined and ranked to obtain the best answers to Journalistic 5Ws. The classification accuracy of the algorithms is validated using a manually annotated dataset of news articles

    Grounding event references in news

    Get PDF
    Events are frequently discussed in natural language, and their accurate identification is central to language understanding. Yet they are diverse and complex in ontology and reference; computational processing hence proves challenging. News provides a shared basis for communication by reporting events. We perform several studies into news event reference. One annotation study characterises each news report in terms of its update and topic events, but finds that topic is better consider through explicit references to background events. In this context, we propose the event linking task which—analogous to named entity linking or disambiguation—models the grounding of references to notable events. It defines the disambiguation of an event reference as a link to the archival article that first reports it. When two references are linked to the same article, they need not be references to the same event. Event linking hopes to provide an intuitive approximation to coreference, erring on the side of over-generation in contrast with the literature. The task is also distinguished in considering event references from multiple perspectives over time. We diagnostically evaluate the task by first linking references to past, newsworthy events in news and opinion pieces to an archive of the Sydney Morning Herald. The intensive annotation results in only a small corpus of 229 distinct links. However, we observe that a number of hyperlinks targeting online news correspond to event links. We thus acquire two large corpora of hyperlinks at very low cost. From these we learn weights for temporal and term overlap features in a retrieval system. These noisy data lead to significant performance gains over a bag-of-words baseline. While our initial system can accurately predict many event links, most will require deep linguistic processing for their disambiguation
    corecore