24,281 research outputs found
A Robust Linguistic Platform for Efficient and Domain specific Web Content Analysis
Web semantic access in specific domains calls for specialized search engines
with enhanced semantic querying and indexing capacities, which pertain both to
information retrieval (IR) and to information extraction (IE). A rich
linguistic analysis is required either to identify the relevant semantic units
to index and weight them according to linguistic specific statistical
distribution, or as the basis of an information extraction process. Recent
developments make Natural Language Processing (NLP) techniques reliable enough
to process large collections of documents and to enrich them with semantic
annotations. This paper focuses on the design and the development of a text
processing platform, Ogmios, which has been developed in the ALVIS project. The
Ogmios platform exploits existing NLP modules and resources, which may be tuned
to specific domains and produces linguistically annotated documents. We show
how the three constraints of genericity, domain semantic awareness and
performance can be handled all together
Towards robust and reliable multimedia analysis through semantic integration of services
Thanks to ubiquitous Web connectivity and portable multimedia devices, it has never been so easy to produce and distribute new multimedia resources such as videos, photos, and audio. This ever-increasing production leads to an information overload for consumers, which calls for efficient multimedia retrieval techniques. Multimedia resources can be efficiently retrieved using their metadata, but the multimedia analysis methods that can automatically generate this metadata are currently not reliable enough for highly diverse multimedia content. A reliable and automatic method for analyzing general multimedia content is needed. We introduce a domain-agnostic framework that annotates multimedia resources using currently available multimedia analysis methods. By using a three-step reasoning cycle, this framework can assess and improve the quality of multimedia analysis results, by consecutively (1) combining analysis results effectively, (2) predicting which results might need improvement, and (3) invoking compatible analysis methods to retrieve new results. By using semantic descriptions for the Web services that wrap the multimedia analysis methods, compatible services can be automatically selected. By using additional semantic reasoning on these semantic descriptions, the different services can be repurposed across different use cases. We evaluated this problem-agnostic framework in the context of video face detection, and showed that it is capable of providing the best analysis results regardless of the input video. The proposed methodology can serve as a basis to build a generic multimedia annotation platform, which returns reliable results for diverse multimedia analysis problems. This allows for better metadata generation, and improves the efficient retrieval of multimedia resources
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
Challenges for chemical information extraction and text retrieval
Item does not contain fulltex
Holaaa!! Writin like u talk is kewl but kinda hard 4 NLP
We present work in progress aiming to build tools for the normalization of User-Generated Content (UGC). As we will see, the task requires the revisiting of the initial steps of NLP processing, since UGC (micro-blog, blog, and, generally, Web 2.0 user texts) presents a number of non-standard communicative and linguistic characteristics, and is in fact much closer to oral and colloquial language than to edited text. We present and characterize a corpus of UGC text in Spanish from three different sources: Twitter, consumer reviews and blogs. We motivate the need for UGC text normalization by analyzing the problems found when processing this type of text through a conventional language processing pipeline, particularly in the tasks of lemmatization and morphosyntactic tagging, and finally we propose a strategy for automatically normalizing UGC using a selector of correct forms on top of a pre-existing spell-checker.Postprint (published version
- …