230 research outputs found

    Information extraction from multimedia web documents: an open-source platform and testbed

    No full text
    The LivingKnowledge project aimed to enhance the current state of the art in search, retrieval and knowledge management on the web by advancing the use of sentiment and opinion analysis within multimedia applications. To achieve this aim, a diverse set of novel and complementary analysis techniques have been integrated into a single, but extensible software platform on which such applications can be built. The platform combines state-of-the-art techniques for extracting facts, opinions and sentiment from multimedia documents, and unlike earlier platforms, it exploits both visual and textual techniques to support multimedia information retrieval. Foreseeing the usefulness of this software in the wider community, the platform has been made generally available as an open-source project. This paper describes the platform design, gives an overview of the analysis algorithms integrated into the system and describes two applications that utilise the system for multimedia information retrieval

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Generating queries from user-selected text

    Full text link
    People browsing the web or reading a document may see text passages that describe a topic of interest, and want to know more about it by searching. Manually formulating a query from that text can be difficult, however, and an effec-tive search is not guaranteed. In this paper, to address this scenario, we propose a learning-based approach which gener-ates effective queries from the content of an arbitrary user-selected text passage. Specifically, the approach extracts and selects representative chunks (noun phrases or named entities) from the content (a text passage) using a rich set of features. We carry out experiments showing that the se-lected chunks can be effectively used to generate queries both in a TREC environment, where weights and query structure can be directly incorporated, and with a “black-box ” web search engine, where query structure is more limited

    Bridging the gap within text-data analytics: a computer environment for data analysis in linguistic research

    Get PDF
    Since computer technology became widespread available at universities during the last quarter of the twentieth century, language researchers have been successfully employing software to analyse usage patterns in corpora. However, although there has been a proliferation of software for different disciplines within text-data analytics, e.g. corpus linguistics, statistics, natural language processing and text mining, this article demonstrates that any computer environment intended to support advanced linguistic research more effectively should be grounded on a user-centred approach to holistically integrate cross-disciplinary methods and techniques in a linguist-friendly manner. To this end, I examine not only the tasks that are derived from linguists' needs and goals but also the technologies that appropriately deal with the properties of linguistic data. This research results in the implementation of DAMIEN, an online workbench designed to conduct linguistic experiments on corpora
    corecore