2,713 research outputs found

    Kirjoitetut tunnisteet peruskoulun luonnontieteiden diagrammeissa: kielelliset rakenteet ja diskurssisuhteet

    Get PDF
    Communication, by nature, is multimodal: it uses various forms (modes) of communication, such as spoken language, written language, illustrations, and many others to create meaning. Multimodality research is the study of communicative situations that rely on such various modes and their combinations. One form of multimodality very commonly seen in everyday life comes in diagrams, which can convey very complex concepts by combining visual expressive resources (such as illustrations or photographs), written language, and diagrammatic elements such as lines and arrows. The primary aim of my thesis is to establish whether the linguistic structures of written labels – that is, textual elements – in diagrams can inform the decomposition of visual expressive resources. Put simply, I seek to find if said visual elements can more accurately be divided into further, more granular units in accordance with linguistic patterns in their accompanying textual elements. To answer my main research question, I posit three sub-questions. First, if certain diagram types (macro-structures), such as tables, cycles, or cross-sections co-occur with specific linguistic patterns; second, if different rhetorical functions found in diagrams employ different structures in their written labels as well; and third, if these functions are signaled by other means in tandem with written language. Answering these questions can help in designing future multimodal corpora and their annotation schemata, increasing annotation accuracy and possibilities for their processing. The theoretical framework used in this thesis synthesizes theories from multimodality theory, discourse studies, and diagrams research. I approach diagrams from the perspective of multimodality, highlighting them as discursive artefacts. This is enabled by the diagrammatic mode, which establishes how discourse semantics can function in the context of diagrams and how their interpretation is dynamic; that is, each element or combination of multiple elements can in turn contextualize or be a part of other elements and their combinations on a different scale. I also discuss the discourse-semantic concepts of coherence and cohesion as they relate to multimodal artefacts: different elements, even if not linguistic, can combine to create semantically meaningful connections between constituents in such an artefact. To exemplify this, I also apply Rhetorical Structure Theory (RST), which seeks to formalize how units of discourse are interconnected and work towards a shared communicative goal. RST employs rhetorical relations such as ELABORATION and IDENTIFICATION to describe how units and their combinations relate to other parts of a text (or other communicative whole). The data I use consists of two interrelated and complementary multimodal corpora: AI2D and AI2D-RST. AI2D is a collection of primary-school textbook science diagrams, annotated for blobs (visual expressive resources), labels, and diagrammatic elements, created for question-answering purposes. It also contains the linguistic data in each of the corpus’s diagrams. AI2D-RST contains a subset of the diagrams in AI2D, expanding them with additional annotation layers for information on macro-structures, visual connectivity, and RST, describing each element’s rhetorical relation in the diagram. I computationally find each rhetorical relation containing a label in AI2D-RST, noting its type, the type of the diagram it appears in, and fetching the labels’ linguistic content from AI2D. I then process each label’s contents with spaCy, a library for natural language processing, for linguistic elements such as phrase types, part-of-speech patterns, and average word counts. The output contains data on each label’s rhetorical relation, the possible macro-structure it is contained in, and said linguistic structures. The results show that there are indeed some differences in how distinct rhetorical relations and macro-groups use language: for example, cycles contain the most verb phrases and highest word count, indicating the use of written language to explicate certain processes to viewers. As linguistic patterns differ across these classes and are contextualized by surrounding diagrammatic elements, approaching diagrams from a discursive standpoint may be beneficial for future empirical multimodality research as well as designing annotation schemata to be more intuitive for annotators. With larger datasets and further research, precise sets of rules containing linguistic structures and layout information may be developed to increase accuracy in probability-based computational analysis of diagrams

    Information retrieval and text mining technologies for chemistry

    Get PDF
    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

    CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap

    Get PDF
    After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in multimedia search engines, we have identified and analyzed gaps within European research effort during our second year. In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio- economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal challenges
    corecore