3 research outputs found

    A corpus of images and text in online news

    Get PDF
    In recent years, several datasets have been released that include images and text, giving impulse to new methods that combine natural language processing and computer vision. However, there is a need for datasets of images in their natural textual context. The ION corpus contains 300K news articles published between August 2014 - 2015 in five online newspapers from two countries. The 1-year coverage over multiple publishers ensures a broad scope in terms of topics, image quality and editorial viewpoints. The corpus consists of JSON-LD files with the following data about each article: the original URL of the article on the news publisher’s website, the date of publication, the headline of the article, the URL of the image displayed with the article (if any), and the caption of that image. Neither the article text nor the images themselves are included in the corpus. Instead, the images are distributed as high-dimensional feature vectors extracted from a Convolutional Neural Network, anticipating their use in computer vision tasks. The article text is represented as a list of automatically generated entity and topic annotations in the form of Wikipedia/DBpedia pages. This facilitates the selection of subsets of the corpus for separate analysis or evaluation

    Modeling Context with an Architecture Viewpoint

    No full text
    The context of a software system comprises the knowledge that architects need to have about the environment in which a system is expected to operate. Contextual knowledge, however, is often unknown or overlooked. This results in software architects designing systems based on assumptions that are largely unfounded and can potentially lead to system failures. To address this problem, this paper presents a Context Description Viewpoint that captures context in software architecture. The viewpoint is based on the results of a literature review that analyzed the state-of-the-art in context, its elements, and modeling techniques. We evaluated and revised the viewpoint by using two case studies based on real-world projects. The case studies showed that the viewpoint is expressive enough to capture context. For software architects it represents a reusable work product to design software systems and to help them identify, capture, and analyze contextual knowledge

    The ION corpus

    No full text
    Dataset published with: Hollink, L, Bedjeti, A, van Harmelen, M, & Elliott, D. (2016). A corpus of images and text in online news. In Proceedings of International Conference on Language Resources and Evaluation 2016 (LREC 10) (pp. 1377–1382)
    corecore