39 research outputs found

    Object-Centric Unsupervised Image Captioning

    Full text link
    Image captioning is a longstanding problem in the field of computer vision and natural language processing. To date, researchers have produced impressive state-of-the-art performance in the age of deep learning. Most of these state-of-the-art, however, requires large volume of annotated image-caption pairs in order to train their models. When given an image dataset of interests, practitioner needs to annotate the caption for each image in the training set and this process needs to happen for each newly collected image dataset. In this paper, we explore the task of unsupervised image captioning which utilizes unpaired images and texts to train the model so that the texts can come from different sources than the images. A main school of research on this topic that has been shown to be effective is to construct pairs from the images and texts in the training set according to their overlap of objects. Unlike in the supervised setting, these constructed pairings are however not guaranteed to have fully overlapping set of objects. Our work in this paper overcomes this by harvesting objects corresponding to a given sentence from the training set, even if they don't belong to the same image. When used as input to a transformer, such mixture of objects enables larger if not full object coverage, and when supervised by the corresponding sentence, produced results that outperform current state of the art unsupervised methods by a significant margin. Building upon this finding, we further show that (1) additional information on relationship between objects and attributes of objects also helps in boosting performance; and (2) our method also extends well to non-English image captioning, which usually suffers from a scarcer level of annotations. Our findings are supported by strong empirical results. Our code is available at https://github.com/zihangm/obj-centric-unsup-caption.Comment: ECCV 202

    Text-image synergy for multimodal retrieval and annotation

    Get PDF
    Text and images are the two most common data modalities found on the Internet. Understanding the synergy between text and images, that is, seamlessly analyzing information from these modalities may be trivial for humans, but is challenging for software systems. In this dissertation we study problems where deciphering text-image synergy is crucial for finding solutions. We propose methods and ideas that establish semantic connections between text and images in multimodal contents, and empirically show their effectiveness in four interconnected problems: Image Retrieval, Image Tag Refinement, Image-Text Alignment, and Image Captioning. Our promising results and observations open up interesting scopes for future research involving text-image data understanding.Text and images are the two most common data modalities found on the Internet. Understanding the synergy between text and images, that is, seamlessly analyzing information from these modalities may be trivial for humans, but is challenging for software systems. In this dissertation we study problems where deciphering text-image synergy is crucial for finding solutions. We propose methods and ideas that establish semantic connections between text and images in multimodal contents, and empirically show their effectiveness in four interconnected problems: Image Retrieval, Image Tag Refinement, Image-Text Alignment, and Image Captioning. Our promising results and observations open up interesting scopes for future research involving text-image data understanding.Text und Bild sind die beiden häufigsten Arten von Inhalten im Internet. Während es für Menschen einfach ist, gerade aus dem Zusammenspiel von Text- und Bildinhalten Informationen zu erfassen, stellt diese kombinierte Darstellung von Inhalten Softwaresysteme vor große Herausforderungen. In dieser Dissertation werden Probleme studiert, für deren Lösung das Verständnis des Zusammenspiels von Text- und Bildinhalten wesentlich ist. Es werden Methoden und Vorschläge präsentiert und empirisch bewertet, die semantische Verbindungen zwischen Text und Bild in multimodalen Daten herstellen. Wir stellen in dieser Dissertation vier miteinander verbundene Text- und Bildprobleme vor: • Bildersuche. Ob Bilder anhand von textbasierten Suchanfragen gefunden werden, hängt stark davon ab, ob der Text in der Nähe des Bildes mit dem der Anfrage übereinstimmt. Bilder ohne textuellen Kontext, oder sogar mit thematisch passendem Kontext, aber ohne direkte Übereinstimmungen der vorhandenen Schlagworte zur Suchanfrage, können häufig nicht gefunden werden. Zur Abhilfe schlagen wir vor, drei Arten von Informationen in Kombination zu nutzen: visuelle Informationen (in Form von automatisch generierten Bildbeschreibungen), textuelle Informationen (Stichworte aus vorangegangenen Suchanfragen), und Alltagswissen. • Verbesserte Bildbeschreibungen. Bei der Objekterkennung durch Computer Vision kommt es des Öfteren zu Fehldetektionen und Inkohärenzen. Die korrekte Identifikation von Bildinhalten ist jedoch eine wichtige Voraussetzung für die Suche nach Bildern mittels textueller Suchanfragen. Um die Fehleranfälligkeit bei der Objekterkennung zu minimieren, schlagen wir vor Alltagswissen einzubeziehen. Durch zusätzliche Bild-Annotationen, welche sich durch den gesunden Menschenverstand als thematisch passend erweisen, können viele fehlerhafte und zusammenhanglose Erkennungen vermieden werden. • Bild-Text Platzierung. Auf Internetseiten mit Text- und Bildinhalten (wie Nachrichtenseiten, Blogbeiträge, Artikel in sozialen Medien) werden Bilder in der Regel an semantisch sinnvollen Positionen im Textfluss platziert. Wir nutzen dies um ein Framework vorzuschlagen, in dem relevante Bilder ausgesucht werden und mit den passenden Abschnitten eines Textes assoziiert werden. • Bildunterschriften. Bilder, die als Teil von multimodalen Inhalten zur Verbesserung der Lesbarkeit von Texten dienen, haben typischerweise Bildunterschriften, die zum Kontext des umgebenden Texts passen. Wir schlagen vor, den Kontext beim automatischen Generieren von Bildunterschriften ebenfalls einzubeziehen. Üblicherweise werden hierfür die Bilder allein analysiert. Wir stellen die kontextbezogene Bildunterschriftengenerierung vor. Unsere vielversprechenden Beobachtungen und Ergebnisse eröffnen interessante Möglichkeiten für weitergehende Forschung zur computergestützten Erfassung des Zusammenspiels von Text- und Bildinhalten

    Geographic information extraction from texts

    Get PDF
    A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction

    A Survey on Semantic Processing Techniques

    Full text link
    Semantic processing is a fundamental research domain in computational linguistics. In the era of powerful pre-trained language models and large language models, the advancement of research in this domain appears to be decelerating. However, the study of semantics is multi-dimensional in linguistics. The research depth and breadth of computational semantic processing can be largely improved with new technologies. In this survey, we analyzed five semantic processing tasks, e.g., word sense disambiguation, anaphora resolution, named entity recognition, concept extraction, and subjectivity detection. We study relevant theoretical research in these fields, advanced methods, and downstream applications. We connect the surveyed tasks with downstream applications because this may inspire future scholars to fuse these low-level semantic processing tasks with high-level natural language processing tasks. The review of theoretical research may also inspire new tasks and technologies in the semantic processing domain. Finally, we compare the different semantic processing techniques and summarize their technical trends, application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN 1566-2535. The equal contribution mark is missed in the published version due to the publication policies. Please contact Prof. Erik Cambria for detail

    Proceedings of the 1st Doctoral Consortium at the European Conference on Artificial Intelligence (DC-ECAI 2020)

    Get PDF
    1st Doctoral Consortium at the European Conference on Artificial Intelligence (DC-ECAI 2020), 29-30 August, 2020 Santiago de Compostela, SpainThe DC-ECAI 2020 provides a unique opportunity for PhD students, who are close to finishing their doctorate research, to interact with experienced researchers in the field. Senior members of the community are assigned as mentors for each group of students based on the student’s research or similarity of research interests. The DC-ECAI 2020, which is held virtually this year, allows students from all over the world to present their research and discuss their ongoing research and career plans with their mentor, to do networking with other participants, and to receive training and mentoring about career planning and career option

    Model driven design and data integration in semantic web information systems

    Get PDF
    The Web is quickly evolving in many ways. It has evolved from a Web of documents into a Web of applications in which a growing number of designers offer new and interactive Web applications with people all over the world. However, application design and implementation remain complex, error-prone and laborious. In parallel there is also an evolution from a Web of documents into a Web of `knowledge' as a growing number of data owners are sharing their data sources with a growing audience. This brings the potential new applications for these data sources, including scenarios in which these datasets are reused and integrated with other existing and new data sources. However, the heterogeneity of these data sources in syntax, semantics and structure represents a great challenge for application designers. The Semantic Web is a collection of standards and technologies that offer solutions for at least the syntactic and some structural issues. If offers semantic freedom and flexibility, but this leaves the issue of semantic interoperability. In this thesis we present Hera-S, an evolution of the Model Driven Web Engineering (MDWE) method Hera. MDWEs allow designers to create data centric applications using models instead of programming. Hera-S especially targets Semantic Web sources and provides a flexible method for designing personalized adaptive Web applications. Hera-S defines several models that together define the target Web application. Moreover we implemented a framework called Hydragen, which is able to execute the Hera-S models to run the desired Web application. Hera-S' core is the Application Model (AM) in which the main logic of the application is defined, i.e. defining the groups of data elements that form logical units or subunits, the personalization conditions, and the relationships between the units. Hera-S also uses a so-called Domain Model (DM) that describes the content and its structure. However, this DM is not Hera-S specific, but instead allows any Semantic Web source representation as its DM, as long as its content can be queried by the standardized Semantic Web query language SPARQL. The same holds for the User Model (UM). The UM can be used for personalization conditions, but also as a source of user-related content if necessary. In fact, the difference between DM and UM is conceptual as their implementation within Hydragen is the same. Hera-S also defines a presentation model (PM) which defines presentation details of elements like order and style. In order to help designers with building their Web applications we have introduced a toolset, Hera Studio, which allows to build the different models graphically. Hera Studio also provides some additional functionality like model checking and deployment of the models in Hydragen. Both Hera-S and its implementation Hydragen are designed to be flexible regarding the user of models. In order to achieve this Hydragen is a stateless engine that queries for relevant information from the models at every page request. This allows the models and data to be changed in the datastore during runtime. We show that one way to exploit this flexibility is by applying aspect-orientation to the AM. Aspect-orientation allows us to dynamically inject functionality that pervades the entire application. Another way to exploit Hera-S' flexibility is in reusing specialized components, e.g. for presentation generation. We present a configuration of Hydragen in which we replace our native presentation generation functionality by the AMACONT engine. AMACONT provides more extensive multi-level presentation generation and adaptation capabilities as well aspect-orientation and a form of semantic based adaptation. Hera-S was designed to allow the (re-)use of any (Semantic) Web datasource. It even opens up the possibility for data integration at the back end, by using an extendible storage layer in our database of choice Sesame. However, even though theoretically possible it still leaves much of the actual data integration issue. As this is a recurring issue in many domains, a broader challenge than for Hera-S design only, we decided to look at this issue in isolation. We present a framework called Relco which provides a language to express data transformation operations as well as a collection of techniques that can be used to (semi-)automatically find relationships between concepts in different ontologies. This is done with a combination of syntactic, semantic and collaboration techniques, which together provide strong clues for which concepts are most likely related. In order to prove the applicability of Relco we explore five application scenarios in different domains for which data integration is a central aspect. This includes a cultural heritage portal, Explorer, for which data from several datasources was integrated and was made available by a mapview, a timeline and a graph view. Explorer also allows users to provide metadata for objects via a tagging mechanism. Another application is SenSee: an electronic TV-guide and recommender. TV-guide data was integrated and enriched with semantically structured data from several sources. Recommendations are computed by exploiting the underlying semantic structure. ViTa was a project in which several techniques for tagging and searching educational videos were evaluated. This includes scenarios in which user tags are related with an ontology, or other tags, using the Relco framework. The MobiLife project targeted the facilitation of a new generation of mobile applications that would use context-based personalization. This can be done using a context-based user profiling platform that can also be used for user model data exchange between mobile applications using technologies like Relco. The final application scenario that is shown is from the GRAPPLE project which targeted the integration of adaptive technology into current learning management systems. A large part of this integration is achieved by using a user modeling component framework in which any application can store user model information, but which can also be used for the exchange of user model data

    I-centric User Interaction

    Get PDF
    Die Vision I-centric Communications bedeutet, einen uneingeschränkten Blick auf das menschliche Kommunikationsverhalten zu werfen, um Kommunikationssysteme entsprechend daran angepasst zu entwickeln. Diese Vision definiert einen benutzerorientierten Ansatz zur Erstellung von Diensten und Anwendungen. Dies setzt zunächst eine Analyse der Benutzeranforderungen voraus, um geeignete Systeme und Dienste zu entwerfen. Anstatt Technologie-fokussierte Lösungen ohne jegliche Anpassung an die jeweiligen Personen anzubieten, sollte ein I-centric System seine Dienste ohne sichtbare technische Details und unter Berücksichtigung von Benutzerpräferenzen sowie der Benutzerumgebung darbieten. Der Vision von I-centric Communications folgend, stellt die vorliegende Arbeit einen Ansatz zur Realisierung der Idee von I-centric User Interaction vor. Dieser Ansatz erweitert und vervollständigt die Vision durch verbesserte Benutzerinteraktionsfähigkeiten. Diese Dissertation zeigt, dass es möglich ist, Kommunikationssysteme zu realisieren, die die Interaktion zwischen Benutzern und Diensten ohne Einschränkung auf bestimmte Technologien für Benutzerschnittstellen sowie in personalisierter und umgebungsberücksichtigende Art und Weise unterstützen. Derartig verbesserte Benutzerinteraktion wird die Akzeptanz und die Benutzung von Diensten erhöhen. Einerseits soll die Benutzerinteraktion verschiedene Arten von Technologien für Benutzerschnittstellen unterstützen, durch die die Geräte-Unabhängigkeit und der ständige Zugang zu den Diensten ermöglicht werden. Entsprechend dem aktuellem Kontext und der Absicht können die Benutzer die bevorzugte und geeignete Art der Interaktion wählen. Andererseits soll die Interaktion selbst den Benutzerpräferenzen sowie der jeweiligen Umgebung angepasst werden. Dementsprechend diskutiert die vorliegende Arbeit diese unterschiedlichen Problembereiche, identifiziert die notwendigen Funktionen und bietet entsprechende Lösungsansätze jeweils. Die Arbeit präsentiert und analysiert zunächst die Vision I-centric Communications mit Hinblick auf den Aspekt der Benutzerinteraktion. Basierend auf den identifizierten Anforderungen wurde ein Ansatz zur Realisierung von I-centric User Interaction entwickelt. Dieser Ansatz, der in dieser Arbeit vorgestellt wird, spezifiziert ein Service Adaptation Framework und einzelne Modelle für Generische Benutzerinteraktion , für Personalisierung sowie für Ambient Awareness , die sich jeweils auf die identifizierten Problembereiche konzentrieren. Abschließend präsentiert die vorliegende Arbeit Ergebnisse einer prototypischen Realisierung des dargelegten Ansatzes. Die Ergebnisse demonstrieren die Einsetzbarkeit der entwickelten Konzepte und die Erfüllung der Vision von I-centric User Interaction . Die Forschungsarbeit im Bereich I-centric Communications wurde in Kooperation zwischen dem Lehrstuhl für Offene Kommunikationssysteme (OKS) der Technischen Universität Berlin (TUB) und dem Fraunhofer Institut FOKUS durchgeführt. Die Vision sowie das Referenzmodell für I-centric Communications , die in der vorliegenden Arbeit vorgestellt werden, sind Ergebnisse dieser Kooperation. Die Forschungsschwerpunkte der Kooperation zwischen TUB und FOKUS waren das Generelle Modell für I-centric Dienste , die Dienstplattform für I-centric Dienste sowie ein Ansatz zur Interaktion zwischen Nutzern und I-centric Dienste . Die vorliegende Arbeit konzentriert sich auf den Ansatz I-centric User Interaction , der die Interaktion zwischen Nutzern und den Diensten betrachtet. Die Aspekte der I-centric Dienste werden in der vorliegenden Arbeit nicht betrachtet. Diese Aspekte wurden in einer zweiten Dissertation von Stefan Arbanowski, Fraunhofer FOKUS, analysiert und ausgearbeitet. Die Ergebnisse dieser Arbeit wurden in verschiedenen nationalen und internationalen Forschungsprojekten (BMBF LiveFutura, BMBF PI-AVIda, BMBF VHE-UD, IST WSI, IST WWRI), Standardisierungsgremien (OMG, WWRF), Konferenzpapieren sowie Zeitschriften eingebracht, um die Vision von I-centric Communications einem größeren Auditorium vorzustellen.The vision of I-centric Communications means to take an unlimited look at human communication behavior and to adapt the activities of communication systems to it. This vision defines a user-centered approach for the realization of services and applications. It requires to start analyzing user demands to design suitable systems and services. Instead of just providing technology-focused solutions without any adaptation to individuals, an I-centric system should provide services hiding technical details and considering the individual s preferences as well as the individual s environment. Following the vision of I-centric Communications, this thesis introduces an approach to realize I-centric User Interaction. This approach enhances and completes the vision by providing advanced user interaction capabilities. It answers the question whether it is possible to realize a communication system, which allows the interaction between user and services without any restriction to specific user interface technologies and in a personalized as well as ambient aware manner. Such enhanced user interaction will lead to a higher acceptance and increased usage of services. On the one hand, the user interaction shall support different kinds of user interface technologies enabling Device Independence and ubiquitous access to the services. According to their current context and intended action, users can select the preferred and suitable way of interaction. On the other hand, the user interaction shall be adapted to the user s preferences and to the user s environment. Accordingly, this work discusses these different areas of concern, identifies necessary functions, and provides suitable solutions for each. First, the thesis introduces and analyses the vision of I-centric Communications with special regard to the aspect of user interaction. Based on the identified requirements and areas of concern, an approach to realize I-centric User Interaction was developed. The approach, presented in this thesis, specifies a Service Adaptation Framework and individual models for Personalization, for Ambient Awareness, and for Generic User Interaction focusing on the respective areas of concern. Finally, the thesis illustrates the results from the prototypical implementation of the presented approach, which has been pursued in several projects in parallel. These results demonstrate the applicability of the developed concepts and the fulfillment of the vision of I-centric User Interaction. The work in the area of I-centric Communications was carried out in cooperation of the Department for Open Communication Systems (OKS) at the Technical University Berlin (TUB) and the Fraunhofer Institute FOKUS. The vision and the reference model for I-centric Communications, introduced in this thesis, are results of this cooperation. The main research directions for the cooperation between TUB and FOKUS have been a general model for I-centric services, the service platform for I-centric services, and an approach for the interaction of users with I-centric services. This thesis focuses on an approach for I-centric User Interaction. The general aspects of I-centric services as defined by the vision are out of scope of this thesis. Nevertheless, these aspects have been analyzed by Stefan Arbanowski, researcher at Fraunhofer FOKUS, in a second PhD thesis in parallel. The results of this work have been contributed to different national and international projects (BMBF LiveFutura, BMBF PI-AVIda, BMBF VHE-UD, IST WSI, IST WWRI), standardization bodies (OMG, WWRF), conferences papers, and journals by introducing the vision of I-centric Communications to a larger auditorium, and by exploiting parts of the developed I-centric systems

    B!SON: A Tool for Open Access Journal Recommendation

    Get PDF
    Finding a suitable open access journal to publish scientific work is a complex task: Researchers have to navigate a constantly growing number of journals, institutional agreements with publishers, funders’ conditions and the risk of Predatory Publishers. To help with these challenges, we introduce a web-based journal recommendation system called B!SON. It is developed based on a systematic requirements analysis, built on open data, gives publisher-independent recommendations and works across domains. It suggests open access journals based on title, abstract and references provided by the user. The recommendation quality has been evaluated using a large test set of 10,000 articles. Development by two German scientific libraries ensures the longevity of the project

    Low-Resource Unsupervised NMT:Diagnosing the Problem and Providing a Linguistically Motivated Solution

    Get PDF
    Unsupervised Machine Translation hasbeen advancing our ability to translatewithout parallel data, but state-of-the-artmethods assume an abundance of mono-lingual data. This paper investigates thescenario where monolingual data is lim-ited as well, finding that current unsuper-vised methods suffer in performance un-der this stricter setting. We find that theperformance loss originates from the poorquality of the pretrained monolingual em-beddings, and we propose using linguis-tic information in the embedding train-ing scheme. To support this, we look attwo linguistic features that may help im-prove alignment quality: dependency in-formation and sub-word information. Us-ing dependency-based embeddings resultsin a complementary word representationwhich offers a boost in performance ofaround 1.5 BLEU points compared to stan-dardWORD2VECwhen monolingual datais limited to 1 million sentences per lan-guage. We also find that the inclusion ofsub-word information is crucial to improv-ing the quality of the embedding

    Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges
    corecore