35 research outputs found
Confluence of Vision and Natural Language Processing for Cross-media Semantic Relations Extraction
In this dissertation, we focus on extracting and understanding semantically meaningful relationships between data items of various modalities; especially relations between images and natural language. We explore the ideas and techniques to integrate such cross-media semantic relations for machine understanding of large heterogeneous datasets, made available through the expansion of the World Wide Web. The datasets collected from social media websites, news media outlets and blogging platforms usually contain multiple modalities of data. Intelligent systems are needed to automatically make sense out of these datasets and present them in such a way that humans can find the relevant pieces of information or get a summary of the available material. Such systems have to process multiple modalities of data such as images, text, linguistic features, and structured data in reference to each other. For example, image and video search and retrieval engines are required to understand the relations between visual and textual data so that they can provide relevant answers in the form of images and videos to the users\u27 queries presented in the form of text. We emphasize the automatic extraction of semantic topics or concepts from the data available in any form such as images, free-flowing text or metadata. These semantic concepts/topics become the basis of semantic relations across heterogeneous data types, e.g., visual and textual data. A classic problem involving image-text relations is the automatic generation of textual descriptions of images. This problem is the main focus of our work. In many cases, large amount of text is associated with images. Deep exploration of linguistic features of such text is required to fully utilize the semantic information encoded in it. A news dataset involving images and news articles is an example of this scenario. We devise frameworks for automatic news image description generation based on the semantic relations of images, as well as semantic understanding of linguistic features of the news articles
Extraction of Semantic Relations from Wikipedia Text Corpus
This paper proposes the algorithm for automatic extraction of semantic relations using the rule-based approach. The authors suggest identifying certain verbs (predicates) between a subject and an object of expressions to obtain a sequence of semantic relations in the designed text corpus of Wikipedia articles. The synsets from WordNet are applied to extract semantic relations between concepts and their synonyms from the text corpus
Fighting with the Sparsity of Synonymy Dictionaries
Graph-based synset induction methods, such as MaxMax and Watset, induce
synsets by performing a global clustering of a synonymy graph. However, such
methods are sensitive to the structure of the input synonymy graph: sparseness
of the input dictionary can substantially reduce the quality of the extracted
synsets. In this paper, we propose two different approaches designed to
alleviate the incompleteness of the input dictionaries. The first one performs
a pre-processing of the graph by adding missing edges, while the second one
performs a post-processing by merging similar synset clusters. We evaluate
these approaches on two datasets for the Russian language and discuss their
impact on the performance of synset induction methods. Finally, we perform an
extensive error analysis of each approach and discuss prominent alternative
methods for coping with the problem of the sparsity of the synonymy
dictionaries.Comment: In Proceedings of the 6th Conference on Analysis of Images, Social
Networks, and Texts (AIST'2017): Springer Lecture Notes in Computer Science
(LNCS
Recommended from our members
A Linked Open Data Approach for Sentiment Lexicon Adaptation
Social media platforms have recently become a gold mine for organisations to monitor their reputation by extracting and analysing the sentiment of the posts generated about them, their markets, and competitors. Among the approaches to analyse sentiment from social media, approaches based on sentiment lexicons (sets of words with associated sentiment scores) have gained popularity since they do not rely on training data, as opposed to Machine Learning approaches. However, sentiment lexicons consider a static sentiment score for each word without taking into consideration the different contexts in which the word is used (e.g, great problem vs. great smile). Additionally, new words constantly emerge from dynamic and rapidly changing social media environments that may not be covered by the lexicons. In this paper we propose a lexicon adaptation approach that makes use of semantic relations extracted from DBpedia to better understand the various contextual scenarios in which words are used. We evaluate our approach on three different Twitter datasets and show that using semantic information to adapt the lexicon improves sentiment computation by 3.7% in average accuracy, and by 2.6% in average F1 measure
Query expansion using medical information extraction for improving information retrieval in French medical domain
Many usersβ queries contain references to named entities, and this is particularly true in the medical field. Doctors express their information needs using medical entities as they are elements rich with information that helps to better target the relevant documents. At the same time, many resources have been recognized as a large container of medical entities and relationships between them such as clinical reports; which are medical texts written by doctors. In this paper, we present a query expansion method that uses medical entities and their semantic relations in the query context based on an external resource in OWL. The goal of this method is to evaluate the effectiveness of an information retrieval system to support doctors in accessing easily relevant information. Experiments on a collection of real clinical reports show that our approach reveals interesting improvements in precision, recall and MAP in medical information retrieval
An Architecture for Data and Knowledge Acquisition for the Semantic Web: the AGROVOC Use Case
We are surrounded by ever growing volumes of unstructured and weakly-structured information, and for a human being, domain expert or not, it is nearly impossible to read, understand and categorize such information in a fair amount of time. Moreover, different user categories have different expectations: final users need easy-to-use tools and services for specific tasks, knowledge engineers require robust tools for knowledge acquisition, knowledge categorization and semantic resources development, while semantic applications developers demand for flexible frameworks for fast and easy, standardized development of complex applications. This work represents an experience report on the use of the CODA framework for rapid prototyping and deployment of knowledge acquisition systems for RDF. The system integrates independent NLP tools and custom libraries complying with UIMA standards. For our experiment a document set has been processed to populate the AGROVOC thesaurus with two new relationships
Sentiment Lexicon Adaptation with Context and Semantics for the Social Web
Sentiment analysis over social streams offers governments and organisations a fast and effective way to monitor the publics' feelings towards policies, brands, business, etc. General purpose sentiment lexicons have been used to compute sentiment from social streams, since they are simple and effective. They calculate the overall sentiment of texts by using a general collection of words, with predetermined sentiment orientation and strength. However, words' sentiment often vary with the contexts in which they appear, and new words might be encountered that are not covered by the lexicon, particularly in social media environments where content emerges and changes rapidly and constantly. In this paper, we propose a lexicon adaptation approach that uses contextual as well as semantic information extracted from DBPedia to update the words' weighted sentiment orientations and to add new words to the lexicon. We evaluate our approach on three different Twitter datasets, and show that enriching the lexicon with contextual and semantic information improves sentiment computation by 3.4% in average accuracy, and by 2.8% in average F1 measure
Manual Evaluation of Results of Automatic hyper-, Hyponymic Verbal Pairs Extraction
The aim of the study is manual verification of the results of automatic extraction of hyper-hyponymic pairs from dictionary definitions and assessment of the level of expert agreement on the truth of established hyponyms and hypernyms. Typical disagreements of experts were identified and their reasons were analyzed. The expert assessment of the truth of hyper-hyponymic pairs showed a high level of agreement. This leads to the further development of methods of semantic relations extraction based on dictionary definitions.Π¦Π΅Π»ΡΡ ΠΈΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΡ ΡΠ²Π»ΡΠ΅ΡΡΡ ΡΡΡΠ½Π°Ρ ΠΏΡΠΎΠ²Π΅ΡΠΊΠ° ΡΠ΅Π·ΡΠ»ΡΡΠ°ΡΠΎΠ² Π°Π²ΡΠΎΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ ΠΈΠ·Π²Π»Π΅ΡΠ΅Π½ΠΈΡ Π³ΠΈΠΏΠΎ-, Π³ΠΈΠΏΠ΅ΡΠΎΠ½ΠΈΠΌΠΈΡΠ΅ΡΠΊΠΈΡ
ΠΏΠ°Ρ ΠΈΠ· ΡΠ»ΠΎΠ²Π°ΡΠ½ΡΡ
Π΄Π΅ΡΠΈΠ½ΠΈΡΠΈΠΉ ΠΈ ΠΎΡΠ΅Π½ΠΊΠ° ΡΡΠΎΠ²Π½Ρ ΡΠΎΠ³Π»Π°ΡΠΈΡ ΡΠΊΡΠΏΠ΅ΡΡΠΎΠ² ΠΎΠ± ΠΈΡΡΠΈΠ½Π½ΠΎΡΡΠΈ ΡΡΡΠ°Π½ΠΎΠ²Π»Π΅Π½Π½ΡΡ
Π³ΠΈΠΏΠΎΠ½ΠΈΠΌΠΎΠ² ΠΈ Π³ΠΈΠΏΠ΅ΡΠΎΠ½ΠΈΠΌΠΎΠ². ΠΡΠ»ΠΈ Π²ΡΡΠ²Π»Π΅Π½Ρ ΡΠΈΠΏΠΈΡΠ½ΡΠ΅ ΡΠ°Π·Π½ΠΎΠ³Π»Π°ΡΠΈΡ ΡΠΊΡΠΏΠ΅ΡΡΠΎΠ² ΠΈ ΠΏΡΠΎΠ°Π½Π°Π»ΠΈΠ·ΠΈΡΠΎΠ²Π°Π½Ρ ΠΈΡ
ΠΏΡΠΈΡΠΈΠ½Ρ. ΠΠΊΡΠΏΠ΅ΡΡΠ½Π°Ρ ΠΎΡΠ΅Π½ΠΊΠ° ΠΈΡΡΠΈΠ½Π½ΠΎΡΡΠΈ Π³ΠΈΠΏΠΎ-, Π³ΠΈΠΏΠ΅ΡΠΎΠ½ΠΈΠΌΠΈΡΠ΅ΡΠΊΠΈΡ
ΠΏΠ°Ρ ΠΏΠΎΠΊΠ°Π·Π°Π»Π° Π²ΡΡΠΎΠΊΠΈΠΉ ΡΡΠΎΠ²Π΅Π½Ρ ΡΠΎΠ³Π»Π°ΡΠΈΡ. ΠΡΠΎ ΠΎΠ±ΡΡΠ»ΠΎΠ²Π»ΠΈΠ²Π°Π΅Ρ ΡΠ΅Π»Π΅ΡΠΎΠΎΠ±ΡΠ°Π·Π½ΠΎΡΡΡ Π΄Π°Π»ΡΠ½Π΅ΠΉΡΠ΅ΠΉ ΡΠ°Π·ΡΠ°Π±ΠΎΡΠΊΠΈ ΠΌΠ΅ΡΠΎΠ΄ΠΎΠ² ΠΈΠ·Π²Π»Π΅ΡΠ΅Π½ΠΈΡ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠ΅ΡΠΊΠΈΡ
ΠΎΡΠ½ΠΎΡΠ΅Π½ΠΈΠΉ Π½Π° Π±Π°Π·Π΅ ΡΠ»ΠΎΠ²Π°ΡΠ½ΡΡ
Π΄Π΅ΡΠΈΠ½ΠΈΡΠΈΠΉ.The reported study was funded by RFBR according to the research project No. 18β302β00129.ΠΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΠ΅ Π²ΡΠΏΠΎΠ»Π½Π΅Π½ΠΎ ΠΏΡΠΈ ΡΠΈΠ½Π°Π½ΡΠΎΠ²ΠΎΠΉ ΠΏΠΎΠ΄Π΄Π΅ΡΠΆΠΊΠ΅ Π Π€Π€Π Π² ΡΠ°ΠΌΠΊΠ°Ρ
Π½Π°ΡΡΠ½ΠΎΠ³ΠΎ ΠΏΡΠΎΠ΅ΠΊΡΠ° β 18β312β00129