13 research outputs found

    Semantic Parsing for Question Answering over Knowledge Graphs

    Full text link
    In this paper, we introduce a novel method with graph-to-segment mapping for question answering over knowledge graphs, which helps understanding question utterances. This method centers on semantic parsing, a key approach for interpreting these utterances. The challenges lie in comprehending implicit entities, relationships, and complex constraints like time, ordinality, and aggregation within questions, contextualized by the knowledge graph. Our framework employs a combination of rule-based and neural-based techniques to parse and construct highly accurate and comprehensive semantic segment sequences. These sequences form semantic query graphs, effectively representing question utterances. We approach question semantic parsing as a sequence generation task, utilizing an encoder-decoder neural network to transform natural language questions into semantic segments. Moreover, to enhance the parsing of implicit entities and relations, we incorporate a graph neural network that leverages the context of the knowledge graph to better understand question representations. Our experimental evaluations on two datasets demonstrate the effectiveness and superior performance of our model in semantic parsing for question answering.Comment: arXiv admin note: text overlap with arXiv:2401.0296

    Improvements to GeoQA, a Question Answering system for Geospatial Questions

    Get PDF
    Η παρούσα εργασία αποτελεί μια προσπάθεια για συγκέντρωση, μελέτη και σύγκριση συστημάτων απάντησης ερωτήσεων όπως τα QUINT, TEMPO και NEQA και του σκελετού συστημάτων απάντησης ερωτήσεων Frankenstein. Η μελέτη επικεντρώνεται στην απάντηση ερωτήσεων σε γεωχωρικά δεδομένα και πιο στο σύστημα GeoQA. Το σύστημα αυτό έχει προταθεί πρόσφατα και ειναι το πρώτο σύστημα απάντησης ερωτήσεων πάνω σε συνδεδεμένα γεωχωρικά δεδομένα βασιζόμενο σε πρότυπα. Βελτιώνουμε το παραπάνω σύστημα χρησιμοποιώντας τα δεδομένα για το σχήμα των βάσεων γνώσης του, προσθέτοντας πρότυπα για πιο σύνθετες ερωτήσεις και αναπτύσσοντας το υποσύστημα για την επεξεργασία φυσικής γλώσσας.We study the question-answering GeoQA which was proposed recently. GeoQA is the first template-based question answering system for linked geospatial data. We improve this system by exploiting the data schema information of the kb’s it’s using, adding more templates for more complex queries and by improving the natural language processing module in order to recognize the patterns. The current work is also an attempt to concentrate, study and compare some other question-answering systems like QUINT, Qanary methodology and Frankenstein framework for question answering systems

    Geospatial Question Answering on the YAGO2geo Knowledge Graph

    Get PDF
    Τα τελευταία χρόνια έχουν γίνει πολλές προσπάθειες για την ανάπτυξη συστημάτων που να μπορούν να επεξεργαστούν ερωτήσεις σε φυσική γλώσσα και να επιστρέψουν έυστοχες απαντήσεις ώστε να γίνει η πληροφορία διαθέσιμη σε όλους και όχι μόνο σε όσους μπορούν να γράψουν ερωτήματα σε βάσεις δεδομένων. Τέτοια συστήματα μπορούν να σχεδιαστούν έτσι ώστε να δουλεύουν για διάφορα είδη ερωτήσεων, από γεγονότα για ιστορικά πρόσωπα μέχρι επιστημονικά προβλήματα. Σε αυτή την πτυχιακή εργασία θα δουλέψουμε με γεωχωρικές ερωτήσεις. Χρησιμοποιούμε ένα ήδη υπάρχον σύστημα γεωχωρικών ερωτήσεων-απαντήσεων φυσικής γλώσσας που μέχρι τώρα χρησιμοποιεί τους γράφους γνώσης Dbpedia, GADM (Database of Global Administrative Areas) και OSM (Open Street Map) και το αλλάζουμε ώστε να χρησιμοποιεί το γράφο γνώσης YAGO2geo ο οποίος έχει επεκταθεί με δεδομένα από το Open Street Map, το Ordnance Survey και το GADM. Ο σκοπός της αλλαγής αυτής είναι η επίτευξη αποτελεσμάτων μεγαλύτερης ακρίβειας χρησιμοποιώντας τα γεωχωρικά δεδομένα του Open Street Map και του Ordnance και τον τεράστιο αριθμό κλάσεων που περιέχονται στο γράφο γνώσης YAGO2.In the recent years there have been many attempts to develop systems that can process natural language questions and return meaningful answers in order to make information available to everyone and not only to people who can write queries for databases. Such systems can be designed to work for different types of questions varying from facts about historical figures all the way to questions about science problems. In this thesis, we will be working with geospatial questions. We use an already existing geospatial natural language QA system (GeoQA system) that is currently using the DBpedia, GADM (Database of Global Administrative Areas) and OSM (Open Street Map) knowledge graphs and changing it to use the YAGO2geo knowledge graph which has been extended with Open Street Map, Ordnance Survey and GADM data. The purpose of this change is to achieve more accurate results using the geospatial information that is in Open Street Map and Ordnance Survey and the huge amount of classes that are included in the YAGO2 knowledge graph

    A grammar for interpreting geo-analytical questions as concept transformations

    Get PDF
    Geographic Question Answering (GeoQA) systems can automatically answer questions phrased in natural language. Potentially this may enable data analysts to make use of geographic information without requiring any GIS skills. However, going beyond the retrieval of existing geographic facts on particular places remains a challenge. Current systems usually cannot handle geo-analytical questions that require GIS analysis procedures to arrive at answers. To enable geo-analytical QA, GeoQA systems need to interpret questions in terms of a transformation that can be implemented in a GIS workflow. To this end, we propose a novel approach to question parsing that interprets questions in terms of core concepts of spatial information and their functional roles in context-free grammar. The core concepts help model spatial information in questions independently from implementation formats, and their functional roles indicate how concepts are transformed and used in a workflow. Using our parser, geo-analytical questions can be converted into expressions of concept transformations corresponding to abstract GIS workflows. We developed our approach on a corpus of 309 GIS-related questions and tested it on an independent source of 134 test questions including workflows. The evaluation results show high precision and recall on a gold standard of concept transformations

    Geospatial Question Answering Web Application

    Get PDF
    Το πεδίο της απάντησης ερωτήσεων μέσω γράφων γνώσης έχει μελετηθεί πολύ τα τελευταία χρόνια. Η κύρια διάσταση τέτοιων συστημάτων, είναι η παροχή δεπαφής χρήστη, μέσω της οποίας καθίσταται δυνατή η θέση και η απάντηση ερωτήσεων σε φυσική γλώσσα. Τέτοια συστήματα παράγουν ερωτήματα και ανακτούν δεδομένα από βάσεις γνώσης, συνήθως σε μορφή URI. Έτσι λοιπόν, είναι σημαντικό να παρουσιάσουμε την πληροφορία αυτή κατάλληλα, έτσι ώστε να δύναται οποιοσδήποτε χρήστης να την κατανοήσει. Χτίσαμε λοιπόν, μια διεπαφή χρήστη για το σύστημα GeoQA, το οποίο είναι μια μηχανή ερωταπαντήσεων πάνω σε συνδεδεμένα γεωχωρικά δεδομένα. Η διεπαφή αυτή είναι σχεδιασμένη, λαμβάνοντας υπ’ όψιν όλους τους τύπους χρηστών. Χρησιμοποιώντας τη διεπαφή, ένας απλός χρήστης μπορεί να θέσει μία ερώτηση σε φυσική γλώσσα και να λάβουν απάντηση χωρίς να γνωρίζουν τους εσωτερικούς μηχανισμούς. Από την άλλη μεριά, ένας ειδικευμένος χρήστης μπορεί να αναλύσει την μηχανή ερωταπαντήσεων και να ανακτήσει την έξοδο όλως των επιμέρους μονάδων. Επιπρόσθετα, οι χρήστες μπορούν να επιλέξουν διαφορετικά σύνολα δεδομένων, πάνω στα οποία επιθυμούν να εκτελέσουν, καθώς και διαφορετικές μονάδες του συστήματος για να επιτύχουν διαφορετικούς σκοπούς. Εν κατακλείδι, δημιουργήσαμε μια διεπαφή χρήστη που πραγματοποιεί την μηχανή GeoQA για όλων των ειδών χρήστες, από τον φιλέρευνο επιστήμονα, μέχρι τον μέσο, κοινό χρήστη.Question Answering over knowledge graphs has been studied a lot in recent years. The main aspect of such systems is to provide an interface, through which natural language questions can be posed and answered. Said systems generate queries and retrieve data from knowledge bases, usually in URI form. Thus, it is important to present this information appropriately, so that any user can make sense of the answers. We have developed an interface to the GeoQA system, which is a question answering engine over linked geospatial data. The interface of GeoQA is developed, having taken all different types of users in mind. By using this interface, a common user can pose a question in natural language and get the answer without knowing any of the underlying infrastructure. On the other hand, an expert user can analyze the QA engine and see the output of all the different modules. In addition, users can select different sets of data, over which they want to run the QA engine, as well as different components to complete different tasks. Therefore, we have developed an interface to realize the geospatial question answering engine GeoQA for all users, from the inquisitive scientist to the common layman

    RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model

    Full text link
    Pre-trained Vision-Language Foundation Models utilizing extensive image-text paired data have demonstrated unprecedented image-text association capabilities, achieving remarkable results across various downstream tasks. A critical challenge is how to make use of existing large-scale pre-trained VLMs, which are trained on common objects, to perform the domain-specific transfer for accomplishing domain-related downstream tasks. In this paper, we propose a new framework that includes the Domain Foundation Model (DFM), bridging the gap between the General Foundation Model (GFM) and domain-specific downstream tasks. Moreover, we present an image-text paired dataset in the field of remote sensing (RS), RS5M, which has 5 million RS images with English descriptions. The dataset is obtained from filtering publicly available image-text paired datasets and captioning label-only RS datasets with pre-trained VLM. These constitute the first large-scale RS image-text paired dataset. Additionally, we tried several Parameter-Efficient Fine-Tuning methods on RS5M to implement the DFM. Experimental results show that our proposed dataset are highly effective for various tasks, improving upon the baseline by 8%16%8 \% \sim 16 \% in zero-shot classification tasks, and obtaining good results in both Vision-Language Retrieval and Semantic Localization tasks. \url{https://github.com/om-ai-lab/RS5M}Comment: RS5M dataset v

    Development of Machine Learning Based Geographic Analysis Workflow Transduction Technique for Geographic Questions with Various Sentence Type

    Get PDF
    학위논문(석사) -- 서울대학교대학원 : 공과대학 건설환경공학부, 2023. 2. 유기윤.Despite the advance of the question answering(QA), which derives succinct and clear answers to questions from documents, there is a lack of a system to answer questions related to geospatial information, which increases by around 20% annually. The research field emerged to solve this problem is named geographic QA. Geo-analytical QA, a subfield of geographic QA, is a study to convert geographic question into geospatial analysis workflow and find the suitable tool and data to perform the analysis workflow. In order to perform realistic Geo-analytic QA, questions with various sentence type must be converted into geospatial analysis workflow. But it is difficult to perform realistic Geo-analytical QA through the method proposed in the previous study because it is rule based approach that fits into limited sentence type. Therefore, to perform realistic Geo-analytical QA, this study proposes a method to convert geospatial questions with various sentence type into geospatial analysis workflow. In addition, in order to perform geospatial analysis, it is important to understand the geospatial operators, so the derived geospatial analysis workflow was set to include the geospatial operators in order according to the analysis intention. In this study, sentence classification techniques were applied to convert geospatial questions into analysis workflow. To use sentence classification techniques, it is necessary to select corpus, label corpus to create datasets, embed questions in corpus to make datasets as input values for classification models, and to learn classification models. The GeoAnQu corpus, known to require various geospatial analysis workflow, was selected and analyzed as the target corpus to derive its own analysis workflow, and then a unique number was assigned to the analysis workflows. Based on the unique number, the questions appearing in the GeoAnQu corpus was labeled to secure a dataset, and then paraphrase was performed to generate various sentence types and increase the data size. After that, sentence embedding was performed using Glove (global vectors), BERT(bidirectional encoder presentations from transformers), RoBERTa(robustly optimized BERT pre-training approaches) and SBERT(sentence-BERT) and then those embeddings were used to learn random forest and linear support vector machine(SVM) respectively. Finally, it was confirmed that the model that trained with SBERT sentence embedding in linear SVM showed the highest performance, and the model was able to convert geospatial questions with various sentence type into geospatial analysis workflow. In addition, the limitations of the results were analyzed and future research directions were presented.문서로부터 질문에 대한 답을 간결하고 명확하게 도출하는 질의응답(question answering, 이하 QA) 분야의 발전에도 불구하고 연간 20% 내외 증가하는 지리공간정보(geographic information)와 관련된 질의를 답하는 시스템은 부족한 상태다. 이를 해결하기 위해 등장한 연구 분야가 지리공간 질의응답(geographic QA)이고 이 중 지리공간분석 질의응답(geographic analysis question answering, 이하 Geo-analytical QA)은 지리공간질의(geographic question)를 지리공간분석절차로 변환하고 이를 수행하기 적합한 데이터와 도구를 탐색하는 연구 분야다. 현실적인 Geo-analytical QA를 수행하기 위해서는 다양한 문장 형태를 가진 질의를 지리공간분석절차로 변환할 수 있어야 하지만 기존 연구에서 제안한 방법은 제한된 문장형태에 대해 규칙 기반 방식을 통해 문장을 분석절차로 변환하기 때문에 현실적인 Geo-analytical QA를 수행하기 어렵다는 한계를 지닌다. 따라서 본 연구에서는 현실적인 Geo-analytical QA를 수행하기 위해 다양한 문장 형태를 가지는 지리공간질의를 지리공간분석절차로 변환하는 방안을 제시하고자 한다. 또한, 지리공간분석을 실제로 수행하기 위해서는 지리공간연산함수를 파악하는 것이 중요하기 때문에 도출한 지리공간분석절차가 지리공간연산함수를 분석 의도에 맞게 순서대로 포함하도록 설정했다. 지리공간질의를 분석절차로 변환하기 위해서 본 연구에서는 문장 분류(text classification)기법을 적용했고, 문장 분류 기법을 이용하기 위해서는 문서를 목적에 맞게 모아 놓은 말뭉치(corpus) 선정, 말뭉치를 라벨링 해 데이터셋 생성, 데이터셋을 분류모델(classification model)의 입력값으로 만들기 위해 말뭉치에 등장하는 질의를 임베딩(embedding)하는 과정, 그리고 각 임베딩과 라벨로 이루어진 데이터셋을 이용해 분류모델을 학습하는 과정이 필요하다. 질의를 답하기 위해 다양한 지리 공간 분석절차를 이용해야 하는 것으로 알려진 GeoAnQu 말뭉치를 대상 말뭉치로 선정하고 분석해서 고유한 분석절차를 도출한 후 해당 분석절차에 고유 번호를 부여했다. 해당 고유번호를 기준으로 GeoAnQu 말뭉치에 등장하는 질의에 대해 라벨링을 수행해 데이터셋을 확보한 후 다양한 문장형태 생성 및 데이터셋 증강을 위해 어휘변용(paraphrase)을 실시했다. 그 후 해당 데이터셋을 Glove(global vectors), BERT(bidirectional encoder representations from transformers), RoBERTa(robustly optimized BERT pre-training approach), SBERT(sentence-BERT)를 이용해 문장 임베딩을 수행하고 각각 임베딩을 linear SVM(support vector machine), 랜덤포레스트(random forest)을 이용해 학습시켰다. 최종적으로 SBERT 문장 임베딩을 linear SVM에 학습시킨 모델이 가장 높은 성능을 보이는 것을 확인할 수 있었고, 해당 모델을 통해 다양한 문장형태를 가지는 지리공간 질의를 지리공간분석절차로 변환할 수 있었다. 또한 해당 결과의 한계점을 분석해 향후 연구 방향을 제시했다.1. 서론 1 1.1 연구 배경 및 목적 1 1.2 관련 연구 5 1.2.1 GeoKBQA 5 1.2.2 Geo-analytical QA 9 1.2.3 지리공간질의 말뭉치(Geographic question corpus) 12 1.2.4 지리공간연산함수(geospatial operation) 분류체계 16 1.2.5 시사점 및 소결론 18 1.3 연구 범위 및 방법 20 2. 연구 방법 23 2.1 데이터 세트 구축 23 2.1.1 지리공간질의 말뭉치 선정 및 지리공간분석절차도출 23 2.1.2 말뭉치 라벨링 25 2.1.3 어휘 변용 25 2.2 문장 임베딩(sentence embedding) 언어모델 26 2.2.1 Glove 27 2.2.2 BERT 29 2.2.3 RoBERTa 33 2.2.4 SBERT 34 2.3 분류모델학습 36 2.3.1 SVM 36 2.3.2 랜덤포레스트 39 2.4 평가방법 41 2.4.1 기존연구의 알고리즘과 비교 41 2.4.2 평가지표 41 3. 실험 적용 및 결과분석 43 3.1 실험환경 43 3.2 데이터 세트 구축 결과 44 3.2.1 지리공간분석절차 도출 44 3.2.2 말뭉치 라벨링 및 어휘 변용 46 3.3 모델구성 및 학습 48 3.3.1 문장 임베딩 49 3.3.2 분류모델학습 51 3.4 실험결과 분석 52 3.4.1 기존연구 알고리즘 적용 결과 52 3.4.2 모델성능 비교 53 4. 결론 63 참고 문헌 66 Abstract 71석

    Spatial and Temporal information in the Semantic Web

    Get PDF
    Ο κύριος σκοπός της πτυχιακής εργασίας είναι η ενίσχυση του Σημασιολογικού Ιστού με χρονική και χωρική πληροφορία επεκτείνοντας τον γράφο γνώσης YAGO με τέτοια πληροφορία. Η εργασία αποτελείται από τρία μέρη. Το πρώτο μέρος αναφέρεται στην μετατροπή των δεδομένων του OpenStreetMap σε RDF τριπλέτες. Το OpenStreetMap είναι ένας χάρτης με ελεύθερη άδεια ο οποίος αναπτύσσεται από μια κοινότητα εθελοντών και περιέχει πληροφορίες για όλο τον κόσμο. Τα δεδομένα του είναι ιδιαίτερα χρήσιμα και απαραίτητα για πολλές εφαρμογές, και για αυτό η παροχή τους σε μορφή RDF είναι ιδιαίτερα σημαντική. Το δεύτερο σκέλος της πτυχιακής αφορά αφορά την μετατροπή μεγάλων χωρικών δεδομένων σε RDF τριπλέτες. Σε αυτήν την υλοποίηση επεκτείνουμε ένα ETL εργαλείο με την τεχνολογία Spark η οποία μας επιτρέπει να παραλλιλοποιήσουμε την μετατροπή των δεδομένων σε RDF με αποτέλεσμα να μειωθεί σημαντικά ο χρόνος εκτέλεσης. Το τρίτο κομμάτι έχει να κάνει με την επέκταση της γνωσιακής βάσης YAGO με χρονική και χωρική πληροφορία σχετικά με την πρώην διοικητική διαίρεση της Ελλάδας.The main purpose of this thesis is the enhancement of the Semantic Web with geospatial and temporal information by extending the YAGO knowledge graph with such information. It is composed of three parts. The first part refers to the conversion of OpenStreetMap data into RDF triples. OpenStreetMap is a collaborative project of a free editable map of the whole world. It contains a lot of useful information which is prerequisite for several applications and therefore its transformation into RDF triples is of significant importance. The second part concerns the conversion of big geospatial data in RDF triples. In this implementation, an ETL utility is extended to work on top of Spark which enables the parallelization of the conversion which results in the reduction of the execution cost. The third part is about the extension of the YAGO knowledge base with temporal and geospatial information the former administrative division of Greece
    corecore