13 research outputs found
A Hybrid Question Answering System based on Ontology and Topic Modeling
A Question Answering (QA) system is an application which could provide accurate answer in response to the natural language questions. However, some QA systems have their weaknesses, especially for the QA system built based on Knowledge-based approach. It requires to pre-define various triple patterns in order to solve different question types. The ultimate goal of this paper is to propose an automated QA system using a hybrid approach, a combination of the knowledge-based and text-based approaches. Our approach only requires two SPARQLs to retrieve the candidate answers from the ontology without defining any question pattern, and then uses the Topic Model to find the most related candidate answers as the answers. We also investigate and evaluate different language models (unigram and bigram). Our results have shown that this proposed QA system is able to perform beyond the random baseline and solve up to 44 out of 80 questions with Mean Reciprocal Rank (MRR) of 38.73% using bigram LDA
Coordinated Reasoning for Cross-Lingual Knowledge Graph Alignment
Existing entity alignment methods mainly vary on the choices of encoding the
knowledge graph, but they typically use the same decoding method, which
independently chooses the local optimal match for each source entity. This
decoding method may not only cause the "many-to-one" problem but also neglect
the coordinated nature of this task, that is, each alignment decision may
highly correlate to the other decisions. In this paper, we introduce two
coordinated reasoning methods, i.e., the Easy-to-Hard decoding strategy and
joint entity alignment algorithm. Specifically, the Easy-to-Hard strategy first
retrieves the model-confident alignments from the predicted results and then
incorporates them as additional knowledge to resolve the remaining
model-uncertain alignments. To achieve this, we further propose an enhanced
alignment model that is built on the current state-of-the-art baseline. In
addition, to address the many-to-one problem, we propose to jointly predict
entity alignments so that the one-to-one constraint can be naturally
incorporated into the alignment prediction. Experimental results show that our
model achieves the state-of-the-art performance and our reasoning methods can
also significantly improve existing baselines.Comment: in AAAI 202
Recommended from our members
Towards Democratizing Data Science with Natural Language Interfaces
Data science has the potential to reshape many sectors of the modern society. This potential can be realized to its maximum only when data science becomes democratized, instead of centralized in a small group of expert data scientists. However, with data becoming more massive and heterogeneous, standing in stark contrast to the spreading demand of data science is the growing gap between human users and data: Every type of data requires extensive specialized training, either to learn a specific query language or a data analytics software. Towards the democratization of data science, in this dissertation we systematically investigate a promising research direction, natural language interface, to bridge the gap between users and data, and make it easier for users who are less technically proficient to access the data analytics power needed for on-demand problem solving and decision making.One of the largest obstacles for general users to access data is the proficiency requirement on formal languages (e.g., SQL) that machines use. Automatically parsing natural language commands from users into formal languages, natural language interfaces can thus play a critical role in democratizing data science. However, a pressing question that is largely left unanswered so far is, how to bootstrap a natural language interface for a new domain? The high cost of data collection and the data-hungry nature of the mainstream neural network models are significantly limiting the wide application of natural language interfaces. The main technical contribution of this dissertation is a systematic framework for bootstrapping natural language interfaces for new domains. Specifically, the proposed framework consists of three complimentary methods: (1) Collecting data at a low cost via crowdsourcing, (2) leveraging existing NLI data from other domains via transfer learning, and (3) letting a bootstrapped model to interact with real users so that it can refine itself over time. Combining the three methods forms a closed data loop for bootstrapping and refining natural language interfaces for any domain.The developed methodologies and frameworks in this dissertation hence pave the path for building data science platforms that everyone can use to process, query, and analyze their data without extensive specialized training. With such AI-powered platforms, users can stay focused on high-level thinking and decision making, instead of overwhelmed by low-level implementation and programming details --- ``\emph{Let machines understand human thinking. Don't let humans think like machines}.'