935 research outputs found
Using Crowdsourcing for Fine-Grained Entity Type Completion in Knowledge Bases
Recent years have witnessed the proliferation of large-scale Knowledge Bases (KBs). However, many entities in KBs have incomplete type information, and some are totally untyped. Even worse, fine-grained types (e.g., BasketballPlayer) containing rich semantic meanings are more likely to be incomplete, as they are more difficult to be obtained. Existing machine-based algorithms use predicates (e.g., birthPlace) of entities to infer their missing types, and they have limitations that the predicates may be insufficient to infer fine-grained types. In this paper, we utilize crowdsourcing to solve the problem, and address the challenge of controlling crowdsourcing cost. To this end, we propose a hybrid machine-crowdsourcing approach for fine-grained entity type completion. It firstly determines the types of some “representative” entities via crowdsourcing and then infers the types for remaining entities based on the crowdsourcing results. To support this approach, we first propose an embedding-based influence for type inference which considers not only the distance between entity embeddings but also the distances between entity and type embeddings. Second, we propose a new difficulty model for entity selection which can better capture the uncertainty of the machine algorithm when identifying the entity types. We demonstrate the effectiveness of our approach through experiments on real crowdsourcing platforms. The results show that our method outperforms the state-of-the-art algorithms by improving the effectiveness of fine-grained type completion at affordable crowdsourcing cost.Peer reviewe
Recommended from our members
Towards Democratizing Data Science with Natural Language Interfaces
Data science has the potential to reshape many sectors of the modern society. This potential can be realized to its maximum only when data science becomes democratized, instead of centralized in a small group of expert data scientists. However, with data becoming more massive and heterogeneous, standing in stark contrast to the spreading demand of data science is the growing gap between human users and data: Every type of data requires extensive specialized training, either to learn a specific query language or a data analytics software. Towards the democratization of data science, in this dissertation we systematically investigate a promising research direction, natural language interface, to bridge the gap between users and data, and make it easier for users who are less technically proficient to access the data analytics power needed for on-demand problem solving and decision making.One of the largest obstacles for general users to access data is the proficiency requirement on formal languages (e.g., SQL) that machines use. Automatically parsing natural language commands from users into formal languages, natural language interfaces can thus play a critical role in democratizing data science. However, a pressing question that is largely left unanswered so far is, how to bootstrap a natural language interface for a new domain? The high cost of data collection and the data-hungry nature of the mainstream neural network models are significantly limiting the wide application of natural language interfaces. The main technical contribution of this dissertation is a systematic framework for bootstrapping natural language interfaces for new domains. Specifically, the proposed framework consists of three complimentary methods: (1) Collecting data at a low cost via crowdsourcing, (2) leveraging existing NLI data from other domains via transfer learning, and (3) letting a bootstrapped model to interact with real users so that it can refine itself over time. Combining the three methods forms a closed data loop for bootstrapping and refining natural language interfaces for any domain.The developed methodologies and frameworks in this dissertation hence pave the path for building data science platforms that everyone can use to process, query, and analyze their data without extensive specialized training. With such AI-powered platforms, users can stay focused on high-level thinking and decision making, instead of overwhelmed by low-level implementation and programming details --- ``\emph{Let machines understand human thinking. Don't let humans think like machines}.'
Zero-Shot Learning with Common Sense Knowledge Graphs
Zero-shot learning relies on semantic class representations such as
hand-engineered attributes or learned embeddings to predict classes without any
labeled examples. We propose to learn class representations from common sense
knowledge graphs. Common sense knowledge graphs are an untapped source of
explicit high-level knowledge that requires little human effort to apply to a
range of tasks. To capture the knowledge in the graph, we introduce ZSL-KG, a
general-purpose framework with a novel transformer graph convolutional network
(TrGCN) for generating class representations. Our proposed TrGCN architecture
computes non-linear combinations of the node neighbourhood and shows
improvements on zero-shot learning tasks in language and vision. Our results
show ZSL-KG outperforms the best performing graph-based zero-shot learning
framework by an average of 2.1 accuracy points with improvements as high as 3.4
accuracy points. Our ablation study on ZSL-KG with alternate graph neural
networks shows that our TrGCN adds up to 1.2 accuracy points improvement on
these tasks
- …