1 research outputs found

    Building a Lightweight Semantic Model for Unsupervised Information Extraction on Short Listings

    No full text
    Short listings such as classified ads or product listings abound on the web. If a computer can reliably extract information from them, it will greatly benefit a variety of applications. Short listings are, however, challenging to process due to their informal styles. In this paper, we present an unsupervised information extraction system for short listings. Given a corpus of listings, the system builds a semantic model that represents typical objects and their attributes in the domain of the corpus, and then uses the model to extract information. Two key features in the system are a semantic parser that extracts objects and their attributes and a listing-focused clustering module that helps group together extracted tokens of same type. Our evaluation shows that the semantic model learned by these two modules is effective across multiple domains.
    corecore