41,842 research outputs found

    Weaving Entities into Relations: From Page Retrieval to Relation Mining on the Web

    Get PDF
    With its sheer amount of information, the Web is clearly an important frontier for data mining. While Web mining must start with content on the Web, there is no effective ``search-based'' mechanism to help sifting through the information on the Web. Our goal is to provide a such online search-based facility for supporting query primitives, upon which Web mining applications can be built. As a first step, this paper aims at entity-relation discovery, or E-R discovery, as a useful function-- to weave scattered entities on the Web into coherent relations. To begin with, as our proposal, we formalize the concept of E-R discovery. Further, to realize E-R discovery, as our main thesis, we abstract tuple ranking-- the essential challenge of E-R discovery-- as pattern-based cooccurrence analysis. Finally, as our key insight, we observe that such relation mining shares the same core functions as traditional page-retrieval systems, which enables us to build the new E-R discovery upon today's search engines, almost for free. We report our system prototype and testbed, WISDM-ER, with real Web corpus. Our case studies have demonstrated a high promise, achieving 83%-91% accuracy for real benchmark queries-- and thus the real possibilities of enabling ad-hoc Web mining tasks with online E-R discovery

    A Brief History of Web Crawlers

    Full text link
    Web crawlers visit internet applications, collect data, and learn about new web pages from visited pages. Web crawlers have a long and interesting history. Early web crawlers collected statistics about the web. In addition to collecting statistics about the web and indexing the applications for search engines, modern crawlers can be used to perform accessibility and vulnerability checks on the application. Quick expansion of the web, and the complexity added to web applications have made the process of crawling a very challenging one. Throughout the history of web crawling many researchers and industrial groups addressed different issues and challenges that web crawlers face. Different solutions have been proposed to reduce the time and cost of crawling. Performing an exhaustive crawl is a challenging question. Additionally capturing the model of a modern web application and extracting data from it automatically is another open question. What follows is a brief history of different technique and algorithms used from the early days of crawling up to the recent days. We introduce criteria to evaluate the relative performance of web crawlers. Based on these criteria we plot the evolution of web crawlers and compare their performanc

    Look before you Hop: Conversational Question Answering over Knowledge Graphs Using Judicious Context Expansion

    No full text
    Fact-centric information needs are rarely one-shot; users typically ask follow-up questions to explore a topic. In such a conversational setting, the user's inputs are often incomplete, with entities or predicates left out, and ungrammatical phrases. This poses a huge challenge to question answering (QA) systems that typically rely on cues in full-fledged interrogative sentences. As a solution, we develop CONVEX: an unsupervised method that can answer incomplete questions over a knowledge graph (KG) by maintaining conversation context using entities and predicates seen so far and automatically inferring missing or ambiguous pieces for follow-up questions. The core of our method is a graph exploration algorithm that judiciously expands a frontier to find candidate answers for the current question. To evaluate CONVEX, we release ConvQuestions, a crowdsourced benchmark with 11,200 distinct conversations from five different domains. We show that CONVEX: (i) adds conversational support to any stand-alone QA system, and (ii) outperforms state-of-the-art baselines and question completion strategies

    Social media as a data gathering tool for international business qualitative research: opportunities and challenges

    Full text link
    Lusophone African (LA) multinational enterprises (MNEs) are becoming a significant pan-African and global economic force regarding their international presence and influence. However, given the extreme poverty and lack of development in their home markets, many LA enterprises seeking to internationalize lack resources and legitimacy in international markets. Compared to higher income emerging markets, Lusophone enterprises in Africa face more significant challenges in their internationalization efforts. Concomitantly, conducting significant international business (IB) research in these markets to understand these MNEs internationalization strategies can be a very daunting task. The fast-growing rise of social media on the Internet, however, provides an opportunity for IB researchers to examine new phenomena in these markets in innovative ways. Unfortunately, for various reasons, qualitative researchers in IB have not fully embraced this opportunity. This article studies the use of social media in qualitative research in the field of IB. It offers an illustrative case based on qualitative research on internationalization modes of LAMNEs conducted by the authors in Angola and Mozambique using social media to identify and qualify the population sample, as well as interact with subjects and collect data. It discusses some of the challenges of using social media in those regions of Africa and suggests how scholars can design their studies to capitalize on social media and corresponding data as a tool for qualitative research. This article underscores the potential opportunities and challenges inherent in the use of social media in IB-oriented qualitative research, providing recommendations on how qualitative IB researchers can design their studies to capitalize on data generated by social media.https://doi.org/10.1080/15475778.2019.1634406https://doi.org/10.1080/15475778.2019.1634406https://doi.org/10.1080/15475778.2019.1634406https://doi.org/10.1080/15475778.2019.1634406Accepted manuscriptPublished versio

    Multi-GPU Graph Analytics

    Full text link
    We present a single-node, multi-GPU programmable graph processing library that allows programmers to easily extend single-GPU graph algorithms to achieve scalable performance on large graphs with billions of edges. Directly using the single-GPU implementations, our design only requires programmers to specify a few algorithm-dependent concerns, hiding most multi-GPU related implementation details. We analyze the theoretical and practical limits to scalability in the context of varying graph primitives and datasets. We describe several optimizations, such as direction optimizing traversal, and a just-enough memory allocation scheme, for better performance and smaller memory consumption. Compared to previous work, we achieve best-of-class performance across operations and datasets, including excellent strong and weak scalability on most primitives as we increase the number of GPUs in the system.Comment: 12 pages. Final version submitted to IPDPS 201
    • …
    corecore