3,726 research outputs found

    A Probabilistic Generative Model for Latent Business Networks Mining

    Get PDF
    The structural embeddedness theory posits that a company’s embeddedness in a business network impacts its competitive performance. This highlights the theoretical and practical values toward business network mining and analysis. Given the fact that latent business relationships may exist and business networks continuously evolve over time, a manual approach for the discovery and analysis of business network is ineffective. Though numerous research has been devoted to social network discovery and analysis, relatively little research is conducted on business network discovery. Guided by the design science research methodology, the main contribution of our research is the design and development of a novel probabilistic generative model for latent business relationship mining. The proposed method can effectively and efficiently discover evolving latent business networks over time. Our experimental results confirm that the proposed method outperforms the well-known vector space model based latent business relationship mining method by 28% in terms of AUC value

    Scraping the Social? Issues in live social research

    Get PDF
    What makes scraping methodologically interesting for social and cultural research? This paper seeks to contribute to debates about digital social research by exploring how a ‘medium-specific’ technique for online data capture may be rendered analytically productive for social research. As a device that is currently being imported into social research, scraping has the capacity to re-structure social research, and this in at least two ways. Firstly, as a technique that is not native to social research, scraping risks to introduce ‘alien’ methodological assumptions into social research (such as an pre-occupation with freshness). Secondly, to scrape is to risk importing into our inquiry categories that are prevalent in the social practices enabled by the media: scraping makes available already formatted data for social research. Scraped data, and online social data more generally, tend to come with ‘external’ analytics already built-in. This circumstance is often approached as a ‘problem’ with online data capture, but we propose it may be turned into virtue, insofar as data formats that have currency in the areas under scrutiny may serve as a source of social data themselves. Scraping, we propose, makes it possible to render traffic between the object and process of social research analytically productive. It enables a form of ‘real-time’ social research, in which the formats and life cycles of online data may lend structure to the analytic objects and findings of social research. By way of a conclusion, we demonstrate this point in an exercise of online issue profiling, and more particularly, by relying on Twitter to profile the issue of ‘austerity’. Here we distinguish between two forms of real-time research, those dedicated to monitoring live content (which terms are current?) and those concerned with analysing the liveliness of issues (which topics are happening?)

    Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples

    Full text link
    Machine Learning has been a big success story during the AI resurgence. One particular stand out success relates to learning from a massive amount of data. In spite of early assertions of the unreasonable effectiveness of data, there is increasing recognition for utilizing knowledge whenever it is available or can be created purposefully. In this paper, we discuss the indispensable role of knowledge for deeper understanding of content where (i) large amounts of training data are unavailable, (ii) the objects to be recognized are complex, (e.g., implicit entities and highly subjective content), and (iii) applications need to use complementary or related data in multiple modalities/media. What brings us to the cusp of rapid progress is our ability to (a) create relevant and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP techniques. Using diverse examples, we seek to foretell unprecedented progress in our ability for deeper understanding and exploitation of multimodal data and continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International Conference on Web Intelligence (WI). arXiv admin note: substantial text overlap with arXiv:1610.0770

    From Query to Usable Code: An Analysis of Stack Overflow Code Snippets

    Full text link
    Enriched by natural language texts, Stack Overflow code snippets are an invaluable code-centric knowledge base of small units of source code. Besides being useful for software developers, these annotated snippets can potentially serve as the basis for automated tools that provide working code solutions to specific natural language queries. With the goal of developing automated tools with the Stack Overflow snippets and surrounding text, this paper investigates the following questions: (1) How usable are the Stack Overflow code snippets? and (2) When using text search engines for matching on the natural language questions and answers around the snippets, what percentage of the top results contain usable code snippets? A total of 3M code snippets are analyzed across four languages: C\#, Java, JavaScript, and Python. Python and JavaScript proved to be the languages for which the most code snippets are usable. Conversely, Java and C\# proved to be the languages with the lowest usability rate. Further qualitative analysis on usable Python snippets shows the characteristics of the answers that solve the original question. Finally, we use Google search to investigate the alignment of usability and the natural language annotations around code snippets, and explore how to make snippets in Stack Overflow an adequate base for future automatic program generation.Comment: 13th IEEE/ACM International Conference on Mining Software Repositories, 11 page

    Theory and Practice of Data Citation

    Full text link
    Citations are the cornerstone of knowledge propagation and the primary means of assessing the quality of research, as well as directing investments in science. Science is increasingly becoming "data-intensive", where large volumes of data are collected and analyzed to discover complex patterns through simulations and experiments, and most scientific reference works have been replaced by online curated datasets. Yet, given a dataset, there is no quantitative, consistent and established way of knowing how it has been used over time, who contributed to its curation, what results have been yielded or what value it has. The development of a theory and practice of data citation is fundamental for considering data as first-class research objects with the same relevance and centrality of traditional scientific products. Many works in recent years have discussed data citation from different viewpoints: illustrating why data citation is needed, defining the principles and outlining recommendations for data citation systems, and providing computational methods for addressing specific issues of data citation. The current panorama is many-faceted and an overall view that brings together diverse aspects of this topic is still missing. Therefore, this paper aims to describe the lay of the land for data citation, both from the theoretical (the why and what) and the practical (the how) angle.Comment: 24 pages, 2 tables, pre-print accepted in Journal of the Association for Information Science and Technology (JASIST), 201
    • …
    corecore