5 research outputs found

    Automatic classification of software related microblogs

    Get PDF
    Abstract—Millions of people, including those in the soft-ware engineering communities have turned to microblogging services, such as Twitter, as a means to quickly disseminate information. A number of past studies by Treude et al., Storey, and Yuan et al. have shown that a wealth of interesting information is stored in these microblogs. However, microblogs also contain a large amount of noisy content that are less relevant to software developers in engineering software systems. In this work, we perform a preliminary study to investigate the feasibility of automatic classification of microblogs into two categories: relevant and irrelevant to engineering software systems. We extract features from the textual content of the microblogs and the titles of any URLs mentioned in the mi-croblogs. These features are then used to learn a discriminative model used in classifying relevant and irrelevant microblogs. We show that our trained model can achieve a promising classification performance. I

    What does software engineering community microblog about?

    Get PDF
    Abstract—Microblogging is a new trend to communicate and to disseminate information. One microblog post could potentially reach millions of users. Millions of microblogs are generated on a daily basis on popular sites such as Twitter. The popularity of microblogging among programmers, software engineers, and software users has also led to their use of microblogs to communicate software engineering issues apart from using emails and other traditional communication channels. Understanding how millions of users use microblogs in software engineering related activities would shed light on ways we could leverage the fast evolving microblogging content to aid software development efforts. In this work, we perform a preliminary study on what software engineering community microblogs about. We analyze the content of microblogs from Twitter and categorize the types of microblogs that are posted. We investigate the relative popularity of each category of mi-croblogs. We also investigate what kinds of microblogs that are diffused more widely in the Twitter network. Our experiments show that microblogs commonly contain job openings, news, questions and answers, or links to download new tools and code. We find that microblogs concerning the real-world events are more widely diffused in the Twitter network. I

    Tag Recommendation in Software Information Sites

    Get PDF
    Abstract—Nowadays, software engineers use a variety of online media to search and become informed of new and interesting technologies, and to learn from and help one another. We refer to these kinds of online media which help software engineers im-prove their performance in software development, maintenance and test processes as software information sites. It is common to see tags in software information sites and many sites allow users to tag various objects with their own words. Users increasingly use tags to describe the most important features of their posted contents or projects. In this paper, we propose TagCombine, an automatic tag recommendation method which analyzes objects in software in-formation sites. TagCombine has 3 different components: 1. multi-label ranking component which considers tag recommendation as a multi-label learning problem; 2. similarity based rankin

    Blogs as Infrastructure for Scholarly Communication.

    Full text link
    This project systematically analyzes digital humanities blogs as an infrastructure for scholarly communication. This exploratory research maps the discourses of a scholarly community to understand the infrastructural dynamics of blogs and the Open Web. The text contents of 106,804 individual blog posts from a corpus of 396 blogs were analyzed using a mix of computational and qualitative methods. Analysis uses an experimental methodology (trace ethnography) combined with unsupervised machine learning (topic modeling), to perform an interpretive analysis at scale. Methodological findings show topic modeling can be integrated with qualitative and interpretive analysis. Special attention must be paid to data fitness, or the shape and re-shaping practices involved with preparing data for machine learning algorithms. Quantitative analysis of computationally generated topics indicates that while the community writes about diverse subject matter, individual scholars focus their attention on only a couple of topics. Four categories of informal scholarly communication emerged from the qualitative analysis: quasi-academic, para-academic, meta-academic, and extra-academic. The quasi and para-academic categories represent discourse with scholarly value within the digital humanities community, but do not necessarily have an obvious path into formal publication and preservation. A conceptual model, the (in)visible college, is introduced for situating scholarly communication on blogs and the Open Web. An (in)visible college is a kind of scholarly communication that is informal, yet visible at scale. This combination of factors opens up a new space for the study of scholarly communities and communication. While (in)invisible colleges are programmatically observable, care must be taken with any effort to count and measure knowledge work in these spaces. This is the first systematic, data driven analysis of the digital humanities and lays the groundwork for subsequent social studies of digital humanities.PhDInformationUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111592/1/mcburton_1.pd
    corecore