18,111 research outputs found
Predicting Good Configurations for GitHub and Stack Overflow Topic Models
Software repositories contain large amounts of textual data, ranging from
source code comments and issue descriptions to questions, answers, and comments
on Stack Overflow. To make sense of this textual data, topic modelling is
frequently used as a text-mining tool for the discovery of hidden semantic
structures in text bodies. Latent Dirichlet allocation (LDA) is a commonly used
topic model that aims to explain the structure of a corpus by grouping texts.
LDA requires multiple parameters to work well, and there are only rough and
sometimes conflicting guidelines available on how these parameters should be
set. In this paper, we contribute (i) a broad study of parameters to arrive at
good local optima for GitHub and Stack Overflow text corpora, (ii) an
a-posteriori characterisation of text corpora related to eight programming
languages, and (iii) an analysis of corpus feature importance via per-corpus
LDA configuration. We find that (1) popular rules of thumb for topic modelling
parameter configuration are not applicable to the corpora used in our
experiments, (2) corpora sampled from GitHub and Stack Overflow have different
characteristics and require different configurations to achieve good model fit,
and (3) we can predict good configurations for unseen corpora reliably. These
findings support researchers and practitioners in efficiently determining
suitable configurations for topic modelling when analysing textual data
contained in software repositories.Comment: to appear as full paper at MSR 2019, the 16th International
Conference on Mining Software Repositorie
Constructing experimental indicators for Open Access documents
The ongoing paradigm change in the scholarly publication system ('science is
turning to e-science') makes it necessary to construct alternative evaluation
criteria/metrics which appropriately take into account the unique
characteristics of electronic publications and other research output in digital
formats. Today, major parts of scholarly Open Access (OA) publications and the
self-archiving area are not well covered in the traditional citation and
indexing databases. The growing share and importance of freely accessible
research output demands new approaches/metrics for measuring and for evaluating
of these new types of scientific publications. In this paper we propose a
simple quantitative method which establishes indicators by measuring the
access/download pattern of OA documents and other web entities of a single web
server. The experimental indicators (search engine, backlink and direct access
indicator) are constructed based on standard local web usage data. This new
type of web-based indicator is developed to model the specific demand for
better study/evaluation of the accessibility, visibility and interlinking of
open accessible documents. We conclude that e-science will need new stable
e-indicators.Comment: 9 pages, 3 figure
Web Mining Functions in an Academic Search Application
This paper deals with Web mining and the different categories of Web mining like content, structure and usage mining. The application of Web mining in an academic search application has been discussed. The paper concludes with open problems related to Web mining. The present work can be a useful input to Web users, Web Administrators in a university environment.Database, HITS, IR, NLP, Web mining
Is Stack Overflow Overflowing With Questions and Tags
Programming question and answer (Q & A) websites, such as Quora, Stack
Overflow, and Yahoo! Answer etc. helps us to understand the programming
concepts easily and quickly in a way that has been tested and applied by many
software developers. Stack Overflow is one of the most frequently used
programming Q\&A website where the questions and answers posted are presently
analyzed manually, which requires a huge amount of time and resource. To save
the effort, we present a topic modeling based technique to analyze the words of
the original texts to discover the themes that run through them. We also
propose a method to automate the process of reviewing the quality of questions
on Stack Overflow dataset in order to avoid ballooning the stack overflow with
insignificant questions. The proposed method also recommends the appropriate
tags for the new post, which averts the creation of unnecessary tags on Stack
Overflow.Comment: 11 pages, 7 figures, 3 tables Presented at Third International
Symposium on Women in Computing and Informatics (WCI-2015
APIs and Your Privacy
Application programming interfaces, or APIs, have been the topic of much recent discussion. Newsworthy events, including those involving Facebook’s API and Cambridge Analytica obtaining information about millions of Facebook users, have highlighted the technical capabilities of APIs for prominent websites and mobile applications. At the same time, media coverage of ways that APIs have been misused has sparked concern for potential privacy invasions and other issues of public policy. This paper seeks to educate consumers on how APIs work and how they are used within popular websites and mobile apps to gather, share, and utilize data.
APIs are used in mobile games, search engines, social media platforms, news and shopping websites, video and music streaming services, dating apps, and mobile payment systems. If a third-party company, like an app developer or advertiser, would like to gain access to your information through a website you visit or a mobile app or online service you use, what data might they obtain about you through APIs and how? This report analyzes 11 prominent online services to observe general trends and provide you an overview of the role APIs play in collecting and distributing information about consumers. For example, how might your data be gathered and shared when using your Facebook account login to sign up for Venmo or to access the Tinder dating app? How might advertisers use Pandora’s API when you are streaming music?
After explaining what APIs are and how they work, this report categorizes and characterizes different kinds of APIs that companies offer to web and app developers. Services may offer content-focused APIs, feature APIs, unofficial APIs, and analytics APIs that developers of other apps and websites may access and use in different ways. Likewise, advertisers can use APIs to target a desired subset of a service’s users and possibly extract user data. This report explains how websites and apps can create user profiles based on your online behavior and generate revenue from advertiser-access to their APIs. The report concludes with observations on how various companies and platforms connecting through APIs may be able to learn information about you and aggregate it with your personal data from other sources when you are browsing the internet or using different apps on your smartphone or tablet. While the paper does not make policy recommendations, it demonstrates the importance of approaching consumer privacy from a broad perspective that includes first parties and third parties, and that considers the integral role of APIs in today’s online ecosystem
- …