117,194 research outputs found

    Identification-method research for open-source software ecosystems

    Get PDF
    In recent years, open-source software (OSS) development has grown, with many developers around the world working on different OSS projects. A variety of open-source software ecosystems have emerged, for instance, GitHub, StackOverflow, and SourceForge. One of the most typical social-programming and code-hosting sites, GitHub, has amassed numerous open-source-software projects and developers in the same virtual collaboration platform. Since GitHub itself is a large open-source community, it hosts a collection of software projects that are developed together and coevolve. The great challenge here is how to identify the relationship between these projects, i.e., project relevance. Software-ecosystem identification is the basis of other studies in the ecosystem. Therefore, how to extract useful information in GitHub and identify software ecosystems is particularly important, and it is also a research area in symmetry. In this paper, a Topic-based Project Knowledge Metrics Framework (TPKMF) is proposed. By collecting the multisource dataset of an open-source ecosystem, project-relevance analysis of the open-source software is carried out on the basis of software-ecosystem identification. Then, we used our Spectral Clustering algorithm based on Core Project (CP-SC) to identify software-ecosystem projects and further identify software ecosystems. We verified that most software ecosystems usually contain a core software project, and most other projects are associated with it. Furthermore, we analyzed the characteristics of the ecosystem, and we also found that interactive information has greater impact on project relevance. Finally, we summarize the Topic-based Project Knowledge Metrics Framework

    Users and Assessors in the Context of INEX: Are Relevance Dimensions Relevant?

    Get PDF
    The main aspects of XML retrieval are identified by analysing and comparing the following two behaviours: the behaviour of the assessor when judging the relevance of returned document components; and the behaviour of users when interacting with components of XML documents. We argue that the two INEX relevance dimensions, Exhaustivity and Specificity, are not orthogonal dimensions; indeed, an empirical analysis of each dimension reveals that the grades of the two dimensions are correlated to each other. By analysing the level of agreement between the assessor and the users, we aim at identifying the best units of retrieval. The results of our analysis show that the highest level of agreement is on highly relevant and on non-relevant document components, suggesting that only the end points of the INEX 10-point relevance scale are perceived in the same way by both the assessor and the users. We propose a new definition of relevance for XML retrieval and argue that its corresponding relevance scale would be a better choice for INEX

    Digitometric Services for Open Archives Environments

    No full text
    We describe “digitometric” services and tools that add value to open-access eprint archives using the Open Archives Initiative (OAI) Protocol for Metadata Harvesting. Celestial is an OAI cache and gateway tool. Citebase Search enhances OAI-harvested metadata with linked references harvested from the full-text to provide a web service for citation navigation and research impact analysis. Digitometrics builds on data harvested using OAI to provide advanced visualisation and hypertext navigation for the research community. Together these services provide a modular, distributed architecture for building a “semantic web” for the research literature

    The Royal Birth of 2013: Analysing and Visualising Public Sentiment in the UK Using Twitter

    Full text link
    Analysis of information retrieved from microblogging services such as Twitter can provide valuable insight into public sentiment in a geographic region. This insight can be enriched by visualising information in its geographic context. Two underlying approaches for sentiment analysis are dictionary-based and machine learning. The former is popular for public sentiment analysis, and the latter has found limited use for aggregating public sentiment from Twitter data. The research presented in this paper aims to extend the machine learning approach for aggregating public sentiment. To this end, a framework for analysing and visualising public sentiment from a Twitter corpus is developed. A dictionary-based approach and a machine learning approach are implemented within the framework and compared using one UK case study, namely the royal birth of 2013. The case study validates the feasibility of the framework for analysis and rapid visualisation. One observation is that there is good correlation between the results produced by the popular dictionary-based approach and the machine learning approach when large volumes of tweets are analysed. However, for rapid analysis to be possible faster methods need to be developed using big data techniques and parallel methods.Comment: http://www.blessonv.com/research/publicsentiment/ 9 pages. Submitted to IEEE BigData 2013: Workshop on Big Humanities, October 201

    The statistical Analysis of Star Clusters

    Full text link
    We review a range of stastistical methods for analyzing the structures of star clusters, and derive a new measure Q{\cal Q} which both quantifies, and distinguishes between, a (relatively smooth) large-scale radial density gradient and multi-scale (fractal) sub-clustering. Q is derived from the normalised correlation length and the normalised edge length of the minimal spanning tree for each cluster

    A methodology for analysing and evaluating narratives in annual reports: a comprehensive descriptive profile and metrics for disclosure quality attributes

    Get PDF
    There is a consensus that the business reporting model needs to expand to serve the changing information needs of the market and provide the information required for enhanced corporate transparency and accountability. Worldwide, regulators view narrative disclosures as the key to achieving the desired step-change in the quality of corporate reporting. In recent years, accounting researchers have increasingly focused their efforts on investigating disclosure and it is now recognised that there is an urgent need to develop disclosure metrics to facilitate research into voluntary disclosure and quality [Core, J. E. (2001). A review of the empirical disclosure literature. Journal of Accounting and Economics, 31(3), 441–456]. This paper responds to this call and contributes in two principal ways. First, the paper introduces to the academic literature a comprehensive four-dimensional framework for the holistic content analysis of accounting narratives and presents a computer-assisted methodology for implementing this framework. This procedure provides a rich descriptive profile of a company's narrative disclosures based on the coding of topic and three type attributes. Second, the paper explores the complex concept of quality, and the problematic nature of quality measurement. It makes a preliminary attempt to identify some of the attributes of quality (such as relative amount of disclosure and topic spread), suggests observable proxies for these and offers a tentative summary measure of disclosure quality
    • 

    corecore