Search CORE

60,200 research outputs found

Data mining and fusion

Author: Addis M. J.
Choi F.
Taylor S. J.
Upstill C.
Watkins E. R.
Publication venue: s.n.
Publication date: 01/04/2006
Field of study

Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure

Author: Zhuge Hai
Publication venue
Publication date: 18/07/2015
Field of study

Big data research has attracted great attention in science, technology, industry and society. It is developing with the evolving scientific paradigm, the fourth industrial revolution, and the transformational innovation of technologies. However, its nature and fundamental challenge have not been recognized, and its own methodology has not been formed. This paper explores and answers the following questions: What is big data? What are the basic methods for representing, managing and analyzing big data? What is the relationship between big data and knowledge? Can we find a mapping from big data into knowledge space? What kind of infrastructure is required to support not only big data management and analysis but also knowledge discovery, sharing and management? What is the relationship between big data and science paradigm? What is the nature and fundamental challenge of big data computing? A multi-dimensional perspective is presented toward a methodology of big data computing.Comment: 59 page

arXiv.org e-Print Archive

CiteSeerX

git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories

Author: Gote Christoph
Scholtes Ingo
Schweitzer Frank
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/03/2019
Field of study

Data from software repositories have become an important foundation for the empirical study of software engineering processes. A recurring theme in the repository mining literature is the inference of developer networks capturing e.g. collaboration, coordination, or communication from the commit history of projects. Most of the studied networks are based on the co-authorship of software artefacts defined at the level of files, modules, or packages. While this approach has led to insights into the social aspects of software development, it neglects detailed information on code changes and code ownership, e.g. which exact lines of code have been authored by which developers, that is contained in the commit log of software projects. Addressing this issue, we introduce git2net, a scalable python software that facilitates the extraction of fine-grained co-editing networks in large git repositories. It uses text mining techniques to analyse the detailed history of textual modifications within files. This information allows us to construct directed, weighted, and time-stamped networks, where a link signifies that one developer has edited a block of source code originally written by another developer. Our tool is applied in case studies of an Open Source and a commercial software project. We argue that it opens up a massive new source of high-resolution data on human collaboration patterns.Comment: MSR 2019, 12 pages, 10 figure

arXiv.org e-Print Archive

ZORA

Ontology-based knowledge representation of experiment metadata in biological data mining

Author: Burke Squires
Carl Dahlke
Hagler Herb
Herb Hagler
Jamie Lee
Jeff Wiser
Jennifer Cai
Karp David
Megan Kong
Patrick Dunn
Richard Scheuermann
Smith Barry
Yu Qian
Publication venue
Publication date: 01/01/2009
Field of study

According to the PubMed resource from the U.S. National Library of Medicine, over 750,000 scientific articles have been published in the ~5000 biomedical journals worldwide in the year 2007 alone. The vast majority of these publications include results from hypothesis-driven experimentation in overlapping biomedical research domains. Unfortunately, the sheer volume of information being generated by the biomedical research enterprise has made it virtually impossible for investigators to stay aware of the latest findings in their domain of interest, let alone to be able to assimilate and mine data from related investigations for purposes of meta-analysis. While computers have the potential for assisting investigators in the extraction, management and analysis of these data, information contained in the traditional journal publication is still largely unstructured, free-text descriptions of study design, experimental application and results interpretation, making it difficult for computers to gain access to the content of what is being conveyed without significant manual intervention. In order to circumvent these roadblocks and make the most of the output from the biomedical research enterprise, a variety of related standards in knowledge representation are being developed, proposed and adopted in the biomedical community. In this chapter, we will explore the current status of efforts to develop minimum information standards for the representation of a biomedical experiment, ontologies composed of shared vocabularies assembled into subsumption hierarchical structures, and extensible relational data models that link the information components together in a machine-readable and human-useable framework for data mining purposes

PhilPapers

Nanoinformatics 2010 Program

Author: Baker Nathan A
Chaka Anne
Cohen Yoram
Colvin Vicki
Fritts Martin
Geraci Charles L.
Hoover Mark D
Ku Sharon
Kulinowski Kristen M
Lippell Phil
Luo James
McLennan Michael
Morse Jeffrey
Ostraat Michele L
Rajan Krishna
Reznik-Zellen Rebecca
Schad Peter
Tuominen Mark T.
Publication venue
Publication date: 01/11/2010
Field of study

InterNano Nanomanufacturing Repository