Search CORE

5,934 research outputs found

Bayesian Non-Exhaustive Classification A Case Study: Online Name Disambiguation using Temporal Record Streams

Author: Bunescu R.
Chen P.-Y.
Davis A.
de Carvalho A. P.
Dundar M.
Lee D. D.
Michaud D. J.
Sethuraman J.
Zhang B.
Publication venue
Publication date: 01/09/2016
Field of study

The name entity disambiguation task aims to partition the records of multiple real-life persons so that each partition contains records pertaining to a unique person. Most of the existing solutions for this task operate in a batch mode, where all records to be disambiguated are initially available to the algorithm. However, more realistic settings require that the name disambiguation task be performed in an online fashion, in addition to, being able to identify records of new ambiguous entities having no preexisting records. In this work, we propose a Bayesian non-exhaustive classification framework for solving online name disambiguation task. Our proposed method uses a Dirichlet process prior with a Normal * Normal * Inverse Wishart data model which enables identification of new ambiguous entities who have no records in the training data. For online classification, we use one sweep Gibbs sampler which is very efficient and effective. As a case study we consider bibliographic data in a temporal stream format and disambiguate authors by partitioning their papers into homogeneous groups. Our experimental results demonstrate that the proposed method is better than existing methods for performing online name disambiguation task.Comment: to appear in CIKM 201

arXiv.org e-Print Archive

Crossref

IUPUIScholarWorks

Data quality: Some comments on the NASA software defect datasets

Author: Mair C
Shepperd M
Song Q
Sun Z
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2013
Field of study

Background-Self-evidently empirical analyses rely upon the quality of their data. Likewise, replications rely upon accurate reporting and using the same rather than similar versions of datasets. In recent years, there has been much interest in using machine learners to classify software modules into defect-prone and not defect-prone categories. The publicly available NASA datasets have been extensively used as part of this research. Objective-This short note investigates the extent to which published analyses based on the NASA defect datasets are meaningful and comparable. Method-We analyze the five studies published in the IEEE Transactions on Software Engineering since 2007 that have utilized these datasets and compare the two versions of the datasets currently in use. Results-We find important differences between the two versions of the datasets, implausible values in one dataset and generally insufficient detail documented on dataset preprocessing. Conclusions-It is recommended that researchers 1) indicate the provenance of the datasets they use, 2) report any preprocessing in sufficient detail to enable meaningful replication, and 3) invest effort in understanding the data prior to applying machine learners

Crossref

UAL Research Online

Brunel University Research Archive

A Meta-analytical Comparison of Naive Bayes and Random Forest for Software Defect Prediction

Author: Awais Ch Muhammad
Dlamini Gcinizwe
Gu Wei
Kholmatova Zamira
Succi Giancarlo
Publication venue
Publication date: 27/06/2023
Field of study

Is there a statistical difference between Naive Bayes and Random Forest in terms of recall, f-measure, and precision for predicting software defects? By utilizing systematic literature review and meta-analysis, we are answering this question. We conducted a systematic literature review by establishing criteria to search and choose papers, resulting in five studies. After that, using the meta-data and forest-plots of five chosen papers, we conducted a meta-analysis to compare the two models. The results have shown that there is no significant statistical evidence that Naive Bayes perform differently from Random Forest in terms of recall, f-measure, and precision.Comment: 11 pages, 8 figures, Conference Pape

arXiv.org e-Print Archive

DOME: recommendations for supervised machine learning validation in biology

Author: Broin P. O.
Capella-Gutierrez S.
Capriotti E.
Casadio R.
Cirillo D.
Del Angel V. D.
Del Conte A.
Dimopoulos A. C.
Dopazo J.
Fariselli P.
Fernandez J. M.
Fishman D.
Garcia-Gasulla D.
Harrow J.
Huber F.
Kreshuk A.
Lenaerts T.
Martelli P. L.
Navarro A.
Pinero J.
Piovesan D.
Pollastri G.
Psomopoulos F. E.
Reczko M.
Ronzano F.
Salgado D.
Satagopam V.
Savojardo C.
Spiwok V.
Tangaro M. A.
Tartari G.
Titma T.
Tosatto S. C. E.
Valencia A.
Walsh I.
Zambelli F.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Institutional Research Information System University of Turin

Recommended from our members

Learning design – making practice explicit

Author: Conole Grainne
Publication venue
Publication date: 28/06/2010
Field of study

New technologies have immense potential for learning, but the sheer variety possible also creates challenges for learners in terms of navigating through an increasingly complex digital landscape and for teachers in terms of how to design and support learning interventions. How can learners and teachers make informed decisions about what technologies to use in the design and support of learning activities? This presentation will consider this question and present a new methodology for design – 'learning design', which aims to shift the creation and support of learning from what has traditionally been an implicit, belief-based practice to one that is explicit and design based. Learning design research at the Open University, UK has included the development of a set of conceptual design views, a tool for visualising designs (CompendiumLD) and a social networking site, for sharing and discussing learning and teaching ideas and designs (Cloudworks). An overview of this work will be provided, along with a discussion of the perceived benefits of this new approach to educational design

Open Research Online (The Open University)

Recommended from our members

Knowledge Cartography: Software tools and mapping techniques

Author
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 15/09/2008
Field of study

Knowledge Cartography is the discipline of mapping intellectual landscapes.The focus of this book is on the process by which manually crafting interactive, hypertextual maps clarifies one’s own understanding, as well as communicating it.The authors see mapping software as a set of visual tools for reading and writing in a networked age. In an information ocean, the primary challenge is to find meaningful patterns around which we can weave plausible narratives. Maps of concepts, discussions and arguments make the connections between ideas tangible and disputable. With 17 chapters from the leading researchers and practitioners, the reader will find the current state–of-the-art in the field. Part 1 focuses on educational applications in schools and universities, before Part 2 turns to applications in professional communitie

Open Research Online (The Open University)