Search CORE

28,958 research outputs found

On Using Active Learning and Self-Training when Mining Performance Discussions on Stack Overflow

Author: Allamanis M.
Chowdhury S.
Cicchetti A.
Lin Y.
Pedregosa F.
Settles B.
Settles B.
Soliman M.
Ying A.
Publication venue
Publication date: 01/01/2017
Field of study

Abundant data is the key to successful machine learning. However, supervised learning requires annotated data that are often hard to obtain. In a classification task with limited resources, Active Learning (AL) promises to guide annotators to examples that bring the most value for a classifier. AL can be successfully combined with self-training, i.e., extending a training set with the unlabelled examples for which a classifier is the most certain. We report our experiences on using AL in a systematic manner to train an SVM classifier for Stack Overflow posts discussing performance of software components. We show that the training examples deemed as the most valuable to the classifier are also the most difficult for humans to annotate. Despite carefully evolved annotation criteria, we report low inter-rater agreement, but we also propose mitigation strategies. Finally, based on one annotator's work, we show that self-training can improve the classification accuracy. We conclude the paper by discussing implication for future text miners aspiring to use AL and self-training.Comment: Preprint of paper accepted for the Proc. of the 21st International Conference on Evaluation and Assessment in Software Engineering, 201

arXiv.org e-Print Archive

Lund University Publications

Crossref

Swedish Institute of Computer Science Publications Database

KULTUR: showcasing art through institutional repositories

Author: Hemmings C.
White W.
Publication venue
Publication date: 01/01/2010
Field of study

Showcasing work has always been at the heart of the arts community, whether it be through an exhibition, site-specific installation or performance. Representation of the original work has also been important and use of print-based options like exhibition catalogues is now complemented by websites and multi-media friendly services like Flickr and YouTube and Vimeo. These services also provide options for sharing born-digital material. For those working in higher education there is a need to profile both the personal and the institutional aspects of creative outputs. The KULTUR project created a model for arts-based institutional repositories and it is hoped that this approach will be useful for other arts institutions

Southampton (e-Prints Soton)

LODE: Linking Digital Humanities Content to the Web of Data

Author: Allen Colin
Huber Jakob
Murdock Jaimie
Niepert Mathias
Noessner Jan
Sztyler Timo
Publication venue
Publication date: 01/01/2014
Field of study

Numerous digital humanities projects maintain their data collections in the form of text, images, and metadata. While data may be stored in many formats, from plain text to XML to relational databases, the use of the resource description framework (RDF) as a standardized representation has gained considerable traction during the last five years. Almost every digital humanities meeting has at least one session concerned with the topic of digital humanities, RDF, and linked data. While most existing work in linked data has focused on improving algorithms for entity matching, the aim of the LinkedHumanities project is to build digital humanities tools that work "out of the box," enabling their use by humanities scholars, computer scientists, librarians, and information scientists alike. With this paper, we report on the Linked Open Data Enhancer (LODE) framework developed as part of the LinkedHumanities project. With LODE we support non-technical users to enrich a local RDF repository with high-quality data from the Linked Open Data cloud. LODE links and enhances the local RDF repository without compromising the quality of the data. In particular, LODE supports the user in the enhancement and linking process by providing intuitive user-interfaces and by suggesting high-quality linking candidates using tailored matching algorithms. We hope that the LODE framework will be useful to digital humanities scholars complementing other digital humanities tools

arXiv.org e-Print Archive

Crossref

MAnnheim DOCument Server

Joining up health and bioinformatics: e-science meets e-health

Author: Gaizauskas R
Hepple M
Ingram D
Kalra D
Milan J
Powers R
Rector A
Rogers J
Scott D
Singleton P
Taweel A
Publication venue: Engineering and Physical Sciences Research Council (EPSRC)
Publication date: 01/09/2004
Field of study

CLEF (Co-operative Clinical e-Science Framework) is an MRC sponsored project in the e-Science programme that aims to establish methodologies and a technical infrastructure forthe next generation of integrated clinical and bioscience research. It is developing methodsfor managing and using pseudonymised repositories of the long-term patient histories whichcan be linked to genetic, genomic information or used to support patient care. CLEF concentrateson removing key barriers to managing such repositories ? ethical issues, informationcapture, integration of disparate sources into coherent ?chronicles? of events, userorientedmechanisms for querying and displaying the information, and compiling the requiredknowledge resources. This paper describes the overall information flow and technicalapproach designed to meet these aims within a Grid framework

UCL Discovery

Assigning Creative Commons Licenses to Research Metadata: Issues and Cases

Author: Aryani Amir
Casanovas Pompeu
Dallmeier-Tiessen Sunje
Doncel Victor Rodriguez
Hausstein Brigitte
Klas Claus-Peter
Manghi Paolo
Poblet Marta
Unsworth Kathryn
Wang Jingbo
Publication venue
Publication date: 01/01/2016
Field of study

This paper discusses the problem of lack of clear licensing and transparency of usage terms and conditions for research metadata. Making research data connected, discoverable and reusable are the key enablers of the new data revolution in research. We discuss how the lack of transparency hinders discovery of research data and make it disconnected from the publication and other trusted research outcomes. In addition, we discuss the application of Creative Commons licenses for research metadata, and provide some examples of the applicability of this approach to internationally known data infrastructures.Comment: 9 pages. Submitted to the 29th International Conference on Legal Knowledge and Information Systems (JURIX 2016), Nice (France) 14-16 December 201

arXiv.org e-Print Archive

Diposit Digital de Documents de la UAB

Mining Threat Intelligence about Open-Source Projects and Libraries from Code Repository Issues and Bug Reports

Author: Joshi Anupam
Mittal Sudip
Neil Lorenzo
Publication venue
Publication date: 09/08/2018
Field of study

Open-Source Projects and Libraries are being used in software development while also bearing multiple security vulnerabilities. This use of third party ecosystem creates a new kind of attack surface for a product in development. An intelligent attacker can attack a product by exploiting one of the vulnerabilities present in linked projects and libraries. In this paper, we mine threat intelligence about open source projects and libraries from bugs and issues reported on public code repositories. We also track library and project dependencies for installed software on a client machine. We represent and store this threat intelligence, along with the software dependencies in a security knowledge graph. Security analysts and developers can then query and receive alerts from the knowledge graph if any threat intelligence is found about linked libraries and projects, utilized in their products

arXiv.org e-Print Archive

Crossref

The role of institutional repositories in addressing higher education challenges

Author: Davis Hugh
Sarker Farhana
Tiropanis Thanassis
Publication venue
Publication date: 03/11/2010
Field of study

Over the last decade, Higher Education around the world is facing a number of challenges. Challenges such as adopting new technologies, improving the quality of learning and teaching, widening participation, student retention, curriculum design/alignment, student employability, funding and the necessity to improve governance are considered particularly in many literature. To effectively operate and to survive in this globalization era, Higher Education institutions need to respond those challenges in an efficient way. This paper proposes ways in which institutional data repositories can be utilized to address the challenges found in different literature. Also we discuss which repositories can be shared across the institutions and which need not to be shared in order to address those challenges. Finally the paper discusses the barriers to sharing Higher Education repositories and how those barriers can be addressed

Southampton (e-Prints Soton)

Economics and Engineering for Preserving Digital Content

Author: Gladney Dr. H. M.
Publication venue
Publication date: 01/11/2007
Field of study

Progress towards practical long-term preservation seems to be stalled. Preservationists cannot afford specially developed technology, but must exploit what is created for the marketplace. Economic and technical facts suggest that most preservation ork should be shifted from repository institutions to information producers and consumers. Prior publications describe solutions for all known conceptual challenges of preserving a single digital object, but do not deal with software development or scaling to large collections. Much of the document handling software needed is available. It has, however, not yet been selected, adapted, integrated, or deployed for digital preservation. The daily tools of both information producers and information consumers can be extended to embed preservation packaging without much burdening these users. We describe a practical strategy for detailed design and implementation. Document handling is intrinsically complicated because of human sensitivity to communication nuances. Our engineering section therefore starts by discussing how project managers can master the many pertinent details.

Characterizing the Landscape of Musical Data on the Web: State of the Art and Challenges

Author: d'Aquin Mathieu
Daga Enrico
Daquino Marilena
Gangemi Aldo
Holland Simon
Laney Robin
Mulholland Paul
Penuela Albert Merono
Publication venue
Publication date: 01/01/2017
Field of study

Musical data can be analysed, combined, transformed and exploited for diverse purposes. However, despite the proliferation of digital libraries and repositories for music, infrastructures and tools, such uses of musical data remain scarce. As an initial step to help fill this gap, we present a survey of the landscape of musical data on the Web, available as a Linked Open Dataset: the musoW dataset of catalogued musical resources. We present the dataset and the methodology and criteria for its creation and assessment. We map the identified dimensions and parameters to existing Linked Data vocabularies, present insights gained from SPARQL queries, and identify significant relations between resource features. We present a thematic analysis of the original research questions associated with surveyed resources and identify the extent to which the collected resources are Linked Data-ready

VU Research Portal

Open Research Online (The Open University)

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Semantic Integration of Cervical Cancer Data Repositories to Facilitate Multicenter Association Studies: The ASSIST Approach

Author: Agorastos Theodoros
Coorevits Pascal
Delopoulos Manolis
Kaufmann Andreas M.
Koutkias Vassilis
Kurzeja Roberto
Mitkas Pericles A.
Tantsis Antonios
Weyers Steven
Publication venue: 'SAGE Publications'
Publication date: 01/01/2009
Field of study

The current work addresses the unifi cation of Electronic Health Records related to cervical cancer into a single medical knowledge source, in the context of the EU-funded ASSIST research project. The project aims to facilitate the research for cervical precancer and cancer through a system that virtually unifi es multiple patient record repositories, physically located in different medical centers/hospitals, thus, increasing fl exibility by allowing the formation of study groups “on demand” and by recycling patient records in new studies. To this end, ASSIST uses semantic technologies to translate all medical entities (such as patient examination results, history, habits, genetic profi le) and represent them in a common form, encoded in the ASSIST Cervical Cancer Ontology. The current paper presents the knowledge elicitation approach followed, towards the defi nition and representation of the disease’s medical concepts and rules that constitute the basis for the ASSIST Cervical Cancer Ontology. The proposed approach constitutes a paradigm for semantic integration of heterogeneous clinical data that may be applicable to other biomedical application domains

Ghent University Academic Bibliography

Archivsystem Ask23