Search CORE

34,607 research outputs found

OpenML: networked science in machine learning

Author: Bischl Bernd
Torgo Luis
van Rijn Jan N.
Vanschoren Joaquin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/08/2014
Field of study

Many sciences have made significant breakthroughs by adopting online tools that help organize, structure and mine information that is too detailed to be printed in journals. In this paper, we introduce OpenML, a place for machine learning researchers to share and organize data in fine detail, so that they can work more effectively, be more visible, and collaborate with others to tackle harder problems. We discuss how OpenML relates to other examples of networked science and what benefits it brings for machine learning research, individual scientists, as well as students and practitioners.Comment: 12 pages, 10 figure

arXiv.org e-Print Archive

CiteSeerX

CASP-DM: Context Aware Standard Process for Data Mining

Author: Contreras-Ochando Lidia
Ferri Cèsar
Flach Peter
Hernández-Orallo José
Kull Meelis
Lachiche Nicolas
Martínez-Plumed Fernando
Ramírez-Quintana María José
Publication venue
Publication date: 19/09/2017
Field of study

We propose an extension of the Cross Industry Standard Process for Data Mining (CRISPDM) which addresses specific challenges of machine learning and data mining for context and model reuse handling. This new general context-aware process model is mapped with CRISP-DM reference model proposing some new or enhanced outputs

arXiv.org e-Print Archive

Explore Bristol Research

Workflow Patterns for Business Process Modeling

Author: Lochpe Cirano
Reichert Manfred
Thom Lucineia Heloisa
Publication venue: Tapir Academic Press
Publication date: 01/01/2007
Field of study

For its reuse advantages, workflow patterns (e.g., control flow patterns, data patterns, resource patterns) are increasingly attracting the interest of both researchers and vendors. Frequently, business process or workflow models can be assembeled out of a set of recurrent process fragments (or recurrent business functions), each of them having generic semantics that can be described as a pattern. To our best knowledge, so far, there has been no (empirical) work evidencing the existence of such recurrent patterns in real workflow applications. Thus, in this paper we elaborate the frequency with which certain patterns occur in practice. Furthermore, we investigate completeness of workflow patterns (based on recurrent functions) with respect to their ability to capture a large variety of business processes

CiteSeerX

DBIS EPub

University of Twente Research Information

Nanoinformatics: developing new computing applications for nanomedicine

Author: Alberto Anguita
Alejandro Pazos
Antoine Geissbuhler
B Smith
BY Kim
C Kulikowski
C Rosse
CA Kulikowski
Casimir Kulikowski
Cristian Munteanu
D Dela Iglesia
David Perez-Rey
DG Thomas
Diana De la Iglesia
ED Green
F Martin-Sanchez
Fernando Gonzalez-Nilo
Fernando Martin-Sanchez
Ferran Sanz
George Potamias
Guillermo De la Calle
Guillermo Lopez-Campos
H Berman
IS Kohane
Isabel Hermosilla
Jose Crespo
Jose Maria Barreiro
Josipa Kern
Joyce A. Mitchell
Julio C. Facelli
K Jain
Luciano Milanesi
M Gerstein
M Viceconti
Martin Fritts
Miguel Garcia-Remesal
N Gordon
NA Baker
Nathan Baker
Norbert Graf
P Kiberstis
Paula Otero
Peter Ghazal
Pierre Grangeat
Rada Hussein
Raul E. Cachau
RB Altman
S Bewick
Sabine Koch
SI O’Donoghue
Sonia E. Benitez
V Maojo
V Maojo
V Maojo
V Maojo
Vassilis Moustakis
Victor Maojo
Victoria Lopez-Alonso
Yannick Legre
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Nanoinformatics has recently emerged to address the need of computing applications at the nano level. In this regard, the authors have participated in various initiatives to identify its concepts, foundations and challenges. While nanomaterials open up the possibility for developing new devices in many industrial and scientific areas, they also offer breakthrough perspectives for the prevention, diagnosis and treatment of diseases. In this paper, we analyze the different aspects of nanoinformatics and suggest five research topics to help catalyze new research and development in the area, particularly focused on nanomedicine. We also encompass the use of informatics to further the biological and clinical applications of basic research in nanoscience and nanotechnology, and the related concept of an extended ?nanotype? to coalesce information related to nanoparticles. We suggest how nanoinformatics could accelerate developments in nanomedicine, similarly to what happened with the Human Genome and other -omics projects, on issues like exchanging modeling and simulation methods and tools, linking toxicity information to clinical and personal databases or developing new approaches for scientific ontologies, among many others

Repositorio da Universidade da Coruña

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Online Research @ Cardiff

Springer - Publisher Connector

DSpace Universidad de Talca

PubMed Central

Edinburgh Research Explorer

Archivo Digital UPM

Archive ouverte UNIGE

Source File Set Search for Clone-and-Own Reuse Analysis

Author: Inoue Katsuro
Ishio Takashi
Ito Kaoru
Sakaguchi Yusuke
Publication venue
Publication date: 01/01/2017
Field of study

Clone-and-own approach is a natural way of source code reuse for software developers. To assess how known bugs and security vulnerabilities of a cloned component affect an application, developers and security analysts need to identify an original version of the component and understand how the cloned component is different from the original one. Although developers may record the original version information in a version control system and/or directory names, such information is often either unavailable or incomplete. In this research, we propose a code search method that takes as input a set of source files and extracts all the components including similar files from a software ecosystem (i.e., a collection of existing versions of software packages). Our method employs an efficient file similarity computation using b-bit minwise hashing technique. We use an aggregated file similarity for ranking components. To evaluate the effectiveness of this tool, we analyzed 75 cloned components in Firefox and Android source code. The tool took about two hours to report the original components from 10 million files in Debian GNU/Linux packages. Recall of the top-five components in the extracted lists is 0.907, while recall of a baseline using SHA-1 file hash is 0.773, according to the ground truth recorded in the source code repositories.Comment: 14th International Conference on Mining Software Repositorie

arXiv.org e-Print Archive

NAIST Academic Repository

Crossref

Chemical information matters: an e-Research perspective on information and data sharing in the chemical sciences

Author: Bird Colin
Frey Jeremy G.
Publication venue: 'Royal Society of Chemistry (RSC)'
Publication date: 01/01/2013
Field of study

Recently, a number of organisations have called for open access to scientific information and especially to the data obtained from publicly funded research, among which the Royal Society report and the European Commission press release are particularly notable. It has long been accepted that building research on the foundations laid by other scientists is both effective and efficient. Regrettably, some disciplines, chemistry being one, have been slow to recognise the value of sharing and have thus been reluctant to curate their data and information in preparation for exchanging it. The very significant increases in both the volume and the complexity of the datasets produced has encouraged the expansion of e-Research, and stimulated the development of methodologies for managing, organising, and analysing "big data". We review the evolution of cheminformatics, the amalgam of chemistry, computer science, and information technology, and assess the wider e-Science and e-Research perspective. Chemical information does matter, as do matters of communicating data and collaborating with data. For chemistry, unique identifiers, structure representations, and property descriptors are essential to the activities of sharing and exchange. Open science entails the sharing of more than mere facts: for example, the publication of negative outcomes can facilitate better understanding of which synthetic routes to choose, an aspiration of the Dial-a-Molecule Grand Challenge. The protagonists of open notebook science go even further and exchange their thoughts and plans. We consider the concepts of preservation, curation, provenance, discovery, and access in the context of the research lifecycle, and then focus on the role of metadata, particularly the ontologies on which the emerging chemical Semantic Web will depend. Among our conclusions, we present our choice of the "grand challenges" for the preservation and sharing of chemical information

Southampton (e-Prints Soton)