121,804 research outputs found

    Defect Prediction on the Hardware Repository - A Case Study on the OpenRISC1000 Project

    Get PDF
    Software defect prediction is one of the most active research topics in the area of mining software engineering data. The software engineering data sources like the code repositories and the bug databases contain rich information about software development history. Mining these data can guide software developers for future development activities and help managers to improve the development process. Nowadays, the computer-engineering field has rapidly evolved from 1972 until present times to the modern chip design, which looks superficially and very much like software design. Hence, the main objective of this thesis is to check whether it would be possible to apply software defect prediction techniques on hardware repositories. In this thesis, we have applied various data mining methods (e.g., linear regression, logistic regression, random forests, and entropy) to predict the post-release bugs of OpenRISC 1000 projects. We have conducted two types of studies: classification (predicting buggy and non-buggy files) and ranking (predicting the buggiest files). In particular, the classification studies show promising results with an average precision and recall of up to 74% and 70% for projects written in Verilog and close to 100% for projects written in C

    IMPLEMENTASI DATA MINING PADA PENJUALAN MENGGUNAKAN ALGORITMA APRIORI

    Get PDF
      The development of information technology, the application of data mining has been applied in various fields, such as fields in business or trade. The purpose of this study is to find a comprehensive way unorganized sales data so that it can provide information on product stock at PT Selatan Indobatam Mandiri and find out how the company can manage transaction data information easily so it can be used as company information by implementing data mining using a priori algorithms via Tanagra. Tanagra is free software for academic and research use. This course covers many data mining techniques, from data mining analysis to statistical learning, from machine learning to databases

    ASTRONOMICAL PLATE ARCHIVES AND SUPERMASSIVE BLACK HOLE BINARIES

    Get PDF
    The recent extensive digitisation of astronomical photographic plate archives, the development of new dedicated software and the use of powerful computers have for the first time enabled effective data mining in extensive plate databases, with wide applications in various fields of recent astrophysics. As an example, analyses of supermassive binary black holes (binary blazars) require very long time intervals (50 years and more), which cannot be provided by other data sources. Examples of data obtained from data mining in plate archives are presented and briefly discussed

    ASTRONOMICAL PLATE ARCHIVES AND SUPERMASSIVE BLACK HOLE BINARIES

    Get PDF
    The recent extensive digitisation of astronomical photographic plate archives, the development of new dedicated software and the use of powerful computers have for the first time enabled effective data mining in extensive plate databases, with wide applications in various fields of recent astrophysics. As an example, analyses of supermassive binary black holes (binary blazars) require very long time intervals (50 years and more), which cannot be provided by other data sources. Examples of data obtained from data mining in plate archives are presented and briefly discussed

    BAGEL2: mining for bacteriocins in genomic data

    Get PDF
    Mining bacterial genomes for bacteriocins is a challenging task due to the substantial structure and sequence diversity, and generally small sizes, of these antimicrobial peptides. Major progress in the research of antimicrobial peptides and the ever-increasing quantities of genomic data, varying from (un)finished genomes to meta-genomic data, led us to develop the significantly improved genome mining software BAGEL2, as a follow-up of our previous BAGEL software. BAGEL2 identifies putative bacteriocins on the basis of conserved domains, physical properties and the presence of biosynthesis, transport and immunity genes in their genomic context. The software supports parameter-free, class-specific mining and has high-throughput capabilities. Besides building an expert validated bacteriocin database, we describe the development of novel Hidden Markov Models (HMMs) and the interpretation of combinations of HMMs via simple decision rules for prediction of bacteriocin (sub-)classes. Furthermore, the genetic context is automatically annotated based on (combinations of) PFAM domains and databases of known context genes. The scoring system was fine-tuned using expert knowledge on data derived from screening all bacterial genomes currently available at the NCBI. BAGEL2 is freely accessible at http://bagel2.molgenrug.nl

    Brief Introduction of Data Mining and Data Warehousing

    Get PDF
    Over the past two decades there has been a huge increase in the amount of data being stored in databases as well as the number of database applications in business and the scientific domain. This explosion in the amount of electronically stored data was accelerated by the success of the relational model for storing data and the development and maturing of data retrieval and manipulation technologies. While technology for storing the data developed fast to keep up with the demand, little stress was paid to developing software for analysing the data until recently when companies realized that hidden within these masses of data was a resource that was being ignored. The huge amount of stored data contains knowledge about a number of aspects of their business waiting to be harnessed and used for more effective business decision support. Database Management Systems used to manage these data sets at present only allow the user to access information explicitly present in the databases i.e. the data. Contained implicitly within this data is knowledge about a number of aspects of their business waiting to be harnessed and used for more effective business decision support. This extraction of knowledge from large data sets is called Data Mining or Knowledge Discovery in Databases and is defined as the non-trivial extraction of implicit, previously unknown and potentially useful information from data. The obvious benefit of Data Mining has resulted in a lot of resources being directed towards its development

    The Digital Puglia Project: An Active Digital Library of Remote Sensing Data

    Get PDF
    The growing need of software infrastructure able to create, maintain and ease the evolution of scientific data, promotes the development of digital libraries in order to provide the user with fast and reliable access to data. In a world that is rapidly changing, the standard view of a digital library as a data repository specialized to a community of users and provided with some search tools is no longer tenable. To be effective, a digital library should be an active digital library, meaning that users can process available data not just to retrieve a particular piece of information, but to infer new knowledge about the data at hand. Digital Puglia is a new project, conceived to emphasize not only retrieval of data to the client's workstation, but also customized processing of the data. Such processing tasks may include data mining, filtering and knowledge discovery in huge databases, compute-intensive image processing (such as principal component analysis, supervised classification, or pattern matching) and on demand computing sessions. We describe the issues, the requirements and the underlying technologies of the Digital Puglia Project, whose final goal is to build a high performance distributed and active digital library of remote sensing data

    Applying data mining to software development projects : a case study

    Get PDF
    One of the main challenges that the project managers have during the building process of a software development project (SDP) is to optimise the values of the parameters that measure the viability of the final process. The accomplishment of this task, something that was not easy at the beginning, was helped with the appearance of dynamic models and simulation environments. The application of data mining techniques to the managing of Software Development Projects (SDP) is not an uncommon phenomenon, as in any other productive process that generates information in the way of input data and output variables. In this paper, we present and analyze the results obtained from a tool, developed by the authors, based on a Knowledge Discovery in Databases (KDD) technique. One of the most important contributions of these techniques to the software engineering field is the possibility of improving the management process of an SDP. The purpose is to provide accurate decision rules in order to help the project manager to take decisions during the development

    TLAD 2011 Proceedings:9th international workshop on teaching, learning and assesment of databases (TLAD)

    Get PDF
    This is the ninth in the series of highly successful international workshops on the Teaching, Learning and Assessment of Databases (TLAD 2011), which once again is held as a workshop of BNCOD 2011 - the 28th British National Conference on Databases. TLAD 2011 is held on the 11th July at Manchester University, just before BNCOD, and hopes to be just as successful as its predecessors.The teaching of databases is central to all Computing Science, Software Engineering, Information Systems and Information Technology courses, and this year, the workshop aims to continue the tradition of bringing together both database teachers and researchers, in order to share good learning, teaching and assessment practice and experience, and further the growing community amongst database academics. As well as attracting academics from the UK community, the workshop has also been successful in attracting academics from the wider international community, through serving on the programme committee, and attending and presenting papers.Due to the healthy number of high quality submissions this year, the workshop will present eight peer reviewed papers. Of these, six will be presented as full papers and two as short papers. These papers cover a number of themes, including: the teaching of data mining and data warehousing, databases and the cloud, and novel uses of technology in teaching and assessment. It is expected that these papers will stimulate discussion at the workshop itself and beyond. This year, the focus on providing a forum for discussion is enhanced through a panel discussion on assessment in database modules, with David Nelson (of the University of Sunderland), Al Monger (of Southampton Solent University) and Charles Boisvert (of Sheffield Hallam University) as the expert panel

    TLAD 2011 Proceedings:9th international workshop on teaching, learning and assesment of databases (TLAD)

    Get PDF
    This is the ninth in the series of highly successful international workshops on the Teaching, Learning and Assessment of Databases (TLAD 2011), which once again is held as a workshop of BNCOD 2011 - the 28th British National Conference on Databases. TLAD 2011 is held on the 11th July at Manchester University, just before BNCOD, and hopes to be just as successful as its predecessors.The teaching of databases is central to all Computing Science, Software Engineering, Information Systems and Information Technology courses, and this year, the workshop aims to continue the tradition of bringing together both database teachers and researchers, in order to share good learning, teaching and assessment practice and experience, and further the growing community amongst database academics. As well as attracting academics from the UK community, the workshop has also been successful in attracting academics from the wider international community, through serving on the programme committee, and attending and presenting papers.Due to the healthy number of high quality submissions this year, the workshop will present eight peer reviewed papers. Of these, six will be presented as full papers and two as short papers. These papers cover a number of themes, including: the teaching of data mining and data warehousing, databases and the cloud, and novel uses of technology in teaching and assessment. It is expected that these papers will stimulate discussion at the workshop itself and beyond. This year, the focus on providing a forum for discussion is enhanced through a panel discussion on assessment in database modules, with David Nelson (of the University of Sunderland), Al Monger (of Southampton Solent University) and Charles Boisvert (of Sheffield Hallam University) as the expert panel
    corecore