12 research outputs found

    Improving Database Quality through Eliminating Duplicate Records

    Get PDF
    Redundant or duplicate data are the most troublesome problem in database management and applications. Approximate field matching is the key solution to resolve the problem by identifying semantically equivalent string values in syntactically different representations. This paper considers token-based solutions and proposes a general field matching framework to generalize the field matching problem in different domains. By introducing a concept of String Matching Points (SMP) in string comparison, string matching accuracy and efficiency are improved, compared with other commonly-applied field matching algorithms. The paper discusses the development of field matching algorithms from the developed general framework. The framework and corresponding algorithm are tested on a public data set of the NASA publication abstract database. The approach can be applied to address the similar problems in other databases

    DATA PUBLICATION IN THE OPEN ACCESS INITIATIVE

    Get PDF
    The ‘Berlin Declaration’ was published in 2003 as a guideline to policy makers to promote the Internet as a functional instrument for a global scientific knowledge base. Because knowledge is derived from data, the principles of the ‘Berlin Declaration’ should apply to data as well. Today, access to scientific data is hampered by structural deficits in the publication process. Data publication needs to offer authors an incentive to publish data through long-term repositories. Data publication also requires an adequate licence model that protects the intellectual property rights of the author while allowing further use of the data by the scientific community

    Development of a Prototype for Critical Disease Predictions using Data Mining

    Get PDF
    The goal of this paper is to present breast cancer prototype model along with the prediction of heart diseases by employing data mining techniques. The data used in the study had been retrieved from Public-Use Data, which is available online. The data comprised of 699 and 909 records for breast cancer and heart disease respectively. For data prediction and mining, C4.5 and C5.0, which are decision tree algorithms, were used on the data, used in the study. The results of both data sets using both algorithms were also compared. The paper also outlines the significance of evidence based medicine, which is the novel and innovative approach in healthcare decision making process [5]. It is essential that the clinical decisions are supported and based on scientific evidence, which ensures that they are sound and effective decisions. This paper also will depict the importance of data mining in modern healthcare

    Discussing the Role of Classification Algorithms in Clinical Predictions with help of Case Studies

    Get PDF
    This paper discuss about the important role of classification algorithms in clinical predictions , two case studies one for breast cancer and other for heart disease prediction with help of classification data mining techniques is presented in this paper. Online freely accessible data is used for the said case studies. Used data is publicly available data on internet consisting of 909 records for heart disease and 699 for breast cancer. C4.5 and the C5.0 Two well-known decision tree algorithms used to get the rules for predictions, and these rules used for improving the quality of an open source Pathology Management System based on Care2x.Performances of these algorithms are also compared. This Paper will further discuss about the importance of open source software in healthcare as well as how a pathology management system can adopt Evidence Based Medicine (EBM). EBM is a new and important approach which can greatly improve decision making in health care. EBM's task is to prevent, diagnose and medicate diseases using medical evidence [5].Clinical decisions must be based on scientific evidence that demonstrates effectiveness. This paper is basically extension of our previous work ‘A Prototype of Cancer/Heart Disease Prediction Model Using Data Mining’

    Case-Based-Reasoning System for Feature Selection and Diagnosing Disease; Case Study: Asthma

    Get PDF
    Asthma is a chronic informatory disease of the respiratory canals in which it has not become obvious what is the reason for the reports argumentation on the ground of asthma prevalence. In the present research, the purpose would be to design a case-based-reasoning (CBR) model in order to assist a physician to diagnose the type of disease and also the needed therapy. At first for designing this system, the disease variables were discriminated and were at the patients' disposal as a questionnaire, and after gathering the relevant data (CBR) algorithm was rendered on the data which led to the asthma diagnosis. The system was tested on 325 asthmatic and non asthmatic adult cases and was accessed with eighty percent accuracy. The consequences were promising. With regard to the fact that the factors of the disease are different in various countries, This study was performed in order to determine risk factors for asthma in Iranian society and the results of research showed that the most important variables of asthma disease in Iran are symptoms heperresponsivity, frequency of cough, cough. Key words: data mining, case based reasoning, asthma, diagnosis

    Die Vergabe von DOI-Namen fĂŒr Sozialund Wirtschaftsdaten Serviceleistungen der Registrierungsagentur da|ra

    Get PDF
    Das GESIS Leibniz-Institut fĂŒr Sozialwissenschaften und das ZBW LeibnizInformationszentrum fĂŒr Wirtschaftswissenschaften betreiben in Kooperation mit DataCite, der internationalen Initiative zur Verbesserung des Zugangs zu Forschungsdaten, einen DOI-Registrierungsservice fĂŒr Sozial- und Wirtschaftsdaten. Mit dieser Infrastruktur wird eine wichtige Voraussetzung fĂŒr eine dauerhafte Identifizierung, Sicherung, Lokalisierung und schließlich eine verlĂ€ssliche Zitierbarkeit von Forschungsdaten aus den Sozial- und Wirtschaftswissenschaften geschaffen. Um die technischen und organisatorischen Lösungen fĂŒr die Vergabe von DOINamen zu testen, fĂŒhrte GESIS 2010 ein Pilotprojekt fĂŒr die Registrierung von sozialwissenschaftlichen Daten durch. Inzwischen sind 5200 Studien registriert und mehr als 7000 MetadatensĂ€tze in das Informationssystem aufgenommen worden. Dieser Beitrag beschreibt die technische und organisatorische Implementierung der Registrierungsagentur da|ra und zeigt, wie in der Etablierungsphase des Projektes das bereits existierende DOI-Registrierungssystem ab 2012 auch fĂŒr wirtschaftswissenschaftliche Forschungsdaten genutzt werden kann.Forschungsdaten, Datenzitation, Persistent Identifier, DOI-Namen

    Die Vergabe von DOI-Namen fĂŒr Sozial- und Wirtschaftsdaten: Serviceleistungen der Registrierungsagentur da|ra

    Full text link
    "Das GESIS Leibniz-Institut fĂŒr Sozialwissenschaften und das ZBW Leibniz- Informationszentrum fĂŒr Wirtschaftswissenschaften betreiben in Kooperation mit DataCite, der internationalen Initiative zur Verbesserung des Zugangs zu Forschungsdaten, einen DOI-Registrierungsservice fĂŒr Sozial- und Wirtschaftsdaten. Mit dieser Infrastruktur wird eine wichtige Voraussetzung fĂŒr eine dauerhafte Identifizierung, Sicherung, Lokalisierung und schließlich eine verlĂ€ssliche Zitierbarkeit von Forschungsdaten aus den Sozial- und Wirtschaftswissenschaften geschaffen. Um die technischen und organisatorischen Lösungen fĂŒr die Vergabe von DOINamen zu testen, fĂŒhrte GESIS 2010 ein Pilotprojekt fĂŒr die Registrierung von sozialwissenschaftlichen Daten durch. Inzwischen sind 5200 Studien registriert und mehr als 7000 MetadatensĂ€tze in das Informationssystem aufgenommen worden. Dieser Beitrag beschreibt die technische und organisatorische Implementierung der Registrierungsagentur da|ra und zeigt, wie in der Etablierungsphase des Projektes das bereits existierende DOI-Registrierungssystem ab 2012 auch fĂŒr wirtschaftswissenschaftliche Forschungsdaten genutzt werden kann." [Autorenreferat

    Improving Quality of Lead Data of Customer Relationship Management in M-Commerce

    Get PDF
    Abstract: In the present growing trend of mobile computing, technologies, applications mainly focus on mobile ecommerce (m-commerce) and the mobile Web. As the mobile commerce market grows, Customer relationship management (CRM) is one of the major applications that incorporates the present marketing standard of relationship management (RM) and supports in acquisition, understanding requirements, and maintaining long-term relationships with customers. Expanding the CRM applications to mobile devices is increasingly becoming a major goal for organizations to optimize their workforce. To obtain a successful and desired outcome, quality of acquired data has a major impact on the organization's productivity. Hence the quality of data constitute the basis for major decisions of organization's operational and strategic levels which intern leads to positive growth of organizations. In order to tackle the above issue, we have proposed a novel quality model for customer lead data. Some of the activities are defined to analyse the performance of system with respect to time, data quality, reliability. This framework needs a feasible implementation module in the near future for improving quality of CRM lead data in m-commerce

    An investigation into the problems of ineffective control of invasive plants in selected areas of South Africa : a case study of Campuloclinium macrocephalum (pompom weed)

    Get PDF
    Interference of natural environment by invasive plants is a global concern. In South Africa and in particular Gauteng Province, interference of natural land by invasive plants that originated from other countries has been an endemic problem. These invasive plants pose a threat to biodiversity as a result of its wild and wide dispersion rate where it spreads into neighbouring Provinces such as Mpumalanga, Limpopo, North West and the Free State. Pompom weed is aggressive to control and can spread by means of both wind and water. This research project investigates problems associated with ineffective control of invasive plants in general and pompom weed in particular. State organs, Non Governmental Organisations (NGOs) and farming communities were identified as relevant respondents in this study. Three hundred (300) validated questionnaires were distributed to these stakeholders and 286 were adequately completed and received. These were analysed and the data interpreted. Results obtained showed that lack of coordination and teamwork from all stakeholders are responsible for ineffective control of invasive plants in the country. The use of biological control was recommended for the control and eradication of the invasive plants.Environmental SciencesM.A. (Environmental Management
    corecore