200 research outputs found

    GIHAT: An Efficient Prediction Technique for Measure for Diabetes Mellitus

    Get PDF
    The medical service industry is a consistently developing field, producing trillions of information consistently. The modernization of the area has an immediate association with this incremental extent. These acquired informational collections are somewhat organized however for the most part unstructured in nature. These acquired information must be prepared with most extreme care to determine finish usable examples for subjective and prescient investigations. These gigantic records of information, in the wake of handling, when utilized, will turn out to be very unpredictable. Diabetes is a lifetime disease marked by elevated levels of sugar in the blood. It is the second leading cause of sightlessness and renal disease worldwide. Sort 2 diabetes mellitus (S2DM) is genuine and expensive metabolic illness that is a developing worries among peoples .S2DM is related with various comorbid conditions that can prompt negative patient results. Comorbid endless torment is extremely basic in S2DM because of the nearness of diabetic neuropathy and musculoskeletal conditions that are related with delayed hyperglycemia. This Paper using General Integrated High Availability Transaction (GIHAT) algorithm concentrates on the causes, sorts, and factors influencing DM (diabetes mellitus), preventive measures, and treatment of diabetes other than those directly associated with Diabetic Patients structured and unstructured data-sets .This algorithm executed in “R” Programming used for statistical analysis which provides the accurate results comparing existing algorithms

    A data mining algorithm for determination of influential factors on the hospitalization of patients subject to chronic obstructive pulmonary disease

    Get PDF
    Background: The present study is on the development of a data mining algorithm for finding the influential factors on the hospitalization of patients subject to chronic obstructive pulmonary disease.Materials and Methods: This is a descriptive analytical study conducted cross sectionally in 2017 on a research community of 150 people with disease symptoms referred to clinics and hospitals across Tehran (Iran). The people were surveyed by a self-designed questionnaire, including queries on life style and family information. The sampling was simple intuitive from previously published studies. The modeling of the data was based on the CRISP method. The C5 decision tree algorithm was used and the data was analyzed by RapidMiner software. Results: The common symptoms of the patients were found to be shortness of breath, cough, chest pain, sputum, continuous cold, and cyanogens. Besides, the family history, smoking, and exposure to allergic agents were other influential factors on the disease. After accomplishment of this study, the results were consulted with the experts of the field.Conclution: It is concluded that data mining can be applied for excavation of knowledge from the gathered data and for determination of the effective factors on patient conditions. Accordingly, this model can successfully predict the disease status of any patient from its symptoms

    Review on Heart Disease Prediction System using Data Mining Techniques

    Get PDF
    Data mining is the computer based process of analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools predict future trends, allowing business to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally taken much time consuming to resolve. The huge amounts of data generated for prediction of heart disease are too complex and voluminous to be processed and analyzed by traditional methods. Data mining provides the methodology and technology to transform these mounds of data into useful information for decision making. By using data mining techniques it takes less time for the prediction of the disease with more accuracy. In this paper we survey different papers in which one or more algorithms of data mining used for the prediction of heart disease. Result from using neural networks is nearly 100% in one paper [10] and in [6]. So that the prediction by using data mining algorithm given efficient results. Applying data mining techniques to heart disease treatment data can provide as reliable performance as that achieved in diagnosing heart disease

    Prediksi Website Pemancing Informasi Penting Phising Menggunakan Support Vector Machine (SVM)

    Get PDF
    The development of information and communication technologies, especially the Internet, have an impact in all sectors of human life with exception in the banking and financial sectors in addition to a positive impact to make essier customer in the transaction process that can do anytime and anywhere without being limited by space and time using the internet, it also brings great potential against parties not responsible for the theft of critical data and information, one of them  with  phishing  techniques,  so  the  method  for  detecting  a  phishing  site requires serious attention. In this study the authors try to give an overview of the most accurate methods to detect phishing websites to compare three methods such as Support Vector Machine, Naïve Bayes, and Decision Tree using public datasets from  the  UCI  Machine  Learning  Repository  (www.uci.edu)  optimized  with feature selection and processed using RapidMiner program that showed Decision Tree has a accuracy rate of 91.84%, Naïve Bayes method amounted to 74.07% and  Support  Vector  Machine  by 92.34%. Hereby declare  that  the  method  of Support Vector Machine has the highest degree of accuracy.   Keyword: Decision Tree, Naïve Bayes, Phishing, Support Vector Machin

    Predictive Analytics on Product Sales at Heva Inc. Using K – Means Method

    Get PDF
    Prediction is the process of estimating something that is most likely to happen in the future based on previous and current knowledge that is owned, with the goal of minimizing the error. Prediction allows people to recognize and then solve difficulties that are occurring or are expected to arise.This study began with preparation, literature review, data collection, and knowledge discovery in databases (KDD). One of the processes is data mining using the K – Means method, which is critical for obtaining the research's results and conclusions. This research also uses the RapidMiner application as a comparison of the results with the results obtained by python coding.By using 4 clusters, products were categorized into 4 labels, namely very good products, good products, bad products, and very bad products.  The research resulted in 11 products in the bad product category, 12 products in the good product category, 10 products in the very good category, and 18 products in the very good product category.  The very good product label was further clarified with visualization to show the best time to restock each recommended product

    A Multiple Classifier Approach to Improving Classification Accuracy Using Big Data Analytics Tool

    Get PDF
    At the heart of analytics is data. Data analytics has become an indispensable part of intelligent decision making in the current digital scenario. Applications today generate a large amount of data. Associated with the data deluge, data analytics field has seen an onset of a large number of open source tools and software to expedite large scale analytics. Data science community is robust with numerous options of tools available for storing, processing and analysing data. This research paper makes use of KNIME, one of the popular tools for big data analytics, to perform an investigative study of the key classification algorithms in machine learning. The comparative study shows that the classification accuracy can be enhanced by using a combination of the learning techniques and proposes an ensemble technique on publicly available datasets

    Automatic Identification of Interestingness in Biomedical Literature

    Get PDF
    This thesis presents research on automatically identifying interestingness in a graph of semantic predications. Interestingness represents a subjective quality of information that represents its value in meeting a user\u27s known or unknown retrieval needs. The perception of information as interesting requires a level of utility for the user as well as a balance between significant novelty and sufficient familiarity. It can also be influenced by additional factors such as unexpectedness or serendipity with recent experiences. The ability to identify interesting information facilitates the development of user-centered retrieval, especially in information semantic summarization and iterative, step-wise searching such as in discovery browsing systems. Ultimately, this allows biomedical researchers to more quickly identify information of greatest potential interest to them, whether expected or, perhaps more importantly, unexpected. Current discovery browsing systems use iterative information retrieval to discover new knowledge - a process that requires finding relevant co-occurring topics and relationships through consistent human involvement to identify interesting concepts. Although interestingness is subjective, this thesis identifies computable quantities in semantic data that correlate to interestingness in user searches. We compare several statistical and rule-based models correlating graph data extracted from semantic predications with concept interestingness as demonstrated in PubMed queries. Semantic predications represent scientific assertions extracted from all of the biomedical literature contained in the MEDLINE database. They are of the form, subject-predicate-object . Predications can easily be represented as graphs, where subjects and objects are nodes and predicates form edges. A graph of predications represents the assertions made in the citations from which the predications were extracted. This thesis uses graph metrics to identify features from the predication graph for model generation. These features are based on degree centrality (connectedness) of the seed concept node and surrounding nodes; they are also based on frequency of occurrence measures of the edges between the seed concept and surrounding nodes as well as between the nodes surrounding the seed concept and the neighbors of those nodes. A PubMed query log is used for training and testing models for interestingness. This log contains a set of user searches over a 24-hour period, and we make the assumption that co-occurrence of concepts with the seed concept in searches demonstrates interestingness of that concept with regard to the seed concept. Graph generation begins by the selection of a set of all predications containing the seed concept from the Semantic Medline database (our training dataset uses Alzheimer\u27s disease as the seed concept). The graph is built with the seed concept as the central node. Additional nodes are added for each concept that occurs with the seed concept in the initial predications and an edge is created for each instance of a predication containing the two concepts. The edges are labeled with the specific predicate in the predication. This graph is extended to include additional nodes within two leaps from the seed concept. The concepts in the PubMed query logs are normalized to UMLS concepts or Entrez Gene symbols using MetaMap. Token-based and user-based counts are collected for each co-occurring term. These measures are combined to create a weighted score which is used to determine three potential thresholds of interestingness based on deviation from the mean score. The concepts that are included in both the graph and the normalized log data are identified for use in model training and testing
    corecore