16,279 research outputs found

    A framework for exploration and cleaning of environmental data : Tehran air quality data experience

    Get PDF
    Management and cleaning of large environmental monitored data sets is a specific challenge. In this article, the authors present a novel framework for exploring and cleaning large datasets. As a case study, we applied the method on air quality data of Tehran, Iran from 1996 to 2013. ; The framework consists of data acquisition [here, data of particulate matter with aerodynamic diameter ≤10 µm (PM10)], development of databases, initial descriptive analyses, removing inconsistent data with plausibility range, and detection of missing pattern. Additionally, we developed a novel tool entitled spatiotemporal screening tool (SST), which considers both spatial and temporal nature of data in process of outlier detection. We also evaluated the effect of dust storm in outlier detection phase.; The raw mean concentration of PM10 before implementation of algorithms was 88.96 µg/m3 for 1996-2013 in Tehran. After implementing the algorithms, in total, 5.7% of data points were recognized as unacceptable outliers, from which 69% data points were detected by SST and 1% data points were detected via dust storm algorithm. In addition, 29% of unacceptable outlier values were not in the PR.  The mean concentration of PM10 after implementation of algorithms was 88.41 µg/m3. However, the standard deviation was significantly decreased from 90.86 µg/m3 to 61.64 µg/m3 after implementation of the algorithms. There was no distinguishable significant pattern according to hour, day, month, and year in missing data.; We developed a novel framework for cleaning of large environmental monitored data, which can identify hidden patterns. We also presented a complete picture of PM10 from 1996 to 2013 in Tehran. Finally, we propose implementation of our framework on large spatiotemporal databases, especially in developing countries

    Applying data mining techniques to determine important parameters in chronic kidney disease and the relations of these parameters to each other.

    Get PDF
    Introduction: Chronic kidney disease (CKD) includes a wide range of pathophysiological processes which will be observed along with abnormal function of kidneys and progressive decrease in glomerular filtration rate (GFR). According to the definition decreasing GFR must have been present for at least three months. CKD will eventually result in end-stage kidney disease. In this process different factors play role and finding the relations between effective parameters in this regard can help to prevent or slow progression of this disease. There are always a lot of data being collected from the patients' medical records. This huge array of data can be considered a valuable source for analyzing, exploring and discovering information. Objectives: Using the data mining techniques, the present study tries to specify the effective parameters and also aims to determine their relations with each other in Iranian patients with CKD. Material and Methods: The study population includes 31996 patients with CKD. First, all of the data is registered in the database. Then data mining tools were used to find the hidden rules and relationships between parameters in collected data. Results: After data cleaning based on CRISP-DM (Cross Industry Standard Process for Data Mining) methodology and running mining algorithms on the data in the database the relationships between the effective parameters was specified. Conclusion: This study was done using the data mining method pertaining to the effective factors on patients with CKD

    Establishment of a integrative multi-omics expression database CKDdb in the context of chronic kidney disease (CKD)

    Get PDF
    Complex human traits such as chronic kidney disease (CKD) are a major health and financial burden in modern societies. Currently, the description of the CKD onset and progression at the molecular level is still not fully understood. Meanwhile, the prolific use of high-throughput omic technologies in disease biomarker discovery studies yielded a vast amount of disjointed data that cannot be easily collated. Therefore, we aimed to develop a molecule-centric database featuring CKD-related experiments from available literature publications. We established the Chronic Kidney Disease database CKDdb, an integrated and clustered information resource that covers multi-omic studies (microRNAs, genomics, peptidomics, proteomics and metabolomics) of CKD and related disorders by performing literature data mining and manual curation. The CKDdb database contains differential expression data from 49395 molecule entries (redundant), of which 16885 are unique molecules (non-redundant) from 377 manually curated studies of 230 publications. This database was intentionally built to allow disease pathway analysis through a systems approach in order to yield biological meaning by integrating all existing information and therefore has the potential to unravel and gain an in-depth understanding of the key molecular events that modulate CKD pathogenesis

    Development of a New Model for Population Prediction in Anambra State, Nigeria

    Get PDF
    This paper was motivated by the desire to develop a new model for National Population Commission in Anambra state for better prediction. Population demographic analysis are calculated with specified formula using non-free software, this makes it difficult for researchers to access data, Census also Census figures are calculated without any mining of patterns or trends of existing demographic information. These are the gaps this research paper covers.The system is a desktop application based on states and behaviours of objects. It was approached with the Object-Oriented Analysis and Design Methodology and implemented with Visual Basic. Net. The developed model accepts given probabilistic demographic information to make prediction. Data-mining warehouse was also developed to make population distribution decision, find hidden and relationship using SQLite database. This desktop application requires only data as input and not formulae. The system brings a multi-line user input interface which allows users to input five (5) or more data input (observations). These observations are employed by the new model to predict future values of demographic information. The software uses voice instruction to tell users how use the software.The resultof this paper is functional predictive demographic information in Anambra State

    The cultural, ethnic and linguistic classification of populations and neighbourhoods using personal names

    Get PDF
    There are growing needs to understand the nature and detailed composition of ethnicgroups in today?s increasingly multicultural societies. Ethnicity classifications areoften hotly contested, but still greater problems arise from the quality and availabilityof classifications, with knock on consequences for our ability meaningfully tosubdivide populations. Name analysis and classification has been proposed as oneefficient method of achieving such subdivisions in the absence of ethnicity data, andmay be especially pertinent to public health and demographic applications. However,previous approaches to name analysis have been designed to identify one or a smallnumber of ethnic minorities, and not complete populations.This working paper presents a new methodology to classify the UK population andneighbourhoods into groups of common origin using surnames and forenames. Itproposes a new ontology of ethnicity that combines some of its multidimensionalfacets; language, religion, geographical region, and culture. It uses data collected atvery fine temporal and spatial scales, and made available, subject to safeguards, at thelevel of the individual. Such individuals are classified into 185 independentlyassigned categories of Cultural Ethnic and Linguistic (CEL) groups, based on theprobable origins of names. We include a justification for the need of classifyingethnicity, a proposed CEL taxonomy, a description of how the CEL classification wasbuilt and applied, a preliminary external validation, and some examples of current andpotential applications

    Smart Health Predicting System Using Data Mining

    Get PDF
    An overview of the data mining techniques with its applications, medical, and educational aspects of Clinical Predictions. In medical and health care areas, due to regulations and due to the availability of computers, a large amount of data is becoming available. Such a large amount of data cannot be processed by humans in a short time to make diagnosis, and treatment schedules. A major objective is to evaluate datamining techniques in clinical and health care applications to develop accurate decisions. It also gives a detailed discussion of medical data mining techniques which can improve various aspects of Clinical Predictions. It is a new powerful technology which is of high interest incomputer world. It is a sub field of computer science that uses already existing data in different databases to transform it into new researches and results. It makes use of machine learning and database management to extract new patterns from large datasets and the knowledge associated with these patterns. The actual task is to extract data by automatic orsemi- automatic means. The different parameters included in data mining include clustering, forecasting, path analysis and predictive analysis. It might have happened so many times that you or someone yours need doctors help immediately, but they are not available due to some reason. The Health Prediction system is an end user support and online consultation project. Here we propose a system that allows users to get instant guidance on their health issues through an intelligent health care system online. The system is fed with various symptoms and the disease/illness associated with those systems. The system allows user to share their symptoms and issues. It then processes userssymptoms to check for various illness that could be associated with it. Here we use some intelligent data mining techniques to guess the most accurate illness that could be associated with patient’s symptoms. If the system is not able to provide suitable results, it informs the user about the type of disease or disorder it feels user’s symptoms are associated with. If users symptoms do not exactly match any disease in our database

    Optimising Health Emergency Resource Management from Multi-Model Databases

    Get PDF
    The health care sector is one of the most sensitive sectors in our society, and it is believed that the application of specific and detailed database creation and design techniques can improve the quality of patient care. In this sense, better management of emergency resources should be achieved. The development of a methodology to manage and integrate a set of data from multiple sources into a centralised database, which ensures a high quality emergency health service, is a challenge. The high level of interrelation between all of the variables related to patient care will allow one to analyse and make the right strategic decisions about the type of care that will be needed in the future, efficiently managing the resources involved in such care. An optimised database was designed that integrated and related all aspects that directly and indirectly affected the emergency care provided in the province of Jaén (city of Jaén, Andalusia, Spain) over the last eight years. Health, social, economic, environmental, and geographical information related to each of these emergency services was stored and related. Linear and nonlinear regression algorithms were used: support vector machine (SVM) with linear kernel and generated linear model (GLM), and the nonlinear SVM with Gaussian kernel. Predictive models of emergency demand were generated with a success rate of over 90%
    • …
    corecore