12 research outputs found

    Application of machine learning for hematological diagnosis

    Full text link
    Quick and accurate medical diagnosis is crucial for the successful treatment of a disease. Using machine learning algorithms, we have built two models to predict a hematologic disease, based on laboratory blood test results. In one predictive model, we used all available blood test parameters and in the other a reduced set, which is usually measured upon patient admittance. Both models produced good results, with a prediction accuracy of 0.88 and 0.86, when considering the list of five most probable diseases, and 0.59 and 0.57, when considering only the most probable disease. Models did not differ significantly from each other, which indicates that a reduced set of parameters contains a relevant fingerprint of a disease, expanding the utility of the model for general practitioner's use and indicating that there is more information in the blood test results than physicians recognize. In the clinical test we showed that the accuracy of our predictive models was on a par with the ability of hematology specialists. Our study is the first to show that a machine learning predictive model based on blood tests alone, can be successfully applied to predict hematologic diseases and could open up unprecedented possibilities in medical diagnosis.Comment: 15 pages, 6 figure

    Automated prediction of mastitis infection patterns in dairy herds using machine learning

    Get PDF
    © 2020, The Author(s). Mastitis in dairy cattle is extremely costly both in economic and welfare terms and is one of the most significant drivers of antimicrobial usage in dairy cattle. A critical step in the prevention of mastitis is the diagnosis of the predominant route of transmission of pathogens into either contagious (CONT) or environmental (ENV), with environmental being further subdivided as transmission during either the nonlactating “dry” period (EDP) or lactating period (EL). Using data from 1000 farms, random forest algorithms were able to replicate the complex herd level diagnoses made by specialist veterinary clinicians with a high degree of accuracy. An accuracy of 98%, positive predictive value (PPV) of 86% and negative predictive value (NPV) of 99% was achieved for the diagnosis of CONT vs ENV (with CONT as a “positive” diagnosis), and an accuracy of 78%, PPV of 76% and NPV of 81% for the diagnosis of EDP vs EL (with EDP as a “positive” diagnosis). An accurate, automated mastitis diagnosis tool has great potential to aid non-specialist veterinary clinicians to make a rapid herd level diagnosis and promptly implement appropriate control measures for an extremely damaging disease in terms of animal health, productivity, welfare and antimicrobial use

    A Hybrid Random Forest based Support Vector Machine Classification Supplemented by Boosting

    Get PDF
    This paper presents an approach to classify remote sensed data using a hybrid classifier. Random forest, Support Vector machines and boosting methods are used to build the said hybrid classifier. The central idea is to subdivide the input data set into smaller subsets and classify individual subsets. The individual subset classification is done using support vector machines classifier. Boosting is used at each subset to evaluate the learning by using a weight factor for every data item in the data set. The weight factor is updated based on classification accuracy. Later the final outcome for the complete data set is computed by implementing a majority voting mechanism to the individual subset classification outcomes

    Non-invasive multi-modal human identification system combining ECG, GSR, and airflow biosignals

    Get PDF
    A huge amount of data can be collected through a wide variety of sensor technologies. Data mining techniques are often useful for the analysis of gathered data. This paper studies the use of three wearable sensors that monitor the electrocardiogram, airflow, and galvanic skin response of a subject with the purpose of designing an efficient multi-modal human identification system. The proposed system, based on the rotation forest ensemble algorithm, offers a high accuracy (99.6 % true acceptance rate and just 0.1 % false positive rate). For its evaluation, the proposed system was testing against the characteristics commonly demanded in a biometric system, including universality, uniqueness, permanence, and acceptance. Finally, a proof-of-concept implementation of the system is demonstrated on a smartphone and its performance is evaluated in terms of processing speed and power consumption. The identification of a sample is extremely efficient, taking around 200 ms and consuming just a few millijoules. It is thus feasible to use the proposed system on a regular smartphone for user identification.This work was supported by MINECO grant TIN2013- 46469-R (SPINY: Security and Privacy in the Internet of You) and CAM grant S2013/ICE-3095 (CIBERDINE: Cybersecurity, Data, and Risks)

    A Hybrid Random Forest based Support Vector Machine Classification Supplemented by Boosting

    Get PDF
    This paper presents an approach to classify remote sensed data using a hybrid classifier. Random forest, Support Vector machines and boosting methods are used to build the said hybrid classifier. The central idea is to subdivide the input data set into smaller subsets and classify individual subsets. The individual subset classification is done using support vector machines classifier. Boosting is used at each subset to evaluate the learning by using a weight factor for every data item in the data set. The weight factor is updated based on classification accuracy. Later the final outcome for the complete data set is computed by implementing a majority voting mechanism to the individual subset classification outcomes

    Predictive analytics applied to firefighter response, a practical approach

    Get PDF
    Time is a crucial factor for the outcome of emergencies, especially those that involve human lives. This paper looks at Lisbon’s firefighter’s occurrences and presents a model,based on city characteristics and climacteric data, to predict whether there will be an occurrence at a certain location, according to the weather forecasts. In this study three algorithms were considered, Logistic Regression, Decision Tree and Random Forest.Measured by the AUC, the best performant modelwasa random forestwith random under-sampling at 0.68. This model was well adjusted across the city and showed that precipitation and size of the subsection are themost relevant featuresin predicting firefighter’s occurrences.The work presented here has clear implications on the firefighter’s decision-makingregarding vehicle allocation, as now they can make an informed decision considering the predicted occurrences

    A Comparative Analysis of Machine Learning Models for Banking News Extraction by Multiclass Classification With Imbalanced Datasets of Financial News: Challenges and Solutions

    Get PDF
    Online portals provide an enormous amount of news articles every day. Over the years, numerous studies have concluded that news events have a significant impact on forecasting and interpreting the movement of stock prices. The creation of a framework for storing news-articles and collecting information for specific domains is an important and untested problem for the Indian stock market. When online news portals produce financial news articles about many subjects simultaneously, finding news articles that are important to the specific domain is nontrivial. A critical component of the aforementioned system should, therefore, include one module for extracting and storing news articles, and another module for classifying these text documents into a specific domain(s). In the current study, we have performed extensive experiments to classify the financial news articles into the predefined four classes Banking, Non-Banking, Governmental, and Global. The idea of multi-class classification was to extract the Banking news and its most correlated news articles from the pool of financial news articles scraped from various web news portals. The news articles divided into the mentioned classes were imbalanced. Imbalance data is a big difficulty with most classifier learning algorithms. However, as recent works suggest, class imbalances are not in themselves a problem, and degradation in performance is often correlated with certain variables relevant to data distribution, such as the existence in noisy and ambiguous instances in the adjacent class boundaries. A variety of solutions to addressing data imbalances have been proposed recently, over-sampling, down-sampling, and ensemble approach. We have presented the various challenges that occur with data imbalances in multiclass classification and solutions in dealing with these challenges. The paper has also shown a comparison of the performances of various machine learning models with imbalanced data and data balances using sampling and ensemble techniques. From the result, it’s clear that the performance of Random Forest classifier with data balances using the over-sampling technique SMOTE is best in terms of precision, recall, F-1, and accuracy. From the ensemble classifiers, the Balanced Bagging classifier has shown similar results as of the Random Forest classifier with SMOTE. Random forest classifier's accuracy, however, was 100% and it was 99% with the Balanced Bagging classifier

    Optimization of firefighter response with predictive analytics : practical application to Lisbon, Portugal

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceTime is a crucial factor for the outcome of emergencies, especially those that involve human lives. This paper looks at Lisbon’s firefighter’s occurrences and presents a model, based on city characteristics and climacteric data, to predict whether there will be an occurrence at a certain location, according to the weather forecasts. In this study three algorithms were considered, Logistic Regression, Decision Tree and Random Forest, as well as four techniques to balance the data – random over-sampling, SMOTE, random under-sampling and Near Miss –, which were compared to the baseline, the imbalanced data. Measured by the AUC, the best performant model was a random forest with random under-sampling at 0.68. This model was well adjusted across the city and showed that precipitation and size of the subsection are the most relevant features in predicting firefighter’s occurrences. The work presented here has clear implications on the firefighter’s decision-making regarding vehicle allocation, as now they can make an informed decision considering the predicted occurrences

    Rangeland degradation assessment using remote sensing and vegetation species.

    Get PDF
    Thesis (Ph.D.)-University of KwaZulu-Natal, Pietermaritzburg, 2011.The degradation of rangeland grass is currently one of the most serious environmental problems in South Africa. Increaser and decreaser grass species have been used as indicators to evaluate rangeland condition. Therefore, classifying these species and monitoring their relative abundance is an important step for sustainable rangelands management. Traditional methods (e.g. wheel point technique) have been used in classifying increaser and decreaser species over small geographic areas. These methods are regarded as being costly and time-consuming, because grasslands usually cover large expanses that are situated in isolated and inaccessible areas. In this regard, remote sensing techniques offer a practical and economical means for quantifying rangeland degradation over large areas. Remote sensing is capable of providing rapid, relatively inexpensive, and near-real-time data that could be used for classifying and monitoring species. This study advocates the development of techniques based on remote sensing to classify four dominant increaser species associated with rangeland degradation namely: Hyparrhenia hirta, Eragrostis curvula, Sporobolus africanus and Aristida diffusa in Okhombe communal rangeland, KwaZulu-Natal, South Africa. To our knowledge, no attempt has yet been made to discriminate and characterize the landscape using these species as indicators of the different levels of rangeland degradation using remote sensing. The first part of the thesis reviewed the problem of rangeland degradation in South Africa, the use of remote sensing (multispectral and hyperspectral) and their challenges and opportunities in mapping rangeland degradation using different indicators. The concept of decreaser and increaser species and how it can be used to map rangeland degradation was discussed. The second part of this study focused on exploring the relationship between vegetation species (increaser and decreaser species) and different levels of rangeland degradation. Results showed that, there is significant relationship between the abundance and distribution of different vegetation species and rangeland condition. The third part of the study aimed to investigate the potential use of hyperspectral remote sensing in discriminating between four increaser species using the raw field spectroscopy data and discriminant analysis as a classifier. The results indicate that the spectroscopic approach used in this study has a strong potential to discriminate among increaser species. These positive results prompted the need to scale up the method to airborne remote sensing data characteristics for the purpose of possible mapping of rangeland species as indicators of degradation. We investigated whether canopy reflectance spectra resampled to AISA Eagle resolution and random forest as a classification algorithm could discriminate between four increaser species. Results showed that hyperspectral data assessed with the random forest algorithm has the potential to accurately discriminate species with best overall accuracy. Knowledge on reduced key wavelength regions and spectral band combinations for successful discrimination of increaser species was obtained. These wavelengths were evaluated using the new WorldView imagery containing unique and strategically positioned band settings. The study demonstrated the potential of WorldView-2 bands in classifying grass at species level with an overall accuracy of 82% which is only 5% less than an overall accuracy achieved by AISA Eagle hyperspectral data. Overall, the study has demonstrated the potential of remote sensing techniques to classify different increaser species representing levels of rangeland degradation. In this regard, we expect that the results of this study can be used to support up-to-date monitoring system for sustainable rangeland management

    Remote sensing of impervious surface area and its interaction with land surface temperature variability in Pretoria, South Africa

    Get PDF
    Includes summary for chapter 1-5Pretoria, City of Tshwane (COT), Gauteng Province, South Africa is one of the cities that continues to experience rapid urban sprawl as a result of population growth and various land use, leading to the change of natural vegetation lands into impervious surface area (ISA). These are associated with transportation (paved roads, streets, highways, parking lots and sidewalks) and cemented buildings and rooftops, made of completely or partly impermeable artificial materials (e.g., asphalt, concrete, and brick). These landscapes influence the micro-climate (e.g., land surface temperature, LST) of Pretoria City as evidenced by the recent heat waves characterized by high temperature. Therefore, understanding ISA changes will provide information for city planning and environmental management. Conventionally, deriving ISA information has been dependent on field surveys and manual digitizing from hard copy maps, which is laborious and time-consuming. Remote sensing provides an avenue for deriving spatially explicit and timely ISA information. Numerous methods have been developed to estimate and retrieve ISA and LST from satellite imagery. There are limited studies focusing on the extraction of ISA and its relationship with LST variability across major cities in Africa. The objectives of the study were: (i) to explore suitable spectral indices to improve the delineation of built-up impervious surface areas from very high resolution multispectral data (e.g., WorldView-2), (ii) to examine exposed rooftop impervious surface area based on different colours, and their interplay with surface temperature variability, (iii) to determine if the spatio-temporal built-up ISA distribution pattern in relation to elevation influences urban heat island (UHI) extent using an optimal analytical scale and (iv) to assess the spatio-temporal change characteristics of ISA expansion using the corresponding surface temperature (LST) at selected administrative subplace units (i.e., local region scale). The study objectives were investigated using remote sensing data such as WorldView-2 (a very high-resolution multispectral sensor), medium resolution Landsat-5 Thematic Mapper (TM) and Landsat-8 OLI (Operational Land Imager) and TIRS (Thermal Infrared Sensor) at multiple scales. The ISA mapping methods used in this study can be grouped into two major categories: (i) the classification-based approach consisting of an object-based multi-class classification with overall accuracy ~90.4% and a multitemporal pixel-based binary classification. The latter yielded an area under the receiver operating characteristic curve (AUROC) = 0.8572 for 1995, AUROC = 0.8709 for 2005, AUROC = 0.8949 for 2015. (ii) the spectral index-based approach such as a new built-up extraction index (NBEI) derived in this study which yielded a high AUROC = ~0.82 compared to Built-up Area Index (BAI) (AUROC = ~0.73), Built-up spectral index (BSI) (AUROC = ~0.78), Red edge / Green Index (RGI) (AUROC = ~0.71) and WorldView-Built-up Index (WV-BI) (AUROC = ~0.67). The multitemporal built-up Index (BUI) also estimated with AUROC = 0.8487 for 1993, AUROC = 0.8302 for 2003, AUROC = 0.8790 for 2013. This indicates that all these methods employed, mapped ISA with high predictive accuracy from remote sensing data. Furthermore, the single-channel algorithm (SCA) was employed to retrieve LST from the thermal infrared (TIR) band of the Landsat images. The LST overall retrieval error for the entire study generally was quite low (overall root mean square RMSE ≤ ~1.48OC), which signifies that the Landsat TIR used provided good results for further analysis. In conclusion, the study showed the potential of multispectral remote sensing data to quantify ISA and evaluate its interaction with surface temperature variability despite the complex urban landscape in Pretoria. Also, using impervious surface LST as a complementary metric in this research helped to reveal urban heat island distribution and improve understanding of the spatio-temporal developing trend of urban expansion at a local spatial scale.Rapid urbanization because of population growth has led to the conversion of natural lands into large man-made landscapes which affects the micro-climate. Rooftop reflectivity, material, colour, slope, height, aspect, elevation are factors that potentially contribute to temperature variability. Therefore, strategically designed rooftop impervious surfaces have the potential to translate into significant energy, long-term cost savings, and health benefits. In this experimental study, we used the semi-automated Environment for Visualizing Images (ENVI) Feature Extraction that uses an object-based image analysis approach to classify rooftop based on colours from WorldView-2 (WV-2) image with overall accuracy ~90.4% and kappa coefficient ~0.87 respectively. The daytime retrieved surface temperatures were derived from 15m pan-sharpened Landsat 8 TIRS with a range of ~14.6OC to ~65OC (retrieval error = 0.38OC) for the same month covering Lynwood Ridge a residential area in Pretoria. Thereafter, the relationship between the rooftops and surface temperature (LST) were examined using multivariate statistical analysis. The results of this research reveal that the interaction between the applicable rooftop explanatory features (i.e., reflectance, texture measures and topographical properties) can explain over 22.10% of the variation in daytime rooftop surface temperatures. Furthermore, analysis of spatial distribution between mean daytime surface temperature and the residential rooftop indicated that the red, brown and green roof surfaces show lower LST values due to high reflectivity, high emissivity and low heat capacity during the daytime. The study concludes that in any study related to the spatial distribution of rooftop impervious surface area surface temperature, effect of various explanatory variables must be considered. The results of this experimental study serve as a useful approach for further application in urban planning and sustainable development.Evaluating changes in built-up impervious surface area (ISA) to understand the urban heat island (UHI) extent is valuable for governments in major cities in developing countries experiencing rapid urbanization and industrialization. This work aims at assessing built-up ISA spatio-temporal and influence on land surface temperature (LST) variability in the context of urban sprawl. Landsat-5 Thematic Mapper (TM) and Landsat-8 OLI (Operational Land Imager) and TIRS (Thermal Infrared Sensor) were used to quantify ISA using built-up Index (BUI) and spatio-temporal dynamics from 1993-2013. Thereafter using a suitable analytical sampling scale that represents the estimated ISA-LST, we examined its distribution in relation to elevation using the Shuttle Radar Topography Mission (SRTM) and also create Getis-Ord Gi* statistics hotspot maps to display the UHI extent. The BUI ISA extraction results show a high predictive accuracy with area under the receiver operating characteristic curve, AUROC = 0.8487 for 1993, AUROC = 0.8302 for 2003, AUROC = 0.8790 for 2013. The ISA spatio-temporal changes within ten years interval time frame results revealed a 14% total growth rate during the study year. Based on a suitable analytical scale (90x90) for the hexagon polygon grid, the majority of ISA distribution across the years was at an elevation range of between >1200m – 1600m. Also, Getis-Ord Gi* statistics hotspot maps revealed that hotspot regions expanded through time with a total growth rate of 19% and coldspot regions decreased by 3%. Our findings can represent useful information for policymakers by providing a scientific basis for sustainable urban planning and management.Over the years, rapid urban growth has led to the conversion of natural lands into large man-made landscapes due to enhanced political and economic growth. This study assessed the spatio-temporal change characteristics of impervious surface area (ISA) expansion using its surface temperature (LST) at selected administrative subplace units (i.e., local region scale). ISA was estimated for 1995, 2005 and 2015 from Landsat-5 Thematic Mapper (TM) and Landsat-8 OLI (Operational Land Imager) and TIRS (Thermal Infrared Sensor) images using a Random Forest (RF) algorithm. The spatio-temporal trends of ISA were assessed using an optimal analytical scale to aggregate ISA LST coupled with weighted standard deviational ellipse (SDE) method. The ISA was quantified with high predictive accuracy (i.e., AUROC = 0.8572 for 1995, AUROC = 0.8709 for 2005, AUROC = 0.8949 for 2015) using RF classifier. More than 70% of the selected administrative subplaces in Pretoria experienced an increase in growth rate (415.59%) between 1995 and 2015. LST computations from the Landsat TIRS bands yielded good results (RMSE = ~1.44OC, 1.40OC, ~0.86OC) for 1995, 2005 and 2015 respectively. Based on the hexagon polygon grid (90x90), the aggregated ISA surface temperature weighted SDE analysis results indicated ISA expansion in different directions at the selected administrative subplace units. Our findings can represent useful information for policymakers in evaluating urban development trends in Pretoria, City of Tshwane (COT).Globally, the unprecedented increase in population in many cities has led to rapid changes in urban landscape, which requires timely assessments and monitoring. Accurate determination of built-up information is vital for urban planning and environmental management. Often, the determination of the built-up area information has been dependent on field surveys, which is laborious and time-consuming. Remote sensing data is the only option for deriving spatially explicit and timely built-up area information. There are few spectral indices for built-up areas and often not accurate as they are specific to impervious material, age, colour, and thickness, especially using higher resolution images. The objective of this study is to test the utility of a new built-up extraction index (NBEI) using WorldView-2 to improve built-up material mapping irrespective of material type, age and colour. The new index was derived from spectral bands such as Green, Red edge, NIR1 and NIR2 bands that profoundly explain the variation in built-up areas on WorldView-2 image (WV-2). The result showed that NBEI improves the extraction of built-up areas with high accuracy (area under the receiver operating characteristic curve, AUROC = ~0.82) compared to the existing indices such as Built-up Area Index (BAI) (AUROC = ~0.73), Built-up spectral index (BSI) (AUROC = ~0.78 ), Red edge / Green Index (RGI) (AUROC = ~0.71) and WorldView-Built-up Index (WV-BI) (AUROC = ~0.67). The study demonstrated that the new built-up index could extract built-up areas using high-resolution images. The performance of NBEI could be attributed to the fact that it is not material specific, and would be necessary for urban area mapping.Environmental SciencesD. Phil. (Environmental Sciences
    corecore