3 research outputs found

    Machine learning as prediction tool

    Get PDF
    Strojno učenje postaje sve češći alat u većini znanstvenih disciplina. Ovaj tekst pokušava objasniti strojno učenje i njegove osnovne koncepte kroz klasifikaciju sustava strojnog učenja i kroz objašnjenja najkorištenijih algoritama. Također prikazuje kako stvoriti sustav strojnog učenja za klasifikaciju korištenjem klasifikatora K najbližih susjeda, logističke regresije te linearnog i RBF stroja potpornih vektora. Također, objašnjeno je kako interpretirati rezultate najkorištenijih metoda za evaluaciju sustava strojnog učenja.Machine learning is becoming a more prevelant tool in most science fields. This text tries to explain machine learning and its basic concepts through classification of machine learning systems and through explanations of most used algorithms. It is also presents how to create a machine learning system for classification using K- nearest neighbors classifier, logistic regression, and linear and RBF kernel support vector machines. Also, it is shown how to interpret results of most used methods for evaluating machine learning systems

    Learning structured medical information from social media

    Get PDF
    Our goal is to summarise and aggregate information from social media regarding the symptoms of a disease, the drugs used and the treatment effects both positive and negative. To achieve this we first apply a supervised machine learning method to automatically extract medical concepts from natural language text. In an environment such as social media, where new data is continuously streamed, we need a methodology that will allow us to continuously train with the new data. To attain such incremental re-training, a semi-supervised methodology is developed, which is capable of learning new concepts from a small set of labelled data together with the much larger set of unlabelled data. The semi-supervised methodology deploys a conditional random field (CRF) as the base-line training algorithm for extracting medical concepts. The methodology iteratively augments to the training set sentences having high confidence, and adds terms to existing dictionaries to be used as features with the base-line model for further classification. Our empirical results show that the base-line CRF performs strongly across a range of different dictionary and training sizes; when the base-line is built with the full training data the F1F_1 score reaches the range 84\%--90\%. Moreover, we show that the semi-supervised method produces a mild but significant improvement over the base-line. We also discuss the significance of the potential improvement of the semi-supervised methodology and found that it is significantly more accurate in most cases than the underlying base-line model

    Extracting health information from social media

    Get PDF
    Social media platforms with large user bases such as Twitter, Reddit, and online health forums contain a rich amount of health-related information. Despite the advances achieved in natural language processing (NLP), extracting actionable health information from social media still remains challenging. This thesis proposes a set of methodologies that can be used to extract medical concepts and health information from social media that is related to drugs, symptoms, and side-effects. We first develop a rule-based relationship extraction system that utilises a set of dictionaries and linguistic rules in order to extract structured information from patients’ posts on online health forums. We then automate the concept extraction pro-cess via; i) a supervised algorithm that has been trained with a small labelled dataset, and ii) an iterative semi-supervised algorithm capable of learning new sentences and concepts. We test our machine-learning pipeline on a COVID-19 case study that involves patient authored social media posts. We develop a novel triage and diagnostic approach to extract symptoms, severity, and prevalence of the disease rather than to provide any actionable decisions at the individual level. Finally, we extend our approach by investigating the potential benefit of incorporating dictionary information into a neural network architecture for natural language processing
    corecore