9,849 research outputs found
Predicting Pancreatic Cancer Using Support Vector Machine
This report presents an approach to predict pancreatic cancer using Support Vector Machine Classification algorithm. The research objective of this project it to predict pancreatic cancer on just genomic, just clinical and combination of genomic and clinical data. We have used real genomic data having 22,763 samples and 154 features per sample. We have also created Synthetic Clinical data having 400 samples and 7 features per sample in order to predict accuracy of just clinical data. To validate the hypothesis, we have combined synthetic clinical data with subset of features from real genomic data. In our results, we observed that prediction accuracy, precision, recall with just genomic data is 80.77%, 20%, 4%. Prediction accuracy, precision, recall with just synthetic clinical data is 93.33%, 95%, 30%. While prediction accuracy, precision, recall for combination of real genomic and synthetic clinical data is 90.83%, 10%, 5%. The combination of real genomic and synthetic clinical data decreased the accuracy since the genomic data is weakly correlated. Thus we conclude that the combination of genomic and clinical data does not improve pancreatic cancer prediction accuracy. A dataset with more significant genomic features might help to predict pancreatic cancer more accurately
Recommended from our members
Health Effects Associated With Electronic Cigarette Use: Automated Mining of Online Forums.
BACKGROUND:Our previous infodemiological study was performed by manually mining health-effect data associated with electronic cigarettes (ECs) from online forums. Manual mining is time consuming and limits the number of posts that can be retrieved. OBJECTIVE:Our goal in this study was to automatically extract and analyze a large number (>41,000) of online forum posts related to the health effects associated with EC use between 2008 and 2015. METHODS:Data were annotated with medical concepts from the Unified Medical Language System using a modified version of the MetaMap tool. Of over 1.4 million posts, 41,216 were used to analyze symptoms (undiagnosed conditions) and disorders (physician-diagnosed terminology) associated with EC use. For each post, sentiment (positive, negative, and neutral) was also assigned. RESULTS:Symptom and disorder data were categorized into 12 organ systems or anatomical regions. Most posts on symptoms and disorders contained negative sentiment, and affected systems were similar across all years. Health effects were reported most often in the neurological, mouth and throat, and respiratory systems. The most frequently reported symptoms and disorders were headache (n=939), coughing (n=852), malaise (n=468), asthma (n=916), dehydration (n=803), and pharyngitis (n=565). In addition, users often reported linked symptoms (eg, coughing and headache). CONCLUSIONS:Online forums are a valuable repository of data that can be used to identify positive and negative health effects associated with EC use. By automating extraction of online information, we obtained more data than in our prior study, identified new symptoms and disorders associated with EC use, determined which systems are most frequently adversely affected, identified specific symptoms and disorders most commonly reported, and tracked health effects over 7 years
A framework of hybrid recommender system for personalized clinical prescription
© 2015 IEEE. General practitioners are faced with a great challenge of clinical prescription owing to the increase of new drugs and their complex functions to different diseases. A personalized recommender system can help practitioners discover mass of medical knowledge hidden in history medical records to deal with information overload problem in prescription. To support practitioner's decision making in prescription, this paper proposes a framework of a hybrid recommender system which integrates artificial neural network and case-based reasoning. Three issues are considered in this system framework: (1) to define a patient's need by giving his/her symptom, (2) to mine features from free text in medical records and (3) to analyze temporal efficiency of drugs. The proposed recommender system is expected to help general practitioners to improve their efficiency and reduce risks of making errors in daily clinical consultation with patients
Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus
The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning
Comparing machine learning clustering with latent class analysis on cancer symptoms' data
Symptom Cluster Research is a major topic in Cancer Symptom Science. In spite of the several statistical and clinical approaches in this domain, there is not a consensus on which method performs better. Identifying a generally accepted analytical method is important in order to be able to utilize and process all the available data. In this paper we report a secondary analysis on cancer symptom data, comparing the performance of five Machine Learning (ML) clustering algorithms in doing so. Based on how well they separate specific subsets of symptom measurements we select the best of them and proceed to compare its performance with the Latent Class Analysis (LCA) method. This analysis is a part of an ongoing study for identifying suitable Machine Learning algorithms to analyse and predict cancer symptoms in cancer treatment
- …