27,811 research outputs found

    Mining Frequency of Drug Side Effects Over a Large Twitter Dataset Using Apache Spark

    Get PDF
    Despite clinical trials by pharmaceutical companies as well as current FDA reporting systems, there are still drug side effects that have not been caught. To find a larger sample of reports, a possible way is to mine online social media. With its current widespread use, social media such as Twitter has given rise to massive amounts of data, which can be used as reports for drug side effects. To process these large datasets, Apache Spark has become popular for fast, distributed batch processing. In this work, we have improved on previous pipelines in sentimental analysis-based mining, processing, and extracting tweets with drug-caused side effects. We have also added a new ensemble classifier using a combination of sentiment analysis features to increase the accuracy of identifying drug-caused side effects. In addition, the frequency count for the side effects is also provided. Furthermore, we have also implemented the same pipeline in Apache Spark to improve the speed of processing of tweets by 2.5 times, as well as to support the process of large tweet datasets. As the frequency count of drug side effects opens a wide door for further analysis, we present a preliminary study on this issue, including the side effects of simultaneously using two drugs, and the potential danger of using less-common combination of drugs. We believe the pipeline design and the results present in this work would have great implication on studying drug side effects and on big data analysis in general

    A survey on utilization of data mining approaches for dermatological (skin) diseases prediction

    Get PDF
    Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data

    Exploring Privacy Preservation in Outsourced K-Nearest Neighbors with Multiple Data Owners

    Full text link
    The k-nearest neighbors (k-NN) algorithm is a popular and effective classification algorithm. Due to its large storage and computational requirements, it is suitable for cloud outsourcing. However, k-NN is often run on sensitive data such as medical records, user images, or personal information. It is important to protect the privacy of data in an outsourced k-NN system. Prior works have all assumed the data owners (who submit data to the outsourced k-NN system) are a single trusted party. However, we observe that in many practical scenarios, there may be multiple mutually distrusting data owners. In this work, we present the first framing and exploration of privacy preservation in an outsourced k-NN system with multiple data owners. We consider the various threat models introduced by this modification. We discover that under a particularly practical threat model that covers numerous scenarios, there exists a set of adaptive attacks that breach the data privacy of any exact k-NN system. The vulnerability is a result of the mathematical properties of k-NN and its output. Thus, we propose a privacy-preserving alternative system supporting kernel density estimation using a Gaussian kernel, a classification algorithm from the same family as k-NN. In many applications, this similar algorithm serves as a good substitute for k-NN. We additionally investigate solutions for other threat models, often through extensions on prior single data owner systems

    Automation of motor dexterity assessment

    Get PDF
    Motor dexterity assessment is regularly performed in rehabilitation wards to establish patient status and automatization for such routinary task is sought. A system for automatizing the assessment of motor dexterity based on the Fugl-Meyer scale and with loose restrictions on sensing technologies is presented. The system consists of two main elements: 1) A data representation that abstracts the low level information obtained from a variety of sensors, into a highly separable low dimensionality encoding employing t-distributed Stochastic Neighbourhood Embedding, and, 2) central to this communication, a multi-label classifier that boosts classification rates by exploiting the fact that the classes corresponding to the individual exercises are naturally organized as a network. Depending on the targeted therapeutic movement class labels i.e. exercises scores, are highly correlated-patients who perform well in one, tends to perform well in related exercises-; and critically no node can be used as proxy of others - an exercise does not encode the information of other exercises. Over data from a cohort of 20 patients, the novel classifier outperforms classical Naive Bayes, random forest and variants of support vector machines (ANOVA: p <; 0.001). The novel multi-label classification strategy fulfills an automatic system for motor dexterity assessment, with implications for lessening therapist's workloads, reducing healthcare costs and providing support for home-based virtual rehabilitation and telerehabilitation alternatives
    • …
    corecore