2,572 research outputs found

    Detecting (Un)Important Content for Single-Document News Summarization

    Full text link
    We present a robust approach for detecting intrinsic sentence importance in news, by training on two corpora of document-summary pairs. When used for single-document summarization, our approach, combined with the "beginning of document" heuristic, outperforms a state-of-the-art summarizer and the beginning-of-article baseline in both automatic and manual evaluations. These results represent an important advance because in the absence of cross-document repetition, single document summarizers for news have not been able to consistently outperform the strong beginning-of-article baseline.Comment: Accepted By EACL 201

    Intelligent Biosignal Processing in Wearable and Implantable Sensors

    Get PDF
    This reprint provides a collection of papers illustrating the state-of-the-art of smart processing of data coming from wearable, implantable or portable sensors. Each paper presents the design, databases used, methodological background, obtained results, and their interpretation for biomedical applications. Revealing examples are brain–machine interfaces for medical rehabilitation, the evaluation of sympathetic nerve activity, a novel automated diagnostic tool based on ECG data to diagnose COVID-19, machine learning-based hypertension risk assessment by means of photoplethysmography and electrocardiography signals, Parkinsonian gait assessment using machine learning tools, thorough analysis of compressive sensing of ECG signals, development of a nanotechnology application for decoding vagus-nerve activity, detection of liver dysfunction using a wearable electronic nose system, prosthetic hand control using surface electromyography, epileptic seizure detection using a CNN, and premature ventricular contraction detection using deep metric learning. Thus, this reprint presents significant clinical applications as well as valuable new research issues, providing current illustrations of this new field of research by addressing the promises, challenges, and hurdles associated with the synergy of biosignal processing and AI through 16 different pertinent studies. Covering a wide range of research and application areas, this book is an excellent resource for researchers, physicians, academics, and PhD or master students working on (bio)signal and image processing, AI, biomaterials, biomechanics, and biotechnology with applications in medicine

    Machine Learning Models for High-dimensional Biomedical Data

    Get PDF
    abstract: The recent technological advances enable the collection of various complex, heterogeneous and high-dimensional data in biomedical domains. The increasing availability of the high-dimensional biomedical data creates the needs of new machine learning models for effective data analysis and knowledge discovery. This dissertation introduces several unsupervised and supervised methods to help understand the data, discover the patterns and improve the decision making. All the proposed methods can generalize to other industrial fields. The first topic of this dissertation focuses on the data clustering. Data clustering is often the first step for analyzing a dataset without the label information. Clustering high-dimensional data with mixed categorical and numeric attributes remains a challenging, yet important task. A clustering algorithm based on tree ensembles, CRAFTER, is proposed to tackle this task in a scalable manner. The second part of this dissertation aims to develop data representation methods for genome sequencing data, a special type of high-dimensional data in the biomedical domain. The proposed data representation method, Bag-of-Segments, can summarize the key characteristics of the genome sequence into a small number of features with good interpretability. The third part of this dissertation introduces an end-to-end deep neural network model, GCRNN, for time series classification with emphasis on both the accuracy and the interpretation. GCRNN contains a convolutional network component to extract high-level features, and a recurrent network component to enhance the modeling of the temporal characteristics. A feed-forward fully connected network with the sparse group lasso regularization is used to generate the final classification and provide good interpretability. The last topic centers around the dimensionality reduction methods for time series data. A good dimensionality reduction method is important for the storage, decision making and pattern visualization for time series data. The CRNN autoencoder is proposed to not only achieve low reconstruction error, but also generate discriminative features. A variational version of this autoencoder has great potential for applications such as anomaly detection and process control.Dissertation/ThesisDoctoral Dissertation Industrial Engineering 201

    時間周波数領域でのてんかん脳波識別に関する研究 ‐平均二乗根に基づく特徴抽出に着目して‐

    Get PDF
    Epilepsy affects over 50 million people on an average yearly world wide. Epileptic Seizure is a generalised term which has broad classification depending on the reasons behind its occurrence. Parvez et al. when applied feature instantaneous bandwidth B2AM and time averaged bandwidth B2FM for classification of interictal and ictal on Freiburg data base, the result dipped low to 77.90% for frontal lobe whereas it was 80.20% for temporal lobe compare to the 98.50% of classification accuracy achieved on Bonn dataset with same feature for classification of ictal against interictal. We found reasons behind such low results are, first Parvez et al. has used first IMF of EMD for feature computation which mostly noised induce. Secondly, they used same kernel parameters of SVM as Bajaj et al. which they must have optimised with different dataset. But the most important reason we found is that two signals s1 and s2 can have same instantaneous bandwidth. Therefore, the motivation of the dissertation is to address the drawback of feature instantaneous bandwidth by new feature with objective of achieving comparable classification accuracy. In this work, we have classified ictal from healthy nonseizure interictal successfully first by using RMS frequency and another feature from Hilbert marginal spectrum then with its parameters ratio. RMS frequency is the square root of sum of square bandwidth and square of center frequency. Its contributing parameters ratio is ratio of center frequency square to square bandwidth. We have also used dominant frequency and its parameters ratio for the same purpose. Dominant frequency have same physical relevance as RMS frequency but different by definition, i.e. square root of sum of square of instantaneous band- width and square of instantaneous frequency. Third feature that we have used is by exploiting the equivalence of RMS frequency and dominant frequency (DF) to define root mean instantaneous frequency square (RMIFS) as square root of sum of time averaged bandwidth square and center frequency square. These features are average measures which shows good discrimination power in classifying ictal from interictal using SVM. These features, fr and fd also have an advantage of overcoming the draw back of square bandwidth and instantaneous bandwidth. RMS frequency that we have used in this work is different from generic root mean square analysis. We have used an adaptive thresholding algorithm to address the issue of false positive. It was able to increase the specificity by average of 5.9% on average consequently increasing the accuracy. Then we have applied morphological component analysis (MCA) with the fractional contribution of dominant frequency and other rest of the features like band- width parameter’s contribution and RMIFS frequency and its parameters and their ratio. With the results from proposed features, we validated our claim to overcome the drawback of instantaneous bandwidth and square bandwidth.九州工業大学博士学位論文 学位記番号:生工博甲第323号 学位授与年月日:平成30年6月28日1 Introduction|2 Empirical Mode Decomposition|3 Root Mean Square Frequency|4 Root Mean Instantaneous Frequency Square|5 Morphological Component Analysis|6 Conclusion九州工業大学平成30年

    Detection of Epileptic Seizures on EEG Signals Using ANFIS Classifier, Autoencoders and Fuzzy Entropies

    Get PDF
    Epileptic seizures are one of the most crucial neurological disorders, and their early diagnosis will help the clinicians to provide accurate treatment for the patients. The electroencephalogram (EEG) signals are widely used for epileptic seizures detection, which provides specialists with substantial information about the functioning of the brain. In this paper, a novel diagnostic procedure using fuzzy theory and deep learning techniques is introduced. The proposed method is evaluated on the Bonn University dataset with six classification combinations and also on the Freiburg dataset. The tunable- Q wavelet transform (TQWT) is employed to decompose the EEG signals into different sub-bands. In the feature extraction step, 13 different fuzzy entropies are calculated from different sub-bands of TQWT, and their computational complexities are calculated to help researchers choose the best set for various tasks. In the following, an autoencoder (AE) with six layers is employed for dimensionality reduction. Finally, the standard adaptive neuro-fuzzy inference system (ANFIS), and also its variants with grasshopper optimization algorithm (ANFIS-GOA), particle swarm optimization (ANFIS-PSO), and breeding swarm optimization (ANFIS-BS) methods are used for classification. Using our proposed method, ANFIS-BS method has obtained an accuracy of 99.7

    GEMINI: A Generic Multi-Modal Natural Interface Framework for Videogames

    Full text link
    In recent years videogame companies have recognized the role of player engagement as a major factor in user experience and enjoyment. This encouraged a greater investment in new types of game controllers such as the WiiMote, Rock Band instruments and the Kinect. However, the native software of these controllers was not originally designed to be used in other game applications. This work addresses this issue by building a middleware framework, which maps body poses or voice commands to actions in any game. This not only warrants a more natural and customized user-experience but it also defines an interoperable virtual controller. In this version of the framework, body poses and voice commands are respectively recognized through the Kinect's built-in cameras and microphones. The acquired data is then translated into the native interaction scheme in real time using a lightweight method based on spatial restrictions. The system is also prepared to use Nintendo's Wiimote as an auxiliary and unobtrusive gamepad for physically or verbally impractical commands. System validation was performed by analyzing the performance of certain tasks and examining user reports. Both confirmed this approach as a practical and alluring alternative to the game's native interaction scheme. In sum, this framework provides a game-controlling tool that is totally customizable and very flexible, thus expanding the market of game consumers.Comment: WorldCIST'13 Internacional Conferenc

    From narrative descriptions to MedDRA: automagically encoding adverse drug reactions

    Get PDF
    The collection of narrative spontaneous reports is an irreplaceable source for the prompt detection of suspected adverse drug reactions (ADRs). In such task qualified domain experts manually revise a huge amount of narrative descriptions and then encode texts according to MedDRA standard terminology. The manual annotation of narrative documents with medical terminology is a subtle and expensive task, since the number of reports is growing up day-by-day. Natural Language Processing (NLP) applications can support the work of people responsible for pharmacovigilance. Our objective is to develop NLP algorithms and tools for the detection of ADR clinical terminology. Efficient applications can concretely improve the quality of the experts\u2019 revisions. NLP software can quickly analyze narrative texts and offer an encoding (i.e., a list of MedDRA terms) that the expert has to revise and validate. MagiCoder, an NLP algorithm, is proposed for the automatic encoding of free-text descriptions into MedDRA terms. MagiCoder procedure is efficient in terms of computational complexity. We tested MagiCoder through several experiments. In the first one, we tested it on a large dataset of about 4500 manually revised reports, by performing an automated comparison between human and MagiCoder encoding. Moreover, we tested MagiCoder on a set of about 1800 reports, manually revised ex novo by some experts of the domain, who also compared automatic solutions with the gold reference standard. We also provide two initial experiments with reports written in English, giving a first evidence of the robustness of MagiCoder w.r.t. the change of the language. For the current base version of MagiCoder, we measured an average recall and precision of and , respectively. From a practical point of view, MagiCoder reduces the time required for encoding ADR reports. Pharmacologists have only to review and validate the MedDRA terms proposed by the application, instead of choosing the right terms among the 70\u202fK low level terms of MedDRA. Such improvement in the efficiency of pharmacologists\u2019 work has a relevant impact also on the quality of the subsequent data analysis. We developed MagiCoder for the Italian pharmacovigilance language. However, our proposal is based on a general approach, not depending on the considered language nor the term dictionary

    Transfer Learning of EEG for Analysis of Interictal Epileptiform Discharges

    Get PDF

    Mining Frequency of Drug Side Effects Over a Large Twitter Dataset Using Apache Spark

    Get PDF
    Despite clinical trials by pharmaceutical companies as well as current FDA reporting systems, there are still drug side effects that have not been caught. To find a larger sample of reports, a possible way is to mine online social media. With its current widespread use, social media such as Twitter has given rise to massive amounts of data, which can be used as reports for drug side effects. To process these large datasets, Apache Spark has become popular for fast, distributed batch processing. In this work, we have improved on previous pipelines in sentimental analysis-based mining, processing, and extracting tweets with drug-caused side effects. We have also added a new ensemble classifier using a combination of sentiment analysis features to increase the accuracy of identifying drug-caused side effects. In addition, the frequency count for the side effects is also provided. Furthermore, we have also implemented the same pipeline in Apache Spark to improve the speed of processing of tweets by 2.5 times, as well as to support the process of large tweet datasets. As the frequency count of drug side effects opens a wide door for further analysis, we present a preliminary study on this issue, including the side effects of simultaneously using two drugs, and the potential danger of using less-common combination of drugs. We believe the pipeline design and the results present in this work would have great implication on studying drug side effects and on big data analysis in general
    corecore