189,657 research outputs found

    A comparison of machine learning techniques for detection of drug target articles

    Get PDF
    Important progress in treating diseases has been possible thanks to the identification of drug targets. Drug targets are the molecular structures whose abnormal activity, associated to a disease, can be modified by drugs, improving the health of patients. Pharmaceutical industry needs to give priority to their identification and validation in order to reduce the long and costly drug development times. In the last two decades, our knowledge about drugs, their mechanisms of action and drug targets has rapidly increased. Nevertheless, most of this knowledge is hidden in millions of medical articles and textbooks. Extracting knowledge from this large amount of unstructured information is a laborious job, even for human experts. Drug target articles identification, a crucial first step toward the automatic extraction of information from texts, constitutes the aim of this paper. A comparison of several machine learning techniques has been performed in order to obtain a satisfactory classifier for detecting drug target articles using semantic information from biomedical resources such as the Unified Medical Language System. The best result has been achieved by a Fuzzy Lattice Reasoning classifier, which reaches 98% of ROC area measure.This research paper is supported by Projects TIN2007-67407- C03-01, S-0505/TIC-0267 and MICINN project TEXT-ENTERPRISE 2.0 TIN2009-13391-C04-03 (Plan I + D + i), as well as for the Juan de la Cierva program of the MICINN of SpainPublicad

    Comparison of machine learning approaches for classification of invoices

    Get PDF
    Machine learning has become one of the leading sciences governing modern world. Various disciplines specifically neural networks have recently gained a lot of attention due to its widespread applications. With the recent advances in the technology the resulting big data has augmented the need of bigger means of storage, analysis and henceforth utilization. This not only implies the efficient use of available techniques but suggests surge in the development of new algorithms and techniques. In this project, three different machine learning approaches were implemented utilizing the open source library of keras on TensorFlow as a proof of concept for the task of intelligent invoice automation. The performance of these approaches for improved business on data of invoices has been analysed using the data of two customers with two target attributes per customer as a dataset. The behaviour of neural network hyper-parameters using matplotlib and TensorBoard was empirically calculated and investigated. As part of the first approach, the standard way of implementing predictive algorithm using neural network was followed. Moreover, the hyper-parameters search space was fine-tuned, and the resulting model was studied by grid search on those hyper-parameters. This strategy of hyper-parameters was followed in the next two approaches as well. In the second approach, not only further possible improvement in prediction accuracy is achieved but also the dependency between the two target attributes by using multi-task learning was determined. As per the third implemented approach, the use of continual learning on invoices for postings was analysed. This investigation, that involves the comparison of varied machine learning approaches has broad significance in approving the currently available algorithms for handling such data and suggests means for improvement as well. It holds great prospects, including but not limited to future implementation of such approaches in the domain of finance towards improved customer experience, fraud detection and ease in the assessments of assets etc

    Exploring machine learning methods to automatically identify students in need of assistance

    Full text link
    Copyright 2015 ACM. Methods for automatically identifying students in need of assistance have been studied for decades. Initially, the work was based on somewhat static factors such as students' educational background and results from various questionnaires, while more recently, constantly accumulating data such as progress with course assignments and behavior in lectures has gained attention. We contribute to this work with results on early detection of students in need of assistance, and provide a starting point for using machine learning techniques on naturally accumulating programming process data. When combining source code snapshot data that is recorded from students' programming process with machine learning methods, we are able to detect high- and low-performing students with high accuracy already after the very first week of an introductory programming course. Comparison of our results to the prominent methods for predicting students' performance using source code snapshot data is also provided. This early information on students' performance is beneficial from multiple viewpoints. Instructors can target their guidance to struggling students early on, and provide more challenging assignments for high-performing students. Moreover, students that perform poorly in the introductory programming course, but who nevertheless pass, can be monitored more closely in their future studies

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

    Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs

    Full text link
    Binary code analysis allows analyzing binary code without having access to the corresponding source code. A binary, after disassembly, is expressed in an assembly language. This inspires us to approach binary analysis by leveraging ideas and techniques from Natural Language Processing (NLP), a rich area focused on processing text of various natural languages. We notice that binary code analysis and NLP share a lot of analogical topics, such as semantics extraction, summarization, and classification. This work utilizes these ideas to address two important code similarity comparison problems. (I) Given a pair of basic blocks for different instruction set architectures (ISAs), determining whether their semantics is similar or not; and (II) given a piece of code of interest, determining if it is contained in another piece of assembly code for a different ISA. The solutions to these two problems have many applications, such as cross-architecture vulnerability discovery and code plagiarism detection. We implement a prototype system INNEREYE and perform a comprehensive evaluation. A comparison between our approach and existing approaches to Problem I shows that our system outperforms them in terms of accuracy, efficiency and scalability. And the case studies utilizing the system demonstrate that our solution to Problem II is effective. Moreover, this research showcases how to apply ideas and techniques from NLP to large-scale binary code analysis.Comment: Accepted by Network and Distributed Systems Security (NDSS) Symposium 201

    Micro-Doppler Based Human-Robot Classification Using Ensemble and Deep Learning Approaches

    Full text link
    Radar sensors can be used for analyzing the induced frequency shifts due to micro-motions in both range and velocity dimensions identified as micro-Doppler (μ\boldsymbol{\mu}-D) and micro-Range (μ\boldsymbol{\mu}-R), respectively. Different moving targets will have unique μ\boldsymbol{\mu}-D and μ\boldsymbol{\mu}-R signatures that can be used for target classification. Such classification can be used in numerous fields, such as gait recognition, safety and surveillance. In this paper, a 25 GHz FMCW Single-Input Single-Output (SISO) radar is used in industrial safety for real-time human-robot identification. Due to the real-time constraint, joint Range-Doppler (R-D) maps are directly analyzed for our classification problem. Furthermore, a comparison between the conventional classical learning approaches with handcrafted extracted features, ensemble classifiers and deep learning approaches is presented. For ensemble classifiers, restructured range and velocity profiles are passed directly to ensemble trees, such as gradient boosting and random forest without feature extraction. Finally, a Deep Convolutional Neural Network (DCNN) is used and raw R-D images are directly fed into the constructed network. DCNN shows a superior performance of 99\% accuracy in identifying humans from robots on a single R-D map.Comment: 6 pages, accepted in IEEE Radar Conference 201
    corecore