2,187 research outputs found

    Systematic literature review (SLR) automation: a systematic literature review

    Get PDF
    Context: A systematic literature review(SLR) is a methodology used to find and aggregate all relevant studies about a specific research question or topic of interest. Most of the SLR processes are manually conducted. Automating these processes can reduce the workload and time consumed by human. Method: we use SLR as a methodology to survey the literature about the technologies used to automate SLR processes. Result: from the collected data we found many work done to automate the study selection process but there is no evidence about automation of the planning and reporting process. Most of the authors use machine learning classifiers to automate the study selection process. From our survey, there are processes that are similar to the SLR process for which there are automatic techniques to perform them. Conclusion: Because of these results, we concluded that there should be more research done on the planning, reporting, data extraction and synthesizing processes of SLR

    A comparison of machine learning techniques for detection of drug target articles

    Get PDF
    Important progress in treating diseases has been possible thanks to the identification of drug targets. Drug targets are the molecular structures whose abnormal activity, associated to a disease, can be modified by drugs, improving the health of patients. Pharmaceutical industry needs to give priority to their identification and validation in order to reduce the long and costly drug development times. In the last two decades, our knowledge about drugs, their mechanisms of action and drug targets has rapidly increased. Nevertheless, most of this knowledge is hidden in millions of medical articles and textbooks. Extracting knowledge from this large amount of unstructured information is a laborious job, even for human experts. Drug target articles identification, a crucial first step toward the automatic extraction of information from texts, constitutes the aim of this paper. A comparison of several machine learning techniques has been performed in order to obtain a satisfactory classifier for detecting drug target articles using semantic information from biomedical resources such as the Unified Medical Language System. The best result has been achieved by a Fuzzy Lattice Reasoning classifier, which reaches 98% of ROC area measure.This research paper is supported by Projects TIN2007-67407- C03-01, S-0505/TIC-0267 and MICINN project TEXT-ENTERPRISE 2.0 TIN2009-13391-C04-03 (Plan I + D + i), as well as for the Juan de la Cierva program of the MICINN of SpainPublicad

    TATPred:a Bayesian method for the identification of twin arginine translocation pathway signal sequences

    Get PDF
    The twin arginine translocation (TAT) system ferries folded proteins across the bacterial membrane. Proteins are directed into this system by the TAT signal peptide present at the amino terminus of the precursor protein, which contains the twin arginine residues that give the system its name. There are currently only two computational methods for the prediction of TAT translocated proteins from sequence. Both methods have limitations that make the creation of a new algorithm for TAT-translocated protein prediction desirable. We have developed TATPred, a new sequence-model method, based on a Nave-Bayesian network, for the prediction of TAT signal peptides. In this approach, a comprehensive range of models was tested to identify the most reliable and robust predictor. The best model comprised 12 residues: three residues prior to the twin arginines and the seven residues that follow them. We found a prediction sensitivity of 0.979 and a specificity of 0.942

    Using Kinect to classify Parkinson’s disease stages related to severity of gait impairment

    Get PDF
    Published: 10 December 2018Parkinson’s Disease (PD) is a chronic neurodegenerative disease associated with motor problems such as gait impairment. Different systems based on 3D cameras, accelerometers or gyroscopes have been used in related works in order to study gait disturbances in PD. Kinect Ⓡ has also been used to build these kinds of systems, but contradictory results have been reported: some works conclude that Kinect does not provide an accurate method of measuring gait kinematics variables, but others, on the contrary, report good accuracy results.This research work was funded by the Spanish Ministry of Economy and Competitiveness (grant FEDER/TIN2016-78011-C4-2-R). The funding bodies had no role in the design or conclusions of this study

    Performance Analysis of Naive Bayes Variation Method in Spice Image Classification Using Histogram of Gradient Oriented (HOG) Feature Extraction

    Get PDF
    Indonesia has a lot of natural wealth of spices. The diversity of spices is an inseparable aspect of Indonesian history. Spices and seasonings are biological resources that have long played an important role in human life. Indonesian spices have almost the same color and shape. The purpose of this study was to analyze the performance of the Naïve Bayes variation method in classifying spices using a Histogram Of Oriented Gradient (HOG) feature extraction. Based on 3 tests, the performance of the four Naïve Bayes variation methods carried out in this study, it can be seen that testing 5 types of spices using the Gaussian Naïve Bayes method obtained the best performance with an accuracy of 0.946, a precision of 0.95, a recall of 0.945, f1 score of 0.947, f beta score of 0.946, and Jaccard score of 0.90. Where as using the Complement Naïve Bayes method gets the lowest performance. From the results of this study it can be concluded that by utilizing HOG feature extraction and the Naïve Bayes variation method, maximum classification results are obtained in classifying spices. To obtain more accurate classification results, consider using other methods and other feature extractio

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Ligand-based virtual screening using binary kernel discrimination

    Get PDF
    This paper discusses the use of a machine-learning technique called binary kernel discrimination (BKD) for virtual screening in drug- and pesticide-discovery programmes. BKD is compared with several other ligand-based tools for virtual screening in databases of 2D structures represented by fragment bit-strings, and is shown to provide an effective, and reasonably efficient, way of prioritising compounds for biological screening

    A voting approach to identify a small number of highly predictive genes using multiple classifiers

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microarray gene expression profiling has provided extensive datasets that can describe characteristics of cancer patients. An important challenge for this type of data is the discovery of gene sets which can be used as the basis of developing a clinical predictor for cancer. It is desirable that such gene sets be compact, give accurate predictions across many classifiers, be biologically relevant and have good biological process coverage.</p> <p>Results</p> <p>By using a new type of multiple classifier voting approach, we have identified gene sets that can predict breast cancer prognosis accurately, for a range of classification algorithms. Unlike a wrapper approach, our method is not specialised towards a single classification technique. Experimental analysis demonstrates higher prediction accuracies for our sets of genes compared to previous work in the area. Moreover, our sets of genes are generally more compact than those previously proposed. Taking a biological viewpoint, from the literature, most of the genes in our sets are known to be strongly related to cancer.</p> <p>Conclusion</p> <p>We show that it is possible to obtain superior classification accuracy with our approach and obtain a compact gene set that is also biologically relevant and has good coverage of different biological processes.</p

    Doctor of Philosophy

    Get PDF
    dissertationNanoinformatics is a relatively young field of study that is important due to its implications in the field of nanomedicine, specifically toward the development of nanoparticle drug delivery systems. As more structural, biochemical, and physiochemical data become available regarding nanoparticles, the greater the knowledge-gain from using nanoinformatics methods will become. While there are challenges that exist with nanoparticle data, including heterogeneity of data and complexity of the particles, nanoinformatics will be at the forefront of processing these data and aid in the design of nanoparticles for biomedical applications. In this dissertation, a review of data mining and machine learning studies performed in the field of nanomedicine is presented. Next, the use of natural language processing methods to extract numeric values of biomedical property terms of poly(amido amine) (PAMAM) dendrimers from nanomedicine literature is demonstrated, along with successful extraction results. Following this is an implementation and its results of data mining techniques used for the development of predictive models of cytotoxicity of PAMAM dendrimers using their chemical and structural properties. Finally, a method and its results for using molecular dynamics simulations to test the ability of EDTA, as a gold standard, and generation 3.5 (G3.5) PAMAM dendrimers to chelate calcium
    corecore