42 research outputs found

    Evolutionary multi-objective training set selection of data instances and augmentations for vocal detection

    Get PDF
    © Springer Nature Switzerland AG 2019. The size of publicly available music data sets has grown significantly in recent years, which allows training better classification models. However, training on large data sets is time-intensive and cumbersome, and some training instances might be unrepresentative and thus hurt classification performance regardless of the used model. On the other hand, it is often beneficial to extend the original training data with augmentations, but only if they are carefully chosen. Therefore, identifying a “smart” selection of training instances should improve performance. In this paper, we introduce a novel, multi-objective framework for training set selection with the target to simultaneously minimise the number of training instances and the classification error. Experimentally, we apply our method to vocal activity detection on a multi-track database extended with various audio augmentations for accompaniment and vocals. Results show that our approach is very effective at reducing classification error on a separate validation set, and that the resulting training set selections either reduce classification error or require only a small fraction of training instances for comparable performance

    Feature engineering and a proposed decision-support system for systematic reviewers of medical evidence

    Get PDF
    Objectives: Evidence-based medicine depends on the timely synthesis of research findings. An important source of synthesized evidence resides in systematic reviews. However, a bottleneck in review production involves dual screening of citations with titles and abstracts to find eligible studies. For this research, we tested the effect of various kinds of textual information (features) on performance of a machine learning classifier. Based on our findings, we propose an automated system to reduce screeing burden, as well as offer quality assurance. Methods: We built a database of citations from 5 systematic reviews that varied with respect to domain, topic, and sponsor. Consensus judgments regarding eligibility were inferred from published reports. We extracted 5 feature sets from citations: alphabetic, alphanumeric +, indexing, features mapped to concepts in systematic reviews, and topic models. To simulate a two-person team, we divided the data into random halves. We optimized the parameters of a Bayesian classifier, then trained and tested models on alternate data halves. Overall, we conducted 50 independent tests. Results: All tests of summary performance (mean F3) surpassed the corresponding baseline, P<0.0001. The ranks for mean F3, precision, and classification error were statistically different across feature sets averaged over reviews; P-values for Friedman's test were .045, .002, and .002, respectively. Differences in ranks for mean recall were not statistically significant. Alphanumeric+ features were associated with best performance; mean reduction in screening burden for this feature type ranged from 88% to 98% for the second pass through citations and from 38% to 48% overall. Conclusions: A computer-assisted, decision support system based on our methods could substantially reduce the burden of screening citations for systematic review teams and solo reviewers. Additionally, such a system could deliver quality assurance both by confirming concordant decisions and by naming studies associated with discordant decisions for further consideration. © 2014 Bekhuis et al

    Pattern Recognition Software and Techniques for Biological Image Analysis

    Get PDF
    The increasing prevalence of automated image acquisition systems is enabling new types of microscopy experiments that generate large image datasets. However, there is a perceived lack of robust image analysis systems required to process these diverse datasets. Most automated image analysis systems are tailored for specific types of microscopy, contrast methods, probes, and even cell types. This imposes significant constraints on experimental design, limiting their application to the narrow set of imaging methods for which they were designed. One of the approaches to address these limitations is pattern recognition, which was originally developed for remote sensing, and is increasingly being applied to the biology domain. This approach relies on training a computer to recognize patterns in images rather than developing algorithms or tuning parameters for specific image processing tasks. The generality of this approach promises to enable data mining in extensive image repositories, and provide objective and quantitative imaging assays for routine use. Here, we provide a brief overview of the technologies behind pattern recognition and its use in computer vision for biological and biomedical imaging. We list available software tools that can be used by biologists and suggest practical experimental considerations to make the best use of pattern recognition techniques for imaging assays

    Identification of Novel Functional Inhibitors of Acid Sphingomyelinase

    Get PDF
    We describe a hitherto unknown feature for 27 small drug-like molecules, namely functional inhibition of acid sphingomyelinase (ASM). These entities named FIASMAs (Functional Inhibitors of Acid SphingoMyelinAse), therefore, can be potentially used to treat diseases associated with enhanced activity of ASM, such as Alzheimer's disease, major depression, radiation- and chemotherapy-induced apoptosis and endotoxic shock syndrome. Residual activity of ASM measured in the presence of 10 µM drug concentration shows a bimodal distribution; thus the tested drugs can be classified into two groups with lower and higher inhibitory activity. All FIASMAs share distinct physicochemical properties in showing lipophilic and weakly basic properties. Hierarchical clustering of Tanimoto coefficients revealed that FIASMAs occur among drugs of various chemical scaffolds. Moreover, FIASMAs more frequently violate Lipinski's Rule-of-Five than compounds without effect on ASM. Inhibition of ASM appears to be associated with good permeability across the blood-brain barrier. In the present investigation, we developed a novel structure-property-activity relationship by using a random forest-based binary classification learner. Virtual screening revealed that only six out of 768 (0.78%) compounds of natural products functionally inhibit ASM, whereas this inhibitory activity occurs in 135 out of 2028 (6.66%) drugs licensed for medical use in humans

    Similarity Clustering of Music Files According to User Preference

    No full text

    Segment and combine approach for non-parametric time-series classification

    Full text link
    peer reviewedThis paper presents a novel, generic, scalable, autonomous, and flexible supervised learning algorithm for the classification of multivariate and variable length time series. The essential ingredients of the algorithm are randomization, segmentation of time-series, decision tree ensemble based learning of subseries classifiers, combination of subseries classification by voting, and cross-validation based temporal resolution adaptation. Experiments are carried out with this method on 10 synthetic and real-world datasets. They highlight the good behavior of the algorithm on a large diversity of problems. Our results are also highly competitive with existing approaches from the literature

    Applications of Knowledge Discovery

    No full text

    A Visual Programming Approach to Big Data Analytics

    No full text
    corecore