758,909 research outputs found
Applying machine learning to identify autistic adults using imitation: An exploratory study
Autism spectrum condition (ASC) is primarily diagnosed by behavioural symptoms
including social, sensory and motor aspects. Although stereotyped, repetitive motor
movements are considered during diagnosis, quantitative measures that identify
kinematic characteristics in the movement patterns of autistic individuals are poorly
studied, preventing advances in understanding the aetiology of motor impairment, or
whether a wider range of motor characteristics could be used for diagnosis. The aim of
this study was to investigate whether data-driven machine learning based methods
could be used to address some fundamental problems with regard to identifying
discriminative test conditions and kinematic parameters to classify between ASC and
neurotypical controls. Data was based on a previous task where 16 ASC participants
and 14 age, IQ matched controls observed then imitated a series of hand movements. 40
kinematic parameters extracted from eight imitation conditions were analysed using
machine learning based methods. Two optimal imitation conditions and nine most
significant kinematic parameters were identified and compared with some standard
attribute evaluators. To our knowledge, this is the first attempt to apply machine
learning to kinematic movement parameters measured during imitation of hand
movements to investigate the identification of ASC. Although based on a small sample,
the work demonstrates the feasibility of applying machine learning methods to analyse
high-dimensional data and suggest the potential of machine learning for identifying
kinematic biomarkers that could contribute to the diagnostic classification of autism
Amplifying Pathological Detection in EEG Signaling Pathways through Cross-Dataset Transfer Learning
Pathology diagnosis based on EEG signals and decoding brain activity holds
immense importance in understanding neurological disorders. With the
advancement of artificial intelligence methods and machine learning techniques,
the potential for accurate data-driven diagnoses and effective treatments has
grown significantly. However, applying machine learning algorithms to
real-world datasets presents diverse challenges at multiple levels. The
scarcity of labelled data, especially in low regime scenarios with limited
availability of real patient cohorts due to high costs of recruitment,
underscores the vital deployment of scaling and transfer learning techniques.
In this study, we explore a real-world pathology classification task to
highlight the effectiveness of data and model scaling and cross-dataset
knowledge transfer. As such, we observe varying performance improvements
through data scaling, indicating the need for careful evaluation and labelling.
Additionally, we identify the challenges of possible negative transfer and
emphasize the significance of some key components to overcome distribution
shifts and potential spurious correlations and achieve positive transfer. We
see improvement in the performance of the target model on the target (NMT)
datasets by using the knowledge from the source dataset (TUAB) when a low
amount of labelled data was available. Our findings indicate a small and
generic model (e.g. ShallowNet) performs well on a single dataset, however, a
larger model (e.g. TCN) performs better on transfer and learning from a larger
and diverse dataset
Validation strategies for target prediction methods
Computational methods for target prediction, based on molecular similarity and network-based approaches, machine learning, docking and others, have evolved as valuable and powerful tools to aid the challenging task of mode of action identification for bioactive small molecules such as drugs and drug-like compounds. Critical to discerning the scope and limitations of a target prediction method is understanding how its performance was evaluated and reported. Ideally, large-scale prospective experiments are conducted to validate the performance of a model; however, this expensive and time-consuming endeavor is often not feasible. Therefore, to estimate the predictive power of a method, statistical validation based on retrospective knowledge is commonly used. There are multiple statistical validation techniques that vary in rigor. In this review we discuss the validation strategies employed, highlighting the usefulness and constraints of the validation schemes and metrics that are employed to measure and describe performance. We address the limitations of measuring only generalized performance, given that the underlying bioactivity and structural data are biased towards certain small-molecule scaffolds and target families, and suggest additional aspects of performance to consider in order to produce more detailed and realistic estimates of predictive power. Finally, we describe the validation strategies that were employed by some of the most thoroughly validated and accessible target prediction methods.publishedVersio
Recommended from our members
A novel deep mining model for effective knowledge discovery from omics data
Knowledge discovery from omics data has become a common goal of current approaches to personalised cancer medicine and understanding cancer genotype and phenotype. However, high-throughput biomedical datasets are characterised by high dimensionality and relatively small sample sizes with small signal-to-noise ratios. Extracting and interpreting relevant knowledge from such complex datasets therefore remains a significant challenge for the fields of machine learning and data mining. In this paper, we exploit recent advances in deep learning to mitigate against these limitations on the basis of automatically capturing enough of the meaningful abstractions latent with the available biological samples. Our deep feature learning model is proposed based on a set of non-linear sparse Auto-Encoders that are deliberately constructed in an under-complete manner to detect a small proportion of molecules that can recover a large proportion of variations underlying the data. However, since multiple projections are applied to the input signals, it is hard to interpret which phenotypes were responsible for deriving such predictions. Therefore, we also introduce a novel weight interpretation technique that helps to deconstruct the internal state of such deep learning models to reveal key determinants underlying its latent representations. The outcomes of our experiment provide strong evidence that the proposed deep mining model is able to discover robust biomarkers that are positively and negatively associated with cancers of interest. Since our deep mining model is problem-independent and data-driven, it provides further potential for this research to extend beyond its cognate disciplines
Semi-supervised prediction of protein subcellular localization using abstraction augmented Markov models
<p>Abstract</p> <p>Background</p> <p>Determination of protein subcellular localization plays an important role in understanding protein function. Knowledge of the subcellular localization is also essential for genome annotation and drug discovery. Supervised machine learning methods for predicting the localization of a protein in a cell rely on the availability of large amounts of labeled data. However, because of the high cost and effort involved in labeling the data, the amount of labeled data is quite small compared to the amount of unlabeled data. Hence, there is a growing interest in developing <it>semi-supervised methods</it> for predicting protein subcellular localization from large amounts of unlabeled data together with small amounts of labeled data.</p> <p>Results</p> <p>In this paper, we present an Abstraction Augmented Markov Model (AAMM) based approach to semi-supervised protein subcellular localization prediction problem. We investigate the effectiveness of AAMMs in exploiting <it>unlabeled</it> data. We compare semi-supervised AAMMs with: (i) Markov models (MMs) (which do not take advantage of unlabeled data); (ii) an expectation maximization (EM); and (iii) a co-training based approaches to semi-supervised training of MMs (that make use of unlabeled data).</p> <p>Conclusions</p> <p>The results of our experiments on three protein subcellular localization data sets show that semi-supervised AAMMs: (i) can effectively exploit unlabeled data; (ii) are more accurate than both the MMs and the EM based semi-supervised MMs; and (iii) are comparable in performance, and in some cases outperform, the co-training based semi-supervised MMs.</p
Unifying an Introduction to Artificial Intelligence Course through Machine Learning Laboratory Experiences
This paper presents work on a collaborative project funded by the National Science Foundation that incorporates machine learning as a unifying theme to teach fundamental concepts typically covered in the introductory Artificial Intelligence courses. The project involves the development of an adaptable framework for the presentation of core AI topics. This is accomplished through the development, implementation, and testing of a suite of adaptable, hands-on laboratory projects that can be closely integrated into the AI course. Through the design and implementation of learning systems that enhance commonly-deployed applications, our model acknowledges that intelligent systems are best taught through their application to challenging problems. The goals of the project are to (1) enhance the student learning experience in the AI course, (2) increase student interest and motivation to learn AI by providing a framework for the presentation of the major AI topics that emphasizes the strong connection between AI and computer science and engineering, and (3) highlight the bridge that machine learning provides between AI technology and modern software engineering
Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples
Machine Learning has been a big success story during the AI resurgence. One
particular stand out success relates to learning from a massive amount of data.
In spite of early assertions of the unreasonable effectiveness of data, there
is increasing recognition for utilizing knowledge whenever it is available or
can be created purposefully. In this paper, we discuss the indispensable role
of knowledge for deeper understanding of content where (i) large amounts of
training data are unavailable, (ii) the objects to be recognized are complex,
(e.g., implicit entities and highly subjective content), and (iii) applications
need to use complementary or related data in multiple modalities/media. What
brings us to the cusp of rapid progress is our ability to (a) create relevant
and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP
techniques. Using diverse examples, we seek to foretell unprecedented progress
in our ability for deeper understanding and exploitation of multimodal data and
continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International
Conference on Web Intelligence (WI). arXiv admin note: substantial text
overlap with arXiv:1610.0770
Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure
Big data research has attracted great attention in science, technology,
industry and society. It is developing with the evolving scientific paradigm,
the fourth industrial revolution, and the transformational innovation of
technologies. However, its nature and fundamental challenge have not been
recognized, and its own methodology has not been formed. This paper explores
and answers the following questions: What is big data? What are the basic
methods for representing, managing and analyzing big data? What is the
relationship between big data and knowledge? Can we find a mapping from big
data into knowledge space? What kind of infrastructure is required to support
not only big data management and analysis but also knowledge discovery, sharing
and management? What is the relationship between big data and science paradigm?
What is the nature and fundamental challenge of big data computing? A
multi-dimensional perspective is presented toward a methodology of big data
computing.Comment: 59 page
- …