910 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    A Survey on the Project in title

    Full text link
    In this paper we present a survey of work that has been done in the project ldquo;Unsupervised Adaptive P300 BCI in the framework of chaotic theory and stochastic theoryrdquo;we summarised the following papers, (Mohammed J Alhaddad amp; 2011), (Mohammed J. Alhaddad amp; Kamel M, 2012), (Mohammed J Alhaddad, Kamel, amp; Al-Otaibi, 2013), (Mohammed J Alhaddad, Kamel, amp; Bakheet, 2013), (Mohammed J Alhaddad, Kamel, amp; Al-Otaibi, 2014), (Mohammed J Alhaddad, Kamel, amp; Bakheet, 2014), (Mohammed J Alhaddad, Kamel, amp; Kadah, 2014), (Mohammed J Alhaddad, Kamel, Makary, Hargas, amp; Kadah, 2014), (Mohammed J Alhaddad, Mohammed, Kamel, amp; Hagras, 2015).We developed a new pre-processing method for denoising P300-based brain-computer interface data that allows better performance with lower number of channels and blocks. The new denoising technique is based on a modified version of the spectral subtraction denoising and works on each temporal signal channel independently thus offering seamless integration with existing pre-processing and allowing low channel counts to be used. We also developed a novel approach for brain-computer interface data that requires no prior training. The proposed approach is based on interval type-2 fuzzy logic based classifier which is able to handle the usersrsquo; uncertainties to produce better prediction accuracies than other competing classifiers such as BLDA or RFLDA. In addition, the generated type-2 fuzzy classifier is learnt from data via genetic algorithms to produce a small number of rules with a rule length of only one antecedent to maximize the transparency and interpretability for the normal clinician. We also employ a feature selection system based on an ensemble neural networks recursive feature selection which is able to find the effective time instances within the effective sensors in relation to given P300 event. The basic principle of this new class of techniques is that the trial with true activation signal within each block has to be different from the rest of the trials within that block. Hence, a measure that is sensitive to this dissimilarity can be used to make a decision based on a single block without any prior training. The new methods were verified using various experiments which were performed on standard data sets and using real-data sets obtained from real subjects experiments performed in the BCI lab in King Abdulaziz University. The results were compared to the classification results of the same data using previous methods. Enhanced performance in different experiments as quantitatively assessed using classification block accuracy as well as bit rate estimates was confirmed. It will be shown that the produced type-2 fuzzy logic based classifier will learn simple rules which are easy to understand explaining the events in question. In addition, the produced type-2 fuzzy logic classifier will be able to give better accuracies when compared to BLDA or RFLDA on various human subjects on the standard and real-world data sets

    Fuzzy Natural Logic in IFSA-EUSFLAT 2021

    Get PDF
    The present book contains five papers accepted and published in the Special Issue, “Fuzzy Natural Logic in IFSA-EUSFLAT 2021”, of the journal Mathematics (MDPI). These papers are extended versions of the contributions presented in the conference “The 19th World Congress of the International Fuzzy Systems Association and the 12th Conference of the European Society for Fuzzy Logic and Technology jointly with the AGOP, IJCRS, and FQAS conferences”, which took place in Bratislava (Slovakia) from September 19 to September 24, 2021. Fuzzy Natural Logic (FNL) is a system of mathematical fuzzy logic theories that enables us to model natural language terms and rules while accounting for their inherent vagueness and allows us to reason and argue using the tools developed in them. FNL includes, among others, the theory of evaluative linguistic expressions (e.g., small, very large, etc.), the theory of fuzzy and intermediate quantifiers (e.g., most, few, many, etc.), and the theory of fuzzy/linguistic IF–THEN rules and logical inference. The papers in this Special Issue use the various aspects and concepts of FNL mentioned above and apply them to a wide range of problems both theoretically and practically oriented. This book will be of interest for researchers working in the areas of fuzzy logic, applied linguistics, generalized quantifiers, and their applications

    Online Tensor Methods for Learning Latent Variable Models

    Get PDF
    We introduce an online tensor decomposition based approach for two latent variable modeling problems namely, (1) community detection, in which we learn the latent communities that the social actors in social networks belong to, and (2) topic modeling, in which we infer hidden topics of text articles. We consider decomposition of moment tensors using stochastic gradient descent. We conduct optimization of multilinear operations in SGD and avoid directly forming the tensors, to save computational and storage costs. We present optimized algorithm in two platforms. Our GPU-based implementation exploits the parallelism of SIMD architectures to allow for maximum speed-up by a careful optimization of storage and data transfer, whereas our CPU-based implementation uses efficient sparse matrix computations and is suitable for large sparse datasets. For the community detection problem, we demonstrate accuracy and computational efficiency on Facebook, Yelp and DBLP datasets, and for the topic modeling problem, we also demonstrate good performance on the New York Times dataset. We compare our results to the state-of-the-art algorithms such as the variational method, and report a gain of accuracy and a gain of several orders of magnitude in the execution time.Comment: JMLR 201

    Multi-class twitter data categorization and geocoding with a novel computing framework

    Get PDF
    This study details the progress in transportation data analysis with a novel computing framework in keeping with the continuous evolution of the computing technology. The computing framework combines the Labeled Latent Dirichlet Allocation (L-LDA)-incorporated Support Vector Machine (SVM) classifier with the supporting computing strategy on publicly available Twitter data in determining transportation-related events to provide reliable information to travelers. The analytical approach includes analyzing tweets using text classification and geocoding locations based on string similarity. A case study conducted for the New York City and its surrounding areas demonstrates the feasibility of the analytical approach. Approximately 700,010 tweets are analyzed to extract relevant transportation-related information for one week. The SVM classifier achieves \u3e 85% accuracy in identifying transportation-related tweets from structured data. To further categorize the transportation-related tweets into sub-classes: incident, congestion, construction, special events, and other events, three supervised classifiers are used: L-LDA, SVM, and L-LDA incorporated SVM. Findings from this study demonstrate that the analytical framework, which uses the L-LDA incorporated SVM, can classify roadway transportation-related data from Twitter with over 98.3% accuracy, which is significantly higher than the accuracies achieved by standalone L-LDA and SVM

    Word Representation with Salient Features

    Get PDF

    Protein Superfamily Classification using Computational Intelligence Techniques

    Get PDF
    The problem of protein superfamily classification is a challenging research area in Bioinformatics and has its major application in drug discovery. If a newly discovered protein which is responsible for the cause of new disease gets correctly classified to its superfamily, then the task of the drug analyst becomes much easier. The analyst can perform molecular docking to find the correct relative orientation of ligand for the protein. The ligand database can be searched for all possible orientations and conformations of the protein belonging to that superfamily paired with the ligand. Thus, the search space is reduced enormously as the protein-ligand pair is searched for a particular protein superfamily. Therefore, correct classification of proteins becomes a very challenging task as it guides the analysts to discover appropriate drugs. In this thesis, Neural Networks (NN), Multiobjective Genetic Algorithm (MOGA),and Support Vector Machine (SVM) are applied to perform the classification task.Adaptive MultiObjective Genetic Algorithm (AMOGA), which is a variation of MOGA is implemented for the structure optimization of Radial Basis Function Network (RBFN). The modification to MOGA is done based on the two key controlling parameters such as probability of crossover and probability of mutation. These values are adaptively varied based upon the performance of the algorithm, i.e., based upon the percentage of the total population present in the best non-domination level. The problem of finding the number of hidden centers remains a critical issue for the design of RBFN. The most optimal RBF network with good generalization ability can be derived from the pareto optimal set. Therefore, every solution of the pareto optimal set gives information regarding the specific samples to be chosen as hidden centers as well as the update weight matrix connecting the hidden and output layer. Principal Component Analysis (PCA) has been used for dimension reduction and significant feature extraction from long feature vector of amino acid sequences.In two-stage approach for protein superfamily classification, feature extraction process is carried in the first stage and design of the classifier has been proposed in the second stage with an overall objective to maximize the performance accuracy of the classifier. In the feature extraction phase, Genetic Algorithm(GA) based wrapper approach is used to select few eigen vectors from the PCA space which are encoded as binary strings in the chromosome. Using PCA-NSGA-II (non-dominated sorting GA), the non-dominated solutions obtained from the pareto front solves the trade-off problem by compromising between the number of eigen vectors selected and the accuracy obtained by the classifier. In the second stage, Recursive Orthogonal Least Square Algorithm (ROLSA) is used for training RBFN. ROLSA selects the optimal number o

    Towards a more efficient and cost-sensitive extreme learning machine: A state-of-the-art review of recent trend

    Get PDF
    In spite of the prominence of extreme learning machine model, as well as its excellent features such as insignificant intervention for learning and model tuning, the simplicity of implementation, and high learning speed, which makes it a fascinating alternative method for Artificial Intelligence, including Big Data Analytics, it is still limited in certain aspects. These aspects must be treated to achieve an effective and cost-sensitive model. This review discussed the major drawbacks of ELM, which include difficulty in determination of hidden layer structure, prediction instability and Imbalanced data distributions, the poor capability of sample structure preserving (SSP), and difficulty in accommodating lateral inhibition by direct random feature mapping. Other drawbacks include multi-graph complexity, global memory size, one-by-one or chuck-by-chuck (a block of data), global memory size limitation, and challenges with big data. The recent trend proposed by experts for each drawback is discussed in detail towards achieving an effective and cost-sensitive mode
    corecore