Search CORE

262 research outputs found

Nonlinear Independent Component Analysis for EEG-Based Brain-Computer Interface Systems

Author: Abbas Efranian
Farid Oveisi
Ioannis Patras
Shahrzad Oveisi
Publication venue: 'IntechOpen'
Publication date: 10/10/2012
Field of study

IntechOpen

Feature Extraction by Mutual Information Based on Minimal-Redundancy-Maximal-Relevance Criterion and Its Application to Classifying EEG Signal for Brain-Computer Interfaces

Author: Abbas Erfanian
Ali Shadvar
Farid Oveisi
Publication venue: 'IntechOpen'
Publication date: 04/02/2011
Field of study

IntechOpen

A model-based approach to selection of tag SNPs

Author: A Barron
A Thomas
AP Dempster
B Halldórsson
BV Halldórsson
CE Shannon
CS Carlson
CS Carlson
D Botstein
DC Crawford
DC Crawford
EC Anderson
Fengzhu Sun
G Schwarz
GA McVean
H Akaike
H Mannila
J Besag
JD Wall
JD Wall
JFC Kingman
JN Hirschhorn
K Zhang
K Zhang
K Zhang
L Breiman
L Excoffier
L Li
LE Baum
Lei M Li
LR Rabiner
M Koivisto
M Nothnagel
M Stephens
MJ Daly
N Li
N Patil
Pierre Nicolas
S Lin
SB Gabriel
SE Ptak
T Niu
TG Schulze
The International HapMap Consortium
TM Cover
W Zhai
X Ke
X Sun
Z Liu
Z Meng
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Single Nucleotide Polymorphisms (SNPs) are the most common type of polymorphisms found in the human genome. Effective genetic association studies require the identification of sets of tag SNPs that capture as much haplotype information as possible. Tag SNP selection is analogous to the problem of data compression in information theory. According to Shannon's framework, the optimal tag set maximizes the entropy of the tag SNPs subject to constraints on the number of SNPs. This approach requires an appropriate probabilistic model. Compared to simple measures of Linkage Disequilibrium (LD), a good model of haplotype sequences can more accurately account for LD structure. It also provides a machinery for the prediction of tagged SNPs and thereby to assess the performances of tag sets through their ability to predict larger SNP sets. RESULTS: Here, we compute the description code-lengths of SNP data for an array of models and we develop tag SNP selection methods based on these models and the strategy of entropy maximization. Using data sets from the HapMap and ENCODE projects, we show that the hidden Markov model introduced by Li and Stephens outperforms the other models in several aspects: description code-length of SNP data, information content of tag sets, and prediction of tagged SNPs. This is the first use of this model in the context of tag SNP selection. CONCLUSION: Our study provides strong evidence that the tag sets selected by our best method, based on Li and Stephens model, outperform those chosen by several existing methods. The results also suggest that information content evaluated with a good model is more sensitive for assessing the quality of a tagging set than the correct prediction rate of tagged SNPs. Besides, we show that haplotype phase uncertainty has an almost negligible impact on the ability of good tag sets to predict tagged SNPs. This justifies the selection of tag SNPs on the basis of haplotype informativeness, although genotyping studies do not directly assess haplotypes. A software that implements our approach is available

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

HAL Descartes

Hal-Diderot

Human Being Emotion in Cognitive Intelligent Robotic Control Pt I: Quantum / Soft Computing Approach

Author: Mamaeva Alla A
Shevchenko Andrey V
Ulyanov Sergey Victorovich
Publication venue: 'Bilingual Publishing Co.'
Publication date: 14/04/2020
Field of study

Abstract. The article consists of two parts. Part I shows the possibility of quantum / soft computing optimizers of knowledge bases (QSCOptKB™) as the toolkit of quantum deep machine learning technology implementation in the solution’s search of intelligent cognitive control tasks applied the cognitive helmet as neurointerface. In particular, case, the aim of this part is to demonstrate the possibility of classifying the mental states of a human being operator in on line with knowledge extraction from electroencephalograms based on SCOptKB™ and QCOptKB™ sophisticated toolkit. Application of soft computing technologies to identify objective indicators of the psychophysiological state of an examined person described. The role and necessity of applying intelligent information technologies development based on computational intelligence toolkits in the task of objective estimation of a general psychophysical state of a human being operator shown. Developed information technology examined with special (difficult in diagnostic practice) examples emotion state estimation of autism children (ASD) and dementia and background of the knowledge bases design for intelligent robot of service use is it. Application of cognitive intelligent control in navigation of autonomous robot for avoidance of obstacles demonstrated.

Bilingual Publishing Co. (BPC): E-Journals

Clustering heterogeneous categorical data using enhanced mini batch K-means with entropy distance measure

Author: Idrus Zainura
Mahfuz Nurshazwani Muhamad
Yusoff Marina
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/02/2023
Field of study

Clustering methods in data mining aim to group a set of patterns based on their similarity. In a data survey, heterogeneous information is established with various types of data scales like nominal, ordinal, binary, and Likert scales. A lack of treatment of heterogeneous data and information leads to loss of information and scanty decision-making. Although many similarity measures have been established, solutions for heterogeneous data in clustering are still lacking. The recent entropy distance measure seems to provide good results for the heterogeneous categorical data. However, it requires many experiments and evaluations. This article presents a proposed framework for heterogeneous categorical data solution using a mini batch k-means with entropy measure (MBKEM) which is to investigate the effectiveness of similarity measure in clustering method using heterogeneous categorical data. Secondary data from a public survey was used. The findings demonstrate the proposed framework has improved the clustering’s quality. MBKEM outperformed other clustering algorithms with the accuracy at 0.88, v-measure (VM) at 0.82, adjusted rand index (ARI) at 0.87, and Fowlkes-Mallow’s index (FMI) at 0.94. It is observed that the average minimum elapsed time-varying for cluster generation, k at 0.26 s. In the future, the proposed solution would be beneficial for improving the quality of clustering for heterogeneous categorical data problems in many domains

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

A new approach of top-down induction of decision trees for knowledge discovery

Author: Lee Jun-Youl
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2008
Field of study

Top-down induction of decision trees is the most popular technique for classification in the field of data mining and knowledge discovery. Quinlan developed the basic induction algorithm of decision trees, ID3 (1984), and extended to C4.5 (1993). There is a lot of research work for dealing with a single attribute decision-making node (so-called the first-order decision) of decision trees. Murphy and Pazzani (1991) addressed about multiple-attribute conditions at decision-making nodes. They show that higher order decision-making generates smaller decision trees and better accuracy. However, there always exist NP-complete combinations of multiple-attribute decision-makings.;We develop a new algorithm of second-order decision-tree inductions (SODI) for nominal attributes. The induction rules of first-order decision trees are combined by \u27AND\u27 logic only, but those of SODI consist of \u27AND\u27, \u27OR\u27, and \u27OTHERWISE\u27 logics. It generates more accurate results and smaller decision trees than any first-order decision tree inductions.;Quinlan used information gains via VC-dimension (Vapnik-Chevonenkis; Vapnik, 1995) for clustering the experimental values for each numerical attribute. However, many researchers have discovered the weakness of the use of VC-dim analysis. Bennett (1997) sophistically applies support vector machines (SVM) to decision tree induction. We suggest a heuristic algorithm (SVMM; SVM for Multi-category) that combines a TDIDT scheme with SVM. In this thesis it will be also addressed how to solve multiclass classification problems.;Our final goal for this thesis is IDSS (Induction of Decision Trees using SODI and SVMM). We will address how to combine SODI and SVMM for the construction of top-down induction of decision trees in order to minimize the generalized penalty cost

Digital Repository @ Iowa State University (ISU)

ProQuest OAI Repository

Estimating evolutionary dynamics of cleavage site peptides among H5HA avian influenza employing mathematical information theory approaches

Author: Dadgar Sherry
Publication venue: RIT Scholar Works
Publication date: 01/11/2008
Field of study

Estimating evolutionary conservation of cleavage site peptides among HA protein of all strains facilitates vaccine development against pandemic influenza. Conserved epitopes may be useful for diagnosis of animals infected with the influenza virus, and preventing their spread in other regions [ 1]. In the preliminary stage of this study, in silico analysis of hemagglutinin was applied to predict potential cleavage sites of each strain employing SigCleave [2] and SignalP 3.0 server [3]. The second stage of the study focused on analyzing the structure of connecting peptides of hemagglutinin cleavage sites based on the availability of the existing experimental data. Our result divulges higher frequency of base amino acids, essential for processing by the cellular protease, among pathogenic strains compared with non/low pathogenic strains. In addition, two complementary methods for identifying conserved amino acids were applied: statistical entropy based method, possibly the most sensitive tool to estimate the diversity of peptides [5], and relative entropy estimation. Analysis of both methods demonstrates that the connecting peptide of HA cleavage site of AIV in the United States were highly conserved over long periods of time. Entropy values aid to select those sequences that have the highest potential for mutation in a broad spectrum of avian population. Position 340 among our group of strains with the entropy value of 0.877928 has the highest bit of information value where highly conserved positions are those with

RIT Scholar Works

Demographics imputation in marketing sector by means of machine learning

Author: Venediktova Margarita Aleksandrovna
Publication venue
Publication date: 27/01/2023
Field of study

Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceThe goal of this project is to develop a predictive model in order to impute missing values in data collected through surveys (demographics data) and evaluate its performance. Currently there are two existing issues: demographics data for each user is either incomplete or missing entirely. Current POC is an attempt to exploit the capabilities of machine learning in order to impute missing demographics data. Data cleaning, normalization, feature selection was performed prior to applying sampling techniques and training several machine learning models. The following machine learning models were trained and tested: Random Forest and Gradient Boosting. After, the metrics appropriate for the current business purposes were selected and models’ performance was evaluated. The results for the targets ‘Ethnicity’, ‘Hispanic’ and ‘Household income’ are not within the acceptable range and therefore could not be used in production at the moment. The metrics obtained with the default hyperparameters indicate that both models demonstrate similar results for ‘Hispanic’ and ‘Ethnicity’ response variables. ‘Household income’ variable seems to have the poorest results, not allowing to predict the variable with adequate accuracy. Current POC suggests that the accurate prediction of demographic variable is complex task and is accompanied by certain challenges: weak relationship between demographic variables and purchase behavior, purchase location and neighborhood and its demographic characteristics, unreliable data, sparse feature set. Further investigations on feature selection and incorporation of other data sources for the training data should be considered

Repositório da Universidade Nova de Lisboa

Recommended from our members

Optimal design of water distribution networks with reliability considerations

Author: Khomsi D.
Publication venue
Publication date
Field of study

The overall aim of this research has been to develop new algorithms and computer software that may be used to assess the reliability of water distribution systems. Such a tool can be used by design engineers to create systems which are both economical in total cost commensurate with meeting targets for a specified level of reliability.The introduction describes how water supply and distribution systems are normally designed, what they comprise and problems associated with failure or lack of availability of an adequate supply to the end user. This is followed by a resume of current methods and algorithms for the analysis of networks and a detailed examination of the previous work on network optimisation and reliability. Three main algorithms exist for the analysis of water networks. These are the Hardy-Cross methods, the Newton-Raphson methods and the Linear method. A computer program based on the Linear method, which is known to be the most reliable, is proposed for the hydraulic analysis part of the present work. With respect to reliability, a full discussion of the topic, including all the various factors which influence it such as the stochastic nature of customer demands, the apparently random occurrence of pipe breakages and the concept of repair time, is presented. A reliability analysis model, that incorporates simultaneously the three reliability factors mentioned, for the assessment of nodal and system availabilities, is proposed, from which an efficient computer program has been developed and tested. Two models for the design of optimal water distribution systems, based on reliability criteria, have been developed, programmed and tested. The first model makes use of the entropy principle for producing 'reliable' distributions of flow and the Linear Programming technique is used for computation of the least cost design. In the second model, however, a Genetic Algorithm procedure, that incorporates the new reliability analysis model and which is superior to other models has been formulated. The thesis concludes with a comparison between the two methods formulated as a result of this research and applied to realistic practical systems, plus suggestions for further work to improve the optimisation of water distribution networks

City Research Online