2,584 research outputs found

    A two-tiered 2D visual tool for assessing classifier performance

    Get PDF
    In this article, a new kind of 2D tool is proposed, namely ⟨φ δ⟩ diagrams, able to highlight most of the information deemed relevant for classifier building and assessment. In particular, accuracy, bias and break-even points are immediately evident therein. These diagrams come in two different forms: the first is aimed at representing the phenomenon under investigation in a space where the imbalance between negative and positive samples is not taken into account, the second (which is a generalization of the first) is able to visualize relevant information in a space that accounts also for the imbalance. According to a specific design choice, all properties found in the first space hold also in the second. The combined use of φ and δ can give important information to researchers involved in the activity of building intelligent systems, in particular for classifier performance assessment and feature ranking/selection

    A cDNA Microarray Gene Expression Data Classifier for Clinical Diagnostics Based on Graph Theory

    Get PDF
    Despite great advances in discovering cancer molecular profiles, the proper application of microarray technology to routine clinical diagnostics is still a challenge. Current practices in the classification of microarrays' data show two main limitations: the reliability of the training data sets used to build the classifiers, and the classifiers' performances, especially when the sample to be classified does not belong to any of the available classes. In this case, state-of-the-art algorithms usually produce a high rate of false positives that, in real diagnostic applications, are unacceptable. To address this problem, this paper presents a new cDNA microarray data classification algorithm based on graph theory and is able to overcome most of the limitations of known classification methodologies. The classifier works by analyzing gene expression data organized in an innovative data structure based on graphs, where vertices correspond to genes and edges to gene expression relationships. To demonstrate the novelty of the proposed approach, the authors present an experimental performance comparison between the proposed classifier and several state-of-the-art classification algorithm

    Accountable, Explainable Artificial Intelligence Incorporation Framework for a Real-Time Affective State Assessment Module

    Get PDF
    The rapid growth of artificial intelligence (AI) and machine learning (ML) solutions has seen it adopted across various industries. However, the concern of ‘black-box’ approaches has led to an increase in the demand for high accuracy, transparency, accountability, and explainability in AI/ML approaches. This work contributes through an accountable, explainable AI (AXAI) framework for delineating and assessing AI systems. This framework has been incorporated into the development of a real-time, multimodal affective state assessment system

    Development of soft computing and applications in agricultural and biological engineering

    Get PDF
    Soft computing is a set of “inexact” computing techniques, which are able to model and analyze very complex problems. For these complex problems, more conventional methods have not been able to produce cost-effective, analytical, or complete solutions. Soft computing has been extensively studied and applied in the last three decades for scientific research and engineering computing. In agricultural and biological engineering, researchers and engineers have developed methods of fuzzy logic, artificial neural networks, genetic algorithms, decision trees, and support vector machines to study soil and water regimes related to crop growth, analyze the operation of food processing, and support decision-making in precision farming. This paper reviews the development of soft computing techniques. With the concepts and methods, applications of soft computing in the field of agricultural and biological engineering are presented, especially in the soil and water context for crop management and decision support in precision agriculture. The future of development and application of soft computing in agricultural and biological engineering is discussed

    Neural networks in multiphase reactors data mining: feature selection, prior knowledge, and model design

    Get PDF
    Les réseaux de neurones artificiels (RNA) suscitent toujours un vif intérêt dans la plupart des domaines d’ingénierie non seulement pour leur attirante « capacité d’apprentissage » mais aussi pour leur flexibilité et leur bonne performance, par rapport aux approches classiques. Les RNA sont capables «d’approximer» des relations complexes et non linéaires entre un vecteur de variables d’entrées x et une sortie y. Dans le contexte des réacteurs multiphasiques le potentiel des RNA est élevé car la modélisation via la résolution des équations d’écoulement est presque impossible pour les systèmes gaz-liquide-solide. L’utilisation des RNA dans les approches de régression et de classification rencontre cependant certaines difficultés. Un premier problème, général à tous les types de modélisation empirique, est celui de la sélection des variables explicatives qui consiste à décider quel sous-ensemble xs ⊂ x des variables indépendantes doit être retenu pour former les entrées du modèle. Les autres difficultés à surmonter, plus spécifiques aux RNA, sont : le sur-apprentissage, l’ambiguïté dans l’identification de l’architecture et des paramètres des RNA et le manque de compréhension phénoménologique du modèle résultant. Ce travail se concentre principalement sur trois problématiques dans l’utilisation des RNA: i) la sélection des variables, ii) l’utilisation de la connaissance apriori, et iii) le design du modèle. La sélection des variables, dans le contexte de la régression avec des groupes adimensionnels, a été menée avec les algorithmes génétiques. Dans le contexte de la classification, cette sélection a été faite avec des méthodes séquentielles. Les types de connaissance a priori que nous avons insérés dans le processus de construction des RNA sont : i) la monotonie et la concavité pour la régression, ii) la connectivité des classes et des coûts non égaux associés aux différentes erreurs, pour la classification. Les méthodologies développées dans ce travail ont permis de construire plusieurs modèles neuronaux fiables pour les prédictions de la rétention liquide et de la perte de charge dans les colonnes garnies à contre-courant ainsi que pour la prédiction des régimes d’écoulement dans les colonnes garnies à co-courant.Artificial neural networks (ANN) have recently gained enormous popularity in many engineering fields, not only for their appealing “learning ability, ” but also for their versatility and superior performance with respect to classical approaches. Without supposing a particular equational form, ANNs mimic complex nonlinear relationships that might exist between an input feature vector x and a dependent (output) variable y. In the context of multiphase reactors the potential of neural networks is high as the modeling by resolution of first principle equations to forecast sought key hydrodynamics and transfer characteristics is intractable. The general-purpose applicability of neural networks in regression and classification, however, poses some subsidiary difficulties that can make their use inappropriate for certain modeling problems. Some of these problems are general to any empirical modeling technique, including the feature selection step, in which one has to decide which subset xs ⊂ x should constitute the inputs (regressors) of the model. Other weaknesses specific to the neural networks are overfitting, model design ambiguity (architecture and parameters identification), and the lack of interpretability of resulting models. This work addresses three issues in the application of neural networks: i) feature selection ii) prior knowledge matching within the models (to answer to some extent the overfitting and interpretability issues), and iii) the model design. Feature selection was conducted with genetic algorithms (yet another companion from artificial intelligence area), which allowed identification of good combinations of dimensionless inputs to use in regression ANNs, or with sequential methods in a classification context. The type of a priori knowledge we wanted the resulting ANN models to match was the monotonicity and/or concavity in regression or class connectivity and different misclassification costs in classification. Even the purpose of the study was rather methodological; some resulting ANN models might be considered contributions per se. These models-- direct proofs for the underlying methodologies-- are useful for predicting liquid hold-up and pressure drop in counter-current packed beds and flow regime type in trickle beds

    Illumination tolerance in facial recognition

    Get PDF
    In this research work, five different preprocessing techniques were experimented with two different classifiers to find the best match for preprocessor + classifier combination to built an illumination tolerant face recognition system. Hence, a face recognition system is proposed based on illumination normalization techniques and linear subspace model using two distance metrics on three challenging, yet interesting databases. The databases are CAS PEAL database, the Extended Yale B database, and the AT&T database. The research takes the form of experimentation and analysis in which five illumination normalization techniques were compared and analyzed using two different distance metrics. The performances and execution times of the various techniques were recorded and measured for accuracy and efficiency. The illumination normalization techniques were Gamma Intensity Correction (GIC), discrete Cosine Transform (DCT), Histogram Remapping using Normal distribution (HRN), Histogram Remapping using Log-normal distribution (HRL), and Anisotropic Smoothing technique (AS). The linear subspace models utilized were principal component analysis (PCA) and Linear Discriminant Analysis (LDA). The two distance metrics were Euclidean and Cosine distance. The result showed that for databases with both illumination (shadows), and lighting (over-exposure) variations like the CAS PEAL database the Histogram remapping technique with normal distribution produced excellent result when the cosine distance is used as the classifier. The result indicated 65% recognition rate in 15.8 ms/img. Alternatively for databases consisting of pure illumination variation, like the extended Yale B database, the Gamma Intensity Correction (GIC) merged with the Euclidean distance metric gave the most accurate result with 95.4% recognition accuracy in 1ms/img. It was further gathered from the set of experiments that the cosine distance produces more accurate result compared to the Euclidean distance metric. However the Euclidean distance is faster than the cosine distance in all the experiments conducted

    A Novel Malware Target Recognition Architecture for Enhanced Cyberspace Situation Awareness

    Get PDF
    The rapid transition of critical business processes to computer networks potentially exposes organizations to digital theft or corruption by advanced competitors. One tool used for these tasks is malware, because it circumvents legitimate authentication mechanisms. Malware is an epidemic problem for organizations of all types. This research proposes and evaluates a novel Malware Target Recognition (MaTR) architecture for malware detection and identification of propagation methods and payloads to enhance situation awareness in tactical scenarios using non-instruction-based, static heuristic features. MaTR achieves a 99.92% detection accuracy on known malware with false positive and false negative rates of 8.73e-4 and 8.03e-4 respectively. MaTR outperforms leading static heuristic methods with a statistically significant 1% improvement in detection accuracy and 85% and 94% reductions in false positive and false negative rates respectively. Against a set of publicly unknown malware, MaTR detection accuracy is 98.56%, a 65% performance improvement over the combined effectiveness of three commercial antivirus products
    corecore