2,546 research outputs found

    Unsupervised Graph-based Rank Aggregation for Improved Retrieval

    Full text link
    This paper presents a robust and comprehensive graph-based rank aggregation approach, used to combine results of isolated ranker models in retrieval tasks. The method follows an unsupervised scheme, which is independent of how the isolated ranks are formulated. Our approach is able to combine arbitrary models, defined in terms of different ranking criteria, such as those based on textual, image or hybrid content representations. We reformulate the ad-hoc retrieval problem as a document retrieval based on fusion graphs, which we propose as a new unified representation model capable of merging multiple ranks and expressing inter-relationships of retrieval results automatically. By doing so, we claim that the retrieval system can benefit from learning the manifold structure of datasets, thus leading to more effective results. Another contribution is that our graph-based aggregation formulation, unlike existing approaches, allows for encapsulating contextual information encoded from multiple ranks, which can be directly used for ranking, without further computations and post-processing steps over the graphs. Based on the graphs, a novel similarity retrieval score is formulated using an efficient computation of minimum common subgraphs. Finally, another benefit over existing approaches is the absence of hyperparameters. A comprehensive experimental evaluation was conducted considering diverse well-known public datasets, composed of textual, image, and multimodal documents. Performed experiments demonstrate that our method reaches top performance, yielding better effectiveness scores than state-of-the-art baseline methods and promoting large gains over the rankers being fused, thus demonstrating the successful capability of the proposal in representing queries based on a unified graph-based model of rank fusions

    Evolutionary Computation and QSAR Research

    Get PDF
    [Abstract] The successful high throughput screening of molecule libraries for a specific biological property is one of the main improvements in drug discovery. The virtual molecular filtering and screening relies greatly on quantitative structure-activity relationship (QSAR) analysis, a mathematical model that correlates the activity of a molecule with molecular descriptors. QSAR models have the potential to reduce the costly failure of drug candidates in advanced (clinical) stages by filtering combinatorial libraries, eliminating candidates with a predicted toxic effect and poor pharmacokinetic profiles, and reducing the number of experiments. To obtain a predictive and reliable QSAR model, scientists use methods from various fields such as molecular modeling, pattern recognition, machine learning or artificial intelligence. QSAR modeling relies on three main steps: molecular structure codification into molecular descriptors, selection of relevant variables in the context of the analyzed activity, and search of the optimal mathematical model that correlates the molecular descriptors with a specific activity. Since a variety of techniques from statistics and artificial intelligence can aid variable selection and model building steps, this review focuses on the evolutionary computation methods supporting these tasks. Thus, this review explains the basic of the genetic algorithms and genetic programming as evolutionary computation approaches, the selection methods for high-dimensional data in QSAR, the methods to build QSAR models, the current evolutionary feature selection methods and applications in QSAR and the future trend on the joint or multi-task feature selection methods.Instituto de Salud Carlos III, PIO52048Instituto de Salud Carlos III, RD07/0067/0005Ministerio de Industria, Comercio y Turismo; TSI-020110-2009-53)Galicia. Consellería de Economía e Industria; 10SIN105004P

    On Robust Face Recognition via Sparse Encoding: the Good, the Bad, and the Ugly

    Get PDF
    In the field of face recognition, Sparse Representation (SR) has received considerable attention during the past few years. Most of the relevant literature focuses on holistic descriptors in closed-set identification applications. The underlying assumption in SR-based methods is that each class in the gallery has sufficient samples and the query lies on the subspace spanned by the gallery of the same class. Unfortunately, such assumption is easily violated in the more challenging face verification scenario, where an algorithm is required to determine if two faces (where one or both have not been seen before) belong to the same person. In this paper, we first discuss why previous attempts with SR might not be applicable to verification problems. We then propose an alternative approach to face verification via SR. Specifically, we propose to use explicit SR encoding on local image patches rather than the entire face. The obtained sparse signals are pooled via averaging to form multiple region descriptors, which are then concatenated to form an overall face descriptor. Due to the deliberate loss spatial relations within each region (caused by averaging), the resulting descriptor is robust to misalignment & various image deformations. Within the proposed framework, we evaluate several SR encoding techniques: l1-minimisation, Sparse Autoencoder Neural Network (SANN), and an implicit probabilistic technique based on Gaussian Mixture Models. Thorough experiments on AR, FERET, exYaleB, BANCA and ChokePoint datasets show that the proposed local SR approach obtains considerably better and more robust performance than several previous state-of-the-art holistic SR methods, in both verification and closed-set identification problems. The experiments also show that l1-minimisation based encoding has a considerably higher computational than the other techniques, but leads to higher recognition rates

    Artificial intelligence for crystal structure prediction

    Get PDF
    Predicting the ground-state and metastable crystal structures of materials from just knowing their composition is a formidable challenge in computational materials discovery. Recent studies that were published in the group of M. Scheffler have investigated how the relative stability of compounds between two crystal-structure types can be predicted from the properties of their atomic constituents within the framework of symbolic regression. By using a novel compressed-sensing-based method, the sure independence screening and sparsifying operator (SISSO), the descriptor that best captured the structural stability was identified from billions of candidates. A descriptor is a vector of analytical formulas built from simple physical quantities. In the first part of the thesis, a multi-task-learning extension of SISSO (MT-SISSO) that enables the treatment of the structural stability of compounds among multiple structure types is introduced. We show how the multi-task method that identifies a single descriptor for all structure types enables the prediction of a well-defined structural stability and, therefore, the design of a crystal-structure map. Moreover, we present how MT-SISSO determines accurate, predictive models even when trained with largely incomplete databases. A different artificial-intelligence approach proposed for tackling the crystal-structure prediction challenge is based on approximating the Born-Oppenheimer potential-energy surface (PES). In particular, Gaussian Approximation Potentials that are typically composed of a combination of two-, three-, and many-body potentials and fitted to elemental systems have attracted attention in recent years. First examples that were published in the group of G. Csanyi have demonstrated how the ground-state and metastable phases could correctly be identified for Si, C, P, and B, by exploring the PES that was predicted by such machine-learning potentials (ML potentials). However, the ML potentials introduced so far show limited transferability, i.e. their accuracy rapidly decreases in regions of the PES that are distant from the training data. As a consequence, these ML potentials are usually fitted to large training databases. Moreover, such training data needs to be constructed for every new material (more precisely, tuple of species types) that was not in the initial training database. For instance, the chemical-species information does not enter the ML potentials in the form of a variable. The second part of the thesis introduces a neural-network-based scheme to make ML potentials, specifically two- and three-body potentials, explicitly chemical-species-type dependent. We call the models chemical transferable potentials (CTP). The methodology enables the prediction of materials not included in the training data. As a showcase example, we consider a set of binary materials. The thesis tackles two challenges at the same time: a) the prediction of the PES of a material not contained in the training data and b) constructing robust models from a limited set of crystal structures. In particular, our tests examine to which extent the ML potentials that were trained on such sparse data allow an accurate prediction of regions of the PES that are far from the training data (in the structural space) but are sampled in a global crystal-structure search. When performing both constrained structure searches among a set of considered crystal-structure prototypes and an unbiased global structure search, we find that missing data in those regions does not hinder our models from identifying the ground-state phases of materials, even if the materials are not in the training data. Moreover, we compare our method to two state-of-the-art ML methods that, similarly to CTP, are capable of predicting the potential energies of materials not included in the training data. These are the extension of the smooth overlap of atomic positions by an alchemical similarity kernel (ASOAP) introduced in the group of M. Ceriotti, and the crystal graph convolutional neural networks (CGCNN) introduced in the group of J. C. Grossman. In the literature so far, the ASOAP and CGCNN have been benchmarked on single-point energy calculations but have not been investigated in combination with global, unbiased structure-search scenarios. We include the ASOAP and CGCNN in our structure-search tests. Our analysis reveals that, unlike CTP, these two approaches learn unphysical shapes of the PES in regions that surround the training data which are typically sampled in a structure-search application. This shortcoming is particularly evident in the unbiased global-search scenario.Die Vorhersage der Grundzustands- und metastabilen Kristallstrukturen von Materialien anhand der Kenntnis ihrer Zusammensetzung ist in der computergestützten Materialwissenschaft eine Herausforderung. In neueren Studien der Forschungsgruppe M. Schefflers wurde untersucht, wie die Energiedifferenz zwischen zwei Kristallstrukturtypen der gleichen chemischen Zusammensetzung anhand der Eigenschaften ihrer atomaren Bestandteile im Rahmen der symbolischen Regression vorhergesagt werden kann. Mithilfe der Verwendung einer neuartigen Compressed-Sensing-basierten Methode, des Sure Independence Screening and Sparsifying Operator (SISSO), wurde aus Milliarden von Kandidaten der Deskriptor identifiziert, der die strukturelle Stabilität am besten erfasst. Ein Deskriptor ist ein Vektor aus analytischen Formeln, die sich aus einfachen physikalischen Größen zusammensetzen. Im ersten Teil der Arbeit wird eine Multi-Task-Learning-Erweiterung von SISSO (MT-SISSO) vorgestellt, die das Behandeln von Energiedifferenzen zwischen mehreren Kristallstrukturtypen des gleichen Materials ermöglicht. Wir demonstrieren, wie die Multi-Task- Methode, die einen einzigen Deskriptor für alle Strukturtypen identifiziert, die Vorhersage einer wohldefinierten strukturellen Stabilität und damit das Erstellen einer Kristallstrukturkarte ermöglicht. Darüber hinaus zeigen wir, wie MT-SISSO genaue Vorhersagemodelle bildet, selbst wenn die Modelle mit weitgehend unvollständigen Daten trainiert werden. Ein weiterer bekannter Ansatz zur Bewältigung der Herausforderung der Kristallstrukturvorhersage mit künstlicher Intelligenz basiert auf der Approximation der Born-Oppenheimer-Potentialenergieoberfläche (PEO). Insbesondere haben Gaussian Approximation Potentials, die in der Regel aus einer Kombination von Zwei-, Drei- und Vielteilchenpotentialen bestehen und an Materialien, die aus einem chemischen Element bestehen, gefittet werden, in den letzten Jahren Aufmerksamkeit erregt. Erste Beispiele, die in der Gruppe von G. Csanyi veröffentlicht wurden, haben gezeigt, wie die Grundzustands- und metastabilen Kristallstrukturen von Si, C, P und B korrekt identifiziert werden können. Dabei wurde die PEO erkundet, die durch die Gaussian Approximation Potentials - oder allgemeiner Machine-Learning-Potentials (ML-Potentials) - vorhergesagt wurde. Die Transferierbarkeit der bisher bekannten ML-Potentials ist allerdings begrenzt, d. h. ihre Genauigkeit nimmt in Bereichen der PEO, die weit entfernt von den Trainingsdaten liegen, rapide ab. Folglich werden diese ML-Potentiale an große Trainingsdatenbanken gefittet. Des Weiteren müssen solche Trainingsdaten für jedes neue Material (genauer gesagt, Tupel von chemischen Elementen), das nicht in der aktuellen Trainingsdatenbank enthalten ist, konstruiert werden. Beispielsweise fehlt in den ML-Potentials eine Beschreibung der Eigenschaften der chemischen Elemente der Materialien in Form einer Variable. Im zweiten Teil der Arbeit wird eine auf Neuronalen-Netzen-basierende Methode entwickelt, die eine explizite Abhängigkeit der ML-Potentials, insbesondere Zwei- und Drei-Teilchen-Potentiale, von den chemischen Elementen des Materials erlaubt. Wir nennen die Modelle Chemical Transferable Potentials (CTP). Die Methodik ermöglicht die Vorhersage von Materialien, die nicht in den Trainingsdaten enthalten sind. Als Vorzeigebeispiel betrachten wir eine Reihe von binären Materialien. Die Arbeit befasst sich mit zwei Herausforderungen zur gleichen Zeit: a) der Vorhersage der PEO eines Materials, das nicht in den Trainingsdaten enthalten ist, und b) das Bilden robuster Modelle aus einer begrenzten Anzahl an Kristallstrukturen. In unseren Untersuchungen wird insbesondere evaluiert, inwieweit die auf solch spärlichen Daten trainierten ML-Potentiale eine genaue Vorhersage von Regionen der PEO ermöglichen, die zwar weit von den Trainingsdaten (im Kristallstrukturraum) entfernt liegen, aber in einer globalen Kristallstruktursuche mit abgetastet werden. Sowohl bei eingeschränkten Kristallstruktursuchen unter einer Reihe von betrachteten Kristallstrukturprototypen als auch bei einer uneingeschränkten globalen Kristallstruktursuche stellen wir fest, dass fehlende Daten in diesen Kristallstrukturregionen unsere Modelle nicht daran hindern, die Grundzustandskristallstrukturen von Materialien zu identifizieren, selbst wenn die Materialien nicht in den Trainingsdaten enthalten sind. Darüber hinaus vergleichen wir unsere Methode mit zwei modernen ML-Methoden, die ähnlich wie die CTP in der Lage sind, die potentielle Energie von Materialien vorherzusagen, die nicht in den Trainingsdaten enthalten sind. Die eine Methode basiert auf einer Erweiterung des Smooth Overlap of Atomic Positions um einen alchemical Ähnlichkeitsmaß (ASOAP), welche in der Gruppe von M. Ceriotti entwickelt wurde. Die zweite Methode heißt Crystal Graph Convolutional Neural Networks (CGCNN) und wurde in der Gruppe von J. C. Grossman eingeführt. Bisher wurden ASOAP und CGCNN in der Literatur anhand von Einzelpunkt-Energieberechnungen validiert, aber nicht im Rahmen globaler uneingeschränkter Kristallstruktursuchen. Wir wenden unsere Kristallstruktursuchtests ebenso auf ASOAP und CGCNN an. Unsere Untersuchungen zeigen, dass die beiden Methoden im Gegensatz zu den CTP unphysikalische Formen der PEO in Regionen lernen, die weit von den Trainingsdaten entfernt liegen, aber in einer Kristallstruktursuche üblicherweise abgetastet werden. Diese Limitation kommt besonders im uneingeschränkten und globalen Suchszenario zur Geltung

    CAESAR models for developmental toxicity

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The new REACH legislation requires assessment of a large number of chemicals in the European market for several endpoints. Developmental toxicity is one of the most difficult endpoints to assess, on account of the complexity, length and costs of experiments. Following the encouragement of QSAR (<it>in silico</it>) methods provided in the REACH itself, the CAESAR project has developed several models.</p> <p>Results</p> <p>Two QSAR models for developmental toxicity have been developed, using different statistical/mathematical methods. Both models performed well. The first makes a classification based on a random forest algorithm, while the second is based on an adaptive fuzzy partition algorithm. The first model has been implemented and inserted into the CAESAR on-line application, which is java-based software that allows everyone to freely use the models.</p> <p>Conclusions</p> <p>The CAESAR QSAR models have been developed with the aim to minimize false negatives in order to make them more usable for REACH. The CAESAR on-line application ensures that both industry and regulators can easily access and use the developmental toxicity model (as well as the models for the other four endpoints).</p

    Evaluation of CNN architectures for gait recognition based on optical flow maps

    Get PDF
    This work targets people identification in video based on the way they walk (\ie gait) by using deep learning architectures. We explore the use of convolutional neural networks (CNN) for learning high-level descriptors from low-level motion features (\ie optical flow components). The low number of training samples for each subject and the use of a test set containing subjects different from the training ones makes the search of a good CNN architecture a challenging task.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tec

    Advanced Feature Learning and Representation in Image Processing for Anomaly Detection

    Get PDF
    Techniques for improving the information quality present in imagery for feature extraction are proposed in this thesis. Specifically, two methods are presented: soft feature extraction and improved Evolution-COnstructed (iECO) features. Soft features comprise the extraction of image-space knowledge by performing a per-pixel weighting based on an importance map. Through soft features, one is able to extract features relevant to identifying a given object versus its background. Next, the iECO features framework is presented. The iECO features framework uses evolutionary computation algorithms to learn an optimal series of image transforms, specific to a given feature descriptor, to best extract discriminative information. That is, a composition of image transforms are learned from training data to present a given feature descriptor with the best opportunity to extract its information for the application at hand. The proposed techniques are applied to an automatic explosive hazard detection application and significant results are achieved
    corecore