204 research outputs found
Unsupervised Visual and Textual Information Fusion in Multimedia Retrieval - A Graph-based Point of View
Multimedia collections are more than ever growing in size and diversity.
Effective multimedia retrieval systems are thus critical to access these
datasets from the end-user perspective and in a scalable way. We are interested
in repositories of image/text multimedia objects and we study multimodal
information fusion techniques in the context of content based multimedia
information retrieval. We focus on graph based methods which have proven to
provide state-of-the-art performances. We particularly examine two of such
methods : cross-media similarities and random walk based scores. From a
theoretical viewpoint, we propose a unifying graph based framework which
encompasses the two aforementioned approaches. Our proposal allows us to
highlight the core features one should consider when using a graph based
technique for the combination of visual and textual information. We compare
cross-media and random walk based results using three different real-world
datasets. From a practical standpoint, our extended empirical analysis allow us
to provide insights and guidelines about the use of graph based methods for
multimodal information fusion in content based multimedia information
retrieval.Comment: An extended version of the paper: Visual and Textual Information
Fusion in Multimedia Retrieval using Semantic Filtering and Graph based
Methods, by J. Ah-Pine, G. Csurka and S. Clinchant, submitted to ACM
Transactions on Information System
Overview of the Relational Analysis approach in Data-Mining and Multi-criteria Decision Making
International audienceIn this chapter we introduce a general framework called the Relational Analysis approach and its related contributions and applications in the fields of data analysis, data mining and multi-criteria decision making. This approach was initiated by J.F. Marcotorchino and P. Michaud at the end of the 70's and has generated many research activities. However, the aspects of this framework that we would like to focus on are of a theoretical kind. Indeed, we are aimed at recalling the background and the basics of this framework, the unifying results and the modeling contributions that it has allowed to achieve. Besides, the main tasks that we are interested in are the ranking aggregation problem, the clustering problem and the block seriation problem. Those problems are combinatorial ones and the computational considerations of such tasks in the context of the RA methodology will not be covered. However, among the list of references that we give thoughout this chapter, there are numerous articles that the interested reader could consult to this end
Hypergraph Modelization of a Syntactically Annotated English Wikipedia Dump
International audienceWikipedia, the well known internet encyclopedia, is nowadays a widely used source of information. To leverage its rich information, already parsed versions of Wikipedia have been proposed. We present an annotated dump of the English Wikipedia. This dump draws upon previously released Wikipedia parsed dumps. Still, we head in a different direction. In this parse we focus more into the syntactical characteristics of words: aside from the classical Part-of-Speech (PoS) tags and dependency parsing relations, we provide the full constituent parse branch for each word in a succinct way. Additionally, we propose a hypergraph network representation of the extracted linguistic information. The proposed modelization aims to take advantage of the information stocked within our parsed Wikipedia dump. We hope that by releasing these resources, researchers from the concerned communities will have a ready-to-experiment Wikipedia corpus to compare and distribute their work. We render public our parsed Wikipedia dump as well as the tool (and its source code) used to perform the parse. The hypergraph network and its related metadata is also distributed
Classification ascendante hiérarchique à noyaux et une application aux données textuelles
National audienceLa formule de Lance et Williams permet d'unifier plusieurs méthodes de classification ascendante hiérarchique (CAH). Dans cet article, nous suppo-sons que les données sont représentées dans un espace euclidien et nous établis-sons une nouvelle expression de cette formule en utilisant les similarités cosinus au lieu des distances euclidiennes au carré. Notre approche présente les avan-tages suivants. D'une part, elle permet d'étendre naturellement les méthodes classiques de CAH aux fonctions noyau. D'autre part, elle permet d'appliquer des méthodes d'écrêtage permettant de rendre la matrice de similarités creuse afin d'améliorer la complexité de la CAH. L'application de notre approche sur des tâches de classification automatique de données textuelles montre d'une part, que le passage à l'échelle est amélioré en mémoire et en temps de traitement; d'autre part, que la qualité des résultats est préservée voire améliorée
Backprojection for Training Feedforward Neural Networks in the Input and Feature Spaces
After the tremendous development of neural networks trained by
backpropagation, it is a good time to develop other algorithms for training
neural networks to gain more insights into networks. In this paper, we propose
a new algorithm for training feedforward neural networks which is fairly faster
than backpropagation. This method is based on projection and reconstruction
where, at every layer, the projected data and reconstructed labels are forced
to be similar and the weights are tuned accordingly layer by layer. The
proposed algorithm can be used for both input and feature spaces, named as
backprojection and kernel backprojection, respectively. This algorithm gives an
insight to networks with a projection-based perspective. The experiments on
synthetic datasets show the effectiveness of the proposed method.Comment: Accepted (to appear) in International Conference on Image Analysis
and Recognition (ICIAR) 2020, Springe
Principal Component Analysis Using Structural Similarity Index for Images
Despite the advances of deep learning in specific tasks using images, the
principled assessment of image fidelity and similarity is still a critical
ability to develop. As it has been shown that Mean Squared Error (MSE) is
insufficient for this task, other measures have been developed with one of the
most effective being Structural Similarity Index (SSIM). Such measures can be
used for subspace learning but existing methods in machine learning, such as
Principal Component Analysis (PCA), are based on Euclidean distance or MSE and
thus cannot properly capture the structural features of images. In this paper,
we define an image structure subspace which discriminates different types of
image distortions. We propose Image Structural Component Analysis (ISCA) and
also kernel ISCA by using SSIM, rather than Euclidean distance, in the
formulation of PCA. This paper provides a bridge between image quality
assessment and manifold learning opening a broad new area for future research.Comment: Paper for the methods named "Image Structural Component Analysis
(ISCA)" and "Kernel Image Structural Component Analysis (Kernel ISCA)
Weighted Fisher Discriminant Analysis in the Input and Feature Spaces
Fisher Discriminant Analysis (FDA) is a subspace learning method which
minimizes and maximizes the intra- and inter-class scatters of data,
respectively. Although, in FDA, all the pairs of classes are treated the same
way, some classes are closer than the others. Weighted FDA assigns weights to
the pairs of classes to address this shortcoming of FDA. In this paper, we
propose a cosine-weighted FDA as well as an automatically weighted FDA in which
weights are found automatically. We also propose a weighted FDA in the feature
space to establish a weighted kernel FDA for both existing and newly proposed
weights. Our experiments on the ORL face recognition dataset show the
effectiveness of the proposed weighting schemes.Comment: Accepted (to appear) in International Conference on Image Analysis
and Recognition (ICIAR) 2020, Springe
Evaluating the Viscoelastic Properties of Tissue from Laser Speckle Fluctuations
Most pathological conditions such as atherosclerosis, cancer, neurodegenerative, and orthopedic disorders are accompanied with alterations in tissue viscoelasticity. Laser Speckle Rheology (LSR) is a novel optical technology that provides the invaluable potential for mechanical assessment of tissue in situ. In LSR, the specimen is illuminated with coherent light and the time constant of speckle fluctuations, Ï„, is measured using a high speed camera. Prior work indicates that Ï„ is closely correlated with tissue microstructure and composition. Here, we investigate the relationship between LSR measurements of Ï„ and sample mechanical properties defined by the viscoelastic modulus, G*. Phantoms and tissue samples over a broad range of viscoelastic properties are evaluated using LSR and conventional mechanical testing. Results demonstrate a strong correlation between Ï„ and |G*| for both phantom (r = 0.79, p <0.0001) and tissue (r = 0.88, p<0.0001) specimens, establishing the unique capability of LSR in characterizing tissue viscoelasticity
Population based trends in mortality, morbidity and treatment for very preterm- and very low birth weight infants over 12 years
BACKGROUND: Over the last two decades, improvements in medical care have been associated with a significant increase and better outcome of very preterm (VP, < 32 completed gestational weeks) and very low birth weight (VLBW, < 1500 g) infants. Only a few publications analyse changes of their short-term outcome in a geographically defined area over more than 10 years. We therefore aimed to investigate the net change of VP- and VLBW infants leaving the hospital without major complications. METHODS: Our population-based observational cohort study used the Minimal Neonatal Data Set, a database maintained by the Swiss Society of Neonatology including information of all VP- and VLBW infants. Perinatal characteristics, mortality and morbidity rates and the survival free of major complications were analysed and their temporal trends evaluated. RESULTS: In 1996, 2000, 2004, and 2008, a total number of 3090 infants were enrolled in the Network Database. At the same time the rate of VP- and VLBW neonates increased significantly from 0.87% in 1996 to 1.10% in 2008 (p < 0.001). The overall mortality remained stable by 13%, but the survival free of major complications increased from 66.9% to 71.7% (p < 0.01). The percentage of infants getting a full course of antenatal corticosteroids increased from 67.7% in 1996 to 91.4% in 2008 (p < 0.001). Surfactant was given more frequently (24.8% in 1996 compared to 40.1% in 2008, p < 0.001) and the frequency of mechanical ventilation remained stable by about 43%. However, the use of CPAP therapy increased considerably from 43% to 73.2% (p < 0.001). Some of the typical neonatal pathologies like bronchopulmonary dysplasia, necrotising enterocolitis and intraventricular haemorrhage decreased significantly (p ≤ 0.02) whereas others like patent ductus arteriosus and respiratory distress syndrome increased (p < 0.001). CONCLUSIONS: Over the 12-year observation period, the number of VP- and VLBW infants increased significantly. An unchanged overall mortality rate and an increase of survivors free of major complication resulted in a considerable net gain in infants with potentially good outcome
Clustering cliques for graph-based summarization of the biomedical research literature
BACKGROUND: Graph-based notions are increasingly used in biomedical data mining and knowledge discovery tasks. In this paper, we present a clique-clustering method to automatically summarize graphs of semantic predications produced from PubMed citations (titles and abstracts). RESULTS: SemRep is used to extract semantic predications from the citations returned by a PubMed search. Cliques were identified from frequently occurring predications with highly connected arguments filtered by degree centrality. Themes contained in the summary were identified with a hierarchical clustering algorithm based on common arguments shared among cliques. The validity of the clusters in the summaries produced was compared to the Silhouette-generated baseline for cohesion, separation and overall validity. The theme labels were also compared to a reference standard produced with major MeSH headings. CONCLUSIONS: For 11 topics in the testing data set, the overall validity of clusters from the system summary was 10% better than the baseline (43% versus 33%). While compared to the reference standard from MeSH headings, the results for recall, precision and F-score were 0.64, 0.65, and 0.65 respectively
- …