6,960 research outputs found

    Effective retrieval and new indexing method for case based reasoning: Application in chemical process design

    Get PDF
    In this paper we try to improve the retrieval step for case based reasoning for preliminary design. This improvement deals with three major parts of our CBR system. First, in the preliminary design step, some uncertainties like imprecise or unknown values remain in the description of the problem, because they need a deeper analysis to be withdrawn. To deal with this issue, the faced problem description is soften with the fuzzy sets theory. Features are described with a central value, a percentage of imprecision and a relation with respect to the central value. These additional data allow us to build a domain of possible values for each attributes. With this representation, the calculation of the similarity function is impacted, thus the characteristic function is used to calculate the local similarity between two features. Second, we focus our attention on the main goal of the retrieve step in CBR to find relevant cases for adaptation. In this second part, we discuss the assumption of similarity to find the more appropriated case. We put in highlight that in some situations this classical similarity must be improved with further knowledge to facilitate case adaptation. To avoid failure during the adaptation step, we implement a method that couples similarity measurement with adaptability one, in order to approximate the cases utility more accurately. The latter gives deeper information for the reusing of cases. In a last part, we present a generic indexing technique for the base, and a new algorithm for the research of relevant cases in the memory. The sphere indexing algorithm is a domain independent index that has performances equivalent to the decision tree ones. But its main strength is that it puts the current problem in the center of the research area avoiding boundaries issues. All these points are discussed and exemplified through the preliminary design of a chemical engineering unit operation

    Two Approaches for Text Segmentation in Web Images

    No full text
    There is a significant need to recognise the text in images on web pages, both for effective indexing and for presentation by non-visual means (e.g., audio). This paper presents and compares two novel methods for the segmentation of characters for subsequent extraction and recognition. The novelty of both approaches is the combination of (different in each case) topological features of characters with an anthropocentric perspective of colour perception— in preference to RGB space analysis. Both approaches enable the extraction of text in complex situations such as in the presence of varying colour and texture (characters and background)

    Two Approaches for Text Segmentation in Web Images

    Get PDF
    There is a significant need to recognise the text in images on web pages, both for effective indexing and for presentation by non-visual means (e.g., audio). This paper presents and compares two novel methods for the segmentation of characters for subsequent extraction and recognition. The novelty of both approaches is the combination of (different in each case) topological features of characters with an anthropocentric perspective of colour perception— in preference to RGB space analysis. Both approaches enable the extraction of text in complex situations such as in the presence of varying colour and texture (characters and background)

    Improving average ranking precision in user searches for biomedical research datasets

    Full text link
    Availability of research datasets is keystone for health and life science study reproducibility and scientific progress. Due to the heterogeneity and complexity of these data, a main challenge to be overcome by research data management systems is to provide users with the best answers for their search queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we investigate a novel ranking pipeline to improve the search of datasets used in biomedical experiments. Our system comprises a query expansion model based on word embeddings, a similarity measure algorithm that takes into consideration the relevance of the query terms, and a dataset categorisation method that boosts the rank of datasets matching query constraints. The system was evaluated using a corpus with 800k datasets and 21 annotated user queries. Our system provides competitive results when compared to the other challenge participants. In the official run, it achieved the highest infAP among the participants, being +22.3% higher than the median infAP of the participant's best submissions. Overall, it is ranked at top 2 if an aggregated metric using the best official measures per participant is considered. The query expansion method showed positive impact on the system's performance increasing our baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively. Our similarity measure algorithm seems to be robust, in particular compared to Divergence From Randomness framework, having smaller performance variations under different training conditions. Finally, the result categorization did not have significant impact on the system's performance. We believe that our solution could be used to enhance biomedical dataset management systems. In particular, the use of data driven query expansion methods could be an alternative to the complexity of biomedical terminologies

    Visual Information Retrieval in Endoscopic Video Archives

    Get PDF
    In endoscopic procedures, surgeons work with live video streams from the inside of their subjects. A main source for documentation of procedures are still frames from the video, identified and taken during the surgery. However, with growing demands and technical means, the streams are saved to storage servers and the surgeons need to retrieve parts of the videos on demand. In this submission we present a demo application allowing for video retrieval based on visual features and late fusion, which allows surgeons to re-find shots taken during the procedure.Comment: Paper accepted at the IEEE/ACM 13th International Workshop on Content-Based Multimedia Indexing (CBMI) in Prague (Czech Republic) between 10 and 12 June 201

    Semantic distillation: a method for clustering objects by their contextual specificity

    Full text link
    Techniques for data-mining, latent semantic analysis, contextual search of databases, etc. have long ago been developed by computer scientists working on information retrieval (IR). Experimental scientists, from all disciplines, having to analyse large collections of raw experimental data (astronomical, physical, biological, etc.) have developed powerful methods for their statistical analysis and for clustering, categorising, and classifying objects. Finally, physicists have developed a theory of quantum measurement, unifying the logical, algebraic, and probabilistic aspects of queries into a single formalism. The purpose of this paper is twofold: first to show that when formulated at an abstract level, problems from IR, from statistical data analysis, and from physical measurement theories are very similar and hence can profitably be cross-fertilised, and, secondly, to propose a novel method of fuzzy hierarchical clustering, termed \textit{semantic distillation} -- strongly inspired from the theory of quantum measurement --, we developed to analyse raw data coming from various types of experiments on DNA arrays. We illustrate the method by analysing DNA arrays experiments and clustering the genes of the array according to their specificity.Comment: Accepted for publication in Studies in Computational Intelligence, Springer-Verla
    • 

    corecore