3,738 research outputs found

    Context-aware person identification in personal photo collections

    Get PDF
    Identifying the people in photos is an important need for users of photo management systems. We present MediAssist, one such system which facilitates browsing, searching and semi-automatic annotation of personal photos, using analysis of both image content and the context in which the photo is captured. This semi-automatic annotation includes annotation of the identity of people in photos. In this paper, we focus on such person annotation, and propose person identification techniques based on a combination of context and content. We propose language modelling and nearest neighbor approaches to context-based person identification, in addition to novel face color and image color content-based features (used alongside face recognition and body patch features). We conduct a comprehensive empirical study of these techniques using the real private photo collections of a number of users, and show that combining context- and content-based analysis improves performance over content or context alone

    Hierarchical Classification as an Aid to Browsing

    Get PDF
    An approach to browsing large chemical reaction databases is presented. The method that is described builds on earlier work in which unsupervised hierarchical classification was used to extract generalizations of reaction classes from reaction databases for use in reaction knowledge bases. The method described in this paper involves classification based on both semantic and topological features. It supports the creation of deep hierarchies in which succeeding levels represent increasing degrees of abstraction. The creation of a hierarchy allows the user to quickly locate interesting items or classes of items by performing a tree traversal as opposed to sequentially scanning a hit list. In addition, the depth of the resulting hierarchy is determined interactively by the user

    Terminology server for improved resource discovery: analysis of model and functions

    Get PDF
    This paper considers the potential to improve distributed information retrieval via a terminologies server. The restriction upon effective resource discovery caused by the use of disparate terminologies across services and collections is outlined, before considering a DDC spine based approach involving inter-scheme mapping as a possible solution. The developing HILT model is discussed alongside other existing models and alternative approaches to solving the terminologies problem. Results from the current HILT pilot are presented to illustrate functionality and suggestions are made for further research and development

    Abstracts

    Get PDF

    CellFinder: a cell data repository

    Get PDF
    CellFinder (http://www.cellfinder.org) is a comprehensive one-stop resource for molecular data characterizing mammalian cells in different tissues and in different development stages. It is built from carefully selected data sets stemming from other curated databases and the biomedical literature. To date, CellFinder describes 3394 cell types and 50 951 cell lines. The database currently contains 3055 microscopic and anatomical images, 205 whole-genome expression profiles of 194 cell/tissue types from RNA-seq and microarrays and 553 905 protein expressions for 535 cells/tissues. Text mining of a corpus of >2000 publications followed by manual curation confirmed expression information on ∼900 proteins and genes. CellFinder's data model is capable to seamlessly represent entities from single cells to the organ level, to incorporate mappings between homologous entities in different species and to describe processes of cell development and differentiation. Its ontological backbone currently consists of 204 741 ontology terms incorporated from 10 different ontologies unified under the novel CELDA ontology. CellFinder's web portal allows searching, browsing and comparing the stored data, interactive construction of developmental trees and navigating the partonomic hierarchy of cells and tissues through a unique body browser designed for life scientists and clinicians

    Sensitivity of Semantic Signatures in Text Mining

    Get PDF
    The rapid development of the Internet and the ability to store data relatively inexpensively has contributed to an information explosion that did not exist a few years ago. Just a few keystrokes on search engines on any given subject will provide more web pages than any time before. As the amount of data available to us is so overwhelming, the ability to extract relevant information from it remains a challenge.;Since 80% of the available data stored world wide is text, we need advanced techniques to process this textual data and extract useful in formation. Text mining is one such process to address the information explosion problem that employs techniques such as natural language processing, information retrieval, machine learning algorithms and knowledge management. In text mining, the subjected text undergoes a transformation where essential attributes of the text are derived. The attributes that form interesting patterns are chosen and machine learning algorithms are used to find similar patterns in desired corpora. At the end, the resulting texts are evaluated and interpreted.;In this thesis we develop a new framework for the text mining process. An investigator chooses target content from training files, which is captured in semantic signatures. Semantic signatures characterize the target content derived from training files that we are looking for in testing files (whose content is unknown). The semantic signatures work as attributes to fetch and/or categorize the target content from a test corpus. A proof of concept software package, consisting of tools that aid an investigator in mining text data, is developed using Visual studio, C# and .NET framework.;Choosing keywords plays a major role in designing semantic signatures; careful selection of keywords leads to a more accurate analysis, especially in English, which is sensitive to semantics. It is interesting to note that when words appear in different contexts they carry a different meaning. We have incorporated stemming within the framework and its effectiveness is demonstrated using a large corpus. We have conducted experiments to demonstrate the sensitivity of semantic signatures to subtle content differences between closely related documents. These experiments show that the newly developed framework can identify subtle semantic differences substantially

    High-level Thesaurus (HILT) Phase III [Project] : Final Report

    Get PDF
    An evaluation stage of the HILT Phase III pilot M2M demonstrator was to be undertaken following completion of the main development work (November/December 2006). The aim was to determine whether the pilot demonstrator operates as specified in the requirements document and, hence, whether it correctly delivers the functionality needed to meet the five use cases (devised during the preceding feasibility study). Outcomes will be used to inform the system refinement process, due to occur in January 2007. Six SOAP functions were designed to meet the functionality required by each of the use cases, either singly or in combination, and the working pilot is best tested by examining whether each part of the system architecture (see Figure 1) operates as specified in the requirements document when any given one of the functions is called. This report documents the use cases being addressed, the nature of the functions designed to meet the use cases, how each part of the system is required to operate when a function is called, methodologies determined to assess the satisfactory performance of functions, and associated results. It is not the intention of this evaluation to study the quality of mappings or retrieval performance. Results presented will enable the identification of issues or errors within the system as it is currently implemented (or requirements as currently specified) and any additional requirements for development beyond Phase III will be noted

    Bioinformatics-based assessment of the relevance of candidate genes for mutation discovery

    Get PDF
    The bioinformatics resources provide a wide range of tools that can be applied in different areas of mutation screening. The enormous and constantly increasing amount of genomic data obtained in plant-oriented molecular studies requires the development of efficient techniques for its processing. There is a wide range of bioinformatics tools which can aid in the course of mutation discovery. The following chapter focuses mainly on the application of different tools and resources to facilitate a Targeting-Induced Local Lesions in Genomes (TILLING) analysis. TILLING is a technique of reverse genetics that applies a traditional mutagenesis to create DNA libraries of mutagenised individuals that are then subjected to high-throughput screening for the identification of mutations. The bioinformatics tools have shown to be useful in supporting the process of candidate gene selection for mutation screening. The availability of bioinformatics software and experimental data repositories provides a powerful tool which enables a process of multi-database mining. The existing raw experimental data (genomics-related information, expression data, annotated ontologies) can be interpreted in terms of a new biological context. This may help in selecting the proper candidate gene for mutation discovery that is controlling the target phenotype. The mutation screening using a TILLING strategy requires a former knowledge of the full genomic sequence of the gene which is of interest. Depending on whether a fully sequenced genome of a particular species is available, different bioinformatics tools can facilitate this process. Specific tools can be also useful for the identification of possible gene paralogs which may mask the effect of mutated gene. Bioinformatics resources can also support the selection of gene fragments most prone to acquire a deleterious nucleotide change. Finally, there are available tools enabling a proper design of oligonucleotide primers for the amplification of a gene fragment for the purpose of mutation screening
    corecore