6,012 research outputs found
Neurocognitive Informatics Manifesto.
Informatics studies all aspects of the structure of natural and artificial information systems. Theoretical and abstract approaches to information have made great advances, but human information processing is still unmatched in many areas, including information management, representation and understanding. Neurocognitive informatics is a new, emerging field that should help to improve the matching of artificial and natural systems, and inspire better computational algorithms to solve problems that are still beyond the reach of machines. In this position paper examples of neurocognitive inspirations and promising directions in this area are given
Designing Semantic Kernels as Implicit Superconcept Expansions
Recently, there has been an increased interest in the exploitation of background knowledge in the context of text mining tasks, especially text classification. At the same time, kernel-based learning algorithms like Support Vector Machines have become a dominant paradigm in the text mining community. Amongst other reasons, this is also due to their capability to achieve more accurate learning results by replacing standard linear kernel (bag-of-words) with customized kernel functions which incorporate additional apriori knowledge. In this paper we propose a new approach to the design of ‘semantic smoothing kernels’ by means of an implicit superconcept expansion using well-known measures of term similarity. The experimental evaluation on two different datasets indicates that our approach consistently improves performance in situations where (i) training data is scarce or (ii) the bag-ofwords representation is too sparse to build stable models when using the linear kernel
Learning Deep Visual Object Models From Noisy Web Data: How to Make it Work
Deep networks thrive when trained on large scale data collections. This has
given ImageNet a central role in the development of deep architectures for
visual object classification. However, ImageNet was created during a specific
period in time, and as such it is prone to aging, as well as dataset bias
issues. Moving beyond fixed training datasets will lead to more robust visual
systems, especially when deployed on robots in new environments which must
train on the objects they encounter there. To make this possible, it is
important to break free from the need for manual annotators. Recent work has
begun to investigate how to use the massive amount of images available on the
Web in place of manual image annotations. We contribute to this research thread
with two findings: (1) a study correlating a given level of noisily labels to
the expected drop in accuracy, for two deep architectures, on two different
types of noise, that clearly identifies GoogLeNet as a suitable architecture
for learning from Web data; (2) a recipe for the creation of Web datasets with
minimal noise and maximum visual variability, based on a visual and natural
language processing concept expansion strategy. By combining these two results,
we obtain a method for learning powerful deep object models automatically from
the Web. We confirm the effectiveness of our approach through object
categorization experiments using our Web-derived version of ImageNet on a
popular robot vision benchmark database, and on a lifelong object discovery
task on a mobile robot.Comment: 8 pages, 7 figures, 3 table
CREATING A BIOMEDICAL ONTOLOGY INDEXED SEARCH ENGINE TO IMPROVE THE SEMANTIC RELEVANCE OF RETREIVED MEDICAL TEXT
Medical Subject Headings (MeSH) is a controlled vocabulary used by the National Library of Medicine to index medical articles, abstracts, and journals contained within the MEDLINE database. Although MeSH imposes uniformity and consistency in the indexing process, it has been proven that using MeSH indices only result in a small increase in precision over free-text indexing. Moreover, studies have shown that the use of controlled vocabularies in the indexing process is not an effective method to increase semantic relevance in information retrieval. To address the need for semantic relevance, we present an ontology-based information retrieval system for the MEDLINE collection that result in a 37.5% increase in precision when compared to free-text indexing systems. The presented system focuses on the ontology to: provide an alternative to text-representation for medical articles, finding relationships among co-occurring terms in abstracts, and to index terms that appear in text as well as discovered relationships. The presented system is then compared to existing MeSH and Free-Text information retrieval systems. This dissertation provides a proof-of-concept for an online retrieval system capable of providing increased semantic relevance when searching through medical abstracts in MEDLINE
- …