2,278 research outputs found
Video browsing interfaces and applications: a review
We present a comprehensive review of the state of the art in video browsing and retrieval systems, with special emphasis on interfaces and applications. There has been a significant increase in activity (e.g., storage, retrieval, and sharing) employing video data in the past decade, both for personal and professional use. The ever-growing amount of video content available for human consumption and the inherent characteristics of video dataâwhich, if presented in its raw format, is rather unwieldy and costlyâhave become driving forces for the development of more effective solutions to present video contents and allow rich user interaction. As a result, there are many contemporary research efforts toward developing better video browsing solutions, which we summarize. We review more than 40 different video browsing and retrieval interfaces and classify them into three groups: applications that use video-player-like interaction, video retrieval applications, and browsing solutions based on video surrogates. For each category, we present a summary of existing work, highlight the technical aspects of each solution, and compare them against each other
How to improve robustness in Kohonen maps and display additional information in Factorial Analysis: application to text mining
This article is an extended version of a paper presented in the WSOM'2012
conference [1]. We display a combination of factorial projections, SOM
algorithm and graph techniques applied to a text mining problem. The corpus
contains 8 medieval manuscripts which were used to teach arithmetic techniques
to merchants. Among the techniques for Data Analysis, those used for
Lexicometry (such as Factorial Analysis) highlight the discrepancies between
manuscripts. The reason for this is that they focus on the deviation from the
independence between words and manuscripts. Still, we also want to discover and
characterize the common vocabulary among the whole corpus. Using the properties
of stochastic Kohonen maps, which define neighborhood between inputs in a
non-deterministic way, we highlight the words which seem to play a special role
in the vocabulary. We call them fickle and use them to improve both Kohonen map
robustness and significance of FCA visualization. Finally we use graph
algorithmic to exploit this fickleness for classification of words
Concept discovery innovations in law enforcement: a perspective.
In the past decades, the amount of information available to law enforcement agencies has increased significantly. Most of this information is in textual form, however analyses have mainly focused on the structured data. In this paper, we give an overview of the concept discovery projects at the Amsterdam-Amstelland police where Formal Concept Analysis (FCA) is being used as text mining instrument. FCA is combined with statistical techniques such as Hidden Markov Models (HMM) and Emergent Self Organizing Maps (ESOM). The combination of this concept discovery and refinement technique with statistical techniques for analyzing high-dimensional data not only resulted in new insights but often in actual improvements of the investigation procedures.Formal concept analysis; Intelligence led policing; Knowledge discovery;
From Keyword Search to Exploration: How Result Visualization Aids Discovery on the Web
A key to the Web's success is the power of search. The elegant way in which search results are returned is usually remarkably effective. However, for exploratory search in which users need to learn, discover, and understand novel or complex topics, there is substantial room for improvement. Human computer interaction researchers and web browser designers have developed novel strategies to improve Web search by enabling users to conveniently visualize, manipulate, and organize their Web search results. This monograph offers fresh ways to think about search-related cognitive processes and describes innovative design approaches to browsers and related tools. For instance, while key word search presents users with results for specific information (e.g., what is the capitol of Peru), other methods may let users see and explore the contexts of their requests for information (related or previous work, conflicting information), or the properties that associate groups of information assets (group legal decisions by lead attorney). We also consider the both traditional and novel ways in which these strategies have been evaluated. From our review of cognitive processes, browser design, and evaluations, we reflect on the future opportunities and new paradigms for exploring and interacting with Web search results
Text mining with the WEBSOM
The emerging field of text mining applies methods from data mining and exploratory data analysis to analyzing text collections and to conveying information to the user in an intuitive manner. Visual, map-like displays provide a powerful and fast medium for portraying information about large collections of text. Relationships between text items and collections, such as similarity, clusters, gaps and outliers can be communicated naturally using spatial relationships, shading, and colors.
In the WEBSOM method the self-organizing map (SOM) algorithm is used to automatically organize very large and high-dimensional collections of text documents onto two-dimensional map displays. The map forms a document landscape where similar documents appear close to each other at points of the regular map grid. The landscape can be labeled with automatically identified descriptive words that convey properties of each area and also act as landmarks during exploration. With the help of an HTML-based interactive tool the ordered landscape can be used in browsing the document collection and in performing searches on the map.
An organized map offers an overview of an unknown document collection helping the user in familiarizing herself with the domain. Map displays that are already familiar can be used as visual frames of reference for conveying properties of unknown text items. Static, thematically arranged document landscapes provide meaningful backgrounds for dynamic visualizations of for example time-related properties of the data. Search results can be visualized in the context of related documents.
Experiments on document collections of various sizes, text types, and languages show that the WEBSOM method is scalable and generally applicable. Preliminary results in a text retrieval experiment indicate that even when the additional value provided by the visualization is disregarded the document maps perform at least comparably with more conventional retrieval methods.reviewe
Towards improving WEBSOM with multi-word expressions
Dissertação para obtenção do Grau de Mestre em
Engenharia InformĂĄticaLarge quantities of free-text documents are usually rich in information and covers
several topics. However, since their dimension is very large, searching and filtering data is an exhaustive task. A large text collection covers a set of topics where each topic is affiliated to a group of documents. This thesis presents a method for building a document map about the core contents covered in the collection.
WEBSOM is an approach that combines document encoding methods and Self-Organising Maps (SOM) to generate a document map. However, this methodology has a weakness in the document encoding method because it uses single words to characterise documents.
Single words tend to be ambiguous and semantically vague, so some documents can be incorrectly related. This thesis proposes a new document encoding method to improve the WEBSOM approach by using multi word expressions (MWEs) to describe documents. Previous research and ongoing experiments encourage us to use MWEs to characterise documents because these are semantically more accurate than single words and more descriptive
Search and Discovery Tools for Astronomical On-line Resources and Services
A growing number of astronomical resources and data or information services
are made available through the Internet. However valuable information is
frequently hidden in a deluge of non-pertinent or non up-to-date documents. At
a first level, compilations of astronomical resources provide help for
selecting relevant sites. Combining yellow-page services and meta-databases of
active pointers may be an efficient solution to the data retrieval problem.
Responses generated by submission of queries to a set of heterogeneous
resources are difficult to merge or cross-match, because different data
providers generally use different data formats: new endeavors are under way to
tackle this problem. We review the technical challenges involved in trying to
provide general search and discovery tools, and to integrate them through upper
level interfaces.Comment: 7 pages, 2 Postscript figures; to be published in A&A
- âŠ