1,786 research outputs found
Spoken content retrieval: A survey of techniques and technologies
Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
Al Hybrid Content-Based Retrieval Approach For Video Data
Increasing use of multimedia data makes it crucial to develop intelligent search mec:hanisms for retrieving multimedia data by content. Traditional text-based methods clearly do not suffice to describe the rich content of images, voice or video. Digital vidseo requires the incorporation of temporal information for any effective contentbased retrieval scheme. We present a novel technique which integrates object motion ancl temporal relationship information in order to characterize the events for subsequent search for similar clips. We propose a hybrid mechanism based on object motion trails similarity match and interval-based temporal modeling that leads to a unique framework for spatio-temporal content based access in digital video. We implemented the proposed methods and demonstrated that high-level query formulation can be achieved for the aforementioned purpose. Development of such technology will enable true multimedia search engines that will accomplish what current Internet search engines like Infoseek or Excite do today for textual data
Constructing an Interaction Behavior Model for Web Image Search
User interaction behavior is a valuable source of implicit relevance
feedback. In Web image search a different type of search result presentation is
used than in general Web search, which leads to different interaction
mechanisms and user behavior. For example, image search results are
self-contained, so that users do not need to click the results to view the
landing page as in general Web search, which generates sparse click data. Also,
two-dimensional result placement instead of a linear result list makes browsing
behaviors more complex. Thus, it is hard to apply standard user behavior models
(e.g., click models) developed for general Web search to Web image search.
In this paper, we conduct a comprehensive image search user behavior analysis
using data from a lab-based user study as well as data from a commercial search
log. We then propose a novel interaction behavior model, called grid-based user
browsing model (GUBM), whose design is motivated by observations from our data
analysis. GUBM can both capture users' interaction behavior, including cursor
hovering, and alleviate position bias. The advantages of GUBM are two-fold: (1)
It is based on an unsupervised learning method and does not need manually
annotated data for training. (2) It is based on user interaction features on
search engine result pages (SERPs) and is easily transferable to other
scenarios that have a grid-based interface such as video search engines. We
conduct extensive experiments to test the performance of our model using a
large-scale commercial image search log. Experimental results show that in
terms of behavior prediction (perplexity), and topical relevance and image
quality (normalized discounted cumulative gain (NDCG)), GUBM outperforms
state-of-the-art baseline models as well as the original ranking. We make the
implementation of GUBM and related datasets publicly available for future
studies.Comment: 10 page
Feature Extraction and Duplicate Detection for Text Mining: A Survey
Text mining, also known as Intelligent Text Analysis is an important research area. It is very difficult to focus on the most appropriate information due to the high dimensionality of data. Feature Extraction is one of the important techniques in data reduction to discover the most important features. Proce- ssing massive amount of data stored in a unstructured form is a challenging task. Several pre-processing methods and algo- rithms are needed to extract useful features from huge amount of data. The survey covers different text summarization, classi- fication, clustering methods to discover useful features and also discovering query facets which are multiple groups of words or phrases that explain and summarize the content covered by a query thereby reducing time taken by the user. Dealing with collection of text documents, it is also very important to filter out duplicate data. Once duplicates are deleted, it is recommended to replace the removed duplicates. Hence we also review the literature on duplicate detection and data fusion (remove and replace duplicates).The survey provides existing text mining techniques to extract relevant features, detect duplicates and to replace the duplicate data to get fine grained knowledge to the user
- …