173,244 research outputs found

    Consistent Text Categorization using Data Augmentation in e-Commerce

    Full text link
    The categorization of massive e-Commerce data is a crucial, well-studied task, which is prevalent in industrial settings. In this work, we aim to improve an existing product categorization model that is already in use by a major web company, serving multiple applications. At its core, the product categorization model is a text classification model that takes a product title as an input and outputs the most suitable category out of thousands of available candidates. Upon a closer inspection, we found inconsistencies in the labeling of similar items. For example, minor modifications of the product title pertaining to colors or measurements majorly impacted the model's output. This phenomenon can negatively affect downstream recommendation or search applications, leading to a sub-optimal user experience. To address this issue, we propose a new framework for consistent text categorization. Our goal is to improve the model's consistency while maintaining its production-level performance. We use a semi-supervised approach for data augmentation and presents two different methods for utilizing unlabeled samples. One method relies directly on existing catalogs, while the other uses a generative model. We compare the pros and cons of each approach and present our experimental results

    Semantic Categorization Of Online Video

    Get PDF
    As internet users are increasing day by day, the users of video-sharing site are also increasing. Video-sharing is becoming more and more popular in e-learing, but the current famous websites like youtube are not structured when it come to serving the purpose of providing educational videos for preschool and high school students. There is a need to fill building more educationally focused video site, where the content is more structured, easy to use, support both direct search and browsing, and follow a particular curriculum for preschool and high school students. This report discuss the issues like categorization and search interface of these sites and propose alternatives to existing ones out there. In this project, I have built an educational website for preschool, high school, and college level students concentrating on improved categorization and search interface of the site. This report provides detail description of my system and the results of comparison between my site and youtube. supraj

    Topic Detection and Tracking in Personal Search History

    Get PDF
    This thesis describes a system for tracking and detecting topics in personal search history. In particular, we developed a time tracking tool that helps users in analyzing their time and discovering their activity patterns. The system allows a user to specify interesting topics to monitor with a keyword description. The system would then keep track of the log and the time spent on each document and produce a time graph to show how much time has been spent on each topic to be monitored. The system can also detect new topics and potentially recommend relevant information about them to the user. This work has been integrated with the UCAIR Toolbar, a client side agent. Considering limited resources on the client side, we designed an e????cient incremental algorithm for topic tracking and detection. Various unsupervised learning approaches have been considered to improve the accuracy in categorizing the user log into appropriate categories. Experiments show that our tool is effective in categorizing the documents into existing categories and detecting the new useful catgeories. Moreover, the quality of categorization improves over time as more and more log is available

    ImageSieve: Exploratory search of museum archives with named entity-based faceted browsing

    Get PDF
    Over the last few years, faceted search emerged as an attractive alternative to the traditional "text box" search and has become one of the standard ways of interaction on many e-commerce sites. However, these applications of faceted search are limited to domains where the objects of interests have already been classified along several independent dimensions, such as price, year, or brand. While automatic approaches to generate faceted search interfaces were proposed, it is not yet clear to what extent the automatically-produced interfaces will be useful to real users, and whether their quality can match or surpass their manually-produced predecessors. The goal of this paper is to introduce an exploratory search interface called ImageSieve, which shares many features with traditional faceted browsing, but can function without the use of traditional faceted metadata. ImageSieve uses automatically extracted and classified named entities, which play important roles in many domains (such as news collections, image archives, etc.). We describe one specific application of ImageSieve for image search. Here, named entities extracted from the descriptions of the retrieved images are used to organize a faceted browsing interface, which then helps users to make sense of and further explore the retrieved images. The results of a user study of ImageSieve demonstrate that a faceted search system based on named entities can help users explore large collections and find relevant information more effectively

    Brain Categorization: Learning, Attention, and Consciousness

    Full text link
    How do humans and animals learn to recognize objects and events? Two classical views are that exemplars or prototypes are learned. A hybrid view is that a mixture, called rule-plus-exceptions, is learned. None of these models learn their categories. A distributed ARTMAP neural network with self-supervised learning incrementally learns categories that match human learning data on a class of thirty diagnostic experiments called the 5-4 category structure. Key predictions of ART models have received behavioral, neurophysiological, and anatomical support. The ART prediction about what goes wrong during amnesic learning has also been supported: A lesion in its orienting system causes a low vigilance parameter.Air Force Office of Scientific Research (F49620-01-1-0397, F49620-01-1-0423); Defense Advanced Research Projects Agency and the Office of Naval Research (N00014-01-1-0624), the National Geospatial Intelligence Agency (NMA 201-01-1-2016); National Science Foundation (EIA-01-30851, IIS-97-20333, SBE-0354378); Office of Naval Research (N00014-95-1-0657, N00014-01-1-0624

    Towards the Automatic Classification of Documents in User-generated Classifications

    Get PDF
    There is a huge amount of information scattered on the World Wide Web. As the information flow occurs at a high speed in the WWW, there is a need to organize it in the right manner so that a user can access it very easily. Previously the organization of information was generally done manually, by matching the document contents to some pre-defined categories. There are two approaches for this text-based categorization: manual and automatic. In the manual approach, a human expert performs the classification task, and in the second case supervised classifiers are used to automatically classify resources. In a supervised classification, manual interaction is required to create some training data before the automatic classification task takes place. In our new approach, we intend to propose automatic classification of documents through semantic keywords and building the formulas generation by these keywords. Thus we can reduce this human participation by combining the knowledge of a given classification and the knowledge extracted from the data. The main focus of this PhD thesis, supervised by Prof. Fausto Giunchiglia, is the automatic classification of documents into user-generated classifications. The key benefits foreseen from this automatic document classification is not only related to search engines, but also to many other fields like, document organization, text filtering, semantic index managing
    • …
    corecore