12,839 research outputs found

    The contribution of data mining to information science

    Get PDF
    The information explosion is a serious challenge for current information institutions. On the other hand, data mining, which is the search for valuable information in large volumes of data, is one of the solutions to face this challenge. In the past several years, data mining has made a significant contribution to the field of information science. This paper examines the impact of data mining by reviewing existing applications, including personalized environments, electronic commerce, and search engines. For these three types of application, how data mining can enhance their functions is discussed. The reader of this paper is expected to get an overview of the state of the art research associated with these applications. Furthermore, we identify the limitations of current work and raise several directions for future research

    Semantic Categorization Of Online Video

    Get PDF
    As internet users are increasing day by day, the users of video-sharing site are also increasing. Video-sharing is becoming more and more popular in e-learing, but the current famous websites like youtube are not structured when it come to serving the purpose of providing educational videos for preschool and high school students. There is a need to fill building more educationally focused video site, where the content is more structured, easy to use, support both direct search and browsing, and follow a particular curriculum for preschool and high school students. This report discuss the issues like categorization and search interface of these sites and propose alternatives to existing ones out there. In this project, I have built an educational website for preschool, high school, and college level students concentrating on improved categorization and search interface of the site. This report provides detail description of my system and the results of comparison between my site and youtube. supraj

    Discovery Is Never By Chance: Designing for (Un)Serendipity

    No full text
    Serendipity has a long tradition in the history of science as having played a key role in many significant discoveries. Computer scientists, valuing the role of serendipity in discovery, have attempted to design systems that encourage serendipity. However, that research has focused primarily on only one aspect of serendipity: that of chance encounters. In reality, for serendipity to be valuable chance encounters must be synthesized into insight. In this paper we show, through a formal consideration of serendipity and analysis of how various systems have seized on attributes of interpreting serendipity, that there is a richer space for design to support serendipitous creativity, innovation and discovery than has been tapped to date. We discuss how ideas might be encoded to be shared or discovered by ‘association-hunting’ agents. We propose considering not only the inventor’s role in perceiving serendipity, but also how that inventor’s perception may be enhanced to increase the opportunity for serendipity. We explore the role of environment and how we can better enable serendipitous discoveries to find a home more readily and immediately

    Hybrid Query Expansion on Ontology Graph in Biomedical Information Retrieval

    Get PDF
    Nowadays, biomedical researchers publish thousands of papers and journals every day. Searching through biomedical literature to keep up with the state of the art is a task of increasing difficulty for many individual researchers. The continuously increasing amount of biomedical text data has resulted in high demands for an efficient and effective biomedical information retrieval (BIR) system. Though many existing information retrieval techniques can be directly applied in BIR, BIR distinguishes itself in the extensive use of biomedical terms and abbreviations which present high ambiguity. First of all, we studied a fundamental yet simpler problem of word semantic similarity. We proposed a novel semantic word similarity algorithm and related tools called Weighted Edge Similarity Tools (WEST). WEST was motivated by our discovery that humans are more sensitive to the semantic difference due to the categorization than that due to the generalization/specification. Unlike most existing methods which model the semantic similarity of words based on either the depth of their Lowest Common Ancestor (LCA) or the traversal distance of between the word pair in WordNet, WEST also considers the joint contribution of the weighted distance between two words and the weighted depth of their LCA in WordNet. Experiments show that weighted edge based word similarity method has achieved 83.5% accuracy to human judgments. Query expansion problem can be viewed as selecting top k words which have the maximum accumulated similarity to a given word set. It has been proved as an effective method in BIR and has been studied for over two decades. However, most of the previous researches focus on only one controlled vocabulary: MeSH. In addition, early studies find that applying ontology won\u27t necessarily improve searching performance. In this dissertation, we propose a novel graph based query expansion approach which is able to take advantage of the global information from multiple controlled vocabularies via building a biomedical ontology graph from selected vocabularies in Metathesaurus. We apply Personalized PageRank algorithm on the ontology graph to rank and identify top terms which are highly relevant to the original user query, yet not presented in that query. Those new terms are reordered by a weighted scheme to prioritize specialized concepts. We multiply a scaling factor to those final selected terms to prevent query drifting and append them to the original query in the search. Experiments show that our approach achieves 17.7% improvement in 11 points average precision and recall value against Lucene\u27s default indexing and searching strategy and by 24.8% better against all the other strategies on average. Furthermore, we observe that expanding with specialized concepts rather than generalized concepts can substantially improve the recall-precision performance. Furthermore, we have successfully applied WEST from the underlying WordNet graph to biomedical ontology graph constructed by multiple controlled vocabularies in Metathesaurus. Experiments indicate that WEST further improve the recall-precision performance. Finally, we have developed a Graph-based Biomedical Search Engine (G-Bean) for retrieving and visualizing information from literature using our proposed query expansion algorithm. G-Bean accepts any medical related user query and processes them with expanded medical query to search for the MEDLINE database

    An Integrated Information Retrieval Framework for Managing the Digital Web Ecosystem

    Get PDF
    The information explosion makes the digital Web ecosystem exploration, as a valid web search tool challenging for retrieving relevant information and knowledge. The existing tools are not integrated, and search results are not well managed. In this article, we describe effective information retrieval services for users and agents in various digital ecosystem scenarios. A novel integrated information retrieval framework (IIRF) is proposed, which employs the Web search technologies and traditional database searching techniques to provide comprehensive, dynamic, personalized, and organization-oriented information retrieval services, ranging from the Internet, intranet, to personal desktop. Experiments are carried out demonstrating the improvements in the search process with an average precision of Web search results to standard 11 recall level, attaining improvement from 41.7% of a comparable system to 65.2% of search. A 23.5% precision improvement is achieved with the framework. The comparison made among search engines presents a similar development with satisfactory search results

    Precise Image Exploration With Cluster Analysis

    Get PDF
    Since the rise of digital multimedia in our present age, when looking for an image that closely matches their needs and preferences, the number of images a user must sort through has become more and more unmanageable. Even when searching for a narrow topic, it can be nearly impossible to find an image that meets a specific preference by going through all the possible images. To combat this growing problem, we describe an exploration system built on deep neural networks to empower the users to quickly sort through all the possible images by quickly narrowing down to their preferred images. By design, our exploration system goes around the need to match the user’s query directly to a small group of images to serve users images that would traditionally be too difficult to group together and match to a query. We propose to use deep metric learning and clustering to group the images, which we will see cleverly manages problems that hold back traditional neural networks in this problem—unseen image groups and shifting definitions
    • …
    corecore