364,922 research outputs found

    Local search engine with global content based on domain specific knowledge

    Get PDF
    In the growing need for information we have come to rely on search engines. The use of large scale search engines, such as Google, is as common as surfingthe World Wide Web. We are impressed with the capabilities of these search engines but still there is a need for improvment. A common problem withsearching is the ambiguity of words. Their meaning often depends on the context in which they are used or varies across specific domains. To resolve this we propose a domain specific search engine that is globally oriented. We intend to provide content classification according to the target domain concepts, access to privileged information, personalization and custom rankingfunctions. Domain specific concepts have been formalized in the form ofontology. The paper describes our approach to a centralized search service for domain specific content. The approach uses automated indexing for various content sources that can be found in the form of a relational database, we! b service, web portal or page, various document formats and other structured or unstructured data. The gathered data is tagged with various approaches and classified against the domain classification. The indexed data is accessible through a highly optimized and personalized search service

    Subjectivity Detection through Socio-Linguistic Features

    Full text link
    Social media platforms have opened new dimensions within the information retrieval domain leading to a novel concept known as Social Information Retrieval. We argue that the concept of Social Information Retrieval can be extended by augmenting the huge amount of content on the traditional Web with the ever-growing rich Social Web content to increase the information richness of today’s search engines. This paper proposes a subjectivity detection framework which can lead towards a proposed emotion-aware search engine interface. Our proposed method differs from previous subjectivity analysis approaches in that it is the first method that takes into account social features of social media platforms for the subjectivity classification task. Through experimental evaluations, we observe the accuracy of the proposed method to be 86.21% which demonstrates a promising outcome for large-scale application of our proposed subjectivity analysis technique

    Untangling the Application of Text-mining Methods in Information Systems Domain

    Get PDF
    The advent of digitalization has brought a massive proliferation of unstructured data, producing vast repositories of textual data, from various sources, such as Web sites, academic publications, news articles, blog posts, e-mail, corporate communication platforms, reports, and social media feeds. This proliferation coupled with the upsurge in mobile and Web technologies alongside ever-improving connectivity has led to various digital platforms and applications rapidly achieving mass-market penetration. With the production of textual and other forms of unstructured data certain to continue at unprecedented rates for the foreseeable future, this availability on massive scale presents both opportunities and challenges that researchers and practitioners must address. Ability to utilize text data on a large scale not only provides better coverage in terms of sample size but also opens opportunities to build a deeper understanding of phenomena that otherwise are simply unobservable, "hidden in the noise.'' However, as the world races towards high-volume production, distribution, and consumption of digital text, information systems (IS) researchers are proving slow to start reaping the potential of analyzing textual data. There is an urgent need for methods and techniques that can meet the challenge of analyzing vast bodies of textual data. In an effort to demonstrate potential application of text-mining methods in information systems research, the dissertation presents essays that address large-scale text-based datasets' use in literature analysis and studies of system-specific behavioral outcomes. The first essay deals with identifying the research themes presented in a large body of publications on cloud computing, and the second essay demonstrates the machine-based classification of papers in leading information-systems journals. Of the behavior-focused pieces, the third essay utilizes user-generated content to illustrate system-driven viewing outcomes in the context of binge watching of television shows, and the final essay examines a large volume of content connected with a business-to-business Web portal, reporting on a study of browsing-device-linked differences in interest in marketing material. In addition to the individual essays, the dissertation contributes to the scholarly discussion of text-mining research issues in three important ways. Firstly, it presents a conceptual framework that aids in revealing the fundamentals of text-mining research in terms of two dimensions: research objective and level of text analysis. Secondly, the four essays provide concrete demonstrations of various suitable applications of text-mining. Finally, the dissertation examines the implications of the work, highlighting specific issues and challenges pertaining to text-mining research. The findings and implications of this work should benefit IS researchers and practitioners striving to exploit large volume of textual data

    HIERARCHICAL LEARNING OF DISCRIMINATIVE FEATURES AND CLASSIFIERS FOR LARGE-SCALE VISUAL RECOGNITION

    Get PDF
    Enabling computers to recognize objects present in images has been a long standing but tremendously challenging problem in the field of computer vision for decades. Beyond the difficulties resulting from huge appearance variations, large-scale visual recognition poses unprecedented challenges when the number of visual categories being considered becomes thousands, and the amount of images increases to millions. This dissertation contributes to addressing a number of the challenging issues in large-scale visual recognition. First, we develop an automatic image-text alignment method to collect massive amounts of labeled images from the Web for training visual concept classifiers. Specif- ically, we first crawl a large number of cross-media Web pages containing Web images and their auxiliary texts, and then segment them into a collection of image-text pairs. We then show that near-duplicate image clustering according to visual similarity can significantly reduce the uncertainty on the relatedness of Web images’ semantics to their auxiliary text terms or phrases. Finally, we empirically demonstrate that ran- dom walk over a newly proposed phrase correlation network can help to achieve more precise image-text alignment by refining the relevance scores between Web images and their auxiliary text terms. Second, we propose a visual tree model to reduce the computational complexity of a large-scale visual recognition system by hierarchically organizing and learning the classifiers for a large number of visual categories in a tree structure. Compared to previous tree models, such as the label tree, our visual tree model does not require training a huge amount of classifiers in advance which is computationally expensive. However, we experimentally show that the proposed visual tree achieves results that are comparable or even better to other tree models in terms of recognition accuracy and efficiency. Third, we present a joint dictionary learning (JDL) algorithm which exploits the inter-category visual correlations to learn more discriminative dictionaries for image content representation. Given a group of visually correlated categories, JDL simul- taneously learns one common dictionary and multiple category-specific dictionaries to explicitly separate the shared visual atoms from the category-specific ones. We accordingly develop three classification schemes to make full use of the dictionaries learned by JDL for visual content representation in the task of image categoriza- tion. Experiments on two image data sets which respectively contain 17 and 1,000 categories demonstrate the effectiveness of the proposed algorithm. In the last part of the dissertation, we develop a novel data-driven algorithm to quantitatively characterize the semantic gaps of different visual concepts for learning complexity estimation and inference model selection. The semantic gaps are estimated directly in the visual feature space since the visual feature space is the common space for concept classifier training and automatic concept detection. We show that the quantitative characterization of the semantic gaps helps to automatically select more effective inference models for classifier training, which further improves the recognition accuracy rates

    LSHTC: A Benchmark for Large-Scale Text Classification

    Full text link
    LSHTC is a series of challenges which aims to assess the performance of classification systems in large-scale classification in a a large number of classes (up to hundreds of thousands). This paper describes the dataset that have been released along the LSHTC series. The paper details the construction of the datsets and the design of the tracks as well as the evaluation measures that we implemented and a quick overview of the results. All of these datasets are available online and runs may still be submitted on the online server of the challenges
    corecore