1,381 research outputs found

    Web Mining Functions in an Academic Search Application

    Get PDF
    This paper deals with Web mining and the different categories of Web mining like content, structure and usage mining. The application of Web mining in an academic search application has been discussed. The paper concludes with open problems related to Web mining. The present work can be a useful input to Web users, Web Administrators in a university environment.Database, HITS, IR, NLP, Web mining

    A Graph-structured Dataset for Wikipedia Research

    Get PDF
    Wikipedia is a rich and invaluable source of information. Its central place on the Web makes it a particularly interesting object of study for scientists. Researchers from different domains used various complex datasets related to Wikipedia to study language, social behavior, knowledge organization, and network theory. While being a scientific treasure, the large size of the dataset hinders pre-processing and may be a challenging obstacle for potential new studies. This issue is particularly acute in scientific domains where researchers may not be technically and data processing savvy. On one hand, the size of Wikipedia dumps is large. It makes the parsing and extraction of relevant information cumbersome. On the other hand, the API is straightforward to use but restricted to a relatively small number of requests. The middle ground is at the mesoscopic scale when researchers need a subset of Wikipedia ranging from thousands to hundreds of thousands of pages but there exists no efficient solution at this scale. In this work, we propose an efficient data structure to make requests and access subnetworks of Wikipedia pages and categories. We provide convenient tools for accessing and filtering viewership statistics or "pagecounts" of Wikipedia web pages. The dataset organization leverages principles of graph databases that allows rapid and intuitive access to subgraphs of Wikipedia articles and categories. The dataset and deployment guidelines are available on the LTS2 website \url{https://lts2.epfl.ch/Datasets/Wikipedia/}

    WEB MINING IN E-COMMERCE

    Get PDF
    Recently, the web is becoming an important part of people’s life. The web is a very good place to run successful businesses. Selling products or services online plays an important role in the success of businesses that have a physical presence, like a reE-Commerce, Data mining, Web mining

    Local Ranking Problem on the BrowseGraph

    Full text link
    The "Local Ranking Problem" (LRP) is related to the computation of a centrality-like rank on a local graph, where the scores of the nodes could significantly differ from the ones computed on the global graph. Previous work has studied LRP on the hyperlink graph but never on the BrowseGraph, namely a graph where nodes are webpages and edges are browsing transitions. Recently, this graph has received more and more attention in many different tasks such as ranking, prediction and recommendation. However, a web-server has only the browsing traffic performed on its pages (local BrowseGraph) and, as a consequence, the local computation can lead to estimation errors, which hinders the increasing number of applications in the state of the art. Also, although the divergence between the local and global ranks has been measured, the possibility of estimating such divergence using only local knowledge has been mainly overlooked. These aspects are of great interest for online service providers who want to: (i) gauge their ability to correctly assess the importance of their resources only based on their local knowledge, and (ii) take into account real user browsing fluxes that better capture the actual user interest than the static hyperlink network. We study the LRP problem on a BrowseGraph from a large news provider, considering as subgraphs the aggregations of browsing traces of users coming from different domains. We show that the distance between rankings can be accurately predicted based only on structural information of the local graph, being able to achieve an average rank correlation as high as 0.8

    Marketing Website and Extranet Site

    Get PDF
    This is a hybrid project. As a new Information Systems consultancy the need for a website to market to potential clients was evident. The decision to add an extranet site was made after specific needs for Ace Hardware were made evident to Yar’s founders. Ace Hardware Corporation is a cooperative with over 4,400 members in over 100 nations worldwide. Ace corporate rolled out a B2B pilot program in the fall of 2013 to help Ace independently owned retail stores learn how to sell to other businesses. One of YAR’s founders is currently working with Ace corporate as a program consultant for the B2B group along with four other consultants nationally. The stores have focused on the homeowner and Do It Yourself (DIY) customers for almost 90 years and see the potential in Business to Business sales. However, the stores are not prepared and need training, tools, processes, alignment, and counsel to be successful. Ace corporate provides much of the IS infrastructure to the stores through Epicor’s ERP suite and other programs. In order for the stores to be successful they need to become competent in account management in sales and customer relationship management and a key tool for that is a Customer Relationship Management (CRM) tool or package. SalesForce and Microsoft’s Dynamics CRM are two of the most popular today. Epicor does offer CRM capabilities in their ERP suite but that is not available to the retail stores and would be extremely cost prohibitive if it was. Before the stores commit to larger and more robust CRM offerings they need a basic tool to help them learn and get the process moving. The extranet was initiated to provide a simple, cloud based CRM tool that the stores could utilize as they begin to build the B2B segment in their stores. Because it is cloud based the stores can all participate with the same tool instead of incurring the cost of having to individually find and purchase off the shelf CRM software or have it custom built by a local developer. They will have to purchase cloud storage, administration fees, and maintenance costs but all those will be a fraction of buying Epicor’s offering and much less than off the shelf or local developer options. The stores from the Houston retailer group that are participating are encouraged at the prospect of finally having a CRM solution even if it is in trial stages and are beginning to engage their B2B program more assertively. As the extranet is accepted integration with Epicor will have to occur at a later date in order to feed the CRM tool with the stores POS data currently being gathered so as to prevent double entry labor

    Using web mining in e-commerce applications

    Get PDF
    Nowadays, the web is an important part of our daily life. The web is now the best medium of doing business. Large companies rethink their business strategy using the web to improve business. Business carried on the Web offers the opportunity to potential customers or partners where their products and specific business can be found. Business presence through a company web site has several advantages as it breaks the barrier of time and space compared with the existence of a physical office. To differentiate through the Internet economy, winning companies have realized that e-commerce transactions is more than just buying / selling, appropriate strategies are key to improve competitive power. One effective technique used for this purpose is data mining. Data mining is the process of extracting interesting knowledge from data. Web mining is the use of data mining techniques to extract information from web data. This article presents the three components of web mining: web usage mining, web structure mining and web content mining.e-commerce, web mining, web content mining, web structure mining, web usage mining

    A COLLABORATIVE FILTERING APPROACH TO PREDICT WEB PAGES OF INTEREST FROMNAVIGATION PATTERNS OF PAST USERS WITHIN AN ACADEMIC WEBSITE

    Get PDF
    This dissertation is a simulation study of factors and techniques involved in designing hyperlink recommender systems that recommend to users, web pages that past users with similar navigation behaviors found interesting. The methodology involves identification of pertinent factors or techniques, and for each one, addresses the following questions: (a) room for improvement; (b) better approach, if any; and (c) performance characteristics of the technique in environments that hyperlink recommender systems operate in. The following four problems are addressed:Web Page Classification. A new metric (PageRank × Inverse Links-to-Word count ratio) is proposed for classifying web pages as content or navigation, to help in the discovery of user navigation behaviors from web user access logs. Results of a small user study suggest that this metric leads to desirable results.Data Mining. A new apriori algorithm for mining association rules from large databases is proposed. The new algorithm addresses the problem of scaling of the classical apriori algorithm by eliminating an expensive joinstep, and applying the apriori property to every row of the database. In this study, association rules show the correlation relationships between user navigation behaviors and web pages they find interesting. The new algorithm has better space complexity than the classical one, and better time efficiency under some conditionsand comparable time efficiency under other conditions.Prediction Models for User Interests. We demonstrate that association rules that show the correlation relationships between user navigation patterns and web pages they find interesting can be transformed intocollaborative filtering data. We investigate collaborative filtering prediction models based on two approaches for computing prediction scores: using simple averages and weighted averages. Our findings suggest that theweighted averages scheme more accurately computes predictions of user interests than the simple averages scheme does.Clustering. Clustering techniques are frequently applied in the design of personalization systems. We studied the performance of the CLARANS clustering algorithm in high dimensional space in relation to the PAM and CLARA clustering algorithms. While CLARA had the best time performance, CLARANS resulted in clusterswith the lowest intra-cluster dissimilarities, and so was most effective in this regard
    • 

    corecore