4,009 research outputs found

    Data Mining Techniques for Mining Query Logs in Web Search Engines

    Get PDF
    International audienceThe Web is the biggest repository of documents humans have ever built. Even more, it is increasingly growing in size every day. Users rely on Web search engines (WSEs) for finding information on the Web. By submitting a textual query expressing their information need, WSE users obtain a list of documents that are highly relevant to the query. Moreover, WSEs store such huge amount of users activities in query logs. Query log mining is the set of techniques aiming at extracting valuable knowledge from query logs. This knowledge represents one of the most used ways of enhancing the users search experience. The primary focus of this work is on introducing the data mining techniques for mining query logs in web search engines and showing how search engines applications may benefit from this mining

    Latitude, longitude, and beyond:mining mobile objects' behavior

    Get PDF
    Rapid advancements in Micro-Electro-Mechanical Systems (MEMS), and wireless communications, have resulted in a surge in data generation. Mobility data is one of the various forms of data, which are ubiquitously collected by different location sensing devices. Extensive knowledge about the behavior of humans and wildlife is buried in raw mobility data. This knowledge can be used for realizing numerous viable applications ranging from wildlife movement analysis, to various location-based recommendation systems, urban planning, and disaster relief. With respect to what mentioned above, in this thesis, we mainly focus on providing data analytics for understanding the behavior and interaction of mobile entities (humans and animals). To this end, the main research question to be addressed is: How can behaviors and interactions of mobile entities be determined from mobility data acquired by (mobile) wireless sensor nodes in an accurate and efficient manner? To answer the above-mentioned question, both application requirements and technological constraints are considered in this thesis. On the one hand, applications requirements call for accurate data analytics to uncover hidden information about individual behavior and social interaction of mobile entities, and to deal with the uncertainties in mobility data. Technological constraints, on the other hand, require these data analytics to be efficient in terms of their energy consumption and to have low memory footprint, and processing complexity

    Framework based on complex networks to model and mine patient pathways

    Full text link
    The automatic discovery of a model to represent the history of encounters of a group of patients with the healthcare system -- the so-called ``pathway of patients'' -- is a new field of research that supports clinical and organisational decisions to improve the quality and efficiency of the treatment provided. The pathways of patients with chronic conditions tend to vary significantly from one person to another, have repetitive tasks, and demand the analysis of multiple perspectives (interventions, diagnoses, medical specialities, among others) influencing the results. Therefore, modelling and mining those pathways is still a challenging task. In this work, we propose a framework comprising: (i) a pathway model based on a multi-aspect graph, (ii) a novel dissimilarity measurement to compare pathways taking the elapsed time into account, and (iii) a mining method based on traditional centrality measures to discover the most relevant steps of the pathways. We evaluated the framework using the study cases of pregnancy and diabetes, which revealed its usefulness in finding clusters of similar pathways, representing them in an easy-to-interpret way, and highlighting the most significant patterns according to multiple perspectives.Comment: 35 pages, 11 figures, 2 appendice

    Combination of web usage, content and structure information for diverse web mining applications in the tourism context and the context of users with disabilities

    Get PDF
    188 p.This PhD focuses on the application of machine learning techniques for behaviourmodelling in different types of websites. Using data mining techniques two aspects whichare problematic and difficult to solve have been addressed: getting the system todynamically adapt to possible changes of user preferences, and to try to extract theinformation necessary to ensure the adaptation in a transparent manner for the users,without infringing on their privacy. The work in question combines information of differentnature such as usage information, content information and website structure and usesappropriate web mining techniques to extract as much knowledge as possible from thewebsites. The extracted knowledge is used for different purposes such as adaptingwebsites to the users through proposals of interesting links, so that the users can get therelevant information more easily and comfortably; for discovering interests or needs ofusers accessing the website and to inform the service providers about it; or detectingproblems during navigation.Systems have been successfully generated for two completely different fields: thefield of tourism, working with the website of bidasoa turismo (www.bidasoaturismo.com)and, the field of disabled people, working with discapnet website (www.discapnet.com)from ONCE/Tecnosite foundation

    Multidimensional process discovery

    Get PDF

    Recommending Best Products from E-commerce Purchase History and User Click Behavior Data

    Get PDF
    E-commerce collaborative filtering recommendation systems, the main input data of user-item rating matrix is a binary purchase data showing only what items a user has purchased recently. This matrix is usually sparse and does not provide a lot of information about customer purchases or product clickstream behavior (eg., clicks, basket placement, and purchase) history, which possibly can improve product recommendations accuracy. Existing recommendation systems in E-commerce with clickstream data include those referred in this thesis as Kim05Rec, Kim11Rec, and Chen13Rec. Kim05Rec forms a decision tree on click behavior attributes such as search type and visit times, discovers the possibility of a user putting products into the basket and uses the information to enrich the user-item rating matrix. If a user clicked a product, Kim11Rec then finds the associated products for it in three stages such as click, basket and purchase, uses the lift value from these stages and calculates a score, it then uses the score to make recommendations. Chen13Rec measures the similarity of users on their category click patterns such as click sequences, click times and visit duration; it then can use the similarity to enhance the collaborative filtering algorithm. However, the similarity between click sequences in sessions can apply to the purchases to some extent, especially for sessions without purchases, this will be able to predict purchases for those session users. But the existing systems have not integrated it, or the historical purchases which shows more than whether or not a user has purchased a product before. In this thesis, we propose HPCRec (Historical Purchase with Clickstream based Recommendation System) to enrich the ratings matrix from both quantity and quality aspects. HPCRec firstly forms a normalized rating-matrix with higher quality ratings from historical purchases, then mines consequential bond between clicks and purchases with weighted frequencies where the weights are similarities between sessions, but rating quantity is better by integrating this information. The experimental results show that our approach HPCRec is more accurate than these existing methods, HPCRec is also capable of handling infrequent cases whereas the existing methods can not
    • …
    corecore