11,777 research outputs found

    Re-mining item associations: methodology and a case study in apparel retailing

    Get PDF
    Association mining is the conventional data mining technique for analyzing market basket data and it reveals the positive and negative associations between items. While being an integral part of transaction data, pricing and time information have not been integrated into market basket analysis in earlier studies. This paper proposes a new approach to mine price, time and domain related attributes through re-mining of association mining results. The underlying factors behind positive and negative relationships can be characterized and described through this second data mining stage. The applicability of the methodology is demonstrated through the analysis of data coming from a large apparel retail chain, and its algorithmic complexity is analyzed in comparison to the existing techniques

    Unapparent information revelation for counterterrorism: Visualizing associations using a hybrid graph-based approach

    Get PDF
    Unapparent Information Revelation refers to the task in the text mining of a document collection of revealing interesting information other than that which is explicitly stated. It focuses on detecting possible links between concepts across multiple text documents by generating a graph that matches the evidence trail found in the documents. A Concept Chain Graph is a statistical technique to find links in snippets of information where singularly each small piece appears to be unconnected.In relation to algorithm performance, Latent Semantic Indexing and the Contextual Network Graph are found to be comparable to the Concept Chain Graph.These aspects are explored and discussed.In this paper,a review is performed on these three similarly grounded approaches. The Concept Chain Graph is proposed as being suited to extracting interesting relations among concepts that co-occur within text collections due to its prominent ability to construct a directed graph, representing the evidence trail. It is the baseline study for our hybrid Concept Chain Graph approac

    The contribution of data mining to information science

    Get PDF
    The information explosion is a serious challenge for current information institutions. On the other hand, data mining, which is the search for valuable information in large volumes of data, is one of the solutions to face this challenge. In the past several years, data mining has made a significant contribution to the field of information science. This paper examines the impact of data mining by reviewing existing applications, including personalized environments, electronic commerce, and search engines. For these three types of application, how data mining can enhance their functions is discussed. The reader of this paper is expected to get an overview of the state of the art research associated with these applications. Furthermore, we identify the limitations of current work and raise several directions for future research

    Processing count queries over event streams at multiple time granularities

    Get PDF
    Management and analysis of streaming data has become crucial with its applications in web, sensor data, network tra c data, and stock market. Data streams consist of mostly numeric data but what is more interesting is the events derived from the numerical data that need to be monitored. The events obtained from streaming data form event streams. Event streams have similar properties to data streams, i.e., they are seen only once in a fixed order as a continuous stream. Events appearing in the event stream have time stamps associated with them in a certain time granularity, such as second, minute, or hour. One type of frequently asked queries over event streams is count queries, i.e., the frequency of an event occurrence over time. Count queries can be answered over event streams easily, however, users may ask queries over di erent time granularities as well. For example, a broker may ask how many times a stock increased in the same time frame, where the time frames specified could be hour, day, or both. This is crucial especially in the case of event streams where only a window of an event stream is available at a certain time instead of the whole stream. In this paper, we propose a technique for predicting the frequencies of event occurrences in event streams at multiple time granularities. The proposed approximation method e ciently estimates the count of events with a high accuracy in an event stream at any time granularity by examining the distance distributions of event occurrences. The proposed method has been implemented and tested on di erent real data sets and the results obtained are presented to show its e ectiveness

    Web Mining for Social Network Analysis:A Review, Direction and Future Vision.

    Get PDF
    Although web is rich in data, gathering this data and making sense of this data is extremely difficult due to its unorganised nature. Therefore existing Data Mining techniques can be applied toextract information from the web data. The knowledge thus extracted can also be used for Analysis of Social Networks and Online Communities. This paper gives a brief insight to Web Mining and Link Analysis used in Social Network Analysis and reveals the algorithms such as HITS, PAGERANK, SALSA, PHITS, CLEVER and INDEGREE which gives a measure to identify Online Communities over Social Networks. The most common amongst these algorithms are PageRank and HITS. PageRank measures the importance of a page efficiently with the help of inlinks in less time, while HITS uses both inlinks and outlinks to measure the importance of a web page and is sensitive to user query. Further various extensions to these algorithms also exist to refine the query based search results. It opens many doors for future researches to find undiscovered knowledge of existing online communities over various social networks.Keywords:Web Structure Mining, Link Analysis, Link Mining, Online Community Minin

    Knowledge Discovery Through Large-Scale Literature-Mining of Biological Text-Data

    Get PDF
    The aim of this study is to develop scalable and efficient literature-mining framework for knowledge discovery in the field of medical and biological sciences. Using this scalable framework, customized disease-disease interaction network can be constructed. Features of the proposed network that differentiate it from existing networks are its 1) flexibility in the level of abstraction, 2) broad coverage, and 3) domain specificity. Empirical results for two neurological diseases have shown the utility of the proposed framework. The second goal of this study is to design and implement a bottom-up information retrieval approach to facilitate literature-mining in the specialized field of medical genetics. Experimental results are being corroborated at the moment