1,109 research outputs found

    Investigation of Heterogeneous Approach to Fact Invention of Web Users’ Web Access Behaviour

    Get PDF
    World Wide Web consists of a huge volume of different types of data. Web mining is one of the fields of data mining wherein there are different web services and a large number of web users. Web user mining is also one of the fields of web mining. The web users’ information about the web access is collected through different ways. The most common technique to collect information about the web users is through web log file. There are several other techniques available to collect web users’ web access information; they are through browser agent, user authentication, web review, web rating, web ranking and tracking cookies. The web users find it difficult to retrieve their required information in time from the web because of the huge volume of unstructured and structured information which increases the complexity of the web. Web usage mining is very much important for various purposes such as organizing website, business and maintenance service, personalization of website and reducing the network bandwidth. This paper provides an analysis about the web usage mining techniques. Â

    NEW METHODS FOR MINING SEQUENTIAL AND TIME SERIES DATA

    Get PDF
    Data mining is the process of extracting knowledge from large amounts of data. It covers a variety of techniques aimed at discovering diverse types of patterns on the basis of the requirements of the domain. These techniques include association rules mining, classification, cluster analysis and outlier detection. The availability of applications that produce massive amounts of spatial, spatio-temporal (ST) and time series data (TSD) is the rationale for developing specialized techniques to excavate such data. In spatial data mining, the spatial co-location rule problem is different from the association rule problem, since there is no natural notion of transactions in spatial datasets that are embedded in continuous geographic space. Therefore, we have proposed an efficient algorithm (GridClique) to mine interesting spatial co-location patterns (maximal cliques). These patterns are used as the raw transactions for an association rule mining technique to discover complex co-location rules. Our proposal includes certain types of complex relationships – especially negative relationships – in the patterns. The relationships can be obtained from only the maximal clique patterns, which have never been used until now. Our approach is applied on a well-known astronomy dataset obtained from the Sloan Digital Sky Survey (SDSS). ST data is continuously collected and made accessible in the public domain. We present an approach to mine and query large ST data with the aim of finding interesting patterns and understanding the underlying process of data generation. An important class of queries is based on the flock pattern. A flock is a large subset of objects moving along paths close to each other for a predefined time. One approach to processing a “flock query” is to map ST data into high-dimensional space and to reduce the query to a sequence of standard range queries that can be answered using a spatial indexing structure; however, the performance of spatial indexing structures rapidly deteriorates in high-dimensional space. This thesis sets out a preprocessing strategy that uses a random projection to reduce the dimensionality of the transformed space. We use probabilistic arguments to prove the accuracy of the projection and to present experimental results that show the possibility of managing the curse of dimensionality in a ST setting by combining random projections with traditional data structures. In time series data mining, we devised a new space-efficient algorithm (SparseDTW) to compute the dynamic time warping (DTW) distance between two time series, which always yields the optimal result. This is in contrast to other approaches which typically sacrifice optimality to attain space efficiency. The main idea behind our approach is to dynamically exploit the existence of similarity and/or correlation between the time series: the more the similarity between the time series, the less space required to compute the DTW between them. Other techniques for speeding up DTW, impose a priori constraints and do not exploit similarity characteristics that may be present in the data. Our experiments demonstrate that SparseDTW outperforms these approaches. We discover an interesting pattern by applying SparseDTW algorithm: “pairs trading” in a large stock-market dataset, of the index daily prices from the Australian stock exchange (ASX) from 1980 to 2002

    A framework for trend mining with application to medical data

    Get PDF
    This thesis presents research work conducted in the field of knowledge discovery. It presents an integrated trend-mining framework and SOMA, which is the application of the trend-mining framework in diabetic retinopathy data. Trend mining is the process of identifying and analysing trends in the context of the variation of support of the association/classification rules that have been extracted from longitudinal datasets. The integrated framework concerns all major processes from data preparation to the extraction of knowledge. At the pre-process stage, data are cleaned, transformed if necessary, and sorted into time-stamped datasets using logic rules. At the next stage, time-stamp datasets are passed through the main processing, in which the ARM technique of matrix algorithm is applied to identify frequent rules with acceptable confidence. Mathematical conditions are applied to classify the sequences of support values into trends. Afterwards, interestingness criteria are applied to obtain interesting knowledge, and a visualization technique is proposed that maps how objects are moving from the previous to the next time stamp. A validation and verification (external and internal validation) framework is described that aims to ensure that the results at the intermediate stages of the framework are correct and that the framework as a whole can yield results that demonstrate causality. To evaluate the thesis, SOMA was developed. The dataset is, in itself, also of interest, as it is very noisy (in common with other similar medical datasets) and does not feature a clear association between specific time stamps and subsets of the data. The Royal Liverpool University Hospital has been a major centre for retinopathy research since 1991. Retinopathy is a generic term used to describe damage to the retina of the eye, which can, in the long term, lead to visual loss. Diabetic retinopathy is used to evaluate the framework, to determine whether SOMA can extract knowledge that is already known to the medics. The results show that those datasets can be used to extract knowledge that can show causality between patients’ characteristics such as the age of patient at diagnosis, type of diabetes, duration of diabetes, and diabetic retinopathy

    Generating and Justifying Design Theory

    Get PDF
    This paper applies Simon’s (1996) sciences of the artificial to elaborate a set of structures and processes for developing design theory. Goals, kernel theory, and artifacts inform an inter-related prototyping cycle of design, evaluation, and appropriation / generation to produce strategic design theory. The paper identifies DSR project types to provide signposts for starting and ending the cycle, artifact and evaluation iteration to facilitate the process and provide a chain of evidence, a simplified format for representing design theory iterations, and stopping rules to end the cycle. We use a detailed example to illustrate the ideas, discuss related work, and identify limitations and future research opportunities

    Mastering the Spatio-Temporal Knowledge Discovery Process

    Get PDF
    The thesis addresses a topic of great importance: a framework for data mining positioning data collected by personal mobile devices. The main contribution of this thesis is the creation of a theoretical and practical framework in order to manage the complex Knowledge discovery process on mobility data. Hence the creation of such framework leads to the integration of very different aspects of the process with their assumptions and requirements. The result is a homogeneous system which gives the possibility to exploit the power of all the components with the same flexibilities of a database such as a new way to use the ontology for an automatic reasoning on trajectory data. Furthermore two extensions are invented and developed and then integrated in the system to confirm the extensibility of it: a innovative way to reconstruct the trajectories considering the uncertainty of the path followed and a Location prediction algorithm called WhereNext. Another important contribution of the thesis is the experimentation on a real case of study on analysis of mobility data. It has been shown the usefulness of the system for a mobility manager who is provided with a knowledge discovery framework

    Trade marketing analytics in consumer goods industry

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies ManagementWe address transparency of trade spends in consumer goods industry and propose a set of business performance indicators that follow Pareto (80/20) rule – a popular concept in optimization problem solving. Discovery of power laws in behaviors of travelling sales persons, buying patterns of customers, popularity of products, and market demand fluctuations – all that leads to better-informed decisions among all those involved into planning, execution, and post-promotion evaluation. Practical result of our work is a prototype implementation of proposed measures. The most remarkable finding – consistency of travelling sales person journey between customer locations. Loyalty to brand, or brand market power – whatever forces field sales representatives to put at least one product of market player of interest into nearly every market basket – fits into small world model. This behavior not only changes from person to person, but also remains the same after reassignment into different territory. For industrialization stage of this project, we outline key design considerations for information system capable of handling real-time workload scalable to petabytes. We built our analyses for collaborative processes of integrated planning that requires joint effort of multidisciplinary team. Field tests demonstrate how insights from data can trigger business transformation. That is why we end up with recommendation for system integrators to include Knowledge Discovery into information system deployment projects

    C3S2E-2008-2016-FinalPrograms

    Get PDF
    This document records the final programs for each of the 9 meetings of the C* Conference on Computer Science & Software Engineering, C 3S2E which were organized in various locations on three continents. The papers published during these years are accessible from the digital librariy of ACM(2008-2016

    Process Mining Handbook

    Get PDF
    This is an open access book. This book comprises all the single courses given as part of the First Summer School on Process Mining, PMSS 2022, which was held in Aachen, Germany, during July 4-8, 2022. This volume contains 17 chapters organized into the following topical sections: Introduction; process discovery; conformance checking; data preprocessing; process enhancement and monitoring; assorted process mining topics; industrial perspective and applications; and closing

    A Machine Learning Enhanced Scheme for Intelligent Network Management

    Get PDF
    The versatile networking services bring about huge influence on daily living styles while the amount and diversity of services cause high complexity of network systems. The network scale and complexity grow with the increasing infrastructure apparatuses, networking function, networking slices, and underlying architecture evolution. The conventional way is manual administration to maintain the large and complex platform, which makes effective and insightful management troublesome. A feasible and promising scheme is to extract insightful information from largely produced network data. The goal of this thesis is to use learning-based algorithms inspired by machine learning communities to discover valuable knowledge from substantial network data, which directly promotes intelligent management and maintenance. In the thesis, the management and maintenance focus on two schemes: network anomalies detection and root causes localization; critical traffic resource control and optimization. Firstly, the abundant network data wrap up informative messages but its heterogeneity and perplexity make diagnosis challenging. For unstructured logs, abstract and formatted log templates are extracted to regulate log records. An in-depth analysis framework based on heterogeneous data is proposed in order to detect the occurrence of faults and anomalies. It employs representation learning methods to map unstructured data into numerical features, and fuses the extracted feature for network anomaly and fault detection. The representation learning makes use of word2vec-based embedding technologies for semantic expression. Next, the fault and anomaly detection solely unveils the occurrence of events while failing to figure out the root causes for useful administration so that the fault localization opens a gate to narrow down the source of systematic anomalies. The extracted features are formed as the anomaly degree coupled with an importance ranking method to highlight the locations of anomalies in network systems. Two types of ranking modes are instantiated by PageRank and operation errors for jointly highlighting latent issue of locations. Besides the fault and anomaly detection, network traffic engineering deals with network communication and computation resource to optimize data traffic transferring efficiency. Especially when network traffic are constrained with communication conditions, a pro-active path planning scheme is helpful for efficient traffic controlling actions. Then a learning-based traffic planning algorithm is proposed based on sequence-to-sequence model to discover hidden reasonable paths from abundant traffic history data over the Software Defined Network architecture. Finally, traffic engineering merely based on empirical data is likely to result in stale and sub-optimal solutions, even ending up with worse situations. A resilient mechanism is required to adapt network flows based on context into a dynamic environment. Thus, a reinforcement learning-based scheme is put forward for dynamic data forwarding considering network resource status, which explicitly presents a promising performance improvement. In the end, the proposed anomaly processing framework strengthens the analysis and diagnosis for network system administrators through synthesized fault detection and root cause localization. The learning-based traffic engineering stimulates networking flow management via experienced data and further shows a promising direction of flexible traffic adjustment for ever-changing environments

    Transformational Leadership for Frontline Leaders

    Get PDF
    Abstract Problem. Transformational leadership (TL) represents the gold standard of leadership styles in contemporary healthcare organizations. The transformational leader\u27s ability to motivate, influence, stimulate, inspire, and attend to followers\u27 individual needs is an antecedent to job satisfaction, quality, and patient safety. The project aimed to improve TL constructs among frontline leaders (managers and assistant nurse managers). Based on the results of a needs assessment, these frontline leaders were provided an opportunity to improve their TL style. Context. Leadership development is a strategic priority for a medium-sized medical center in a healthcare system in Northern California. Frontline leaders within patient care services (PCS) for this medical center strive to achieve job satisfaction and improve patient outcomes. Ten frontline leaders volunteered to participate in this evidence-based TL development program. Interventions. The TL development program included didactic education on TL theory, inspirational motivation, idealized influence, and emotional intelligence (Bradberry & Greaves, 2009) during three four-hour sessions scheduled between February 10, 2020, to August 5, 2020. The pedagogy involved lectures, reflective practice, team coaching, action learning concepts, and adult learning principles. Measures. The impact of the TL development program was appraised using a pre-post assessment with a modified Multifactor Leadership Questionnaire (MLQ™) 5X (Avolio & Bass, 2004). The MLQ™ 5X is a valid and reliable instrument that measures overall TL and five constructs: idealized influence attributes (IIA), idealized influence behaviors (IIB), inspirational motivation (IM), intellectual stimulation (IS), and individual consideration (IC) (Avolio & Bass, 2004, pp. 103-104). This appraisal included both self-assessment of participants and rater-assessment of participants by identified supervisors, peers, and subordinates. Results. Statistical analysis for overall TL scores on the MLQ™ 5X revealed that participants’ self-assessed scores declined slightly from pre-intervention (M = 3.1, n = 10) to post-intervention (M = 2.9, n = 9). Conversely, the TL rater-assessed scores of participants increased from pre-intervention (M = 3.1) to post-intervention (M = 3.3). Subordinates rated participants\u27 TL style higher than participants rated themselves at both pre-and post-intervention. Supervisors rated participants\u27 TL style for all constructs lower at pre-intervention but higher at post-intervention. Conclusions. The global coronavirus pandemic, societal unrest, and fires in the general area may have impacted participants\u27 ability to view themselves during the project as improving transformational leaders. The MLQ™ 5X total mean score for supervisor ratings of participants improved from pre-intervention (M = 2.7) to post-intervention (M = 3.2). Post-intervention, supervisors perceived higher TL levels among those they supervise, based on their performance during a crisis. Specifically, supervisors’ mean scores for “encourages innovative thinking – intellectual stimulation (IS) and coaches and develops people – individual consideration (IC)” (Avolio & Bass, 2020b, p. 3) (M = 3.5) exceeded the participants’ self-rated scores (M = 3.0). The scores in these two constructs may reflect the frontline leaders\u27 innovation and coaching during the pandemic. Participants reported feeling less confident in their TL acumen after learning about TL constructs during the program. Further research is required to design and implement effective, evidence-based leadership development programs and mitigate learning impediments
    • …
    corecore