309 research outputs found

    Optimal constraint-based decision tree induction from itemset lattices

    No full text
    International audienceIn this article we show that there is a strong connection between decision tree learning and local pattern mining. This connection allows us to solve the computationally hard problem of finding optimal decision trees in a wide range of applications by post-processing a set of patterns: we use local patterns to construct a global model. We exploit the connection between constraints in pattern mining and constraints in decision tree induction to develop a framework for categorizing decision tree mining constraints. This framework allows us to determine which model constraints can be pushed deeply into the pattern mining process, and allows us to improve the state-of-the-art of optimal decision tree induction

    Item-centric mining of frequent patterns from big uncertain data

    Get PDF
    Item-centric mining of frequent patterns from big uncertain dat

    Data Mining Algorithms for Internet Data: from Transport to Application Layer

    Get PDF
    Nowadays we live in a data-driven world. Advances in data generation, collection and storage technology have enabled organizations to gather data sets of massive size. Data mining is a discipline that blends traditional data analysis methods with sophisticated algorithms to handle the challenges posed by these new types of data sets. The Internet is a complex and dynamic system with new protocols and applications that arise at a constant pace. All these characteristics designate the Internet a valuable and challenging data source and application domain for a research activity, both looking at Transport layer, analyzing network tra c flows, and going up to Application layer, focusing on the ever-growing next generation web services: blogs, micro-blogs, on-line social networks, photo sharing services and many other applications (e.g., Twitter, Facebook, Flickr, etc.). In this thesis work we focus on the study, design and development of novel algorithms and frameworks to support large scale data mining activities over huge and heterogeneous data volumes, with a particular focus on Internet data as data source and targeting network tra c classification, on-line social network analysis, recommendation systems and cloud services and Big data

    Querying recurrent convoys over trajectory data

    Get PDF
    National Research Foundation (NRF) Singapore under International Research Centres in Singapore Funding Initiativ

    A Constraint-based Querying System for Exploratory Pattern Discovery

    Get PDF
    In this article we present CONQUEST, a constraint-based querying system able to support the intrinsically exploratory (i.e., human-guided, interactive and iterative) nature of pattern discovery. Following the inductive database vision, our framework provides users with an expressive constraint-based query language, which allows the discovery process to be effectively driven toward potentially interesting patterns. Such constraints are also exploited to reduce the cost of pattern mining computation. CONQUEST is a comprehensive mining system that can access real-world relational databases from which to extract data. Through the interaction with a friendly graphical user interface (GUI), the user can define complex mining queries by means of few clicks. After a pre-processing step, mining queries are answered by an efficient and robust pattern mining engine which entails the state-of-the-art of data and search space reduction techniques. Resulting patterns are then presented to the user in a pattern browsing window, and possibly stored back in the underlying database as relations

    Validation of an Innovation Mining Framework

    Get PDF
    The driving hypothesis of this thesis is that a quantitative approach linking business objectives of an organization with technological limitations of the physical product would enable industry to create more innovative products. The main goal of this research is to validate the applicability and reliability of the innovation mining framework developed by Peyyeti (2016) to identify innovation opportunities and components worth innovating in a product. In this work, the innovation mining framework is applied with minor modifications to a mechanical pencil, innovation scenarios were then compared to existing innovations in mechanical pencils. Based on the success of the feasibility trial, the innovation mining framework was applied to a Dirt-Devil vacuum and compared to innovations implemented in the Dyson-V6 vacuum to improve a set of chosen value-metrics. Based on this study, the following insights were developed: (1) The model sufficiently identified several innovation opportunities to improve each value-metric (2) Varying weighting schemes do not have significant effects on filtered data (3) The top-half of the dendrogram contains the most relevant clusters that present viable innovation opportunities (4) The relevant clusters must be viewed from a systems thinking perspective as a single chain that must be innovated for the most benefit (5) Implementing this model provokes systems thinking approach in the user. This gives a substantial advantage over intuitive and qualitative approaches by providing insights on hidden relationships and identifying innovation opportunities in a system that may otherwise be ignored or unexplored. Opportunities for future-work include developing a transfer-function system representing true relationships, performing SVD at every level of the coupling matrices to gain insights into the nature of transformation and cluster formation, comparing clusters obtained to failure-modes associated with the corresponding value-metric for systematic prioritization and comparing dendrogram clusters with function-structure map to get detailed insights on clusters and their interactions

    Event Correlated Usage Mapping in an Embedded Linux System - A Data Mining Approach

    Get PDF
    A software system composed of applications running on embedded devices could be hard to monitor and debug due to the limited possibilities to extract information about the complex process interactions. Logging and monitoring the systems behavior help in getting an insight of the system status. The information gathered can be used for improving the system and helping developers to understand what caused a malfunctioning behavior. This thesis explores the possibility of implementing an Event Sniffer that runs on an embedded Linux device and monitors processes and overall system performance to enable mapping between system usage and load on certain parts of the system. It also examines the use of data mining to process the large amount of data logged by the Event Sniffer and with this find frequent sequential patterns that cause a bug to affect the system’s performance. The final prototype of the Event Sniffer logs process cpu usage, memory usage, process function calls, interprocess communication, system overall performance and other application specific data. To evaluate the data mining of the logged information a bug pattern was planted in the interprocess communication, that caused a false malfunctioning. The data mining analysis of the logged interprocess communication was able to find the planted bug-patterna that caused the false malfunctioning. A search for a memory leak with the help of data mining was also tested by mining function calls from a process. This test found sequential patterns that was unique when the memory increased

    Multi-level analysis of Malware using Machine Learning

    Get PDF
    Multi-level analysis of Malware using Machine Learnin

    Is There a Corporate Debt Crisis?

    Get PDF
    macroeconomics, Corporate, Debt, Crisis
    corecore