9,963 research outputs found

    A taxonomy of supply chain innovations

    Get PDF
    In this paper, a taxonomy of supply chain and logistics innovations was developed and presented. The taxonomy was based on an extensive literature survey of both theoretical research and case studies. The primary goals are to provide guidelines for choosing the most appropriate innovations for a company, and help companies in positioning themselves in the supply of chain innovations landscape. To this end, the three dimensions of supply chain innovations, namely the goals, supply chain attributes, and innovation attributes were identified and classified. The taxonomy allows for the efficient representation of critical supply chain innovations information, and serves the mentioned goals, which are fundamental to companies in a multitude of industries

    Learning ontology aware classifiers

    Get PDF
    Many applications of data-driven knowledge discovery processes call for the exploration of data from multiple points of view that reflect different ontological commitments on the part of the learner. Of particular interest in this context are algorithms for learning classifiers from ontologies and data. Against this background, my dissertation research is aimed at the design and analysis of algorithms for construction of robust, compact, accurate and ontology aware classifiers. We have precisely formulated the problem of learning pattern classifiers from attribute value taxonomies (AVT) and partially specified data. We have designed and implemented efficient and theoretically well-founded AVT-based classifier learners. Based on a general strategy of hypothesis refinement to search in a generalized hypothesis space, our AVT-guided learning algorithm adopts a general learning framework that takes into account the tradeoff between the complexity and the accuracy of the predictive models, which enables us to learn a classifier that is both compact and accurate. We have also extended our approach to learning compact and accurate classifier from semantically heterogeneous data sources. We presented a principled way to reduce the problem of learning from semantically heterogeneous data to the problem of learning from distributed partially specified data by reconciling semantic heterogeneity using AVT mappings, and we described a sufficient statistics based solution

    Abstraction, aggregation and recursion for generating accurate and simple classifiers

    Get PDF
    An important goal of inductive learning is to generate accurate and compact classifiers from data. In a typical inductive learning scenario, instances in a data set are simply represented as ordered tuples of attribute values. In our research, we explore three methodologies to improve the accuracy and compactness of the classifiers: abstraction, aggregation, and recursion;Firstly, abstraction is aimed at the design and analysis of algorithms that generate and deal with taxonomies for the construction of compact and robust classifiers. In many applications of the data-driven knowledge discovery process, taxonomies have been shown to be useful in constructing compact, robust, and comprehensible classifiers. However, in many application domains, human-designed taxonomies are unavailable. We introduce algorithms for automated construction of taxonomies inductively from both structured (such as UCI Repository) and unstructured (such as text and biological sequences) data. We introduce AVT-Learner, an algorithm for automated construction of attribute value taxonomies (AVT) from data, and Word Taxonomy Learner (WTL), an algorithm for automated construction of word taxonomy from text and sequence data. We describe experiments on the UCI data sets and compare the performance of AVT-NBL (an AVT-guided Naive Bayes Learner) with that of the standard Naive Bayes Learner (NBL). Our results show that the AVTs generated by AVT-Learner are compeitive with human-generated AVTs (in cases where such AVTs are available). AVT-NBL using AVTs generated by AVT-Learner achieves classification accuracies that are comparable to or higher than those obtained by NBL; and the resulting classifiers are significantly more compact than those generated by NBL. Similarly, our experimental results of WTL and WTNBL on protein localization sequences and Reuters newswire text categorization data sets show that the proposed algorithms can generate Naive Bayes classifiers that are more compact and often more accurate than those produced by standard Naive Bayes learner for the Multinomial Model;Secondly, we apply aggregation to construct features as a multiset of values for the intrusion detection task. For this task, we propose a bag of system calls representation for system call traces and describe misuse and anomaly detection results on the University of New Mexico (UNM) and MIT Lincoln Lab (MIT LL) system call sequences with the proposed representation. With the feature representation as input, we compare the performance of several machine learning techniques for misuse detection and show experimental results on anomaly detection. The results show that standard machine learning and clustering techniques using the simple bag of system calls representation based on the system call traces generated by the operating system\u27s kernel is effective and often performs better than approaches that use foreign contiguous sequences in detecting intrusive behaviors of compromised processes;Finally, we construct a set of classifiers by recursive application of the Naive Bayes learning algorithms. Naive Bayes (NB) classifier relies on the assumption that the instances in each class can be described by a single generative model. This assumption can be restrictive in many real world classification tasks. We describe recursive Naive Bayes learner (RNBL), which relaxes this assumption by constructing a tree of Naive Bayes classifiers for sequence classification, where each individual NB classifier in the tree is based on an event model (one model for each class at each node in the tree). In our experiments on protein sequences, Reuters newswire documents and UC-Irvine benchmark data sets, we observe that RNBL substantially outperforms NB classifier. Furthermore, our experiments on the protein sequences and the text documents show that RNBL outperforms C4.5 decision tree learner (using tests on sequence composition statistics as the splitting criterion) and yields accuracies that are comparable to those of support vector machines (SVM) using similar information

    TAXONOMY DEVELOPMENT IN INFORMATION SYSTEMS: DEVELOPING A TAXONOMY OF MOBILE APPLICATIONS

    Get PDF
    The complexity of the information systems field often lends itself to classification schemes, or taxonomies, which provide ways to understand the similarities and differences among objects under study. Developing a taxonomy, however, is a complex process that is often done in an ad hoc way. This research-in-progress paper uses the design science paradigm to develop a systematic method for taxonomy development in information systems. The method we propose uses an indicator or operational level model that combines both empirical to deductive and deductive to empirical approaches. We evaluate this method by using it to develop a taxonomy of mobile applications, which we have chosen because of their ever-increasing number and variety. The resulting taxonomy contains seven dimensions with fifteen characteristics. We demonstrate the usefulness of this taxonomy by analyzing a range of current and proposed mobile applications. From the results of this analysis we identify combinations of characteristics where applications are missing and thus are candidates for new and potentially useful applications.taxonomy, design science, mobile application

    Data mining by means of generalized patterns

    Get PDF
    The thesis is mainly focused on the study and the application of pattern discovery algorithms that aggregate database knowledge to discover and exploit valuable correlations, hidden in the analyzed data, at different abstraction levels. The aim of the research effort described in this work is two-fold: the discovery of associations, in the form of generalized patterns, from large data collections and the inference of semantic models, i.e., taxonomies and ontologies, suitable for driving the mining proces

    Data Mining

    Get PDF

    Algorithm Selection Framework: A Holistic Approach to the Algorithm Selection Problem

    Get PDF
    A holistic approach to the algorithm selection problem is presented. The “algorithm selection framework uses a combination of user input and meta-data to streamline the algorithm selection for any data analysis task. The framework removes the conjecture of the common trial and error strategy and generates a preference ranked list of recommended analysis techniques. The framework is performed on nine analysis problems. Each of the recommended analysis techniques are implemented on the corresponding data sets. Algorithm performance is assessed using the primary metric of recall and the secondary metric of run time. In six of the problems, the recall of the top ranked recommendation is considered excellent with at least 95 percent of the best observed recall; the average of this metric is 79 percent due to two poorly performing recommendations. The top recommendation is Pareto efficient for three of the problems. The framework measures well against an a-priori set of criteria. The framework provides value by filtering the candidate of analytic techniques and, often, selecting a high performing technique as the top ranked recommendation. The user input and meta-data used by the framework contain information with high potential for effective algorithm selection. Future work should optimize the recommendation logic and expand the scope of techniques for other types of analysis problems. Further, the results of this proposed study should be leveraged in order to better understand the behavior of meta-learning models
    corecore