195 research outputs found

    CS 7720: Data Mining

    Get PDF
    This course studies the fundamental concepts, issues, and techniques of data mining. Topics include basics of data, data preprocessing, feature selection/extraction, frequent pattern and association/correlation mining, classification, clustering, outlier analysis, OLAP/OLAM, contrast mining, applications, etc

    CS 3200/5200: Theoretical Foundations of Computing

    Get PDF
    CS 3200/5200 is an introduction to (a) formal language and automata theory and (b) computability. For (a), we will examine mechanisms for defining syntax of languages and devices for recognizing languages. Along with the fundamentals of these two topics, the course will investigate the relationships between language definition mechanisms and language recognition devices. For (b), we will study decision problems, the Church-Turing thesis, the undecidability of the Halting Problem, and problem reduction and undecidability. The text will be the third edition of Languages and Machines: An Introduction to the Theory of Computer Science, by Thomas Sudkamp

    CS 7700: Advanced Database Systems

    Get PDF
    Introduction of design concepts, operating principles, current trends and research issues in database systems

    CS 400/600-02: Data Structures and Software Design

    Get PDF

    CS 405/605-01: Introduction to Database Management Systems

    Get PDF
    Logical and physical aspects of database management systems are surveyed. Data models including entity-relationship (ER) and relational models are presented. Physical implementation (data organization and indexing) methods are discussed. Query languages including SQL, relational algebra, relational calculus, and QBE are studied. Students will gain experience in creating and manipulating a database, and gain knowledge on professional and ethical responsibility and on the importance of privacy/security of data

    Masquerader Detection Using OCLEP: One-Class Classification Using Length Statistics of Emerging Patterns

    Get PDF
    We introduce a new method for masquerader detection that only uses a user’s own data for training, called Oneclass Classification using Length statistics of Emerging Patterns (OCLEP). Emerging patterns (EPs) are patterns whose support increases from one dataset/class to another with a big ratio, and have been very useful in earlier studies. OCLEP classifies a case T as self or masquerader by using the average length of EPs obtained by contrasting T against sets of samples of a user’s normal data. It is based on the observation that one needs long EPs to differentiate instances from a common class, but needs short EPs to differentiate instances from different classes. OCLEP has two novel features: for training it uses EPs mined from just the self class; for classification it uses the length statistics instead of the EPs themselves. Experiments show that OCLEP can achieve very good accuracy while keeping the false positive rate low, it achieves slightly better area-under-ROC-curve than SVM, and it can achieve good results when other approaches can not. OCLEP requires little effort in choosing parameters; the SVM requires significant tuning and it is hard to reach the theoretical optimal result. These features imply that OCLEP is a good complementary component for a robust masquerader detection system, even though its average performance in false positive rate is not as good as SVM’s

    A Clustering Comparison Measure Using Density Profiles and its Application to the Discovery of Alternate Clusterings

    Get PDF
    Data clustering is a fundamental and very popular method of data analysis. Its subjective nature, however, means that different clustering algorithms or different parameter settings can produce widely varying and sometimes conflicting results. This has led to the use of clustering comparison measures to quantify the degree of similarity between alternative clusterings. Existing measures, though, can be limited in their ability to assess similarity and sometimes generate unintuitive results. They also cannot be applied to compare clusterings which contain different data points, an activity which is important for scenarios such as data stream analysis. In this paper, we introduce a new clustering similarity measure, known as ADCO, which aims to address some limitations of existing measures, by allowing greater flexibility of comparison via the use of density profiles to characterize a clustering. In particular, it adopts a ‘data mining style’ philosophy to clustering comparison, whereby two clusterings are considered to be more similar, if they are likely to give rise to similar types of prediction models. Furthermore, we show that this new measure can be applied as a highly effective objective function within a new algorithm, known as MAXIMUS, for generating alternate clusterings
    • …
    corecore