1,021 research outputs found

    Three Modern Roles for Logic in AI

    Full text link
    We consider three modern roles for logic in artificial intelligence, which are based on the theory of tractable Boolean circuits: (1) logic as a basis for computation, (2) logic for learning from a combination of data and knowledge, and (3) logic for reasoning about the behavior of machine learning systems.Comment: To be published in PODS 202

    Exact Learning Augmented Naive Bayes Classifier

    Full text link
    Earlier studies have shown that classification accuracies of Bayesian networks (BNs) obtained by maximizing the conditional log likelihood (CLL) of a class variable, given the feature variables, were higher than those obtained by maximizing the marginal likelihood (ML). However, differences between the performances of the two scores in the earlier studies may be attributed to the fact that they used approximate learning algorithms, not exact ones. This paper compares the classification accuracies of BNs with approximate learning using CLL to those with exact learning using ML. The results demonstrate that the classification accuracies of BNs obtained by maximizing the ML are higher than those obtained by maximizing the CLL for large data. However, the results also demonstrate that the classification accuracies of exact learning BNs using the ML are much worse than those of other methods when the sample size is small and the class variable has numerous parents. To resolve the problem, we propose an exact learning augmented naive Bayes classifier (ANB), which ensures a class variable with no parents. The proposed method is guaranteed to asymptotically estimate the identical class posterior to that of the exactly learned BN. Comparison experiments demonstrated the superior performance of the proposed method.Comment: 29 page

    Development of data-mining technique for seismic vulnerability assessment

    Get PDF
    Assessment of seismic vulnerability of urbaninfrastructure is an actual problem, since the damage caused byearthquakes is quite significant. Despite the complexity of suchtasks, today’s machine learning methods allow the use of “fast”methods for assessing seismic vulnerability. The article proposesa methodology for assessing the characteristics of typical urbanobjects that affect their seismic resistance; using classification andclustering methods. For the analysis, we use kmeans and hkmeansclustering methods, where the Euclidean distance is used as ameasure of proximity. The optimal number of clusters isdetermined using the Elbow method. A decision-making model onthe seismic resistance of an urban object is presented, also themost important variables that have the greatest impact on theseismic resistance of an urban object are identified. The studyshows that the results of clustering coincide with expert estimates,and the characteristic of typical urban objects can be determinedas a result of data modeling using clustering algorithms

    Privacy versus Information in Keystroke Latency Data

    Get PDF
    The computer science education research field studies how students learn computer science related concepts such as programming and algorithms. One of the major goals of the field is to help students learn CS concepts that are often difficult to grasp because students rarely encounter them in primary or secondary education. In order to help struggling students, information on the learning process of students has to be collected. In many introductory programming courses process data is automatically collected in the form of source code snapshots. Source code snapshots usually include at least the source code of the student's program and a timestamp. Studies ranging from identifying at-risk students to inferring programming experience and topic knowledge have been conducted using source code snapshots. However, replicating source code snapshot -based studies is currently hard as data is rarely shared due to privacy concerns. Source code snapshot data often includes many attributes that can be used for identification, for example the name of the student or the student number. There can even be hidden identifiers in the data that can be used for identification even if obvious identifiers are removed. For example, keystroke data from source code snapshots can be used for identification based on the distinct typing profiles of students. Hence, simply removing explicit identifiers such as names and student numbers is not enough to protect the privacy of the users who have supplied the data. At the same time, removing all keystroke data would decrease the value of the data significantly and possibly preclude replication studies. In this work, we investigate how keystroke data from a programming context could be modified to prevent keystroke latency -based identification whilst still retaining valuable information in the data. This study is the first step in enabling the sharing of anonymized source code snapshots. We investigate the degree of anonymization required to make identification of students based on their typing patterns unreliable. Then, we study whether the modified keystroke data can still be used to infer the programming experience of the students as a case study of whether the anonymized typing patterns have retained at least some informative value. We show that it is possible to modify data so that keystroke latency -based identification is no longer accurate, but the programming experience of the students can still be inferred, i.e. the data still has value to researchers

    Machine-Learning Classifiers for Malware Detection Using Data Features

    Get PDF
    The spread of ransomware has risen exponentially over the past decade, causing huge financial damage to multiple organizations. Various anti-ransomware firms have suggested methods for preventing malware threats. The growing pace, scale and sophistication of malware provide the anti-malware industry with more challenges. Recent literature indicates that academics and anti-virus organizations have begun to use artificial learning as well as fundamental modeling techniques for the research and identification of malware. Orthodox signature-based anti-virus programs struggle to identify unfamiliar malware and track new forms of malware. In this study, a malware evaluation framework focused on machine learning was adopted that consists of several modules: dataset compiling in two separate classes (malicious and benign software), file disassembly, data processing, decision making, and updated malware identification. The data processing module uses grey images, functions for importing and Opcode n-gram to remove malware functionality. The decision making module detects malware and recognizes suspected malware. Different classifiers were considered in the research methodology for the detection and classification of malware. Its effectiveness was validated on the basis of the accuracy of the complete process
    corecore