363 research outputs found

    On NIS-Apriori Based Data Mining in SQL

    Get PDF
    We have proposed a framework of Rough Non-deterministic Information Analysis (RNIA) for tables with non-deterministic information, and applied RNIA to analyzing tables with uncertainty. We have also developed the RNIA software tool in Prolog and getRNIA in Python, in addition to these two tools we newly consider the RNIA software tool in SQL for handling large size data sets. This paper reports the current state of the prototype named NIS-Apriori in SQL, which will afford us more convenient environment for data analysis.International Joint Conference on Rough Sets (IJCRS 2016), October 7-11, 2016, Santiago, Chil

    A Proposal of a Privacy-preserving Questionnaire by Non-deterministic Information and Its Analysis

    Get PDF
    We focus on a questionnaire consisting of three-choice question or multiple-choice question, and propose a privacy-preserving questionnaire by non-deterministic information. Each respondent usually answers one choice from the multiple choices, and each choice is stored as a tuple in a table data. The organizer of this questionnaire analyzes the table data set, and obtains rules and the tendency. If this table data set contains personal information, the organizer needs to employ the analytical procedures with the privacy-preserving functionality. In this paper, we propose a new framework that each respondent intentionally answers non-deterministic information instead of deterministic information. For example, he answers ‘either A, B, or C’ instead of the actual choice A, and he intentionally dilutes his choice. This may be the similar concept on the k-anonymity. Non-deterministic information will be desirable for preserving each respondent\u27s information. We follow the framework of Rough Non-deterministic Information Analysis (RNIA), and apply RNIA to the privacy-preserving questionnaire by non-deterministic information. In the current data mining algorithms, the tuples with non-deterministic information may be removed based on the data cleaning process. However, RNIA can handle such tuples as well as the tuples with deterministic information. By using RNIA, we can consider new types of privacy-preserving questionnaire.2016 IEEE International Conference on Big Data, December 5-8, 2016, Washington DC, US

    On Two Apriori-Based Rule Generators: Apriori in Prolog and Apriori in SQL

    Get PDF
    This paper focuses on two Apriori-based rule generators. The first is the rule generator in Prolog and C, and the second is the one in SQL. They are named Apriori in Prolog and Apriori in SQL, respectively. Each rule generator is based on the Apriori algorithm. However, each rule generator has its own properties. Apriori in Prolog employs the equivalence classes defined by table data sets and follows the framework of rough sets. On the other hand, Apriori in SQL employs a search for rule generation and does not make use of equivalence classes. This paper clarifies the properties of these two rule generators and considers effective applications of each to existing data sets

    NIS-Apriori-based rule generation with three-way decisions and its application system in SQL

    Get PDF
    In the study, non-deterministic information systems-Apriori-based (NIS-Apriori-based) rule generation from table data sets with incomplete information, SQL implementation, and the unique characteristics of the new framework are presented. Additionally, a few unsolved new research topics are proposed based on the framework. We follow the framework of NISs and propose certain rules and possible rules based on possible world semantics. Although each rule τ depends on a large number of possible tables, we prove that each rule τ is determined by examining only two τ -dependent possible tables. The NIS-Apriori algorithm is an adjusted Apriori algorithm that can handle such tables. Furthermore, it is logically sound and complete with regard to the rules. Subsequently, the implementation of the NIS-Apriori algorithm in SQL is described and a few new topics induced by effects of NIS-Apriori-based rule generation are confirmed. One of the topics that are considered is the possibility of estimating missing values via the obtained certain rules. The proposed methodology and the environment yielded by NIS-Apriori-based rule generation in SQL are useful for table data analysis with three-way decisions

    A Proposal of Machine Learning by Rule Generation from Tables with Non-deterministic Information and Its Prototype System

    Get PDF
    A logical framework on Machine Learning by Rule Generation (MLRG) from tables with non-deterministic information is proposed, and its prototype system in SQL is implemented. In MLRG, the certain rules defined in Rough Non-deterministic Information Analysis (RNIA) are obtained at first, and each uncertain attribute value is estimated so as to cause the certain rules as many as possible, because the certain rules show us the most reliable information. This strategy is similar to the maximum likelihood estimation in statistics. By repeating this process, a standard table and the rules in its table are learned (or estimated) from a given table with non-deterministic information. Even though it will be hard to know the actual unknown values, MLRG will give a plausible estimation value.International Joint Conference on Rough Sets (IJCRS 2017), 3-7 July, 2017, Olsztyn, Polan

    NIS-Apriori Algorithm with a Target Descriptor for Handling Rules Supported by Minor Instances

    Get PDF
    For each implication τ: Condition_part⇒ Decision_part defined in table data sets, we see τ is a rule if τ satisfies appropriate constraints, i.e., support(τ)≥α and accuracy(τ)≥β for two threshold values α and β (0<α,β≤1 ). If τ is a rule for relatively high α , we say τ is supported by major instances. On the other hand, if τ is a rule for lower α , we say τ is supported by minor instances. This paper focuses on rules supported by minor instances, and clarifies some problems. Then, the NIS-Apriori algorithm, which was proposed for handling rules supported by major instances from tables with information incompleteness, is extended to the NIS-Apriori algorithm with a target descriptor. The effectiveness of the new algorithm is examined by some experiments.The seventh International Symposium on Integrated Uncertainty in Knowledge Modelling and Decision Making (IUKM 2019), 27 - 29 March, 2019, Nara, Japa

    An adjusted Apriori algorithm to itemsets defined by tables and an improved rule generator with three-way decisions

    Get PDF
    The NIS-Apriori algorithm, which is extended from the Apriori algorithm, was proposed for rule generation from non-deterministic information systems and implemented in SQL. The realized system handles the concept of certainty, possibility, and three-way decisions. This paper newly focuses on such a characteristic of table data sets that there is usually a fixed decision attribute. Therefore, it is enough for us to handle itemsets with one decision attribute, and we can see that one frequent itemset defines one implication. We make use of these characteristics and reduce the unnecessary itemsets for improving the performance of execution. Some experiments by the implemented software tool in Python clarify the improved performance.International Joint Conference on Rough Sets, IJCRS 2020, June 29 – July 3, 2020, Havana, Cuba (COVID-19の感染拡大によるオンライン開催に変更

    Profiling relational data: a survey

    Get PDF
    Profiling data to determine metadata about a given dataset is an important and frequent activity of any IT professional and researcher and is necessary for various use-cases. It encompasses a vast array of methods to examine datasets and produce metadata. Among the simpler results are statistics, such as the number of null values and distinct values in a column, its data type, or the most frequent patterns of its data values. Metadata that are more difficult to compute involve multiple columns, namely correlations, unique column combinations, functional dependencies, and inclusion dependencies. Further techniques detect conditional properties of the dataset at hand. This survey provides a classification of data profiling tasks and comprehensively reviews the state of the art for each class. In addition, we review data profiling tools and systems from research and industry. We conclude with an outlook on the future of data profiling beyond traditional profiling tasks and beyond relational databases

    Intrusion detection and prevention of web service attacks for software as a service:Fuzzy association rules vs fuzzy associative patterns

    Get PDF
    Cloud computing inherits all the systems, networks as well asWeb Services’ security vulnerabilities, in particular for software as a service (SaaS), where business applications or services are provided over the Cloud as Web Service (WS). Hence, WS-based applications must be protected against loss of integrity, confidentiality and availability when they are deployed over to the Cloud environment. Many existing IDP systems address only attacks mostly occurring at PaaS and IaaS. In this paper, we present our fuzzy association rule-based (FAR) and fuzzy associative pattern-based (FAP) intrusion detection and prevention (IDP) systems in defending against WS attacks at the SaaS level. Our experimental results have validated the capabilities of these two IDP systems in terms of detection of known attacks and prediction of newvariant attacks with accuracy close to 100%. For each transaction transacted over the Cloud platform, detection, prevention or prediction is carried out in less than five seconds. For load and volume testing on the SaaS where the system is under stress (at a work load of 5000 concurrent users submitting normal, suspicious and malicious transactions over a time interval of 300 seconds), the FAR IDP system provides close to 95% service availability to normal transactions. Future work involves determining more quality attributes besides service availability, such as latency, throughput and accountability for a more trustworthy SaaS

    Doctor of Philosophy

    Get PDF
    dissertationWith the growing national dissemination of the electronic health record (EHR), there are expectations that the public will benefit from biomedical research and discovery enabled by electronic health data. Clinical data are needed for many diseases and conditions to meet the demands of rapidly advancing genomic and proteomic research. Many biomedical research advancements require rapid access to clinical data as well as broad population coverage. A fundamental issue in the secondary use of clinical data for scientific research is the identification of study cohorts of individuals with a disease or medical condition of interest. The problem addressed in this work is the need for generalized, efficient methods to identify cohorts in the EHR for use in biomedical research. To approach this problem, an associative classification framework was designed with the goal of accurate and rapid identification of cases for biomedical research: (1) a set of exemplars for a given medical condition are presented to the framework, (2) a predictive rule set comprised of EHR attributes is generated by the framework, and (3) the rule set is applied to the EHR to identify additional patients that may have the specified condition. iv Based on this functionality, the approach was termed the ‘cohort amplification' framework. The development and evaluation of the cohort amplification framework are the subject of this dissertation. An overview of the framework design is presented. Improvements to some standard associative classification methods are described and validated. A qualitative evaluation of predictive rules to identify diabetes cases and a study of the accuracy of identification of asthma cases in the EHR using frameworkgenerated prediction rules are reported. The framework demonstrated accurate and reliable rules to identify diabetes and asthma cases in the EHR and contributed to methods for identification of biomedical research cohorts
    corecore