7,207 research outputs found

    Parallel Algorithm for Frequent Itemset Mining on Intel Many-core Systems

    Get PDF
    Frequent itemset mining leads to the discovery of associations and correlations among items in large transactional databases. Apriori is a classical frequent itemset mining algorithm, which employs iterative passes over database combining with generation of candidate itemsets based on frequent itemsets found at the previous iteration, and pruning of clearly infrequent itemsets. The Dynamic Itemset Counting (DIC) algorithm is a variation of Apriori, which tries to reduce the number of passes made over a transactional database while keeping the number of itemsets counted in a pass relatively low. In this paper, we address the problem of accelerating DIC on the Intel Xeon Phi many-core system for the case when the transactional database fits in main memory. Intel Xeon Phi provides a large number of small compute cores with vector processing units. The paper presents a parallel implementation of DIC based on OpenMP technology and thread-level parallelism. We exploit the bit-based internal layout for transactions and itemsets. This technique reduces the memory space for storing the transactional database, simplifies the support count via logical bitwise operation, and allows for vectorization of such a step. Experimental evaluation on the platforms of the Intel Xeon CPU and the Intel Xeon Phi coprocessor with large synthetic and real databases showed good performance and scalability of the proposed algorithm.Comment: Accepted for publication in Journal of Computing and Information Technology (http://cit.fer.hr

    Improving mining efficiency: A new scheme for extracting association rules

    Get PDF
    In the age of information technology, the amount of accumulated data is tremendous. Extracting the association rule from this data is one of the important tasks in data mining.Most of the existing association rules in algorithms typically assume that the data set can fit in the memory.In this paper, we propose a practical and effective scheme to mine association rules from frequent patterns, called Prefixfoldtree scheme (PFT scheme).The original dataset is divided into folds, and then from each fold the frequent patterns are mined by using the tree projection approach.These frequent patterns are combined into one set and finally interestingness constraints are used to extract the association rules.The experiments will be conducted to illustrate the efficiency of our scheme

    How Does Science Come to Speak in the Courts? Citations Intertexts, Expert Witnesses, Consequential Facts, and Reasoning

    Get PDF
    Citations, in their highly conventionalized forms, visibly indicate each texts explicit use of the prior literature that embodies the knowledge and contentions of its field. This relation to prior texts has been called intertextuality in literary and literacy studies. Here, Bazerman discusses the citation practices and intertextuality in science and the law in theoretical and historical perspective, and considers the intersection of science and law by identifying the judicial rules that limit and shape the role of scientific literature in court proceedings. He emphasizes that from the historical and theoretical analysis, it is clear that, in the US, judicial reasoning is an intertextually tight and self-referring system that pays only limited attention to documents outside the laws, precedents, and judicial rules. The window for scientific literature to enter the courts is narrow, focused, and highly filtered. It serves as a warrant for the expert witnesses\u27 expertise, which in turn makes opinion admissible in a way not available to ordinary witnesses

    What is Probable Cause, and Why Should We Care?: The Costs, Benefits, and Meaning of Individualized Suspicion

    Get PDF
    Taslitz defines probable cause as having four components: one quantitative, one qualitative, one temporal, and one moral. He focuses on the last of these components. Individualized suspicion, the US Supreme Court has suggested, is perhaps the most important of the four components of probable cause. That is a position with which he heartily agree. The other three components each play only a supporting role. But individualized suspicion is the beating heart that gives probable cause its vitality

    The Knowledge Level Approach To Intelligent Information System Design

    Get PDF
    Traditional approaches to building intelligent information systems employ an ontology to define a representational structure for the data and information of interest within the target domain of the system. At runtime, the ontology provides a constrained template for the creation of the individual objects and relationships that together define the state of the system at a given point in time. The ontology also provides a vocabulary for expressing domain knowledge typically in the form of rules (declarative knowledge) or methods (procedural knowledge). The system utilizes the encoded knowledge, often in conjunction user input, to progress the state of the system towards the specific goals indicated by the users. While this approach has been very successful, it has some drawbacks. Regardless of the implementation paradigm the knowledge is essentially buried in the code and therefore inaccessible to most domain experts. The knowledge also tends to be very domain specific and is not extensible at runtime. This paper describes a variation on the traditional approach that employs an explicit knowledge level within the ontology to mitigate the identified drawbacks

    Doctor of Philosophy

    Get PDF
    dissertationWith the growing national dissemination of the electronic health record (EHR), there are expectations that the public will benefit from biomedical research and discovery enabled by electronic health data. Clinical data are needed for many diseases and conditions to meet the demands of rapidly advancing genomic and proteomic research. Many biomedical research advancements require rapid access to clinical data as well as broad population coverage. A fundamental issue in the secondary use of clinical data for scientific research is the identification of study cohorts of individuals with a disease or medical condition of interest. The problem addressed in this work is the need for generalized, efficient methods to identify cohorts in the EHR for use in biomedical research. To approach this problem, an associative classification framework was designed with the goal of accurate and rapid identification of cases for biomedical research: (1) a set of exemplars for a given medical condition are presented to the framework, (2) a predictive rule set comprised of EHR attributes is generated by the framework, and (3) the rule set is applied to the EHR to identify additional patients that may have the specified condition. iv Based on this functionality, the approach was termed the ‘cohort amplification' framework. The development and evaluation of the cohort amplification framework are the subject of this dissertation. An overview of the framework design is presented. Improvements to some standard associative classification methods are described and validated. A qualitative evaluation of predictive rules to identify diabetes cases and a study of the accuracy of identification of asthma cases in the EHR using frameworkgenerated prediction rules are reported. The framework demonstrated accurate and reliable rules to identify diabetes and asthma cases in the EHR and contributed to methods for identification of biomedical research cohorts

    Asbestos Lessons: The Consequences of Asbestos Litigation

    Get PDF
    Abstract not availabl

    Unlocking the “Virtual Cage” of Wildlife Surveillance

    Get PDF
    The electronic surveillance of wildlife has grown more extensive than ever. For instance, thousands of wolves wear collars transmitting signals to wildlife biologists. Some collars inject wolves with tranquilizers that allow for their immediate capture if they stray outside of the boundaries set by anthropocentric management policies. Hunters have intercepted the signals from surveillance collars and have used this information to track and slaughter the animals. While the ostensible reason for the surveillance programs is to facilitate the peaceful coexistence of humanity and wildlife, the reality is less benign—an outdoor version of Bentham’s Panopticon. This Article reconceptualizes the enterprise of wildlife surveillance. Without suggesting that animals have standing to assert constitutional rights, the Article posits a public interest in protecting the privacy of wildlife. The very notion of wildness implies privacy. The law already protects the bodily integrity of animals to some degree, and a protected zone of privacy is penumbral to this core protection, much the same way that human privacy emanates from narrower guarantees against government intrusion. Policy implications follow that are akin to the rules under the Fourth Amendment limiting the government’s encroachment on human privacy. Just as the police cannot install a wiretap without demonstrating a particularized investigative need for which all less intrusive methods would be insufficient, so too should surveillance of wildlife necessitate a specific showing of urgency. A detached, neutral authority should review all applications for electronic monitoring of wildlife. Violati ons of the rules should result in substantial sanctions. The Article concludes by considering—and refuting—foreseeable objections to heightened requirements for the surveillance of wildlife
    • 

    corecore