Search CORE

2,706 research outputs found

Integrating E-Commerce and Data Mining: Architecture and Challenges

Author: Ansari Suhail
Kohavi Ron
Mason Llew
Zheng Zijian
Publication venue
Publication date: 01/01/2000
Field of study

We show that the e-commerce domain can provide all the right ingredients for successful data mining and claim that it is a killer domain for data mining. We describe an integrated architecture, based on our expe-rience at Blue Martini Software, for supporting this integration. The architecture can dramatically reduce the pre-processing, cleaning, and data understanding effort often documented to take 80% of the time in knowledge discovery projects. We emphasize the need for data collection at the application server layer (not the web server) in order to support logging of data and metadata that is essential to the discovery process. We describe the data transformation bridges required from the transaction processing systems and customer event streams (e.g., clickstreams) to the data warehouse. We detail the mining workbench, which needs to provide multiple views of the data through reporting, data mining algorithms, visualization, and OLAP. We con-clude with a set of challenges.Comment: KDD workshop: WebKDD 200

arXiv.org e-Print Archive

CiteSeerX

Sequential Fault Diagnosis in Combinational Networks

Author: Kohavi
Koren
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Applications of concurrent access patterns in web usage mining

Author: B. Liu
J. Lu
J. Lu
J. Lu
J. Pei
J. Srivastava
R. Agrawal
R. Kohavi
Publication venue
Publication date: 01/01/2013
Field of study

This paper builds on the original data mining and modelling research which has proposed the discovery of novel structural relation patterns, applying the approach in web usage mining. The focus of attention here is on concurrent access patterns (CAP), where an overarching framework illuminates the methodology for web access patterns post-processing. Data pre-processing, pattern discovery and patterns analysis all proceed in association with access patterns mining, CAP mining and CAP modelling. Pruning and selection of access pat-terns takes place as necessary, allowing further CAP mining and modelling to be pursued in the search for the most interesting concurrent access patterns. It is shown that higher level CAPs can be modelled in a way which brings greater structure to bear on the process of knowledge discovery. Experiments with real-world datasets highlight the applicability of the approach in web navigation

Crossref

University of Bedfordshire Repository

Excellent diagnostic characteristics for ultrafast gene profiling of DEFA1-IL1B-LTF in detection of prosthetic joint infections

Author: Demuth HB
Gallo J
Garvin KL
Kohavi R
Macskassy SA
Mirra JM
Schmalzried TP
Tománková T
Trampuz A
Publication venue: 'American Society for Microbiology'
Publication date: 01/01/2017
Field of study

The timely and exact diagnosis of prosthetic joint infection (PJI) is crucial for surgical decision-making. Intraoperatively, delivery of the result within an hour is required. Alpha-defensin lateral immunoassay of joint fluid (JF) is precise for the intraoperative exclusion of PJI; however, for patients with a limited amount of JF and/or in cases where the JF is bloody, this test is unhelpful. Important information is hidden in periprosthetic tissues that may much better reflect the current status of implant pathology. We therefore investigated the utility of the gene expression patterns of 12 candidate genes (TLR1, -2, -4, -6, and 10, DEFA1, LTF, IL1B, BPI, CRP, IFNG, and DEFB4A) previously associated with infection for detection of PJI in periprosthetic tissues of patients with total joint arthroplasty (TJA) (n = 76) reoperated for PJI (n = 38) or aseptic failure (n = 38), using the ultrafast quantitative reverse transcription-PCR (RT-PCR) Xxpress system (BJS Biotechnologies Ltd.). Advanced data-mining algorithms were applied for data analysis. For PJI, we detected elevated mRNA expression levels of DEFA1 (P < 0.0001), IL1B (P < 0.0001), LTF (P < 0.0001), TLR1 (P = 0.02), and BPI (P = 0.01) in comparison to those in tissues from aseptic cases. A feature selection algorithm revealed that the DEFA1-IL1B-LTF pattern was the most appropriate for detection/exclusion of PJI, achieving 94.5% sensitivity and 95.7% specificity, with likelihood ratios (LRs) for positive and negative results of 16.3 and 0.06, respectively. Taken together, the results show that DEFA1-IL1B-LTF gene expression detection by use of ultrafast qRT-PCR linked to an electronic calculator allows detection of patients with a high probability of PJI within 45 min after sampling. Further testing on a larger cohort of patients is needed.Web of Science5592697268

Crossref

DSpace at VSB Technical University of Ostrava

Is the Stack Distance Between Test Case and Method Correlated With Test Effectiveness?

Author: Acree Allen Troy
Chawla Nitesh V
Jefferson Offutt A
Ji Changbin
Kohavi Ron
Marko Ivanković Goran Petrović
Niedermayr Rainer
Schuler David
Strug Joanna
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/03/2019
Field of study

Mutation testing is a means to assess the effectiveness of a test suite and its outcome is considered more meaningful than code coverage metrics. However, despite several optimizations, mutation testing requires a significant computational effort and has not been widely adopted in industry. Therefore, we study in this paper whether test effectiveness can be approximated using a more light-weight approach. We hypothesize that a test case is more likely to detect faults in methods that are close to the test case on the call stack than in methods that the test case accesses indirectly through many other methods. Based on this hypothesis, we propose the minimal stack distance between test case and method as a new test measure, which expresses how close any test case comes to a given method, and study its correlation with test effectiveness. We conducted an empirical study with 21 open-source projects, which comprise in total 1.8 million LOC, and show that a correlation exists between stack distance and test effectiveness. The correlation reaches a strength up to 0.58. We further show that a classifier using the minimal stack distance along with additional easily computable measures can predict the mutation testing result of a method with 92.9% precision and 93.4% recall. Hence, such a classifier can be taken into consideration as a light-weight alternative to mutation testing or as a preceding, less costly step to that.Comment: EASE 201

arXiv.org e-Print Archive

Crossref

Animals, anthropocentrism, and morality analysing the discourse of the animal issue

Author: Kohavi Zohar
Publication venue: The University of Edinburgh
Publication date: 27/06/2007
Field of study

This dissertation identifies and criticises a fundamental characteristic of the philosophical discourse surrounding the animal issue: the underlying anthropocentric reasoning that informs the accounts of both philosophy of mind and moral philosophy. Such reasoning works from human paradigms as the only possible starting point of the analysis. Accordingly, the aim of my dissertation is to show how anthropocentric reasoning and its implications distort the inquiry of the animal debate. In extracting the erroneous biases from the debate, my project enables an important shift in the starting line of the philosophical inquiry of the animal issue. In chapters one and two, I focus on philosophy of mind. I show how philosophical accounts that are based on anthropocentric a priori reasoning are inattentive to the relevant empirical findings regarding animals' mental capacities. Employing a conceptual line of argument, I demonstrate that starting the analysis from a human paradigm creates a rigid conceptual framework that unjustifiably excludes the possibility of associating the relevant empirical findings in the research. Furthermore, I show how the common approaches to the issue of animals' belief and intentions deny that animals can have these capacities, and I demonstrate how such denials can be avoided. The philosophical discourse that I examine denies intentional mental capacities to animals. Such denials take place, I maintain, because the analysis is anthropocentric: it uses humans' most sophisticated capacities as the only possible benchmark for evaluating animals' mental abilities. A central example of such anthropocentric reasoning is the oft-mentioned view that there is a necessary link between language and intentionality. Such a link indeed characterises humans. Yet the claim that there is no intentionality without language is a problematic framework for analysing the supposed intentionality of non-linguistic and prelinguistic creatures. Employing a standard that applies to normal, adult humans excludes the possibility of animals' intentionality from the outset. It seems, however, that intentionality is a capacity that evolves in stages, and that simple intentional mental states do not require language. At the same time, such an analysis ignores, to a large extent, cases of attributing intentionality to pre-linguistic humans and even normal, adult humans. Thus, I show how the denial that animals may have intentional mental capacities results in a double standard. In chapters three to six, I critically examine the anthropocentric nature of the debate concerning animals' moral status. The anthropocentric reasoning relates to the conditions of moral status in an oversimplified manner. I show that human prototypes, e.g., rational agency and autonomy, have mistakenly served as conditions for either moral status in general or of a particular type. Seemingly, using such conditions excludes from the proffered moral domain not only animals, but also human moral patients. Yet eventually only animals are excluded from the proffered moral domain. I identify and criticise the manoeuvre that enables this outcome. That is, although the proffered conditions are based on individual characteristics of moral agents, they are applied in a collective manner in order to include human moral patients in the moral domain under examination. I also show that when animals are granted moral status, this status appears to be subjugated by human needs and interests, and therefore the very potential to substantiate animal moral status becomes problematic. Significantly, I also criticise arguments in favour of animals' moral status, claiming that they sustain the oversimplified nature of the inquiry, hence reproducing the major problems of the arguments they were originally designed to refute. As part of my critique towards both such arguments and anthropocentric reasoning, I suggest a non-anthropocentric framework that avoids oversimplification with regard to the conditions of moral status. The aspiration of anthropocentric reasoning as well as of pro-animals philosophers is to find a common denominator that is allegedly shared by all members of the moral community as the single foundation of moral status, which consists of individual characteristics. My framework challenges this aspiration by showing that this common denominator cannot account for all cases. The framework that I suggest enables establishing moral statuses upon distinctive foundations, and at the same time, my proposal avoids falling into the trap of speciesism

Edinburgh Research Archive

Measuring extraordinary rendition and international cooperation

Author: Council of Europe
Fitzpatrick J
Gu XS
Kohavi R
Urry J
Publication venue: 'SAGE Publications'
Publication date: 31/01/2017
Field of study

Following the launch of the War on Terror, the United States of America established a global rendition network that saw the transfer of US Central Intelligence Agency terrorist suspects to secret detention sites across the world. There has been considerable debate over how many countries participated in rendition, secret detention and interrogation during the post-9/11 period, and conventional accounts of foreign complicity suggest that diverse countries were involved, including many established democracies. However, research on rendition has continually suffered from uncertainty, a lack of data, and systematic empirical evidence due to the secret nature of counterterrorism cooperation. In this article, I argue that it is possible to study the practice of rendition, unlike many other forms of clandestine security cooperation, as it is partially observable. Specifically, suspected extraordinary rendition flight paths can be tracked using publicly available flight data. This article uses the world’s largest set of public flight data relating to rendition to estimate cross-country collaboration in rendition, secret detention and interrogation. The result suggests 307 likely rendition flights and 15 new participating countries beyond the 54 known cases, with cross validation tests demonstrating high levels of model accuracy. </jats:p

University of Essex Research Repository

Crossref

Feature Selection for Clustering

Author: R. Kohavi
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Towards an automated classification of spreadsheets

Author: D Jannach
IH Witten
M Hall
R Kohavi
R Quinlan
Y Yusof
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Many spreadsheets in the wild do not have documentation nor categorization associated with them. This makes difficult to apply spreadsheet research that targets specific spreadsheet domains such as financial or database.We introduce with this paper a methodology to automatically classify spreadsheets into different domains. We exploit existing data mining classification algorithms using spreadsheet-specific features. The algorithms were trained and validated with cross-validation using the EUSES corpus, with an up to 89% accuracy. The best algorithm was applied to the larger Enron corpus in order to get some insight from it and to demonstrate the usefulness of this work

Universidade do Minho: RepositoriUM

Crossref