2,706 research outputs found
Integrating E-Commerce and Data Mining: Architecture and Challenges
We show that the e-commerce domain can provide all the right ingredients for
successful data mining and claim that it is a killer domain for data mining. We
describe an integrated architecture, based on our expe-rience at Blue Martini
Software, for supporting this integration. The architecture can dramatically
reduce the pre-processing, cleaning, and data understanding effort often
documented to take 80% of the time in knowledge discovery projects. We
emphasize the need for data collection at the application server layer (not the
web server) in order to support logging of data and metadata that is essential
to the discovery process. We describe the data transformation bridges required
from the transaction processing systems and customer event streams (e.g.,
clickstreams) to the data warehouse. We detail the mining workbench, which
needs to provide multiple views of the data through reporting, data mining
algorithms, visualization, and OLAP. We con-clude with a set of challenges.Comment: KDD workshop: WebKDD 200
Applications of concurrent access patterns in web usage mining
This paper builds on the original data mining and modelling research which has proposed the discovery of novel structural relation patterns, applying the approach in web usage mining. The focus of attention here is on concurrent access patterns (CAP), where an overarching framework illuminates the methodology for web access patterns post-processing. Data pre-processing, pattern discovery and patterns analysis all proceed in association with access patterns mining, CAP mining and CAP modelling. Pruning and selection of access pat-terns takes place as necessary, allowing further CAP mining and modelling to be pursued in the search for the most interesting concurrent access patterns. It is shown that higher level CAPs can be modelled in a way which brings greater structure to bear on the process of knowledge discovery. Experiments with real-world datasets highlight the applicability of the approach in web navigation
Excellent diagnostic characteristics for ultrafast gene profiling of DEFA1-IL1B-LTF in detection of prosthetic joint infections
The timely and exact diagnosis of prosthetic joint infection (PJI) is crucial for surgical decision-making. Intraoperatively, delivery of the result within an hour is required. Alpha-defensin lateral immunoassay of joint fluid (JF) is precise for the intraoperative exclusion of PJI; however, for patients with a limited amount of JF and/or in cases where the JF is bloody, this test is unhelpful. Important information is hidden in periprosthetic tissues that may much better reflect the current status of implant pathology. We therefore investigated the utility of the gene expression patterns of 12 candidate genes (TLR1, -2, -4, -6, and 10, DEFA1, LTF, IL1B, BPI, CRP, IFNG, and DEFB4A) previously associated with infection for detection of PJI in periprosthetic tissues of patients with total joint arthroplasty (TJA) (n = 76) reoperated for PJI (n = 38) or aseptic failure (n = 38), using the ultrafast quantitative reverse transcription-PCR (RT-PCR) Xxpress system (BJS Biotechnologies Ltd.). Advanced data-mining algorithms were applied for data analysis. For PJI, we detected elevated mRNA expression levels of DEFA1 (P < 0.0001), IL1B (P < 0.0001), LTF (P < 0.0001), TLR1 (P = 0.02), and BPI (P = 0.01) in comparison to those in tissues from aseptic cases. A feature selection algorithm revealed that the DEFA1-IL1B-LTF pattern was the most appropriate for detection/exclusion of PJI, achieving 94.5% sensitivity and 95.7% specificity, with likelihood ratios (LRs) for positive and negative results of 16.3 and 0.06, respectively. Taken together, the results show that DEFA1-IL1B-LTF gene expression detection by use of ultrafast qRT-PCR linked to an electronic calculator allows detection of patients with a high probability of PJI within 45 min after sampling. Further testing on a larger cohort of patients is needed.Web of Science5592697268
Is the Stack Distance Between Test Case and Method Correlated With Test Effectiveness?
Mutation testing is a means to assess the effectiveness of a test suite and
its outcome is considered more meaningful than code coverage metrics. However,
despite several optimizations, mutation testing requires a significant
computational effort and has not been widely adopted in industry. Therefore, we
study in this paper whether test effectiveness can be approximated using a more
light-weight approach. We hypothesize that a test case is more likely to detect
faults in methods that are close to the test case on the call stack than in
methods that the test case accesses indirectly through many other methods.
Based on this hypothesis, we propose the minimal stack distance between test
case and method as a new test measure, which expresses how close any test case
comes to a given method, and study its correlation with test effectiveness. We
conducted an empirical study with 21 open-source projects, which comprise in
total 1.8 million LOC, and show that a correlation exists between stack
distance and test effectiveness. The correlation reaches a strength up to 0.58.
We further show that a classifier using the minimal stack distance along with
additional easily computable measures can predict the mutation testing result
of a method with 92.9% precision and 93.4% recall. Hence, such a classifier can
be taken into consideration as a light-weight alternative to mutation testing
or as a preceding, less costly step to that.Comment: EASE 201
Animals, anthropocentrism, and morality analysing the discourse of the animal issue
This dissertation identifies and criticises a fundamental characteristic of the philosophical
discourse surrounding the animal issue: the underlying anthropocentric reasoning that informs
the accounts of both philosophy of mind and moral philosophy. Such reasoning works from
human paradigms as the only possible starting point of the analysis.
Accordingly, the aim of my dissertation is to show how anthropocentric reasoning
and its implications distort the inquiry of the animal debate. In extracting the erroneous biases
from the debate, my project enables an important shift in the starting line of the philosophical
inquiry of the animal issue.
In chapters one and two, I focus on philosophy of mind. I show how philosophical
accounts that are based on anthropocentric a priori reasoning are inattentive to the relevant
empirical findings regarding animals' mental capacities. Employing a conceptual line of
argument, I demonstrate that starting the analysis from a human paradigm creates a rigid
conceptual framework that unjustifiably excludes the possibility of associating the relevant
empirical findings in the research. Furthermore, I show how the common approaches to the
issue of animals' belief and intentions deny that animals can have these capacities, and I
demonstrate how such denials can be avoided.
The philosophical discourse that I examine denies intentional mental capacities to
animals. Such denials take place, I maintain, because the analysis is anthropocentric: it uses
humans' most sophisticated capacities as the only possible benchmark for evaluating animals'
mental abilities. A central example of such anthropocentric reasoning is the oft-mentioned
view that there is a necessary link between language and intentionality. Such a link indeed
characterises humans. Yet the claim that there is no intentionality without language is a
problematic framework for analysing the supposed intentionality of non-linguistic and prelinguistic
creatures. Employing a standard that applies to normal, adult humans excludes the
possibility of animals' intentionality from the outset. It seems, however, that intentionality is a
capacity that evolves in stages, and that simple intentional mental states do not require
language. At the same time, such an analysis ignores, to a large extent, cases of attributing
intentionality to pre-linguistic humans and even normal, adult humans. Thus, I show how the
denial that animals may have intentional mental capacities results in a double standard.
In chapters three to six, I critically examine the anthropocentric nature of the debate
concerning animals' moral status. The anthropocentric reasoning relates to the conditions of
moral status in an oversimplified manner. I show that human prototypes, e.g., rational agency
and autonomy, have mistakenly served as conditions for either moral status in general or of a
particular type. Seemingly, using such conditions excludes from the proffered moral domain
not only animals, but also human moral patients. Yet eventually only animals are excluded
from the proffered moral domain. I identify and criticise the manoeuvre that enables this
outcome. That is, although the proffered conditions are based on individual characteristics of
moral agents, they are applied in a collective manner in order to include human moral patients
in the moral domain under examination. I also show that when animals are granted moral
status, this status appears to be subjugated by human needs and interests, and therefore the
very potential to substantiate animal moral status becomes problematic.
Significantly, I also criticise arguments in favour of animals' moral status, claiming
that they sustain the oversimplified nature of the inquiry, hence reproducing the major
problems of the arguments they were originally designed to refute. As part of my critique
towards both such arguments and anthropocentric reasoning, I suggest a non-anthropocentric
framework that avoids oversimplification with regard to the conditions of moral status. The
aspiration of anthropocentric reasoning as well as of pro-animals philosophers is to find a
common denominator that is allegedly shared by all members of the moral community as the
single foundation of moral status, which consists of individual characteristics. My framework
challenges this aspiration by showing that this common denominator cannot account for all
cases. The framework that I suggest enables establishing moral statuses upon distinctive
foundations, and at the same time, my proposal avoids falling into the trap of speciesism
Measuring extraordinary rendition and international cooperation
Following the launch of the War on Terror, the United States of America established a global rendition network that saw the transfer of US Central Intelligence Agency terrorist suspects to secret detention sites across the world. There has been considerable debate over how many countries participated in rendition, secret detention and interrogation during the post-9/11 period, and conventional accounts of foreign complicity suggest that diverse countries were involved, including many established democracies. However, research on rendition has continually suffered from uncertainty, a lack of data, and systematic empirical evidence due to the secret nature of counterterrorism cooperation. In this article, I argue that it is possible to study the practice of rendition, unlike many other forms of clandestine security cooperation, as it is partially observable. Specifically, suspected extraordinary rendition flight paths can be tracked using publicly available flight data. This article uses the world’s largest set of public flight data relating to rendition to estimate cross-country collaboration in rendition, secret detention and interrogation. The result suggests 307 likely rendition flights and 15 new participating countries beyond the 54 known cases, with cross validation tests demonstrating high levels of model accuracy. </jats:p
Towards an automated classification of spreadsheets
Many spreadsheets in the wild do not have documentation nor categorization associated with them. This makes difficult to apply spreadsheet research that targets specific spreadsheet domains such as financial or database.We introduce with this paper a methodology to automatically classify spreadsheets into different domains. We exploit existing data mining classification algorithms using spreadsheet-specific features. The algorithms were trained and validated with cross-validation using the EUSES corpus, with an up to 89% accuracy. The best algorithm was applied to the larger Enron corpus in order to get some insight from it and to demonstrate the usefulness of this work
- …