16,708 research outputs found

    Combining classifiers to identify online databases

    Full text link

    Passport: Enabling Accurate Country-Level Router Geolocation using Inaccurate Sources

    Full text link
    When does Internet traffic cross international borders? This question has major geopolitical, legal and social implications and is surprisingly difficult to answer. A critical stumbling block is a dearth of tools that accurately map routers traversed by Internet traffic to the countries in which they are located. This paper presents Passport: a new approach for efficient, accurate country-level router geolocation and a system that implements it. Passport provides location predictions with limited active measurements, using machine learning to combine information from IP geolocation databases, router hostnames, whois records, and ping measurements. We show that Passport substantially outperforms existing techniques, and identify cases where paths traverse countries with implications for security, privacy, and performance

    Passport: enabling accurate country-level router geolocation using inaccurate sources

    Full text link
    When does Internet traffic cross international borders? This question has major geopolitical, legal and social implications and is surprisingly difficult to answer. A critical stumbling block is a dearth of tools that accurately map routers traversed by Internet traffic to the countries in which they are located. This paper presents Passport: a new approach for efficient, accurate country-level router geolocation and a system that implements it. Passport provides location predictions with limited active measurements, using machine learning to combine information from IP geolocation databases, router hostnames, whois records, and ping measurements. We show that Passport substantially outperforms existing techniques, and identify cases where paths traverse countries with implications for security, privacy, and performance.First author draf

    Ensemble of Example-Dependent Cost-Sensitive Decision Trees

    Get PDF
    Several real-world classification problems are example-dependent cost-sensitive in nature, where the costs due to misclassification vary between examples and not only within classes. However, standard classification methods do not take these costs into account, and assume a constant cost of misclassification errors. In previous works, some methods that take into account the financial costs into the training of different algorithms have been proposed, with the example-dependent cost-sensitive decision tree algorithm being the one that gives the highest savings. In this paper we propose a new framework of ensembles of example-dependent cost-sensitive decision-trees. The framework consists in creating different example-dependent cost-sensitive decision trees on random subsamples of the training set, and then combining them using three different combination approaches. Moreover, we propose two new cost-sensitive combination approaches; cost-sensitive weighted voting and cost-sensitive stacking, the latter being based on the cost-sensitive logistic regression method. Finally, using five different databases, from four real-world applications: credit card fraud detection, churn modeling, credit scoring and direct marketing, we evaluate the proposed method against state-of-the-art example-dependent cost-sensitive techniques, namely, cost-proportionate sampling, Bayes minimum risk and cost-sensitive decision trees. The results show that the proposed algorithms have better results for all databases, in the sense of higher savings.Comment: 13 pages, 6 figures, Submitted for possible publicatio

    Investigating the impact of combining handwritten signature and keyboard keystroke dynamics for gender prediction

    Get PDF
    © 2019 IEEE. The use of soft-biometric data as an auxiliary tool on user identification is already well known. Gender, handorientation and emotional state are some examples which can be called soft-biometrics. These soft-biometric data can be predicted directly from the biometric templates. It is very common to find researches using physiological modalities for soft-biometric prediction, but behavioural biometric is often not well explored for this context. Among the behavioural biometric modalities, keystroke dynamics and handwriting signature have been widely explored for user identification, including some soft-biometric predictions. However, in these modalities, the soft-biometric prediction is usually done in an individual way. In order to fill this space, this study aims to investigate whether the combination of those two biometric modalities can impact the performance of a soft-biometric data, gender prediction. The main aim is to assess the impact of combining data from two different biometric sources in gender prediction. Our findings indicated gains in terms of performance for gender prediction when combining these two biometric modalities, when compared to the individual ones

    A survey on utilization of data mining approaches for dermatological (skin) diseases prediction

    Get PDF
    Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data

    Automatic human face detection for content-based image annotation

    Get PDF
    In this paper, an automatic human face detection approach using colour analysis is applied for content-based image annotation. In the face detection, the probable face region is detected by adaptive boosting algorithm, and then combined with a colour filtering classifier to enhance the accuracy in face detection. The initial experimental benchmark shows the proposed scheme can be efficiently applied for image annotation with higher fidelity
    • …
    corecore