16,708 research outputs found
Passport: Enabling Accurate Country-Level Router Geolocation using Inaccurate Sources
When does Internet traffic cross international borders? This question has
major geopolitical, legal and social implications and is surprisingly difficult
to answer. A critical stumbling block is a dearth of tools that accurately map
routers traversed by Internet traffic to the countries in which they are
located. This paper presents Passport: a new approach for efficient, accurate
country-level router geolocation and a system that implements it. Passport
provides location predictions with limited active measurements, using machine
learning to combine information from IP geolocation databases, router
hostnames, whois records, and ping measurements. We show that Passport
substantially outperforms existing techniques, and identify cases where paths
traverse countries with implications for security, privacy, and performance
Passport: enabling accurate country-level router geolocation using inaccurate sources
When does Internet traffic cross international borders? This question has major geopolitical, legal and social implications and is surprisingly difficult to answer. A critical stumbling block is a dearth of tools that accurately map routers traversed by Internet traffic to the countries in which they are located. This paper presents Passport: a new approach for efficient, accurate country-level router geolocation and a system that implements it. Passport provides location predictions with limited active measurements, using machine learning to combine information from IP geolocation databases, router hostnames, whois records, and ping measurements. We show that Passport substantially outperforms existing techniques, and identify cases where paths traverse countries with implications for security, privacy, and performance.First author draf
Ensemble of Example-Dependent Cost-Sensitive Decision Trees
Several real-world classification problems are example-dependent
cost-sensitive in nature, where the costs due to misclassification vary between
examples and not only within classes. However, standard classification methods
do not take these costs into account, and assume a constant cost of
misclassification errors. In previous works, some methods that take into
account the financial costs into the training of different algorithms have been
proposed, with the example-dependent cost-sensitive decision tree algorithm
being the one that gives the highest savings. In this paper we propose a new
framework of ensembles of example-dependent cost-sensitive decision-trees. The
framework consists in creating different example-dependent cost-sensitive
decision trees on random subsamples of the training set, and then combining
them using three different combination approaches. Moreover, we propose two new
cost-sensitive combination approaches; cost-sensitive weighted voting and
cost-sensitive stacking, the latter being based on the cost-sensitive logistic
regression method. Finally, using five different databases, from four
real-world applications: credit card fraud detection, churn modeling, credit
scoring and direct marketing, we evaluate the proposed method against
state-of-the-art example-dependent cost-sensitive techniques, namely,
cost-proportionate sampling, Bayes minimum risk and cost-sensitive decision
trees. The results show that the proposed algorithms have better results for
all databases, in the sense of higher savings.Comment: 13 pages, 6 figures, Submitted for possible publicatio
Recommended from our members
Multi-class protein fold classification using a new ensemble machine learning approach.
Protein structure classification represents an important process in understanding the associations
between sequence and structure as well as possible functional and evolutionary relationships.
Recent structural genomics initiatives and other high-throughput experiments have populated the
biological databases at a rapid pace. The amount of structural data has made traditional methods
such as manual inspection of the protein structure become impossible. Machine learning has been
widely applied to bioinformatics and has gained a lot of success in this research area. This work
proposes a novel ensemble machine learning method that improves the coverage of the classifiers
under the multi-class imbalanced sample sets by integrating knowledge induced from different base
classifiers, and we illustrate this idea in classifying multi-class SCOP protein fold data. We have
compared our approach with PART and show that our method improves the sensitivity of the
classifier in protein fold classification. Furthermore, we have extended this method to learning over
multiple data types, preserving the independence of their corresponding data sources, and show
that our new approach performs at least as well as the traditional technique over a single joined
data source. These experimental results are encouraging, and can be applied to other bioinformatics
problems similarly characterised by multi-class imbalanced data sets held in multiple data
sources
Investigating the impact of combining handwritten signature and keyboard keystroke dynamics for gender prediction
© 2019 IEEE. The use of soft-biometric data as an auxiliary tool on user identification is already well known. Gender, handorientation and emotional state are some examples which can be called soft-biometrics. These soft-biometric data can be predicted directly from the biometric templates. It is very common to find researches using physiological modalities for soft-biometric prediction, but behavioural biometric is often not well explored for this context. Among the behavioural biometric modalities, keystroke dynamics and handwriting signature have been widely explored for user identification, including some soft-biometric predictions. However, in these modalities, the soft-biometric prediction is usually done in an individual way. In order to fill this space, this study aims to investigate whether the combination of those two biometric modalities can impact the performance of a soft-biometric data, gender prediction. The main aim is to assess the impact of combining data from two different biometric sources in gender prediction. Our findings indicated gains in terms of performance for gender prediction when combining these two biometric modalities, when compared to the individual ones
A survey on utilization of data mining approaches for dermatological (skin) diseases prediction
Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data
Automatic human face detection for content-based image annotation
In this paper, an automatic human face detection approach using colour analysis is applied for content-based image annotation. In the face detection, the probable face region is detected by adaptive boosting algorithm, and then combined with a colour filtering classifier to enhance the accuracy in face detection. The initial experimental benchmark shows the proposed scheme can be efficiently applied for image annotation with higher fidelity
- …