19,429 research outputs found
A Broad Evaluation of the Tor English Content Ecosystem
Tor is among most well-known dark net in the world. It has noble uses,
including as a platform for free speech and information dissemination under the
guise of true anonymity, but may be culturally better known as a conduit for
criminal activity and as a platform to market illicit goods and data. Past
studies on the content of Tor support this notion, but were carried out by
targeting popular domains likely to contain illicit content. A survey of past
studies may thus not yield a complete evaluation of the content and use of Tor.
This work addresses this gap by presenting a broad evaluation of the content of
the English Tor ecosystem. We perform a comprehensive crawl of the Tor dark web
and, through topic and network analysis, characterize the types of information
and services hosted across a broad swath of Tor domains and their hyperlink
relational structure. We recover nine domain types defined by the information
or service they host and, among other findings, unveil how some types of
domains intentionally silo themselves from the rest of Tor. We also present
measurements that (regrettably) suggest how marketplaces of illegal drugs and
services do emerge as the dominant type of Tor domain. Our study is the product
of crawling over 1 million pages from 20,000 Tor seed addresses, yielding a
collection of over 150,000 Tor pages. We make a dataset of the intend to make
the domain structure publicly available as a dataset at
https://github.com/wsu-wacs/TorEnglishContent.Comment: 11 page
Integrating e-commerce standards and initiatives in a multi-layered ontology
The proliferation of different standards and joint initiatives for the classification of products and services (UNSPSC, e-cl@ss, RosettaNet, NAICS, SCTG, etc.) reveals that B2B markets have not reached a consensus on the coding systems, on the level of detail of their descriptions, on their granularity, etc. This paper shows how these standards and initiatives, which are built to cover different needs and functionalities, can be integrated in an ontology using a common multi-layered knowledge architecture. This multi-layered ontology will provide a shared understanding of the domain for applications of e-commerce, allowing the information sharing between heterogeneous systems. We will present a method for designing ontologies from these information sources by automatically transforming, integrating and enriching the existing vocabularies with the WebODE platform. As an illustration, we show an example on the computer domain, presenting the relationships between UNSPSC, e-cl@ss, RosettaNet and an electronic catalogue from an e-commerce platform
A Topic Recommender for Journalists
The way in which people acquire information on events and form their own
opinion on them has changed dramatically with the advent of social media. For many
readers, the news gathered from online sources become an opportunity to share points
of view and information within micro-blogging platforms such as Twitter, mainly
aimed at satisfying their communication needs. Furthermore, the need to deepen the
aspects related to news stimulates a demand for additional information which is often
met through online encyclopedias, such as Wikipedia. This behaviour has also
influenced the way in which journalists write their articles, requiring a careful assessment
of what actually interests the readers. The goal of this paper is to present
a recommender system, What to Write and Why, capable of suggesting to a journalist,
for a given event, the aspects still uncovered in news articles on which the
readers focus their interest. The basic idea is to characterize an event according to
the echo it receives in online news sources and associate it with the corresponding
readers’ communicative and informative patterns, detected through the analysis of
Twitter and Wikipedia, respectively. Our methodology temporally aligns the results
of this analysis and recommends the concepts that emerge as topics of interest from
Twitter and Wikipedia, either not covered or poorly covered in the published news
articles
Joint Intermodal and Intramodal Label Transfers for Extremely Rare or Unseen Classes
In this paper, we present a label transfer model from texts to images for
image classification tasks. The problem of image classification is often much
more challenging than text classification. On one hand, labeled text data is
more widely available than the labeled images for classification tasks. On the
other hand, text data tends to have natural semantic interpretability, and they
are often more directly related to class labels. On the contrary, the image
features are not directly related to concepts inherent in class labels. One of
our goals in this paper is to develop a model for revealing the functional
relationships between text and image features as to directly transfer
intermodal and intramodal labels to annotate the images. This is implemented by
learning a transfer function as a bridge to propagate the labels between two
multimodal spaces. However, the intermodal label transfers could be undermined
by blindly transferring the labels of noisy texts to annotate images. To
mitigate this problem, we present an intramodal label transfer process, which
complements the intermodal label transfer by transferring the image labels
instead when relevant text is absent from the source corpus. In addition, we
generalize the inter-modal label transfer to zero-shot learning scenario where
there are only text examples available to label unseen classes of images
without any positive image examples. We evaluate our algorithm on an image
classification task and show the effectiveness with respect to the other
compared algorithms.Comment: The paper has been accepted by IEEE Transactions on Pattern Analysis
and Machine Intelligence. It will apear in a future issu
Connectivity of Natura 2000 forest sites in Europe
Background/Purpose: In the context of the European Biodiversity policy, the
Green Infrastructure Strategy is one supporting tool to mitigate fragmentation,
inter-alia to increase the spatial and functional connectivity between
protected and unprotected areas. The Joint Research Centre has developed an
integrated model to provide a macro-scale set of indices to evaluate the
connectivity of the Natura 2000 network, which forms the backbone of a Green
Infrastructure for Europe. The model allows a wide assessment and comparison to
be performed across countries in terms of structural (spatially connected or
isolated sites) and functional connectivity (least-cost distances between sites
influenced by distribution, distance and land cover).
Main conclusion: The Natura 2000 network in Europe shows differences among
countries in terms of the sizes and numbers of sites, their distribution as
well as distances between sites. Connectivity has been assessed on the basis of
a 500 m average inter-site distance, roads and intensive land use as barrier
effects as well as the presence of "green" corridors. In all countries the
Natura 2000 network is mostly made of sites which are not physically connected.
Highest functional connectivity values are found for Spain, Slovakia, Romania
and Bulgaria. The more natural landscape in Sweden and Finland does not result
in high inter-site network connectivity due to large inter-site distances. The
distribution of subnets with respect to roads explains the higher share of
isolated subnets in Portugal than in Belgium.Comment: 9 pages, from a poster published in F1000Posters 2014, 5: 48
- …