Search CORE

19,429 research outputs found

On Intra-page and Inter-page Semantic Analysis of Web Pages

Author: Wang Jicheng
Wang Jun
Wu Gangshan
津田宏
Publication venue: COLIPS PUBLICATIONS
Publication date: 01/01/2003
Field of study

A Broad Evaluation of the Tor English Content Ecosystem

Author: Allahyari Mehdi
Doran Derek
Sadeghi Reza
Zabihimayvan Mahdieh
Publication venue
Publication date: 01/01/2019
Field of study

Tor is among most well-known dark net in the world. It has noble uses, including as a platform for free speech and information dissemination under the guise of true anonymity, but may be culturally better known as a conduit for criminal activity and as a platform to market illicit goods and data. Past studies on the content of Tor support this notion, but were carried out by targeting popular domains likely to contain illicit content. A survey of past studies may thus not yield a complete evaluation of the content and use of Tor. This work addresses this gap by presenting a broad evaluation of the content of the English Tor ecosystem. We perform a comprehensive crawl of the Tor dark web and, through topic and network analysis, characterize the types of information and services hosted across a broad swath of Tor domains and their hyperlink relational structure. We recover nine domain types defined by the information or service they host and, among other findings, unveil how some types of domains intentionally silo themselves from the rest of Tor. We also present measurements that (regrettably) suggest how marketplaces of illegal drugs and services do emerge as the dominant type of Tor domain. Our study is the product of crawling over 1 million pages from 20,000 Tor seed addresses, yielding a collection of over 150,000 Tor pages. We make a dataset of the intend to make the domain structure publicly available as a dataset at https://github.com/wsu-wacs/TorEnglishContent.Comment: 11 page

arXiv.org e-Print Archive

Crossref

CORE

Integrating e-commerce standards and initiatives in a multi-layered ontology

Author: Corcho Oscar
Gómez-Pérez A.
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2001
Field of study

The proliferation of different standards and joint initiatives for the classification of products and services (UNSPSC, e-cl@ss, RosettaNet, NAICS, SCTG, etc.) reveals that B2B markets have not reached a consensus on the coding systems, on the level of detail of their descriptions, on their granularity, etc. This paper shows how these standards and initiatives, which are built to cover different needs and functionalities, can be integrated in an ontology using a common multi-layered knowledge architecture. This multi-layered ontology will provide a shared understanding of the domain for applications of e-commerce, allowing the information sharing between heterogeneous systems. We will present a method for designing ontologies from these information sources by automatically transforming, integrating and enriching the existing vocabularies with the WebODE platform. As an illustration, we show an example on the computer domain, presenting the relationships between UNSPSC, e-cl@ss, RosettaNet and an electronic catalogue from an e-commerce platform

Archivo Digital UPM

A Topic Recommender for Journalists

Author: Cucchiarelli Alessandro
Morbidoni Christian
Stilo Giovanni
Velardi Paola
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

The way in which people acquire information on events and form their own opinion on them has changed dramatically with the advent of social media. For many readers, the news gathered from online sources become an opportunity to share points of view and information within micro-blogging platforms such as Twitter, mainly aimed at satisfying their communication needs. Furthermore, the need to deepen the aspects related to news stimulates a demand for additional information which is often met through online encyclopedias, such as Wikipedia. This behaviour has also influenced the way in which journalists write their articles, requiring a careful assessment of what actually interests the readers. The goal of this paper is to present a recommender system, What to Write and Why, capable of suggesting to a journalist, for a given event, the aspects still uncovered in news articles on which the readers focus their interest. The basic idea is to characterize an event according to the echo it receives in online news sources and associate it with the corresponding readers’ communicative and informative patterns, detected through the analysis of Twitter and Wikipedia, respectively. Our methodology temporally aligns the results of this analysis and recommends the concepts that emerge as topics of interest from Twitter and Wikipedia, either not covered or poorly covered in the published news articles

IRIS UniversitÃ Politecnica delle Marche

Archivio della ricerca- Università di Roma La Sapienza

Joint Intermodal and Intramodal Label Transfers for Extremely Rare or Unseen Classes

Author: Aggarwal Charu
Huang Thomas
Liu Wei
Qi Guo-Jun
Publication venue
Publication date: 22/03/2017
Field of study

In this paper, we present a label transfer model from texts to images for image classification tasks. The problem of image classification is often much more challenging than text classification. On one hand, labeled text data is more widely available than the labeled images for classification tasks. On the other hand, text data tends to have natural semantic interpretability, and they are often more directly related to class labels. On the contrary, the image features are not directly related to concepts inherent in class labels. One of our goals in this paper is to develop a model for revealing the functional relationships between text and image features as to directly transfer intermodal and intramodal labels to annotate the images. This is implemented by learning a transfer function as a bridge to propagate the labels between two multimodal spaces. However, the intermodal label transfers could be undermined by blindly transferring the labels of noisy texts to annotate images. To mitigate this problem, we present an intramodal label transfer process, which complements the intermodal label transfer by transferring the image labels instead when relevant text is absent from the source corpus. In addition, we generalize the inter-modal label transfer to zero-shot learning scenario where there are only text examples available to label unseen classes of images without any positive image examples. We evaluate our algorithm on an image classification task and show the effectiveness with respect to the other compared algorithms.Comment: The paper has been accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence. It will apear in a future issu

arXiv.org e-Print Archive

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Connectivity of Natura 2000 forest sites in Europe

Author: Caudullo Giovanni
de Rigo Daniele
Estreguil Christine
Publication venue
Publication date: 20/06/2014
Field of study

Background/Purpose: In the context of the European Biodiversity policy, the Green Infrastructure Strategy is one supporting tool to mitigate fragmentation, inter-alia to increase the spatial and functional connectivity between protected and unprotected areas. The Joint Research Centre has developed an integrated model to provide a macro-scale set of indices to evaluate the connectivity of the Natura 2000 network, which forms the backbone of a Green Infrastructure for Europe. The model allows a wide assessment and comparison to be performed across countries in terms of structural (spatially connected or isolated sites) and functional connectivity (least-cost distances between sites influenced by distribution, distance and land cover). Main conclusion: The Natura 2000 network in Europe shows differences among countries in terms of the sizes and numbers of sites, their distribution as well as distances between sites. Connectivity has been assessed on the basis of a 500 m average inter-site distance, roads and intensive land use as barrier effects as well as the presence of "green" corridors. In all countries the Natura 2000 network is mostly made of sites which are not physically connected. Highest functional connectivity values are found for Spain, Slovakia, Romania and Bulgaria. The more natural landscape in Sweden and Finland does not result in high inter-site network connectivity due to large inter-site distances. The distribution of subnets with respect to roads explains the higher share of isolated subnets in Portugal than in Belgium.Comment: 9 pages, from a poster published in F1000Posters 2014, 5: 48

arXiv.org e-Print Archive

CiteSeerX

FigShare