46,761 research outputs found
Smart Content Recognition from Images Using a Mixture of Convolutional Neural Networks
With rapid development of the Internet, web contents become huge. Most of the
websites are publicly available, and anyone can access the contents from
anywhere such as workplace, home and even schools. Nevertheless, not all the
web contents are appropriate for all users, especially children. An example of
these contents is pornography images which should be restricted to certain age
group. Besides, these images are not safe for work (NSFW) in which employees
should not be seen accessing such contents during work. Recently, convolutional
neural networks have been successfully applied to many computer vision
problems. Inspired by these successes, we propose a mixture of convolutional
neural networks for adult content recognition. Unlike other works, our method
is formulated on a weighted sum of multiple deep neural network models. The
weights of each CNN models are expressed as a linear regression problem learned
using Ordinary Least Squares (OLS). Experimental results demonstrate that the
proposed model outperforms both single CNN model and the average sum of CNN
models in adult content recognition.Comment: To be published in LNEE, Code: github.com/mundher/NSF
Efficient filtering of adult content using textual information
Nowadays adult content represents a non negligible proportion of the Web
content. It is of the utmost importance to protect children from this content.
Search engines, as an entry point for Web navigation are ideally placed to deal
with this issue.
In this paper, we propose a method that builds a safe index i.e.
adult-content free for search engines. This method is based on a filter that
uses only textual information from the web page and the associated URL
Link Graph Analysis for Adult Images Classification
In order to protect an image search engine's users from undesirable results
adult images' classifier should be built. The information about links from
websites to images is employed to create such a classifier. These links are
represented as a bipartite website-image graph. Each vertex is equipped with
scores of adultness and decentness. The scores for image vertexes are
initialized with zero, those for website vertexes are initialized according to
a text-based website classifier. An iterative algorithm that propagates scores
within a website-image graph is described. The scores obtained are used to
classify images by choosing an appropriate threshold. The experiments on
Internet-scale data have shown that the algorithm under consideration increases
classification recall by 17% in comparison with a simple algorithm which
classifies an image as adult if it is connected with at least one adult site
(at the same precision level).Comment: 7 pages. Young Scientists Conference, 4th Russian Summer School in
Information Retrieva
Automated Discovery of Internet Censorship by Web Crawling
Censorship of the Internet is widespread around the world. As access to the
web becomes increasingly ubiquitous, filtering of this resource becomes more
pervasive. Transparency about specific content that citizens are denied access
to is atypical. To counter this, numerous techniques for maintaining URL filter
lists have been proposed by various individuals and organisations that aim to
empirical data on censorship for benefit of the public and wider censorship
research community.
We present a new approach for discovering filtered domains in different
countries. This method is fully automated and requires no human interaction.
The system uses web crawling techniques to traverse between filtered sites and
implements a robust method for determining if a domain is filtered. We
demonstrate the effectiveness of the approach by running experiments to search
for filtered content in four different censorship regimes. Our results show
that we perform better than the current state of the art and have built domain
filter lists an order of magnitude larger than the most widely available public
lists as of Jan 2018. Further, we build a dataset mapping the interlinking
nature of blocked content between domains and exhibit the tightly networked
nature of censored web resources
Internet Filters: A Public Policy Report (Second edition; fully revised and updated)
No sooner was the Internet upon us than anxiety arose over the ease of accessing pornography and other controversial content. In response, entrepreneurs soon developed filtering products. By the end of the decade, a new industry had emerged to create and market Internet filters....Yet filters were highly imprecise from the beginning. The sheer size of the Internet meant that identifying potentially offensive content had to be done mechanically, by matching "key" words and phrases; hence, the blocking of Web sites for "Middlesex County," or words such as "magna cum laude". Internet filters are crude and error-prone because they categorize expression without regard to its context, meaning, and value. Yet these sweeping censorship tools are now widely used in companies, homes, schools, and libraries. Internet filters remain a pressing public policy issue to all those concerned about free expression, education, culture, and democracy. This fully revised and updated report surveys tests and studies of Internet filtering products from the mid-1990s through 2006. It provides an essential resource for the ongoing debate
- …