164,784 research outputs found
An evaluation of the role of sentiment in second screen microblog search tasks
The recent prominence of the real-time web is proving both challenging and disruptive for information retrieval and web data mining research. User-generated content on the real-time web is perhaps best epitomised by content on microblogging platforms, such as Twitter. Given the substantial quantity of microblog posts that may be relevant to a user's query at a point in time, automated methods are required to sift through this information. Sentiment analysis offers a promising direction for modelling microblog content. We build and evaluate a sentiment-based filtering system using real-time user studies. We find a significant role played by sentiment in the search scenarios, observing detrimental effects in filtering out certain sentiment types. We make a series of observations regarding associations between document-level sentiment and user feedback, including associations with user profile attributes, and users' prior topic sentiment
Applying Web analysis in Web page filtering
A machine-based learning approach that combines web content analysis and web structure analysis was proposed. The approach addressed the issue of filtering out irrelevant documents from a set of relevant documents collected from the web. It was found that the proposed approach is useful for vertical search engine development and other web applications. The results show that the approach can be used for web page filtering by effectively applying web content and link analysis.published_or_final_versio
FilteredWeb: A Framework for the Automated Search-Based Discovery of Blocked URLs
Various methods have been proposed for creating and maintaining lists of
potentially filtered URLs to allow for measurement of ongoing internet
censorship around the world. Whilst testing a known resource for evidence of
filtering can be relatively simple, given appropriate vantage points,
discovering previously unknown filtered web resources remains an open
challenge.
We present a new framework for automating the process of discovering filtered
resources through the use of adaptive queries to well-known search engines. Our
system applies information retrieval algorithms to isolate characteristic
linguistic patterns in known filtered web pages; these are then used as the
basis for web search queries. The results of these queries are then checked for
evidence of filtering, and newly discovered filtered resources are fed back
into the system to detect further filtered content.
Our implementation of this framework, applied to China as a case study, shows
that this approach is demonstrably effective at detecting significant numbers
of previously unknown filtered web pages, making a significant contribution to
the ongoing detection of internet filtering as it develops.
Our tool is currently deployed and has been used to discover 1355 domains
that are poisoned within China as of Feb 2017 - 30 times more than are
contained in the most widely-used public filter list. Of these, 759 are outside
of the Alexa Top 1000 domains list, demonstrating the capability of this
framework to find more obscure filtered content. Further, our initial analysis
of filtered URLs, and the search terms that were used to discover them, gives
further insight into the nature of the content currently being blocked in
China.Comment: To appear in "Network Traffic Measurement and Analysis Conference
2017" (TMA2017
Censorship: Filtering Content on the Web
The World Wide Web has become a vehicle of free expression for millions of people around the world. It also represents a type of international library with no geographical or physical boundaries, bringing a vast array of information into private homes, schools and businesses. Because the Web allows anyone to post anything at any time, many believe some sort of censorship should be imposed. Censorship of the Web comes in the form of software which filters Web sites, blocking those which publish content deemed unsuitable by those administering the filtering software. Most content filtering software is used on computers in public schools, businesses, and libraries. The goal is to block sites that have no legitimate use in the workplace or in the classroom. These include sites promoting pornography, drugs, gambling, hacking, violence, and spyware among others (Sarrel, 2007)
Distributed Agents for Web Content Filtering
This paper describe Web Content Filtering that aimed to block out offensive material by using DistributedAgents. The proposed system using FCM algorithm and other page's features (Title, Metadata , Warning Message) to classifythe websites (using as candidate) into two types:- white that considered acceptable, and black that contain harmful materialtaking the English Pornographic websites as a case study
Website Blocked: Filtering Technology in Schools and School Libraries
This paper investigates the impact of filtering software in K-12 schools and school libraries. The Children\u27s Internet Protection Act, or CIPA, requires that public schools and school libraries use filtering technology in order to receive discounted rates on technology. As a result, nearly all public elementary and secondary schools today use filtering technology. While the provisions of CIPA narrowly define the content to be blocked, filters are often set to block much more than is required. Filtering technology is often ineffective, and many unobjectionable sites end up being blocked, including Web 2.0 sites and tools needed to educate students in a 21st century learning environment. Filtering software raises other issues as well, such as First Amendment implications, a possible digital divide between students that have unfiltered access to online content at home and those that do not, and the loss of opportunity to educate students on how to be good digital citizens. These issues should be acknowledged and addressed. There are many options available to librarians, educators, administrators, and other stakeholders that can increase students\u27 access to online information and educational tools while still protecting children from inappropriate online content and complying with the requirements of CIPA
Dynamic Web Content Filtering Based on User's Knowledge
This paper focuses on the development of a maintainable information filtering system. The simple and efficient solution to this problem is to block the Web sites by URL, including IP address. However, it is not efficient for unknown Web sites and it is difficult to obtain complete block list. Content based filtering is suggested to overcome this problem as an additional strategy of URL filtering. The manual rule based method is widely applied in current content filtering systems, but they overlook the knowledge acquisition bottleneck problems. To solve this problem, we employed the Multiple Classification Ripple-Down Rules (MCRDR) knowledge acquisition method, which allows the domain expert to maintain the knowledge base without the help of knowledge engineers. Throughout this study, we will prove the MCRDR based information filtering system can easily prevent unknown Web information from being delivered and easily maintain the knowledge base for the filtering system
A personalized web page content filtering model based on segmentation
In the view of massive content explosion in World Wide Web through diverse
sources, it has become mandatory to have content filtering tools. The filtering
of contents of the web pages holds greater significance in cases of access by
minor-age people. The traditional web page blocking systems goes by the Boolean
methodology of either displaying the full page or blocking it completely. With
the increased dynamism in the web pages, it has become a common phenomenon that
different portions of the web page holds different types of content at
different time instances. This paper proposes a model to block the contents at
a fine-grained level i.e. instead of completely blocking the page it would be
efficient to block only those segments which holds the contents to be blocked.
The advantages of this method over the traditional methods are fine-graining
level of blocking and automatic identification of portions of the page to be
blocked. The experiments conducted on the proposed model indicate 88% of
accuracy in filtering out the segments.Comment: 11 Pages, 6 Figure
Advanced quantum based neural network classifier and its application for objectionable web content filtering
© 2013 IEEE. In this paper, an Advanced Quantum-based Neural Network Classifier (AQNN) is proposed. The proposed AQNN is used to form an objectionable Web content filtering system (OWF). The aim is to design a neural network with a few numbers of hidden layer neurons with the optimal connection weights and the threshold of neurons. The proposed algorithm uses the concept of quantum computing and genetic concept to evolve connection weights and the threshold of neurons. Quantum computing uses qubit as a probabilistic representation which is the smallest unit of information in the quantum computing concept. In this algorithm, a threshold boundary parameter is also introduced to find the optimal value of the threshold of neurons. The proposed algorithm forms neural network architecture which is used to form an objectionable Web content filtering system which detects objectionable Web request by the user. To judge the performance of the proposed AQNN, a total of 2000 (1000 objectionable + 1000 non-objectionable) Website's contents have been used. The results of AQNN are also compared with QNN-F and well-known classifiers as backpropagation, support vector machine (SVM), multilayer perceptron, decision tree algorithm, and artificial neural network. The results show that the AQNN as classifier performs better than existing classifiers. The performance of the proposed objectionable Web content filtering system (OWF) is also compared with well-known objectionable Web filtering software and existing models. It is found that the proposed OWF performs better than existing solutions in terms of filtering objectionable content
- …