164,784 research outputs found

    An evaluation of the role of sentiment in second screen microblog search tasks

    Get PDF
    The recent prominence of the real-time web is proving both challenging and disruptive for information retrieval and web data mining research. User-generated content on the real-time web is perhaps best epitomised by content on microblogging platforms, such as Twitter. Given the substantial quantity of microblog posts that may be relevant to a user's query at a point in time, automated methods are required to sift through this information. Sentiment analysis offers a promising direction for modelling microblog content. We build and evaluate a sentiment-based filtering system using real-time user studies. We find a significant role played by sentiment in the search scenarios, observing detrimental effects in filtering out certain sentiment types. We make a series of observations regarding associations between document-level sentiment and user feedback, including associations with user profile attributes, and users' prior topic sentiment

    Applying Web analysis in Web page filtering

    Get PDF
    A machine-based learning approach that combines web content analysis and web structure analysis was proposed. The approach addressed the issue of filtering out irrelevant documents from a set of relevant documents collected from the web. It was found that the proposed approach is useful for vertical search engine development and other web applications. The results show that the approach can be used for web page filtering by effectively applying web content and link analysis.published_or_final_versio

    FilteredWeb: A Framework for the Automated Search-Based Discovery of Blocked URLs

    Full text link
    Various methods have been proposed for creating and maintaining lists of potentially filtered URLs to allow for measurement of ongoing internet censorship around the world. Whilst testing a known resource for evidence of filtering can be relatively simple, given appropriate vantage points, discovering previously unknown filtered web resources remains an open challenge. We present a new framework for automating the process of discovering filtered resources through the use of adaptive queries to well-known search engines. Our system applies information retrieval algorithms to isolate characteristic linguistic patterns in known filtered web pages; these are then used as the basis for web search queries. The results of these queries are then checked for evidence of filtering, and newly discovered filtered resources are fed back into the system to detect further filtered content. Our implementation of this framework, applied to China as a case study, shows that this approach is demonstrably effective at detecting significant numbers of previously unknown filtered web pages, making a significant contribution to the ongoing detection of internet filtering as it develops. Our tool is currently deployed and has been used to discover 1355 domains that are poisoned within China as of Feb 2017 - 30 times more than are contained in the most widely-used public filter list. Of these, 759 are outside of the Alexa Top 1000 domains list, demonstrating the capability of this framework to find more obscure filtered content. Further, our initial analysis of filtered URLs, and the search terms that were used to discover them, gives further insight into the nature of the content currently being blocked in China.Comment: To appear in "Network Traffic Measurement and Analysis Conference 2017" (TMA2017

    Censorship: Filtering Content on the Web

    Get PDF
    The World Wide Web has become a vehicle of free expression for millions of people around the world. It also represents a type of international library with no geographical or physical boundaries, bringing a vast array of information into private homes, schools and businesses. Because the Web allows anyone to post anything at any time, many believe some sort of censorship should be imposed. Censorship of the Web comes in the form of software which filters Web sites, blocking those which publish content deemed unsuitable by those administering the filtering software. Most content filtering software is used on computers in public schools, businesses, and libraries. The goal is to block sites that have no legitimate use in the workplace or in the classroom. These include sites promoting pornography, drugs, gambling, hacking, violence, and spyware among others (Sarrel, 2007)

    Distributed Agents for Web Content Filtering

    Get PDF
    This paper describe Web Content Filtering that aimed to block out offensive material by using DistributedAgents. The proposed system using FCM algorithm and other page's features (Title, Metadata , Warning Message) to classifythe websites (using as candidate) into two types:- white that considered acceptable, and black that contain harmful materialtaking the English Pornographic websites as a case study

    Website Blocked: Filtering Technology in Schools and School Libraries

    Get PDF
    This paper investigates the impact of filtering software in K-12 schools and school libraries. The Children\u27s Internet Protection Act, or CIPA, requires that public schools and school libraries use filtering technology in order to receive discounted rates on technology. As a result, nearly all public elementary and secondary schools today use filtering technology. While the provisions of CIPA narrowly define the content to be blocked, filters are often set to block much more than is required. Filtering technology is often ineffective, and many unobjectionable sites end up being blocked, including Web 2.0 sites and tools needed to educate students in a 21st century learning environment. Filtering software raises other issues as well, such as First Amendment implications, a possible digital divide between students that have unfiltered access to online content at home and those that do not, and the loss of opportunity to educate students on how to be good digital citizens. These issues should be acknowledged and addressed. There are many options available to librarians, educators, administrators, and other stakeholders that can increase students\u27 access to online information and educational tools while still protecting children from inappropriate online content and complying with the requirements of CIPA

    Dynamic Web Content Filtering Based on User's Knowledge

    Get PDF
    This paper focuses on the development of a maintainable information filtering system. The simple and efficient solution to this problem is to block the Web sites by URL, including IP address. However, it is not efficient for unknown Web sites and it is difficult to obtain complete block list. Content based filtering is suggested to overcome this problem as an additional strategy of URL filtering. The manual rule based method is widely applied in current content filtering systems, but they overlook the knowledge acquisition bottleneck problems. To solve this problem, we employed the Multiple Classification Ripple-Down Rules (MCRDR) knowledge acquisition method, which allows the domain expert to maintain the knowledge base without the help of knowledge engineers. Throughout this study, we will prove the MCRDR based information filtering system can easily prevent unknown Web information from being delivered and easily maintain the knowledge base for the filtering system

    A personalized web page content filtering model based on segmentation

    Full text link
    In the view of massive content explosion in World Wide Web through diverse sources, it has become mandatory to have content filtering tools. The filtering of contents of the web pages holds greater significance in cases of access by minor-age people. The traditional web page blocking systems goes by the Boolean methodology of either displaying the full page or blocking it completely. With the increased dynamism in the web pages, it has become a common phenomenon that different portions of the web page holds different types of content at different time instances. This paper proposes a model to block the contents at a fine-grained level i.e. instead of completely blocking the page it would be efficient to block only those segments which holds the contents to be blocked. The advantages of this method over the traditional methods are fine-graining level of blocking and automatic identification of portions of the page to be blocked. The experiments conducted on the proposed model indicate 88% of accuracy in filtering out the segments.Comment: 11 Pages, 6 Figure

    Advanced quantum based neural network classifier and its application for objectionable web content filtering

    Full text link
    © 2013 IEEE. In this paper, an Advanced Quantum-based Neural Network Classifier (AQNN) is proposed. The proposed AQNN is used to form an objectionable Web content filtering system (OWF). The aim is to design a neural network with a few numbers of hidden layer neurons with the optimal connection weights and the threshold of neurons. The proposed algorithm uses the concept of quantum computing and genetic concept to evolve connection weights and the threshold of neurons. Quantum computing uses qubit as a probabilistic representation which is the smallest unit of information in the quantum computing concept. In this algorithm, a threshold boundary parameter is also introduced to find the optimal value of the threshold of neurons. The proposed algorithm forms neural network architecture which is used to form an objectionable Web content filtering system which detects objectionable Web request by the user. To judge the performance of the proposed AQNN, a total of 2000 (1000 objectionable + 1000 non-objectionable) Website's contents have been used. The results of AQNN are also compared with QNN-F and well-known classifiers as backpropagation, support vector machine (SVM), multilayer perceptron, decision tree algorithm, and artificial neural network. The results show that the AQNN as classifier performs better than existing classifiers. The performance of the proposed objectionable Web content filtering system (OWF) is also compared with well-known objectionable Web filtering software and existing models. It is found that the proposed OWF performs better than existing solutions in terms of filtering objectionable content
    corecore