43 research outputs found

    Information Waste on the World Wide Web and Combating the Clutter

    Get PDF
    The Internet has become a critical part of the infrastructure supporting modern life. The high degree of openness and autonomy of information providers determines the access to a vast amount of information on the Internet. However, this makes the web vulnerable to inaccurate, misleading, or outdated information. The unnecessary and unusable content, which is referred to as “information waste,” takes up hardware resources and clutters the web. In this paper, we examine the phenomenon of web information waste by developing a taxonomy of it and analyzing its causes and effects. We then explore possible solutions and propose a classification approach using quantitative metrics for information waste detection

    Applying insights from machine learning towards guidelines for the detection of text-based fake news

    Get PDF
    Web-based technologies have fostered an online environment where information can be disseminated in a fast and cost-effective manner whilst targeting large and diverse audiences. Unfortunately, the rise and evolution of web-based technologies have also created an environment where false information, commonly referred to as “fake news”, spreads rapidly. The effects of this spread can be catastrophic. Finding solutions to the problem of fake news is complicated for a myriad of reasons, such as: what is defined as fake news, the lack of quality datasets available to researchers, the topics covered in such data, and the fact that datasets exist in a variety of languages. The effects of false information dissemination can result in reputational damage, financial damage to affected brands, and ultimately, misinformed online news readers who can make misinformed decisions. The objective of the study is to propose a set of guidelines that can be used by other system developers to implement misinformation detection tools and systems. The guidelines are constructed using findings from the experimentation phase of the project and information uncovered in the literature review conducted as part of the study. A selection of machine and deep learning approaches are examined to test the applicability of cues that could separate fake online articles from real online news articles. Key performance metrics such as precision, recall, accuracy, F1-score, and ROC are used to measure the performance of the selected machine learning and deep learning models. To demonstrate the practicality of the guidelines and allow for reproducibility of the research, each guideline provides background information relating to the identified problem, a solution to the problem through pseudocode, code excerpts using the Python programming language, and points of consideration that may assist with the implementation.Thesis (MA) --Faculty of Engineering, the Built Environment, and Technology, 202

    Applying insights from machine learning towards guidelines for the detection of text-based fake news

    Get PDF
    Web-based technologies have fostered an online environment where information can be disseminated in a fast and cost-effective manner whilst targeting large and diverse audiences. Unfortunately, the rise and evolution of web-based technologies have also created an environment where false information, commonly referred to as “fake news”, spreads rapidly. The effects of this spread can be catastrophic. Finding solutions to the problem of fake news is complicated for a myriad of reasons, such as: what is defined as fake news, the lack of quality datasets available to researchers, the topics covered in such data, and the fact that datasets exist in a variety of languages. The effects of false information dissemination can result in reputational damage, financial damage to affected brands, and ultimately, misinformed online news readers who can make misinformed decisions. The objective of the study is to propose a set of guidelines that can be used by other system developers to implement misinformation detection tools and systems. The guidelines are constructed using findings from the experimentation phase of the project and information uncovered in the literature review conducted as part of the study. A selection of machine and deep learning approaches are examined to test the applicability of cues that could separate fake online articles from real online news articles. Key performance metrics such as precision, recall, accuracy, F1-score, and ROC are used to measure the performance of the selected machine learning and deep learning models. To demonstrate the practicality of the guidelines and allow for reproducibility of the research, each guideline provides background information relating to the identified problem, a solution to the problem through pseudocode, code excerpts using the Python programming language, and points of consideration that may assist with the implementation.Thesis (MA) --Faculty of Engineering, the Built Environment, and Technology, 202

    DESIGN AND EXPLORATION OF NEW MODELS FOR SECURITY AND PRIVACY-SENSITIVE COLLABORATION SYSTEMS

    Get PDF
    Collaboration has been an area of interest in many domains including education, research, healthcare supply chain, Internet of things, and music etc. It enhances problem solving through expertise sharing, ideas sharing, learning and resource sharing, and improved decision making. To address the limitations in the existing literature, this dissertation presents a design science artifact and a conceptual model for collaborative environment. The first artifact is a blockchain based collaborative information exchange system that utilizes blockchain technology and semi-automated ontology mappings to enable secure and interoperable health information exchange among different health care institutions. The conceptual model proposed in this dissertation explores the factors that influences professionals continued use of video- conferencing applications. The conceptual model investigates the role the perceived risks and benefits play in influencing professionals’ attitude towards VC apps and consequently its active and automatic use

    Advanced Machine Learning Techniques and Meta-Heuristic Optimization for the Detection of Masquerading Attacks in Social Networks

    Get PDF
    According to the report published by the online protection firm Iovation in 2012, cyber fraud ranged from 1 percent of the Internet transactions in North America Africa to a 7 percent in Africa, most of them involving credit card fraud, identity theft, and account takeover or h¼acking attempts. This kind of crime is still growing due to the advantages offered by a non face-to-face channel where a increasing number of unsuspecting victims divulges sensitive information. Interpol classifies these illegal activities into 3 types: • Attacks against computer hardware and software. • Financial crimes and corruption. • Abuse, in the form of grooming or “sexploitation”. Most research efforts have been focused on the target of the crime developing different strategies depending on the casuistic. Thus, for the well-known phising, stored blacklist or crime signals through the text are employed eventually designing adhoc detectors hardly conveyed to other scenarios even if the background is widely shared. Identity theft or masquerading can be described as a criminal activity oriented towards the misuse of those stolen credentials to obtain goods or services by deception. On March 4, 2005, a million of personal and sensitive information such as credit card and social security numbers was collected by White Hat hackers at Seattle University who just surfed the Web for less than 60 minutes by means of the Google search engine. As a consequence they proved the vulnerability and lack of protection with a mere group of sophisticated search terms typed in the engine whose large data warehouse still allowed showing company or government websites data temporarily cached. As aforementioned, platforms to connect distant people in which the interaction is undirected pose a forcible entry for unauthorized thirds who impersonate the licit user in a attempt to go unnoticed with some malicious, not necessarily economic, interests. In fact, the last point in the list above regarding abuses has become a major and a terrible risk along with the bullying being both by means of threats, harassment or even self-incrimination likely to drive someone to suicide, depression or helplessness. California Penal Code Section 528.5 states: “Notwithstanding any other provision of law, any person who knowingly and without consent credibly impersonates another actual person through or on an Internet Web site or by other electronic means for purposes of harming, intimidating, threatening, or defrauding another person is guilty of a public offense punishable pursuant to subdivision [...]”. IV Therefore, impersonation consists of any criminal activity in which someone assumes a false identity and acts as his or her assumed character with intent to get a pecuniary benefit or cause some harm. User profiling, in turn, is the process of harvesting user information in order to construct a rich template with all the advantageous attributes in the field at hand and with specific purposes. User profiling is often employed as a mechanism for recommendation of items or useful information which has not yet considered by the client. Nevertheless, deriving user tendency or preferences can be also exploited to define the inherent behavior and address the problem of impersonation by detecting outliers or strange deviations prone to entail a potential attack. This dissertation is meant to elaborate on impersonation attacks from a profiling perspective, eventually developing a 2-stage environment which consequently embraces 2 levels of privacy intrusion, thus providing the following contributions: • The inference of behavioral patterns from the connection time traces aiming at avoiding the usurpation of more confidential information. When compared to previous approaches, this procedure abstains from impinging on the user privacy by taking over the messages content, since it only relies on time statistics of the user sessions rather than on their content. • The application and subsequent discussion of two selected algorithms for the previous point resolution: – A commonly employed supervised algorithm executed as a binary classifier which thereafter has forced us to figure out a method to deal with the absence of labeled instances representing an identity theft. – And a meta-heuristic algorithm in the search for the most convenient parameters to array the instances within a high dimensional space into properly delimited clusters so as to finally apply an unsupervised clustering algorithm. • The analysis of message content encroaching on more private information but easing the user identification by mining discriminative features by Natural Language Processing (NLP) techniques. As a consequence, the development of a new feature extraction algorithm based on linguistic theories motivated by the massive quantity of features often gathered when it comes to texts. In summary, this dissertation means to go beyond typical, ad-hoc approaches adopted by previous identity theft and authorship attribution research. Specifically it proposes tailored solutions to this particular and extensively studied paradigm with the aim at introducing a generic approach from a profiling view, not tightly bound to a unique application field. In addition technical contributions have been made in the course of the solution formulation intending to optimize familiar methods for a better versatility towards the problem at hand. In summary: this Thesis establishes an encouraging research basis towards unveiling subtle impersonation attacks in Social Networks by means of intelligent learning techniques

    Monte Carlo Method with Heuristic Adjustment for Irregularly Shaped Food Product Volume Measurement

    Get PDF
    Volume measurement plays an important role in the production and processing of food products. Various methods have been proposed to measure the volume of food products with irregular shapes based on 3D reconstruction. However, 3D reconstruction comes with a high-priced computational cost. Furthermore, some of the volume measurement methods based on 3D reconstruction have a low accuracy. Another method for measuring volume of objects uses Monte Carlo method. Monte Carlo method performs volume measurements using random points. Monte Carlo method only requires information regarding whether random points fall inside or outside an object and does not require a 3D reconstruction. This paper proposes volume measurement using a computer vision system for irregularly shaped food products without 3D reconstruction based on Monte Carlo method with heuristic adjustment. Five images of food product were captured using five cameras and processed to produce binary images. Monte Carlo integration with heuristic adjustment was performed to measure the volume based on the information extracted from binary images. The experimental results show that the proposed method provided high accuracy and precision compared to the water displacement method. In addition, the proposed method is more accurate and faster than the space carving method
    corecore