43 research outputs found
Information Waste on the World Wide Web and Combating the Clutter
The Internet has become a critical part of the infrastructure supporting modern life. The high degree of openness and autonomy of information providers determines the access to a vast amount of information on the Internet. However, this makes the web vulnerable to inaccurate, misleading, or outdated information. The unnecessary and unusable content, which is referred to as “information waste,” takes up hardware resources and clutters the web. In this paper, we examine the phenomenon of web information waste by developing a taxonomy of it and analyzing its causes and effects. We then explore possible solutions and propose a classification approach using quantitative metrics for information waste detection
Applying insights from machine learning towards guidelines for the detection of text-based fake news
Web-based technologies have fostered an online environment where information can be disseminated in a fast and cost-effective manner whilst targeting large and diverse audiences. Unfortunately, the rise and evolution of web-based technologies have also created an environment where false information, commonly referred to as “fake news”, spreads rapidly. The effects of this spread can be catastrophic. Finding solutions to the problem of fake news is complicated for a myriad of reasons, such as: what is defined as fake news, the lack of quality datasets available to researchers, the topics covered in such data, and the fact that datasets exist in a variety of languages. The effects of false information dissemination can result in reputational damage, financial damage to affected brands, and ultimately, misinformed online news readers who can make misinformed decisions. The objective of the study is to propose a set of guidelines that can be used by other system developers to implement misinformation detection tools and systems. The guidelines are constructed using findings from the experimentation phase of the project and information uncovered in the literature review conducted as part of the study. A selection of machine and deep learning approaches are examined to test the applicability of cues that could separate fake online articles from real online news articles. Key performance metrics such as precision, recall, accuracy, F1-score, and ROC are used to measure the performance of the selected machine learning and deep learning models. To demonstrate the practicality of the guidelines and allow for reproducibility of the research, each guideline provides background information relating to the identified problem, a solution to the problem through pseudocode, code excerpts using the Python programming language, and points of consideration that may assist with the implementation.Thesis (MA) --Faculty of Engineering, the Built Environment, and Technology, 202
Applying insights from machine learning towards guidelines for the detection of text-based fake news
Web-based technologies have fostered an online environment where information can be disseminated in a fast and cost-effective manner whilst targeting large and diverse audiences. Unfortunately, the rise and evolution of web-based technologies have also created an environment where false information, commonly referred to as “fake news”, spreads rapidly. The effects of this spread can be catastrophic. Finding solutions to the problem of fake news is complicated for a myriad of reasons, such as: what is defined as fake news, the lack of quality datasets available to researchers, the topics covered in such data, and the fact that datasets exist in a variety of languages. The effects of false information dissemination can result in reputational damage, financial damage to affected brands, and ultimately, misinformed online news readers who can make misinformed decisions. The objective of the study is to propose a set of guidelines that can be used by other system developers to implement misinformation detection tools and systems. The guidelines are constructed using findings from the experimentation phase of the project and information uncovered in the literature review conducted as part of the study. A selection of machine and deep learning approaches are examined to test the applicability of cues that could separate fake online articles from real online news articles. Key performance metrics such as precision, recall, accuracy, F1-score, and ROC are used to measure the performance of the selected machine learning and deep learning models. To demonstrate the practicality of the guidelines and allow for reproducibility of the research, each guideline provides background information relating to the identified problem, a solution to the problem through pseudocode, code excerpts using the Python programming language, and points of consideration that may assist with the implementation.Thesis (MA) --Faculty of Engineering, the Built Environment, and Technology, 202
DESIGN AND EXPLORATION OF NEW MODELS FOR SECURITY AND PRIVACY-SENSITIVE COLLABORATION SYSTEMS
Collaboration has been an area of interest in many domains including education, research, healthcare supply chain, Internet of things, and music etc. It enhances problem solving through expertise sharing, ideas sharing, learning and resource sharing, and improved decision making.
To address the limitations in the existing literature, this dissertation presents a design science artifact and a conceptual model for collaborative environment. The first artifact is a blockchain based collaborative information exchange system that utilizes blockchain technology and semi-automated ontology mappings to enable secure and interoperable health information exchange among different health care institutions. The conceptual model proposed in this dissertation explores the factors that influences professionals continued use of video- conferencing applications. The conceptual model investigates the role the perceived risks and benefits play in influencing professionals’ attitude towards VC apps and consequently its active and automatic use
Advanced Machine Learning Techniques and Meta-Heuristic Optimization for the Detection of Masquerading Attacks in Social Networks
According to the report published by the online protection firm Iovation in 2012,
cyber fraud ranged from 1 percent of the Internet transactions in North America
Africa to a 7 percent in Africa, most of them involving credit card fraud, identity
theft, and account takeover or hÂĽacking attempts. This kind of crime is still growing
due to the advantages offered by a non face-to-face channel where a increasing
number of unsuspecting victims divulges sensitive information. Interpol classifies
these illegal activities into 3 types:
• Attacks against computer hardware and software.
• Financial crimes and corruption.
• Abuse, in the form of grooming or “sexploitation”.
Most research efforts have been focused on the target of the crime developing different
strategies depending on the casuistic. Thus, for the well-known phising, stored
blacklist or crime signals through the text are employed eventually designing adhoc
detectors hardly conveyed to other scenarios even if the background is widely
shared. Identity theft or masquerading can be described as a criminal activity oriented
towards the misuse of those stolen credentials to obtain goods or services by
deception. On March 4, 2005, a million of personal and sensitive information such
as credit card and social security numbers was collected by White Hat hackers at
Seattle University who just surfed the Web for less than 60 minutes by means of
the Google search engine. As a consequence they proved the vulnerability and lack
of protection with a mere group of sophisticated search terms typed in the engine
whose large data warehouse still allowed showing company or government websites
data temporarily cached.
As aforementioned, platforms to connect distant people in which the interaction is
undirected pose a forcible entry for unauthorized thirds who impersonate the licit
user in a attempt to go unnoticed with some malicious, not necessarily economic,
interests. In fact, the last point in the list above regarding abuses has become a
major and a terrible risk along with the bullying being both by means of threats,
harassment or even self-incrimination likely to drive someone to suicide, depression
or helplessness. California Penal Code Section 528.5 states:
“Notwithstanding any other provision of law, any person who knowingly
and without consent credibly impersonates another actual person through
or on an Internet Web site or by other electronic means for purposes of
harming, intimidating, threatening, or defrauding another person is guilty
of a public offense punishable pursuant to subdivision [...]”.
IV
Therefore, impersonation consists of any criminal activity in which someone assumes
a false identity and acts as his or her assumed character with intent to get
a pecuniary benefit or cause some harm. User profiling, in turn, is the process of
harvesting user information in order to construct a rich template with all the advantageous
attributes in the field at hand and with specific purposes. User profiling is
often employed as a mechanism for recommendation of items or useful information
which has not yet considered by the client. Nevertheless, deriving user tendency or
preferences can be also exploited to define the inherent behavior and address the
problem of impersonation by detecting outliers or strange deviations prone to entail
a potential attack.
This dissertation is meant to elaborate on impersonation attacks from a profiling
perspective, eventually developing a 2-stage environment which consequently embraces
2 levels of privacy intrusion, thus providing the following contributions:
• The inference of behavioral patterns from the connection time traces aiming at
avoiding the usurpation of more confidential information. When compared to
previous approaches, this procedure abstains from impinging on the user privacy
by taking over the messages content, since it only relies on time statistics
of the user sessions rather than on their content.
• The application and subsequent discussion of two selected algorithms for the
previous point resolution:
– A commonly employed supervised algorithm executed as a binary classifier
which thereafter has forced us to figure out a method to deal with the
absence of labeled instances representing an identity theft.
– And a meta-heuristic algorithm in the search for the most convenient parameters
to array the instances within a high dimensional space into properly
delimited clusters so as to finally apply an unsupervised clustering
algorithm.
• The analysis of message content encroaching on more private information but
easing the user identification by mining discriminative features by Natural
Language Processing (NLP) techniques. As a consequence, the development of
a new feature extraction algorithm based on linguistic theories motivated by
the massive quantity of features often gathered when it comes to texts.
In summary, this dissertation means to go beyond typical, ad-hoc approaches
adopted by previous identity theft and authorship attribution research. Specifically
it proposes tailored solutions to this particular and extensively studied paradigm
with the aim at introducing a generic approach from a profiling view, not tightly
bound to a unique application field. In addition technical contributions have been
made in the course of the solution formulation intending to optimize familiar methods
for a better versatility towards the problem at hand. In summary: this Thesis
establishes an encouraging research basis towards unveiling subtle impersonation
attacks in Social Networks by means of intelligent learning techniques
Monte Carlo Method with Heuristic Adjustment for Irregularly Shaped Food Product Volume Measurement
Volume measurement plays an important role in the production and processing of food products. Various methods have been
proposed to measure the volume of food products with irregular shapes based on 3D reconstruction. However, 3D reconstruction
comes with a high-priced computational cost. Furthermore, some of the volume measurement methods based on 3D reconstruction
have a low accuracy. Another method for measuring volume of objects uses Monte Carlo method. Monte Carlo method performs
volume measurements using random points. Monte Carlo method only requires information regarding whether random points
fall inside or outside an object and does not require a 3D reconstruction. This paper proposes volume measurement using a
computer vision system for irregularly shaped food products without 3D reconstruction based on Monte Carlo method with
heuristic adjustment. Five images of food product were captured using five cameras and processed to produce binary images.
Monte Carlo integration with heuristic adjustment was performed to measure the volume based on the information extracted from
binary images. The experimental results show that the proposed method provided high accuracy and precision compared to the
water displacement method. In addition, the proposed method is more accurate and faster than the space carving method