2,310 research outputs found
Self Super-Resolution for Magnetic Resonance Images using Deep Networks
High resolution magnetic resonance~(MR) imaging~(MRI) is desirable in many
clinical applications, however, there is a trade-off between resolution, speed
of acquisition, and noise. It is common for MR images to have worse
through-plane resolution~(slice thickness) than in-plane resolution. In these
MRI images, high frequency information in the through-plane direction is not
acquired, and cannot be resolved through interpolation. To address this issue,
super-resolution methods have been developed to enhance spatial resolution. As
an ill-posed problem, state-of-the-art super-resolution methods rely on the
presence of external/training atlases to learn the transform from low
resolution~(LR) images to high resolution~(HR) images. For several reasons,
such HR atlas images are often not available for MRI sequences. This paper
presents a self super-resolution~(SSR) algorithm, which does not use any
external atlas images, yet can still resolve HR images only reliant on the
acquired LR image. We use a blurred version of the input image to create
training data for a state-of-the-art super-resolution deep network. The trained
network is applied to the original input image to estimate the HR image. Our
SSR result shows a significant improvement on through-plane resolution compared
to competing SSR methods.Comment: Accepted by IEEE International Symposium on Biomedical Imaging (ISBI)
201
CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks
Data quality affects machine learning (ML) model performances, and data
scientists spend considerable amount of time on data cleaning before model
training. However, to date, there does not exist a rigorous study on how
exactly cleaning affects ML -- ML community usually focuses on developing ML
algorithms that are robust to some particular noise types of certain
distributions, while database (DB) community has been mostly studying the
problem of data cleaning alone without considering how data is consumed by
downstream ML analytics. We propose a CleanML study that systematically
investigates the impact of data cleaning on ML classification tasks. The
open-source and extensible CleanML study currently includes 14 real-world
datasets with real errors, five common error types, seven different ML models,
and multiple cleaning algorithms for each error type (including both commonly
used algorithms in practice as well as state-of-the-art solutions in academic
literature). We control the randomness in ML experiments using statistical
hypothesis testing, and we also control false discovery rate in our experiments
using the Benjamini-Yekutieli (BY) procedure. We analyze the results in a
systematic way to derive many interesting and nontrivial observations. We also
put forward multiple research directions for researchers.Comment: published in ICDE 202
SecREP : A Framework for Automating the Extraction and Prioritization of Security Requirements Using Machine Learning and NLP Techniques
Gathering and extracting security requirements adequately requires extensive effort, experience, and time, as large amounts of data need to be analyzed. While many manual and academic approaches have been developed to tackle the discipline of Security Requirements Engineering (SRE), a need still exists for automating the SRE process. This need stems mainly from the difficult, error-prone, and time-consuming nature of traditional and manual frameworks. Machine learning techniques have been widely used to facilitate and automate the extraction of useful information from software requirements documents and artifacts. Such approaches can be utilized to yield beneficial results in automating the process of extracting and eliciting security requirements. However, the extraction of security requirements alone leaves software engineers with yet another tedious task of prioritizing the most critical security requirements. The competitive and fast-paced nature of software development, in addition to resource constraints make the process of security requirements prioritization crucial for software engineers to make educated decisions in risk-analysis and trade-off analysis.
To that end, this thesis presents an automated framework/pipeline for extracting and prioritizing security requirements. The proposed framework, called the Security Requirements Extraction and Prioritization Framework (SecREP) consists of two parts: SecREP Part 1: Proposes a machine learning approach for identifying/extracting security requirements from natural language software requirements artifacts (e.g., the Software Requirement Specification document, known as the SRS documents) SecREP Part 2: Proposes a scheme for prioritizing the security requirements identified in the previous step.
For the first part of the SecREP framework, three machine learning models (SVM, Naive Bayes, and Random Forest) were trained using an enhanced dataset the “SecREP Dataset” that was created as a result of this work. Each model was validated using resampling (80% of for training and 20% for validation) and 5-folds cross validation techniques. For the second part of the SecREP framework, a prioritization scheme was established with the aid of NLP techniques. The proposed prioritization scheme analyzes each security requirement using Part-of-speech (POS) and Named Entity Recognition methods to extract assets, security attributes, and threats from the security requirement. Additionally, using a text similarity method, each security requirement is compared to a super-sentence that was defined based on the STRIDE threat model. This prioritization scheme was applied to the extracted list of security requirements obtained from the case study in part one, and the priority score for each requirement was calculated and showcase
The role of emotional variables in the classification and prediction of collective social dynamics
We demonstrate the power of data mining techniques for the analysis of
collective social dynamics within British Tweets during the Olympic Games 2012.
The classification accuracy of online activities related to the successes of
British athletes significantly improved when emotional components of tweets
were taken into account, but employing emotional variables for activity
prediction decreased the classifiers' quality. The approach could be easily
adopted for any prediction or classification study with a set of
problem-specific variables.Comment: 16 pages, 9 figures, 2 tables and 1 appendi
- …