Search CORE

96,046 research outputs found

Feature selection to enhance android malware detection using modified term frequency-inverse document frequency (MTF-IDF)

Author: Mazlan Nurul Hidayah
Publication venue
Publication date: 01/02/2019
Field of study

This research synthesizes an evaluation of feature selection algorithm by utilizing Term Frequency-Inverse Document Frequency (TF-IDF) as the main algorithm in Android malware detection. The TF-IDF algorithm is used to filter Android features filtered before detection process. However, IDF is unaware to the training class labels and gives incorrect weight value to some features. Therefore, the proposed approach that is Modified Term Frequency – Inverse Document Frequency (MTF-IDF) algorithm give more focus on both sample and features to give correct weight value to some features. The proposed algorithm considered features based on its level of importance where weight given based on number of features involved in the sample. The related best features in the sample are selected using weight and priority ranking process using K-means. This ensures that only important malware features are selected in the Android application sample. These experiments are conducted on a sample collected from DREBIN. Comparison between existing TF-IDF algorithm and MTF-IDF algorithm have been made under various conditions such as tested on different number of sample size, different number of features used and integration of different types of features. The results showed that feature selection using MTF-IDF can improve Android malware detection analysis. It was proven that MTF-IDF is an effective Android malware detection algorithm regardless of different kinds of features or sample sizes used. MTF-IDF algorithm also proved that it can give appropriate scaling for all features in analyzing Android malware detection

UTHM Institutional Repository

Blog Analysis with Fuzzy TFIDF

Author: Ho Chi-Shu
Publication venue: SJSU ScholarWorks
Publication date: 01/01/2007
Field of study

These days blogs are becoming increasingly popular because it allows anyone to share their personal diary, opinions, and comments on the World Wide Wed. Many blogs contain valuable information, but it is a difficult task to extract this information from a high number of blog comments. The goal is to analyze a high number of blog comments by clustering all blog comments by their similarity based on keyword relevance into smaller groups. TF-IDF weight has been used in classifying documents by measuring appearance frequency of each keyword in a document, but it is not effective in differentiating semantic similarities between words. By applying fuzzy semantic to TF-IDF, TF-IDF becomes fuzzy TF-IDF and has the ability to rank semantic relevancy. Fuzzy VSM can be effective in exploring hidden relationship between blog comments by adapting fuzzy TF-IDF and fuzzy semantic for extending Vector Space Model to fuzzy VSM. Therefore, fuzzy VSM can cluster a high number of blog comments into small number of groups based on document similarity and semantic relevancy

SJSU ScholarWorks

Bug or Not? Bug Report Classification Using N-Gram IDF

Author: Hata Hideaki
Matsumoto Kenichi
Phannachitta Passakorn
Terdchanakul Pannavat
Publication venue
Publication date: 01/01/2017
Field of study

Previous studies have found that a significant number of bug reports are misclassified between bugs and non-bugs, and that manually classifying bug reports is a time-consuming task. To address this problem, we propose a bug reports classification model with N-gram IDF, a theoretical extension of Inverse Document Frequency (IDF) for handling words and phrases of any length. N-gram IDF enables us to extract key terms of any length from texts, these key terms can be used as the features to classify bug reports. We build classification models with logistic regression and random forest using features from N-gram IDF and topic modeling, which is widely used in various software engineering tasks. With a publicly available dataset, our results show that our N-gram IDF-based models have a superior performance than the topic-based models on all of the evaluated cases. Our models show promising results and have a potential to be extended to other software engineering tasks.Comment: 5 pages, ICSME 201

arXiv.org e-Print Archive

NAIST Academic Repository

Crossref

From Surviving to Thriving: Evaluation of the International Diabetes Federation Life for a Child Program

Author: Louise Sigfrid
Martin McKee
Miranda Eeles
Sue Atkinson
Zoe Atkinson
Publication venue: International Diabetes Foundation
Publication date: 03/03/2015
Field of study

IDF-LFAC aims to provide: (1) insulin and syringes; (2) blood glucose monitoring (BGM) equipment; (3) appropriate clinical care; (4) HbA1c testing; (5) diabetes education; and (6) technical support and training for health professionals, as well as 7) facilitating relevant clinical research, and where possible 8) assisting with capacity building. IDF-LFAC receives financial and in-kind support from private foundations, individuals, and corporations. Insulin and blood glucose monitoring equipment distribution is made possible by donations of insulin and the purchase of blood glucose monitors and strips at a reduced price from large pharmaceutical companies.The goal of this evaluation is to assess IDF-LFAC's organizational structure, strategic framework, processes, program impact, and potential to catalyze longterm sustainable improvements to T1D care delivery systems in its partner countries. LSHTM were commissioned to undertake the evaluation in 2014 when IDF-LFAC had active programs in 45 countries

IssueLab

The accessibility dimension for structured document retrieval

Author: Kazai Gabriella
Lalmas Mounia
Quicker Stefan
Roelleke Thomas
Ruthven Ian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2001
Field of study

Structured document retrieval aims at retrieving the document components that best satisfy a query, instead of merely retrieving pre-defined document units. This paper reports on an investigation of a tf-idf-acc approach, where tf and idf are the classical term frequency and inverse document frequency, and acc, a new parameter called accessibility, that captures the structure of documents. The tf-idf-acc approach is defined using a probabilistic relational algebra. To investigate the retrieval quality and estimate the acc values, we developed a method that automatically constructs diverse test collections of structured documents from a standard test collection, with which experiments were carried out. The analysis of the experiments provides estimates of the acc values

CiteSeerX

Crossref

University of Strathclyde Institutional Repository

Silencing CHALCONE SYNTHASE in maize impedes the incorporation of tricin into lignin and increases lignin content

Author: Boerjan Wout
Cesarino Igor
de Lyra Soriano Saleme Marina
Eloy Nubia Barbosa
Goeminne Geert
Lan Wu
Morreel Kris
Nicomedes José Junior
Pallidis Andreas
Ralph John
Smith Rebecca
Vanholme Ruben
Voorend Wannes
Publication venue: 'American Society of Plant Biologists (ASPB)'
Publication date: 09/12/2016
Field of study

Lignin is a phenolic heteropolymer that is deposited in secondary-thickened cell walls, where it provides mechanical strength. A recent structural characterization of cell walls from monocot species showed that the flavone tricin is part of the native lignin polymer, where it is hypothesized to initiate lignin chains. In this study, we investigated the consequences of altered tricin levels on lignin structure and cell wall recalcitrance by phenolic profiling, nuclear magnetic resonance, and saccharification assays of the naturally silenced maize (Zea mays) C2-Idf (inhibitor diffuse) mutant, defective in the CHALCONE SYNTHASE Colorless2 (C2) gene. We show that the C2-Idf mutant produces highly reduced levels of apigenin-and tricin-related flavonoids, resulting in a strongly reduced incorporation of tricin into the lignin polymer. Moreover, the lignin was enriched in beta-beta and beta-5 units, lending support to the contention that tricin acts to initiate lignin chains and that, in the absence of tricin, more monolignol dimerization reactions occur. In addition, the C2-Idf mutation resulted in strikingly higher Klason lignin levels in the leaves. As a consequence, the leaves of C2-Idf mutants had significantly reduced saccharification efficiencies compared with those of control plants. These findings are instructive for lignin engineering strategies to improve biomass processing and biochemical production

Ghent University Academic Bibliography

PubMed Central

Are glucose profiles well-controlled within the targets recommended by the International Diabetes Federation in type 2 diabetes? A meta-analysis of results from continuous glucose monitoring based studies

Author: Chastin Sebastien F.M.
Collier Andrew
Kirk Alison F.
Kubiak Thomas
Paing Aye C.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

AIMS: To assess continuous glucose monitoring (CGM) derived intra-day glucose profiles using global guideline for type 2 diabetes recommended by the International Diabetes Federation (IDF). METHODS: The Cochrane Library, MEDLINE, PubMed, CINAHL and Science Direct were searched to identify observational studies reporting intra-day glucose profiles using CGM in people with type 2 diabetes on any anti-diabetes agents. Overall and subgroup analyses were conducted to summarise mean differences between reported glucose profiles (fasting glucose, pre-meal glucose, postprandial glucose and post-meal glucose spike/excursion) and the IDF targets. RESULTS: Twelve observational studies totalling 731 people were included. Pooled fasting glucose (0.81 mmol/L, 95% CI, 0.53-1.09 mmol/L), postprandial glucose after breakfast (1.63 mmol/L, 95% CI, 0.79-2.48 mmol/L) and post-breakfast glucose spike (1.05 mmol/L, 95% CI, 0.13-1.96 mmol/L) were significantly higher than the IDF targets. Pre-lunch glucose, pre-dinner glucose and postprandial glucose after lunch and dinner were above the IDF targets but not significantly. Subgroup analysis showed significantly higher fasting glucose and postprandial glucose after breakfast in all groups: HbA1c <7% and ≥7% (53 mmol/mol) and duration of diabetes <10 years and ≥10 years. CONCLUSIONS: Independent of HbA1c, fasting glucose and postprandial glucose after breakfast are not well-controlled in type 2 diabetes

University of Strathclyde Institutional Repository

Ghent University Academic Bibliography

ResearchOnline@GCU

Classification of metamorphic virus using n-grams signatures

Author: A Hamid Isredza Rahmi
Abdullah Zubaile
Kipli Kuryati
Md Sani Nur Sakinah
Mohd Foozy Cik Feresa
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Metamorphic virus has a capability to change, translate, and rewrite its own code once infected the system to bypass detection. The computer system then can be seriously damage by this undetected metamorphic virus. Due to this, it is very vital to design a metamorphic virus classification model that can detect this virus. This paper focused on detection of metamorphic virus using Term Frequency Inverse Document Frequency (TF-IDF) technique. This research was conducted using Second Generation virus dataset. The first step is the classification model to cluster the metamorphic virus using TF-IDF technique. Then, the virus cluster is evaluated using Naïve Bayes algorithm in terms of accuracy using performance metric. The types of virus classes and features are extracted from bi-gram assembly language. The result shows that the proposed model was able to classify metamorphic virus using TF-IDF with optimal number of virus class with average accuracy of 94.2%

UTHM Institutional Repository

Crossref