19 research outputs found
Aplicación de selección de características, métricas de aprendizaje y reducción de dimensión en sistemas de detección de intrusos /
Las redes de computadores inicialmente fueron diseñadas para una cantidad limitada de usuarios,
hoy día se presentan como una necesidad para los hogares, pequeñas, medianas y grandes
organizaciones. Los malos diseños de estructura de las redes de computadores han generado
brechas de seguridad para mantener la integralidad, confidencialidad y disponibilidad de la
información que es transferida por dicho medio, por ello existe la necesidad de proponer nuevas
estrategias que permitan la identificación de ingresos no autorizados a las redes de computadores.
El desarrollo de esta investigación tiene como propósito la aplicación de técnicas de selección de
características, métricas de aprendizaje y reducción de dimensión en sistemas de detección
de intrusos, utilizando los datos almacenados en el dataset NSL-KDD, el cual contiene 225.000
registros de conexiones en una red de computadores con 41 características.Incluye bibliografía, anexo
Dataset for estimation of obesity levels based on eating habits and physical condition in individuals from Colombia, Peru and Mexico
This paper presents data for the estimation of obesity levels in individuals from the countries of Mexico, Peru and Colombia, based on their eating habits and physical condition. The data contains 17 attributes and 2111 records, the records are labeled with the class variable NObesity (Obesity Level), that allows classification of the data using the values of Insufficient Weight, Normal Weight, Overweight Level I, Overweight Level II, Obesity Type I, Obesity Type II and Obesity Type III. 77% of the data was generated synthetically using the Weka tool and the SMOTE filter, 23% of the data was collected directly from users through a web platform. This data can be used to generate intelligent computational tools to identify the obesity level of an individual and to build recommender systems that monitor obesity levels. For discussion and more information of the dataset creation, please refer to the full-length article “Obesity Level Estimation Software based on Decision Trees” (De-La-Hoz-Correa et al., 2019).Universidad de la Cost
Deep learning of robust representations for multi-instance and multi-label image classification
In multi-instance problems (MIL), an arbitrary number of instances is associated with a class label. Therefore, the labeling of training data becomes simpler (since it is done together, instead of individually) with the disadvantage that a weakly supervised database is produced [9]. In the PCRY, each restaurant is represented by a set of images that share the attribute label(s) of that establishment. This paper explores the use of previously learned attribute extractors, trained in 3 different databases that are similar and complementary to the PCRY databas
Association rules implementation for affinity analysis between elements composing multimedia objects
The multimedia objects are a constantly growing resource in the world wide web, consequently it has
generated as a necessity the design of methods and tools that allow to obtain new knowledge from the
information analyzed. Association rules are a technique of Data Mining, whose purpose is to search for
correlations between elements of a collection of data (data) as support for decision making from the
identification and analysis of these correlations. Using algorithms such as: A priori, Frequent Parent
Growth, QFP Algorithm, CBA, CMAR, CPAR, among others. On the other hand, multimedia applications
today require the processing of unstructured data provided by multimedia objects, which are made up of
text, images, audio and videos. For the storage, processing and management of multimedia objects,
solutions have been generated that allow efficient search of data of interest to the end user, considering that
the semantics of a multimedia object must be expressed by all the elements that composed of. In this article
an analysis of the state of the art in relation to the implementation of the Association Rules in the
processing of Multimedia objects is made, in addition the analysis of the consulted literature allows to
generate questions about the possibility of generating a method of association rules for the analysis of these
objects.Universidad de la Costa, Universidad Pontificia Bolivariana
Classification and features selection method for obesity level prediction
Obesity has become one of the world’s largest health issues, rich and poor countries, without exception, have
each year larger populations with this condition. Obesity and overweight are defined as abnormal or excessive
fat accumulation that may impair health according to the World Health Organization (WHO) and has nearly
tripled since 1975. Data Mining and their techniques have become a strong scientific field to analyze huge
data sources and to provide new information about patterns and behaviors from the population. This study
uses data mining techniques to build a model for obesity prediction, using a dataset based on a survey for
college students in several countries. After cleaning and transformation of the data, a set of classification
methods was implemented (Logistic Model Tree - LMT, RandomForest - RF, Multi-Layer Perceptron - MLP
and Support Vector Machines - SVM), and the feature selection methods InfoGain, GainRatio, Chi-Square
and Relief, finally, crossed validation was performed for the training and testing processes. The data showed
than LMT had the best performance in precision, obtaining 96.65%, compared to RandomForest (95.62%),
MLP (94.41%) and SMO (83.89%), so this study shows that LMT it can be used with confidence to analyze
obesity and similar data
Estado del arte del proyecto
The aim of REMIND is to create an International and Intersectoral network to facilitate the exchange of staff to progress developments in reminding technologies for persons with dementia that can be deployed in smart environments. The consortium is comprised of an International network of 7 academic beneficiaries, 5 nonacademic beneficiaries and 4 partners from Third Countries, all of whom are committed to progressing the notion of reminding technologies within smart environments. The focus of REMIND is to develop staff and beneficiary/partner skills in the areas of user centered design and behavioral science coupled with improved computational techniques which in turn will offer more appropriate and efficacious reminding solutions. This will be further supported through research involving user centric studies into the use of reminding technologies and the theory of behaviour change to improve compliance of usage. Research objectives will be focused within the domain of smart environments. A smart environment can be viewed as having the ability to sense its surroundings through embedded sensors and following processing of the sensed information, adjust the environment through actuators to offer an improved experience for the inhabitant. Even though the availability, cost, size and battery life of sensing technology have all improved in recent years, the uptake of real smart environments has been limited. This is mainly related to the effort required to support the technical deployments and the lack of a business model to support a service provider capable of offering support to a large number of environments. In addition, there is a limit to the amount of scenarios which can be facilitated by such environments; this limit is directly related to the number of sensors availabl
Feature selection, learning metrics and dimension reduction in training and classification processes in intrusion detection systems
This research presents an IDS prototype in Matlab that assess network traffic connections contained in the
NSL-KDD dataset, comparing feature selection techniques available in FEAST toolbox, refining prior
results applying dimension reduction technique ISOMAP. The classification process used a supervised
learning technique called Support Vector Machines (SVM). The comparative analysis related to detection
rates by attack category are conclusive that MRMR+PCA+SVM (selection, reduction and classification
techniques) combined obtained more promising results, just using 5 of 41 available features in the dataset.
The results obtained were: 85.42% normal traffic, 80.77% DoS, 90.41% Probe, 91.78% U2R and 83.25%
R2L
Application of feast (Feature Selection Toolbox) in ids (Intrusion detection Systems)
Security in computer networks has become a critical point for many organizations, but keeping data integrity demands time and large economic investments, in consequence there has been several solution approaches between hardware and software but sometimes these has become inefficient for attacks detection. This paper presents research results obtained implementing algorithms from FEAST, a Matlab Toolbox with the purpose of selecting the method with better precision results for different attacks detection using the least number of features. The Data Set NSL-KDD was taken as reference. The Relief method obtained the best precision levels for attack detection: 86.20%(NORMAL), 85.71% (DOS), 88.42% (PROBE), 93.11%(U2R), 90.07(R2L), which makes it a promising technique for features selection in data network intrusions
Method based on data mining techniques for breast cancer recurrence analysis
Cancer is a constantly evolving disease, which affects a large number of people worldwide. Great efforts have been made at the research level for the development of tools based on data mining techniques that allow to detect or prevent breast cancer. The large volumes of data play a fundamental role according to the literature consulted, a great variety of dataset oriented to the analysis of the disease has been generated, in this research the Breast Cancer dataset was used, the purpose of the proposed research is to submit comparison of the J48 and randomforest, NaiveBayes and NaiveBayes Simple, SMO Poli-kernel and SMO RBF-Kernel classification algorithms, integrated with the Simple K-Means cluster algorithm for the generation of a model that allows the successful classification of patients who are or Non-recurring breast cancer after having previously undergone surgery for the treatment of said disease, finally the methods that obtained the best levels were SMO Poly-Kernel + Simple K-Means 98.5% of Precision, 98.5% recall, 98.5% TPRATE and 0.2% FPRATE. The results obtained suggest the possibility of using intelligent computational tools based on data mining methods for the detection of breast cancer recurrence in patients who had previously undergone surgery
RDF query and protocols language using for description and representation of web ontologies
The purpose of this article is to expose the metadata structure based on RDF (Resource Description Framework) and the way in which queries can be made using SPARQL (Protocol and RDF Query Language), as a principle for searching the Semantic Web. It also describes what must be considered to build a Web Ontology and the tools that can help the Software developer to make querys using SPARQL