21 research outputs found
NATDATA: integrando dados de recursos naturais dos biomas brasileiros.
RESUMO: A agricultura nacional exige a intensificação das áreas plantadas aliada à manutenção dos recursos naturais dos biomas brasileiros. Respostas rápidas a questões envolvendo temas como solo, recursos hídricos, biodiversidade e clima nesse caso são essenciais. O Brasil dispõe de um grande acervo de dados sobre estes temas, distribuídos em várias instituições de pesquisa. A heterogeneidade de padrões aliada a essa distribuição dificulta o seu uso combinado. Este trabalho apresenta uma iniciativa que vem sendo desenvolvida pela Empresa Brasileira de Pesquisa Agropecuária - Embrapa, que tem como principal objetivo integrar dados de recursos naturais dos diferentes biomas brasileiros, fornecendo aos usuários um ambiente que permita a consulta rápida e integrada a estes dados.SBIAgro 2011
DYMS (Dynamic Matcher Selector) – Scenario-based Schema Matcher Selector
Schema matching is one of the main challenges in different information system integration contexts. Over the past 20 years, different schema matching methods have been proposed and shown to be successful in various situations. Although numerous advanced matching algorithms have emerged, schema matching research remains a critical issue. Different algorithms are implemented to resolve different types of schema heterogeneities, including differences in design methodologies, naming conventions, and the level of specificity of schemas, amongst others. The algorithms are usually too generic regardless of the schema matching scenario. This situation indicates that a single matcher cannot be optimized for all matching scenarios. In this research, I proposed a dynamic matcher selector (DYMS) as a probable solution to the aforementioned problem. The proposed DYMS analyzes the schema matching scenario and selects the most appropriate matchers for a given scenario. Selecting matchers are weighted based on the parameter optimization process, which adopts the heuristic learning approach. The DYMS returns the alignment result of input schemas
Web Personalization using Neuro-Fuzzy Clustering Algorithms
Different users have different needs from the same web page and hence it is necessary to develop a system which understands the needs and demands of the users. Web server logs have abundant information about the nature of users accessing it. In this paper we discussed how to mine these web server logs for a given period of time using unsupervised and competitive learning algorithm like Kohonen\u27\u27s self organizing maps (SOM) and interpreting those results using Unified distance Matrix (U-matrix). These algorithms help us in efficiently clustering users based on similar web access patterns and each cluster having users with similar browsing patterns. These clusters are useful in web personalization so that it communicates better with its users and also in web traffic analysis for predicting web traffic at a given period of time
Data Warehousing Scenarios for Model Management
Model management is a framework for supporting meta-data related
applications where models and mappings are manipulated as first class objects
using operations such as Match, Merge, ApplyFunction, and Compose. To demonstrate
the approach, we show how to use model management in two scenarios
related to loading data warehouses. The case study illustrates the value of model
management as a methodology for approaching meta-data related problems. It
also helps clarify the required semantics of key operations. These detailed
scenarios provide evidence that generic model management is useful and, very
likely, implementable
Analyzing Web Server Access Log Files Using Data Mining Techniques
Nowadays web is not only considered as a network for acquiring data, buying products and obtaining
services but as a social environment for interaction and
information sharing. As the number of web sites continues
to grow it becomes more difficult for users to find and
extract information. As a solution to that problem, during
the last decade, web mining is used to evaluate the web sites, to personalize the information that is displayed to a user or set of users or to adapt the indexing structure of a web site to meet the needs of the users. In this work we describe a methodology for web usage mining that enables discovering user access patterns. Particularly we are interested whether the topology of the web site matches the desires of the users. Data collections that are used for analysis and interpretation of user viewing patterns are taken from the web server log files. Data mining techniques, such as classification, clustering and association rules are applied on preprocessed
data. The intent of this research is to propose techniques for improvement of user perception and interaction with a web site
Natdata - plataforma de recursos naturais dos biomas brasileiros: informações geoespaciais para sustentabilidade na agricultura.
Resumo: Este trabalho apresenta o Natdata, uma plataforma para integração de informação de Recursos Naturais dos Biomas Brasileiros. O artigo comenta o processo a ser adotado no seu desenvolvimento para lidar com problemas como a heterogeneidade semântica, de dados e espacial. Em especial destaca-se neste contexto a contribuição do Bioma Pantanal, rico em diversidade de informação, com vários dados já estruturados, podendo servir como um ambiente para validação das atividades sendo desenvolvidas.Geopantanal 2012
Online Real Estate Feed Reader
This report basically discusses the preliminary research done and basic understanding of
the proposed topic, which is "Online Real Estate Feed Reader". This online real estate
feed reader is an idea in order to give services to people who lives in the developed area
where people will face the difficulty to find land or houses. This also gives people an
easy alternative way to find the agent for real estate from using the traditional way like
newspaper or directly to agent. User can just click search button to display the result
and also can just subscribe RSS in order to get time to time updated. This Online Real
Estate feed reader website with the objective to help the user to search exactly the real
estate website available from internet and not with the anonymous result which
sometimes no related at all to the real estate web. User also can get information about
what they are searching in term of description, contact person or picture from this
online real estate feed reader, this will make sure user will get right information from
the selected real estate website. The website also contains with the RSS feeds where it
can be subscribe from the top of the page, this RSS will be update frequently
automatically will show the outline of the information and also the picture if there are
available. For the introduction, the scope for this online real estate feed reader will be
focusing to search or the of real estate in KL area only, where this website only lists the
real estate from KL area and stores all the information into the database. The
methodology use in this project is prototyping methodology, where it consist of several
phase which are planning, analysis, design and implementation phase, which the
planning, analysis and design are perform repeatedly until the system is completed .The
analysis also has been perform under result and discussion session, most of the user
give a positive feedback for the system and for the conclusion author hope this project
will be success and achieve it scope and objectives like planned and user can get benefit
by using this system
A New Algorithm to Preserve Sensitive Frequents Itemsets (APSFI) in Horizontal or Vertical Database
This research aimed to preserve on privacy of sensitive information from adversaries. We propose an Algorithm to Preserve Sensitive Frequents Itemsets (APSFI) with two ramifications to hides sensitive frequents itemsets in horizontal or vertical databases which minimize the number of database scanning processes during hiding operation. The main approach to hide sensitive frequent itemsets is to reduce the support of each given frequents sensitive 1-itemsets to be insensitive and convert another insensitive to be sensitive in the same transaction to avoid the change of database size and transaction's nature to avoid adversaries' doubt. The experiments of APSFI showed very encouraging results; it excluded 91% of database scan operations in vertical databases and 41% in horizontal layout databases in comparison with the well-known FHSFI algorithm. The experiments depict the APSFI tolerance for database size scalability, and its linear outperformance, from execution time aspect, in contrast with FHSFI
CẢI THIỆN THUẬT GIẢI CUCKOO TRONG VẤN ĐỀ ẨN LUẬT KẾT HỢP
Nowadays, the problem of data security in the process of data mining receives more attention. The question is how to balance between exploiting legal data and avoiding revealing sensitive information. There have been many approaches, and one remarkable approach is privacy preservation in association rule mining to hide sensitive rules. Recently, a meta-heuristic algorithm is relatively effective for this purpose, which is cuckoo optimization algorithm (COA4ARH). In this paper, an improved version of COA4ARH is presented for calculating the minimum number of sensitive items which should be removed to hide sensitive rules, as well as limit the loss of non-sensitive rules. The experimental results gained from three real datasets showed that the proposed method has better results compared to the original algorithm in several cases.Hiện nay, vấn đề bảo mật dữ liệu ngày càng được quan tâm hơn trong quá trình khai thác dữ liệu. Làm sao để vừa có thể khai thác hợp pháp mà vừa tránh lộ ra các thông tin nhạy cảm. Có rất nhiều hướng tiếp cận nhưng nổi trội trong số đó là khai thác luật kết hợp đảm bảo sự riêng tư nhằm ẩn các luật nhạy cảm. Gần đây, có một thuật toán meta heuristic khá hiệu quả để đạt mục đích này, đó là thuật toán tối ưu hóa Cuckoo (COA4ARH). Trong bài báo này, một đề xuất cải tiến của COA4ARH được đưa ra để tính toán số lượng tối thiểu các item nhạy cảm cần được xóa để ẩn luật, từ đó hạn chế việc mất các luật không nhạy cảm. Các kết quả thực nghiệm tiến hành trên ba tập dữ liệu thực cho thấy trong một số trường hợp thì cải tiến đề xuất có kết quả khá tốt so với thuật toán ban đầu