Search CORE

11 research outputs found

Flecha : um sistema de recomendação de questões de concurso público

Author: Bomfim Renato Rangel Costa Cruz
Publication venue
Publication date: 04/10/2017
Field of study

Trabalho de conclusão de curso (graduação)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2017.Esse estudo apresenta um sistema de recomendação de questões sobre Noções de Informática e Direito Administrativo que foram aplicadas em concursos públicos, com base na maior incidência dos temas nas provas. A ferramenta pode ser utilizada por candidatos para otimizar o tempo de estudo, permitindo escolher para resolver os exercícios cujos temas foram mais abordados em concursos. O sistema criado é capaz de extrair questões oriundas de meios não estruturados como PDFs, classificar automaticamente as questões e fazer recomendações de tal forma que a quantidade recomendada é proporcional ao que foi cobrado anteriormente. Para fazer a classificação foram induzidos classificadores com o SVM nas implementações SVC e LinearSVC e realizados experimentos com diferentes parâmetros. Também foram testados diferentes tipos de pré-processamento. Já na recomendação foi proposto um sistema que clusteriza as questões em grupos que serão recomendados proporcionalmente ao tamanho de cada cluster. Foram realizados experimentos com diferentes números de clusters, a eficácia de uma função de decaimento e o comportamento da qualidade da recomendação quando se aumenta o número de questões recomendadas. Na classificação, os melhores resultados obtidos foram com o LinearSVC. A recomendação obteve os melhores resultados sem a utilização do decaimento e com um número pequeno de clusters.This research presents a recommendation system for Brazilian civil service exams on Information Technology and Administrative Law. Based on the higher incidence of subjects in the tests, the tool can be used by candidates to optimize the study time, allowing to choose exercises whose subjects were most approached in the last exams. The system was created to extract questions from unstructured media such as PDFs, automatically classify questions and make recommendations such that the recommended amount is proportional to what was previously seen on past exams. In order to do the classification, it was induced SVM classifiers with the SVC and LinearSVC implementations and experiments were performed with different parameters. Different types of preprocessing have also been tested. In the recommendation, the exam question were clusterized into similar groups and recommended in proportion to the size of each cluster. Experiments were performed with different numbers of clusters, the effectiveness of the decay function and the behavior of the recommendation when increasing the number of recommended questions. In the classification, the best results were obtained with LinearSVC. The recommendation obtained the best results without the use of decay function and with a small number of clusters

Biblioteca Digital de Monografias

Recommended from our members

FP-tree Based Spatial Co-location Pattern Mining

Author: Yu Ping
Publication venue: 'University of North Texas Libraries'
Publication date: 01/05/2005
Field of study

A co-location pattern is a set of spatial features frequently located together in space. A frequent pattern is a set of items that frequently appears in a transaction database. Since its introduction, the paradigm of frequent pattern mining has undergone a shift from candidate generation-and-test based approaches to projection based approaches. Co-location patterns resemble frequent patterns in many aspects. However, the lack of transaction concept, which is crucial in frequent pattern mining, makes the similar shift of paradigm in co-location pattern mining very difficult. This thesis investigates a projection based co-location pattern mining paradigm. In particular, a FP-tree based co-location mining framework and an algorithm called FP-CM, for FP-tree based co-location miner, are proposed. It is proved that FP-CM is complete, correct, and only requires a small constant number of database scans. The experimental results show that FP-CM outperforms candidate generation-and-test based co-location miner by an order of magnitude

UNT Digital Library

An Investigation in Efficient Spatial Patterns Mining

Author: Wang Lizhen
Publication venue
Publication date
Field of study

The technical progress in computerized spatial data acquisition and storage results in the growth of vast spatial databases. Faced with large amounts of increasing spatial data, a terminal user has more difficulty in understanding them without the helpful knowledge from spatial databases. Thus, spatial data mining has been brought under the umbrella of data mining and is attracting more attention. Spatial data mining presents challenges. Differing from usual data, spatial data includes not only positional data and attribute data, but also spatial relationships among spatial events. Further, the instances of spatial events are embedded in a continuous space and share a variety of spatial relationships, so the mining of spatial patterns demands new techniques. In this thesis, several contributions were made. Some new techniques were proposed, i.e., fuzzy co-location mining, CPI-tree (Co-location Pattern Instance Tree), maximal co-location patterns mining, AOI-ags (Attribute-Oriented Induction based on Attributes’ Generalization Sequences), and fuzzy association prediction. Three algorithms were put forward on co-location patterns mining: the fuzzy co-location mining algorithm, the CPI-tree based co-location mining algorithm (CPI-tree algorithm) and the orderclique- based maximal prevalence co-location mining algorithm (order-clique-based algorithm). An attribute-oriented induction algorithm based on attributes’ generalization sequences (AOI-ags algorithm) is further given, which unified the attribute thresholds and the tuple thresholds. On the two real-world databases with time-series data, a fuzzy association prediction algorithm is designed. Also a cell-based spatial object fusion algorithm is proposed. Two fuzzy clustering methods using domain knowledge were proposed: Natural Method and Graph-Based Method, both of which were controlled by a threshold. The threshold was confirmed by polynomial regression. Finally, a prototype system on spatial co-location patterns’ mining was developed, and shows the relative efficiencies of the co-location techniques proposed The techniques presented in the thesis focus on improving the feasibility, usefulness, effectiveness, and scalability of related algorithm. In the design of fuzzy co-location Abstract mining algorithm, a new data structure, the binary partition tree, used to improve the process of fuzzy equivalence partitioning, was proposed. A prefix-based approach to partition the prevalent event set search space into subsets, where each sub-problem can be solved in main-memory, was also presented. The scalability of CPI-tree algorithm is guaranteed since it does not require expensive spatial joins or instance joins for identifying co-location table instances. In the order-clique-based algorithm, the co-location table instances do not need be stored after computing the Pi value of corresponding colocation, which dramatically reduces the executive time and space of mining maximal colocations. Some technologies, for example, partitions, equivalence partition trees, prune optimization strategies and interestingness, were used to improve the efficiency of the AOI-ags algorithm. To implement the fuzzy association prediction algorithm, the “growing window” and the proximity computation pruning were introduced to reduce both I/O and CPU costs in computing the fuzzy semantic proximity between time-series. For new techniques and algorithms, theoretical analysis and experimental results on synthetic data sets and real-world datasets were presented and discussed in the thesis

University of Huddersfield Repository

Design and analysis of clustering algorithms for numerical, categorical and mixed data

Author: Suarez Alvarez Maria Del Mar
Publication venue
Publication date: 01/01/2010
Field of study

In recent times, several machine learning techniques have been applied successfully to discover useful knowledge from data. Cluster analysis that aims at finding similar subgroups from a large heterogeneous collection of records, is one o f the most useful and popular of the available techniques o f data mining. The purpose of this research is to design and analyse clustering algorithms for numerical, categorical and mixed data sets. Most clustering algorithms are limited to either numerical or categorical attributes. Datasets with mixed types o f attributes are common in real life and so to design and analyse clustering algorithms for mixed data sets is quite timely. Determining the optimal solution to the clustering problem is NP-hard. Therefore, it is necessary to find solutions that are regarded as “good enough” quickly. Similarity is a fundamental concept for the definition of a cluster. It is very common to calculate the similarity or dissimilarity between two features using a distance measure. Attributes with large ranges will implicitly assign larger contributions to the metrics than the application to attributes with small ranges. There are only a few papers especially devoted to normalisation methods. Usually data is scaled to unit range. This does not secure equal average contributions of all features to the similarity measure. For that reason, a main part o f this thesis is devoted to normalisation.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

OpenGrey Repository

Design and analysis of clustering algorithms for numerical, categorical and mixed data

Author: Suarez Alvarez Maria Del Mar
Publication venue
Publication date
Field of study

Online Research @ Cardiff

An investigation in efficient spatial patterns mining

Author: Wang Lizhen
Publication venue
Publication date: 01/01/2002
Field of study

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

OpenGrey Repository