3,365 research outputs found

    Evolutionary Data Mining Design to Visualize the Examination Timetabling Data at a University: A First Round Development.

    Get PDF
    Examination scheduling ("timetabling") at a University is a determined challenge. Allocating exam stipulate “time slots" requires most advanced quantitative techniques. This study takes an alternate approach of applying the principles of data mining (DM) explicitly using undirected data mining, data preprocessing to get the patterns in data then understand the relationship between them

    From public data to private information: The case of the supermarket

    Get PDF
    The background to this paper is that in our world of massively increasing personal digital data any control over the data about me seems illusionary – informational privacy seems a lost cause. On the other hand, the production of this digital data seems a necessary component of our present life in the industrialized world. A framework for a resolution of this apparent dilemma is provided if by the distinction between (meaningless) data and (meaningful) information. I argue that computational data processing is necessary for many present-day processes and not a breach of privacy, while collection and processing of private information is often not necessary and a breach of privacy. The problem and the sketch of its solution are illustrated in a case-study: supermarket customer cards

    Datamining in MS SQL Using Incremental Algorithms

    Get PDF
    Tato práce pojednává o problematice dolování v proudu dat, která patří v dnešní době k velmi dynamickým oblastem informačních technologií. V práci jsou popsány obecné principy dolování v datech a principy dolování v proudových datech. Podrobně je rozebrán implementovaný algoritmus CluStream. V rámci praktické části bylo navrženo a implementováno řešení zpracování proudových dat v technologii MSSQL za použítí výše uvedeného algoritmu. Funkčnost algoritmu pak byla ověřena za pomocí vlastního generátoru proudu dat.This work deals with issues in data streams mining which nowadays is a very dynamic area in information technology. The thesis describes the general principles of data mining. There are also the principles of data mining in the data streams. Special attention is given to the implemented algorithm CluStream. In the practical part the data stream processing solution was designed and implemented by the MSSQL technology using the above algorithm. The functionality of the algorithm was verified using own data stream generator.

    PRINSIP KLASIFIKASI DAN DATA MINING DENGAN ALGORITMA C4.5

    Get PDF
    Pertumbuhan yang cepat dan integrasi database memberikan ilmuwan, insinyur, dan pebisnis dengan sumber daya baru yang luas yang dapat dianalisis untuk membuat penemuan ilmiah, mengoptimalkan sistem industri, dan mengungkap pola yang berharga secara finansial. Untuk melakukan proyek analisis data besar ini, peneliti dan praktisi telah mengadopsi algoritme mapan dari statistik, pembelajaran mesin, jaringan saraf, dan basis data dan juga telah mengembangkan metode baru yang ditargetkan pada masalah data mining besar. Principles of Data Mining oleh David Hand, Heikki Mannila, dan Padhraic Smyth memberikan pengenalan kepada praktisi dan siswa tentang berbagai algoritma dan metodologi di area yang menarik ini. Pada penelitian ini digunakan algoritma C4.5. Sifat interdisipliner bidang ini cocok dengan ketiga penulis ini, yang keahliannya mencakup statistik, database, dan ilmu komputer. Hasilnya adalah sebuah buku yang tidak hanya memberikan detail teknis dan prinsip-prinsip matematika yang mendasari metode data mining, tetapi juga memberikan perspektif yang berharga tentang keseluruhan perusahaan

    Identifying Terrorist Affiliations Through Social Network Analysis Using Data Mining Techniques

    Get PDF
    In a technologically enabled world, local ideologically inspired warfare becomes global all too quickly, specifically terrorist groups like Al Quaeda and ISIS (Daesh) have successfully used modern computing technology and social networking environments to broadcast their message, recruit new members, and plot attacks. This is especially true for such platforms as Twitter and encrypted mobile apps like Telegram or the clandestine Alrawi. As early detection of such activity is crucial to attack prevention data mining techniques have become increasingly important in the fight against the spread of global terrorist activity. This study employs data mining tools to mine Twitter for terrorist ‘organizing’ vocabulary and to pinpoint, through the analysis of (admittedly sometimes sparse) tweet metadata, the most likely geographical location and connected identities behind the user accounts used to transmit which organizing or post-event information. To accomplish this goal, R code and the twitteR package was used to connect through the existing Twitter API in order to validate a relevant word/ search term list. I then determine, with “most likely” frequency counts and word clouds the number of K-means clusters into which to separate the linguistic uses of these words and, by virtue of association, their user accounts. These user accounts are then investigated with network graphs built using R, NodeXL, and Gephi, which plot the user network as the final step. For the sake of user-friendly visualization, these networks are shown using three verified ISIS-sympathizing accounts that contain activist language and have emerged through analysis as leadership positions, either in terms of communication or in terms of internet activism. Within the limits of this thesis and available computing resources, an analysis of these three accounts will have to suffice; however, this technique could be used in a larger framework to produce more analytical layers and identify high-rank leaders. One challenge to this approach has been the meaningful extraction of Arabic terms in R, which has required workarounds for UTF-8 to overcame challenges relating to character sets; another is the transience nature of social network activity, in which user accounts change frequently, one user is found to own several accounts, and tweets can be deleted at any time. As customary with Natural Language Processing, a third challenge emerges through variations in spelling, orthography, and the use of abbreviations and special characters (especially the use of the underscore character), must be accredited this impacts the composition of stop lists and edge lists and likely introduce false positives into the overall analysis. This is why, at the present, visual verification of the analysis results is requisite with greater refinement of the analysis, which exceeds the context of this study, and this need can be greatly reduced

    Applying Data Mining to Scheduling Courses at a University

    Get PDF
    Scheduling courses ( timetabling ) at a University is a persistent challenge. Allocating course-sections to prescribed time slots for courses requires advanced quantitative techniques, such as goal programming, and collecting a large amount of multi-criteria data at least six to eight months in advance of a semester. This study takes an alternate approach. It demonstrates the feasibility of applying the principles of data mining. Specifically it uses association rules to evaluate a non-standard ( aberrant ) timetabling pilot study undertaken in one College at a University. The results indicate that 1), inductive methods are indeed applicable, 2), both summary and detailed results can be understood by key decision-makers, and 3), straightforward, repeatable SQL queries can be used as the chief analytical technique on a recurring basis. In addition, this study was one of the first empirical studies to provide an accurate measure of the discernable, but negligible, scheduling exclusionary effects that may impact course availability and diversity negatively

    Topological visual localization using decentralized galois lattices

    Get PDF
    This paper presents a new decentralized method for selecting visual landmarks in a structured environment. Different images, issued from the different places, are analyzed, and primitives are extracted to determine whether or not features are present in the images. Subsequently, landmarks are selected as a combination of these features with a mathematical formalism called Galois - or concept - lattices. The main drawback of the general approach is the exponential complexity of lattice building algorithms. A decentralized approach is therefore defined and detailed here: it leads to smaller lattices, and thus to better performance as well as an improved legibility
    corecore