47 research outputs found

    A New Similarity Measure for Document Classification and Text Mining

    Get PDF
    Accurate, efficient and fast processing of textual data and classification of electronic documents have become an important key factor in knowledge management and related businesses in today’s world. Text mining, information retrieval, and document classification systems have a strong positive impact on digital libraries and electronic content management, e-marketing, electronic archives, customer relationship management, decision support systems, copyright infringement, and plagiarism detection, which strictly affect economics, businesses, and organizations. In this study, we propose a new similarity measure that can be used with k-nearest neighbors (k-NN) and Rocchio algorithms, which are some of the well-known algorithms for document classification, information retrieval, and some other text mining purposes. We have tested our novel similarity measure with some structured textual data sets and we have compared the results with some other standard distance metrics and similarity measures such as Cosine similarity, Euclidean distance, and Pearson correlation coefficient. We have obtained some promising results, which show that this proposed similarity measure could be alternatively used within all suitable algorithms, methods, and models for text mining, document classification, and relevant knowledge management systems. Keywords: text mining, document classification, similarity measures, k-NN, Rocchio algorith

    A proposed model for qualitative information security risk assessment based on machine learning

    No full text
    Doktora TeziÖrgütlerde ve kurumsal yapılarda bilgi güvenliği yönetiminin en önemli aşaması bilgi güvenliği risklerinin belirlenmesi ve bu risklerin çeşitli nitel veya nicel yöntemlerle hesaplanıp değerlendirilmesi sürecidir. Bilgi güvenliği risklerini değerlendirmede doğrusal olmayan modeller üretilmesi; bilinen yöntemlere alternatif olabilecek gelişime açık güncel bir araştırma konusu olarak kabul edilmektedir. Bu çalışmada, bilgi güvenliği risklerinin nitel değerlendirmesine yönelik yeni bir model ortaya konulmaktadır. Yapılan çalışma iki temel amaca yöneliktir. Birinci amaç, bir kuruma özel bir bilgi güvenliği risk anketi oluşturulması, kurumda anketin uygulanması ve elde edilen sonuçların değerlendirilmesidir. Çalışmanın ikinci amacı da anket verilerinden yola çıkarak özdevimli öğrenme sınıflandırıcılarına uygun özgün bir nitel risk değerlendirme modeli geliştirmek ve hangi sınıflandırıcıların tasarlanan model için en başarılı sonuçları verdiğini belirlemektir. Sonuçlar değerlendirildiğinde geliştirilen modelin başarılı ve iyileştirmeye açık özgün bir model olduğu görülmüştür. Çalışma süresince bu iki amaca ek olarak diğer bazı önemli ve özgün bulgular ve sonuçlar elde edilmiştir. Özdevimli öğrenme yaklaşımının bilgi güvenliği risk değerlendirme sürecinde denetim mekanizması olarak katkı sağlayacağı görülmüş ve çalışmadaki modelin iyileştirilmesinde bu sonuçlardan yararlanılmıştır. Bir başka bulgu, değişik anket uygulamalarında özdevimli öğrenme yaklaşımının hata denetim işlevi olarak kullanılabileceğinin ortaya çıkarılmış olmasıdır. Geliştirilen örnek modelin uygulanabileceği yeni çalışma alanları ve modelin iyileştirilmesine yönelik öneriler çalışma sonucunda ortaya konulmuştur.AbstractThe definition, analysis and assessment of information security risks by the aid of quantitative or qualitative methods is the most crucial process in information security management amongst the institutions and corporations. Instead of the common methodologies and models; derivation and usage of non-linear models for information security risk assessment has become an alternative hot topic. In this study, a new model has been proposed for assessing the qualitative information security risks. This dissertation?s basic aim is twofold. The first aim is to design, derive and implement a unique information security risk analysis survey for a specific institution. The second aim of this study is to implement an original machine learning classification model that deduces and prioritizes the risks with the data set that is derived from the survey results. The model is refined by observing and comparing the performance values of binary classifier algorithms? train and test results. The results show that the model can be accepted as a successful prototype. In addition, it is shown that some machine learning classifiers could be used as a cross-check control mechanism in information security risk evaluations and assessments. It is also shown that machine learning algorithms can be adapted and used as a supporting control mechanism for discovering biased answers in generic purpose surveys. Some recommendations for further improvements and new research areas have also been included in the study

    A genetic algorithmic approach to the differential and linear cryptanalysis

    No full text
    Thesis (Master)--Izmir Institute of Technology, Computer Engineering, Izmir, 1999Includes bibliographical references (leaves: 193-197)Text in English; Abstract: Turkish and Englishxii, 204 leavesThe two most well known and recently developed methods in cryptanalysis of DES and DES-like symmetric block ciphers are difTerential and linear cryptanalysis. But these cryptanalytic attacks need to be improved due to the computational performance and storage capacity problems On the other hand, genetic algorithms can be a good solution in cases where the optimum value or near-optimum solutions are sought in complex systems or for non-linear problems. This is a valid situation for the cryptanalysis case where DES and DES-like ciphers are non-linear in structure making dilTerential and linear cryptanalysis a complex system with a very large search landscape and extreme amount of conditional and probabilistic candidates for the key being sought. In this study, a new and promising method wit h bet ter performance is to be developed for differential/linear cryptanalysis of DES and similar symmetric cryptosystems exploiting genetic algorithms' broadened search and optimum finding capacity

    A genetic algorithmic approach to the differential and linear cryptanalysis

    No full text
    Thesis (Master)--Izmir Institute of Technology, Computer Engineering, Izmir, 1999Includes bibliographical references (leaves: 193-197)Text in English; Abstract: Turkish and Englishxii, 204 leavesThe two most well known and recently developed methods in cryptanalysis of DES and DES-like symmetric block ciphers are difTerential and linear cryptanalysis. But these cryptanalytic attacks need to be improved due to the computational performance and storage capacity problems On the other hand, genetic algorithms can be a good solution in cases where the optimum value or near-optimum solutions are sought in complex systems or for non-linear problems. This is a valid situation for the cryptanalysis case where DES and DES-like ciphers are non-linear in structure making dilTerential and linear cryptanalysis a complex system with a very large search landscape and extreme amount of conditional and probabilistic candidates for the key being sought. In this study, a new and promising method wit h bet ter performance is to be developed for differential/linear cryptanalysis of DES and similar symmetric cryptosystems exploiting genetic algorithms' broadened search and optimum finding capacity

    A new similarity measure for vector space models in text classification and information retrieval

    No full text
    There are various models, methodologies and algorithms that can be used today for document classification, information retrieval and other text mining applications and systems. One of them is the vector space-based models, where distance metrics or similarity measures lie at the core of such models. Vector space-based model is one of the fast and simple alternatives for the processing of textual data; however, its accuracy, precision and reliability still need significant improvements. In this study, a new similarity measure is proposed, which can be effectively used for vector space models and related algorithms such as k-nearest neighbours (k-NN) and Rocchio as well as some clustering algorithms such as K-means. The proposed similarity measure is tested with some universal benchmark data sets in Turkish and English, and the results are compared with some other standard metrics such as Euclidean distance, Manhattan distance, Chebyshev distance, Canberra distance, Bray-Curtis dissimilarity, Pearson correlation coefficient and Cosine similarity. Some successful and promising results have been obtained, which show that this proposed similarity measure could be alternatively used within all suitable algorithms and models for information retrieval, document clustering and text classification

    A Deep Reinforcement Learning Approach for Pathfinding in Computer Games

    No full text
    Oyungeliştirmenin en büyük zorluklarından biri, hem tatmin edici derecede gerçekçihareket sonuçları üreten hem de oyun geliştiricilerin sınırsız hayal gücününyarattığı dünyalarda farklı senaryoları gerçekleştirebilen bir yol bulmaalgoritması üretmektir. Ayrıca oyunlar son kullanıcıya yönelik programlarolduğu için içerdikleri sistemlerin mümkün olduğunca az bilgisayar kaynağıkullanması ve maliyet açısından mümkün olan en kısa sürede geliştirilmesiistenmektedir. Mevcut yol bulma çözümleri, soruna güçlü yanıtlar üretebilse deçözülmesi uzun bir geliştirme süresi gerektiren bazı kronik sorunları daiçermektedir. Bu çözümler, baştan sona gezinilebilen ve sürekli haritalarda çokiyi çalışmaktadır. Ancak yürüme dışında zıplama, uçma gibi farklı hareketmekaniklerinin de kullanıldığı çeşitli engellerin aşılması gereken durumlaraçözüm getirememektedir. Çoğu zaman geliştiricilerin yol bulma örgülerinibirbirine linklerle manuel olarak bağlaması gerekmektedir. Bu da farklı hareketmekaniklerine sahip haritalarda oyun geliştirme sürecini önemli ölçüde uzatmaktadır.Aynı zamanda bağlanan linkler manuel olarak kurulduğu için link üzerindehareket eden cismin hareketi doğal görünmemektedir. Bu çalışmanın odak noktası,mevcut yol bulma algoritmalarının zorluklarını aşmak için yapay sinir ağları vederin pekiştirmeli öğrenme kullanarak bir düğüm ağı oluşturacak bir sistemoluşturmaktır. Son olarak, yapım aşamasında yapay sinir ağları kullanılmayacağındanson kullanıcı için hızlı ve daha az kaynak kullanan bir sistemhedeflenmektedir.One ofthe biggest challenges of game development is to produce a pathfindingalgorithm that both produces satisfactory realistic movement results and cansolve different scenarios in worlds created by game developers' unlimitedimagination. Furthermore, since the games are programs for the end user, it isdesired that the systems they contain use as little computer resources aspossible and be developed as quickly as possible in terms of cost. Althoughexisting solutions can produce strong answers to the problem, they also containsome chronic problems that take a long development time to solve. Existing solutionswork very well on maps that are continuous and can be navigated by moving fromstart to finish. However, they cannot find a solution in cases where variousobstacles must be overcome by various movement mechanics such as jumping,flying are used other than walking. Most of the time developers must manuallyassign links to meshes, significantly prolonging the game development processon the maps with such features. At the same time, since the connected links areestablished manually, the movement of the object moving on the link does notseem natural. The focus of this study is to create a system that will generatea node network by using artificial neural networks and deep reinforcementlearning to overcome the difficulties of existing pathfinding algorithms.Finally, a system that is fast and uses less resources is aimed for the enduser, since artificial neural networks will not be used during the build phase.&nbsp;</p
    corecore