2 research outputs found

    Similarity Detection between Turkish Text Documents with Distance Metrics

    No full text
    2017 International Conference on Computer Science and Engineering (UBMK) -- OCT 05-08, 2017 -- Antalya, TURKEYWOS: 000426856900059The aim of this study is to compare the successes of various distance metrics and to determine the most appropriate methods in order to detect similarities among textual documents written in Turkish. Computing similarities between text documents is the basic step of plagiarism detection, and text mining methods like author detection, text classification and clustering. Therefore, plagiarism detection and text mining applications will be more successful by using the distance metrics that are determined according to the results obtained in this study. For this purpose, chunks of texts in different lengths are selected as the experimental dataset in this study. After that, preprocessing methods are applied to the dataset that is used; therefore new and different experimental scenarios are created by removing stopwords and Turkish characters, and stemming words with Zemberek. According to the experimental results, it is observed that the preprocessing phase increases the accuracy of similarity detection. Especially, stemming using Zemberek increases the success rate. In all cases, the Cosine Similarity method has been observed as more successful than other distance metrics, because of producing more realistic results.IEEE Adv Technol Human, Istanbul Teknik Univ, Gazi Univ, Atilim Univ, TBV, Akdeniz Univ, Tmmob Bilgisayar Muhendisleri Odas

    Construction Crew Productivity Prediction By Using Data Mining Methods

    No full text
    4th World Conference on Learning, Teaching and Educational Leadership (WCLTA) -- OCT 27-29, 2013 -- Univ Barcelona, Barcelona, SPAINWOS: 000345351800205Ceramic tiling industry has become one of Turkey's fastest growing industries due to the outstanding achievements of Turkish ceramic producers with respect to producing high quality products with lower costs than their equivalents worldwide. Conversely high costs of the end product of Turkish building industry in general show that there is an important problem with the productivity and quality of construction crews. That's why most construction firms begin to realize the need for a detailed research on the factors affecting construction crew productivity. The purpose of this study is thus to classify the factors that affect the productivity of ceramic tiling crews by using data mining methods. To achieve the purpose of our study, a systematic time study was undertaken with ceramic tiling crews in Turkey. Daily productivity values of ceramic tiling crews were collected together with the information related with the factors like the crew size, age and experience of crewmembers. Collected data was classified by using Weka program. The outlier values were first removed from the dataset and decision tree method was used to classify the new dataset. Decision tree method was preferred due to its easiness of use and rapidness in classification. Apriori algorithm, which is the mostly preferred association algorithm in previous studies, was also used to highlight the general trend in the dataset. (C) 2014 The Authors. Published by Elsevier Ltd
    corecore