13 research outputs found

    DETEKSI PLAGIARISME MENGGUNAKAN ALGORITMA LEVENSHTEIN DISTANCE

    Get PDF
    Deteksi kesamaan dokumen untuk sistem plagiarisme termasuk dalam riset Natural Language Processing dalam bidang kecerdasan buatan. Plagiarisme banyak terjadi pada dokumen di lingkungan akademisi, begitupun yang terjadi pada PSMTS ULM. Deteksi plagiarisme diperlukan agar menjaga orisinalitas dari hasil tesis mahasiswa. Ada beberapa algoritma yang digunakan peneliti sebelumnya untuk mendeteksi plagiarisme. Namun, algoritma yang diperlukan adalah algoritma yang cepat karena yang sedang terjadi pada tesis mahasiswa relatif memiliki string yang banyak dan data tesis yang akan terus bertambah setiap saatnya mengakibatkan memperlambat kinerja algoritma. algoritma Levenshtein Distance mengungguli algoritma adaptif. Proses preprocessing yang terdiri dari metode case folding, tokenizing, stopword removal, dan stemming yang dapat melakukan estimasi proses sistem menjadi lebih cepat. Algoritma Levenshtein Distence dapat mendeteksi plagiasi dengan baik dan rata-rata lama proses sistem tanpa dilakukan preprocessing adalah 6,283 ms dan dengan preprocessing adalah 4,920 ms

    Unsupervised Algorithms for Microarray Sample Stratification

    Get PDF
    The amount of data made available by microarrays gives researchers the opportunity to delve into the complexity of biological systems. However, the noisy and extremely high-dimensional nature of this kind of data poses significant challenges. Microarrays allow for the parallel measurement of thousands of molecular objects spanning different layers of interactions. In order to be able to discover hidden patterns, the most disparate analytical techniques have been proposed. Here, we describe the basic methodologies to approach the analysis of microarray datasets that focus on the task of (sub)group discovery.Peer reviewe

    A taxonomy for similarity metrics between Markov decision processes

    Get PDF
    Although the notion of task similarity is potentially interesting in a wide range of areas such as curriculum learning or automated planning, it has mostly been tied to transfer learning. Transfer is based on the idea of reusing the knowledge acquired in the learning of a set of source tasks to a new learning process in a target task, assuming that the target and source tasks are close enough. In recent years, transfer learning has succeeded in making reinforcement learning (RL) algorithms more efficient (e.g., by reducing the number of samples needed to achieve (near-)optimal performance). Transfer in RL is based on the core concept of similarity: whenever the tasks are similar, the transferred knowledge can be reused to solve the target task and significantly improve the learning performance. Therefore, the selection of good metrics to measure these similarities is a critical aspect when building transfer RL algorithms, especially when this knowledge is transferred from simulation to the real world. In the literature, there are many metrics to measure the similarity between MDPs, hence, many definitions of similarity or its complement distance have been considered. In this paper, we propose a categorization of these metrics and analyze the definitions of similarity proposed so far, taking into account such categorization. We also follow this taxonomy to survey the existing literature, as well as suggesting future directions for the construction of new metricsOpen Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This work has also been supported by the Madrid Government (Comunidad de Madrid-Spain) under the Multiannual Agreement with UC3M in the line of Excellence of University Professors (EPUC3M17), and in the context of the V PRICIT (Regional Programme of Research and Technological Innovation)S

    A Machine Learning System for Glaucoma Detection using Inexpensive Machine Learning

    Get PDF
    This thesis presents a neural network system which segments images of the retina to calculate the cup-to-disc ratio, one of the diagnostic indicators of the presence or continuing development of glaucoma, a disease of the eye which causes blindness. The neural network is designed to run on commodity hardware and to be run with minimal skill required from the user by packaging the software required to run the network into a Singularity image. The RIGA dataset used to train the network provides images of the retina which have been annotated with the location of the optic cup and disc by six ophthalmologists, and six separate models have been trained, one for each ophthalmologist. Previous work with this dataset has combined the annotations into a consensus annotation, or taken all annotations together as a group to create a model, as opposed to creating individual models by annotator. The interannotator disagreements in the data are large and the method implemented in this thesis captures their differences rather than combining them together. The mean error of the pixel label predictions across the six models is 10.8%; the precision and recall for the predictions of the cup-to-disc ratio across the six models are 0.920 and 0.946, respectively

    Weather persistence on sub-seasonal to seasonal timescales: a methodological review

    Get PDF
    Persistence is an important concept in meteorology. It refers to surface weather or the atmospheric circulation either remaining in approximately the same state (stationarity) or repeatedly occupying the same state (recurrence) over some prolonged period of time. Persistence can be found at many different timescales; however, the sub-seasonal to seasonal (S2S) timescale is especially relevant in terms of impacts and atmospheric predictability. For these reasons, S2S persistence has been attracting increasing attention by the scientific community. The dynamics responsible for persistence and their potential evolution under climate change are a notable focus of active research. However, one important challenge facing the community is how to define persistence, from both a qualitative and quantitative perspective. Despite a general agreement on the concept, many different definitions and perspectives have been proposed over the years, among which it is not always easy to find one’s way. The purpose of this review is to present and discuss existing concepts of weather persistence, associated methodologies and physical interpretations. In particular, we call attention to the fact that persistence can be defined as a global or as a local property of a system, with important implications in terms of methods but also impacts. We also highlight the importance of timescale and similarity metric selection, and illustrate some of the concepts using the example of summertime atmospheric circulation over Western Europ

    Concept Drift Adaptation in Text Stream Mining Settings: A Comprehensive Review

    Full text link
    Due to the advent and increase in the popularity of the Internet, people have been producing and disseminating textual data in several ways, such as reviews, social media posts, and news articles. As a result, numerous researchers have been working on discovering patterns in textual data, especially because social media posts function as social sensors, indicating peoples' opinions, interests, etc. However, most tasks regarding natural language processing are addressed using traditional machine learning methods and static datasets. This setting can lead to several problems, such as an outdated dataset, which may not correspond to reality, and an outdated model, which has its performance degrading over time. Concept drift is another aspect that emphasizes these issues, which corresponds to data distribution and pattern changes. In a text stream scenario, it is even more challenging due to its characteristics, such as the high speed and data arriving sequentially. In addition, models for this type of scenario must adhere to the constraints mentioned above while learning from the stream by storing texts for a limited time and consuming low memory. In this study, we performed a systematic literature review regarding concept drift adaptation in text stream scenarios. Considering well-defined criteria, we selected 40 papers to unravel aspects such as text drift categories, types of text drift detection, model update mechanism, the addressed stream mining tasks, types of text representations, and text representation update mechanism. In addition, we discussed drift visualization and simulation and listed real-world datasets used in the selected papers. Therefore, this paper comprehensively reviews the concept drift adaptation in text stream mining scenarios.Comment: 49 page
    corecore