Search CORE

6 research outputs found

Accélération de k-means par pré-calcul dynamique d'agrégats

Author: El Malki Nabil
Ravat Franck
Teste Olivier
Publication venue: Editions RNTI
Publication date: 01/01/2019
Field of study

L'algorithme de classification non supervisé 'k-means' nécessite un accès itératif et répétitif aux données allant jusqu'à effectuer plusieurs fois le même calcul sur les mêmes données. Ces calculs répétés peuvent s'avérer coûteux lorsqu'il s'agit de classifier des données massives. Nous proposons d'étendre l'algorithme de k-means en introduisant une approche d'optimisation basée sur le pré-calcul dynamique d'agrégats pouvant ensuite être réutilisés afin d'éviter des calculs redondants

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

PENERAPAN METODE K-MEANS CLUSTERING UNTUK MENGELOMPOKKAN DATA PELABUHAN DAN BONGKAR MUAT BARANG DI INDONESIA

Author: ABDI HAZMAN
Publication venue: Universitas Telkom
Publication date: 30/03/2019
Field of study

ABSTRAK Pelabuhan memiliki peran penting untuk menghubungkan antara pulau dan negara. Bukan hanya itu saja, pelabuhan juga berperan penting dalam pertumbuhan ekonomi di negara tersebut. Pelabuhan merupakan tempat titik temu untuk melakukan kegiatan transportasi barang dan orang dari darat ke laut maupun sebaliknya. Dari segi pengusahaannya, pelabuhan terbagi antara pelabuhan di usahakan dan pelabuhan yang tidak di usahakan. Kegiatan yang dilakukan dalam penelitian ini adalah penulis meng-clastering jumlah pelabuhan pada suatu daerah untuk dilakukan analisa atas pembagian data pelabuhan yang di usahakan dan tidak di usahakan dalam beberapa tahun terakhir dengan menggunakan algoritma K-Means. Kata Kunci : Pelabuhan, Clustering , Algoritma K-Mean

Open Library

CLUSTER ANALYSIS IN BIOTECHNOLOGY

Author: Kitching
Publication venue: 'National Academy of Sciences of Ukraine (Co. LTD Ukrinformnauka)'
Publication date
Field of study

Crossref

Big Data mining and machine learning techniques applied to real world scenarios

Author: Pagliarani Andrea <1990>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 04/04/2019
Field of study

Data mining techniques allow the extraction of valuable information from heterogeneous and possibly very large data sources, which can be either structured or unstructured. Unstructured data, such as text files, social media, mobile data, are much more than structured data, and grow at a higher rate. Their high volume and the inherent ambiguity of natural language make unstructured data very hard to process and analyze. Appropriate text representations are therefore required in order to capture word semantics as well as to preserve statistical information, e.g. word counts. In Big Data scenarios, scalability is also a primary requirement. Data mining and machine learning approaches should take advantage of large-scale data, exploiting abundant information and avoiding the curse of dimensionality. The goal of this thesis is to enhance text understanding in the analysis of big data sets, introducing novel techniques that can be employed for the solution of real world problems. The presented Markov methods temporarily achieved the state-of-the-art on well-known Amazon reviews corpora for cross-domain sentiment analysis, before being outperformed by deep approaches in the analysis of large data sets. A noise detection method for the identification of relevant tweets leads to 88.9% accuracy in the Dow Jones Industrial Average daily prediction, which is the best result in literature based on social networks. Dimensionality reduction approaches are used in combination with LinkedIn users' skills to perform job recommendation. A framework based on deep learning and Markov Decision Process is designed with the purpose of modeling job transitions and recommending pathways towards a given career goal. Finally, parallel primitives for vendor-agnostic implementation of Big Data mining algorithms are introduced to foster multi-platform deployment, code reuse and optimization

AMS Tesi di Dottorato

A Parallel Clustering Algorithm with MPI – MKmeans

Author: Arthur
Blake
Frank
Gongqing Wu
Gropp
Jain
Jing Zhang
MacQueen
Shiying Li
Shuilong Hao
Xuegang Hu
Publication venue: 'Academy Publisher'
Publication date
Field of study

Crossref