Distributed Single Pass Clustering Algorithm Based on MapReduce

Journal
Conference/Workshop
Technical Report
Master Thesis
Book / Book Chapter
PhD Thesis
Researsh & Applied Activities
Extension Documents
Advanced Search

Abstract: Available data increase quickly every moment, this eventually drags to big data flooding. Hence there is an emergent need for exploiting big data in order to extract valuable knowledge from it. Adoption of distributed architecture and data intensive algorithms facilitates handling and processing big data. This paper introduces a distributed single pass clustering algorithm based on MapReduce in order to reduce running time of processing big data. Also, it introduces median based single pass clustering in order to mitigate the order of the input data problem that is associated with single pass clustering. Furthermore, it introduces a new hybrid approach which integrates median based single pass clustering and k-means algorithm. The proposed integration improves the median based clustering to work well with sparse data such as text.

Publication year

2018

Organization Name

Climate Change Information Center & Renewable Energy & Expert Systems

serial title

Eighth International Conference on Intelligent Computing and Information Systems

Author(s) from ARC

Abd Elrahman Elsayed Mohamed

External authors (outside ARC)

Osama Ismael

Hoda M. O. Mokhtar

Publication Type

Conference/Workshop

Agricultural Research Center 2024

+202 35723000 - +202 35723001

+202 35722609

[email protected]

↑