Implementing K-Means for Achievement Study between Apache Spark and Map Reduce

Dr.E.Laxmi Lydia; Dr.A.Krishna Mohan; Dr. M.Ben Swarup

Abstract

Huge Data has for quite some time been the subject of enthusiasm for Computer Science fans around the globe, and has increased much more conspicuousness in the later times with the constant blast of information coming about because of any semblance of online networking and the journey for tech monsters to get entrance to more profound investigation of their information. MapReduce and its variations have been very fruitful in actualizing vast scale information concentrated applications on ware groups. Then again, a large portion of these frameworks are manufactured around a non-cyclic information stream demonstrate that is not suitable for other famous applications. Unique MapReduce executes jobs in a straightforward yet unbending structure design. MapReduce changes step ("map"), a synchronization step ("shuffle"), and a stage to join results from every one of the nodes in a cluster ("reduce"). Accordingly to defeat the inflexible structure of guide and diminish we proposed the as of late presented Apache Spark – both of which give a handling model to breaking down enormous information. The primary contender for "successor to MapReduce" today is Apache Spark. Like MapReduce, it is an extensively helpful engine, be that as it may it is proposed to run various more workloads, and to do in that capacity much speedier than the more prepared framework. In this paper we contrast these two systems along and giving the execution examination utilizing a standard machine considering so as learning calculation for bunching (KMeans) and through considering some different parameters like scheduling delay, speed up, energy consumption than the existing systems.

Keywords

Spark, MapReduce, Hadoop, Big Data

References

[1] Apache Hive. http://hadoop.apache.org/hive 5Scalaprogramming language. http://www.scala-lang.org.

[2] C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data

[3] Y. Yu, M. Isard, D. Fetterly, M. Budiu, U´ . Erlingsson,P. K. Gunda, and J. Currey. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In OSDI ’08, San Diego, CA, 2008.

[4] J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Commun. ACM, 51(1):107 113, 2008.

[5] M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In EuroSys 2007, pages 59–72, 2007.

[6] B. Nitzberg and V. Lo. Distributed shared memory: a survey of issues and algorithms. Computer, 24(8):52 –60, Aug 1991.

[7] Spark Main Website

[8] Spark Examples

[9]Spark Summit 2014 Conference Presentation and Videos

[10]Spark on Databricks website

Cites this article as

D. Lydia, D. Mohan, D. M. Swarup, "Implementing K-Means for Achievement Study between Apache Spark and Map Reduce", International Journal of Innovative Research in Engineering & Management (IJIREM), Vol-3, Issue-5, Page No-431-435, 2016. Available from:

Corresponding Author

Dr.E.Laxmi Lydia

Associate Professor, Department of Computer Science and Engineering, Vignan's Institute Of Information Technology, Visakhapatnam, Andhra Pradesh, India.