Volume- 2
Issue- 1
Year- 2015
Article Tools: Print the Abstract | Indexing metadata | How to cite item | Email this article | Post a Comment
Bharat Parte , Umesh Jamdade , Pranita Sonavane , Sheetal Jadhav
Squid Log Analyzer with Distributed System is a kind of Squid Log analytics software that parses log file from the server. It derives indicators about when, how and by whom a web server is visited. In today’s world 80% of data captured today is unstructured from sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records and cell phones GPS signals. In Squid Log Analyzer, Log file contain information about user name, IP address, time stamp, access request, number of byte transferred, result status, URL that referred and user agent. Through Squid Log Analyzer the web log file are uploaded into the Hadoop Distributed Framework where parallel procession on log file is carried in the form of master and slaves structure. Pig scripts are written on the classified log files to satisfy certain query. The log files are maintained by the web servers. By analyzing this log files gives an idea about the user. It involves effective mining of data and also uses tools to process the log files. It also provides the idea of creating an extended log file and learning the user behavior. Analyzing the user activities is particularly useful for studying user behavior when using highly interactive systems. The main focus of our project is to build a prototype of log analyzer, studying the informationseeking process and analyzing the log files in graphical format date wise and month wise. Also information regarding hits and visiting a particular website is achieved.
[1] K. Christodoulopoulos, V. Gkamas, and E.A. Varvarigos, ‘‘Statistical Analysis and Modeling of Jobs in a Grid Environment,’’ J. Grid Comput., vol. 6, no.1, 2008.
[2] J. Dean, S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” In Proc. of the 6th Symposium on Operating SystemsDesign and Implementation, San Francisco CA, Dec. 2004.
[3] Jeffrey Dean and Sanjay Ghemawat., (2004) “MapReduce: Simplified Data Processing on Large Clusters”, Google Research Publication. [4]http://www.michaelnoll.com/tutorials/running-hadoopon-
[5] Tom White, (2009) “Hadoop: The Definitive Guide. O’Reilly”, Scbastopol, California.
[6] M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica, “Job scheduling for multi-user map reduce clusters,” EECS Department, University of California, Berkeley, Tech. Rep., Apr 2009
[7] T. Hastie and R. Tibshirani. Discriminant analysis by gaussian mixtures. Journal of the Royal Statistical Society B, pages 155–176, 1996 J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing Cluster”,Commune.ACM,51(1):107-113,2008.
[8] C. Reiss, A. Tumanov, G.R. Ganger, R.H. Katz, and M.A. Kozuch, ‘‘Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis,’’ in Proc. SoCC, 2012, p. 7.
[9] K. Ren, G. Gibson, Y. Kwon, M. Balazinska, and B. Howe, ‘‘Hadoop’s Adolescence; a Comparative Workloads Analysis from Three Research Clusters,’’ in Proc. SC Companion, 2012, p. 1452.
[10] Bernard J. Jansen,”The Methodology of Search Log Analysis”,ubuntulinux-si
Department of CSE, Department of CSE, VIIT, Pune, India
No. of Downloads: 4 | No. of Views: 810