Volume -I , Issue -VI, August 2014


Author(s) :

Archana S. Kakade , Suhas Rautnline purchase


Today's date disk capacity is more advanced. But it lacks in disk access time. As a result, system with disk based storage are finding difficult to cope up with the performance demands of large cluster based systems. Hadoop is an open source framework for big data. Hadoop supports applications that run on large clusters. In an attempt to eliminate disk access, this paper presents the design of caching mechanism based on Primary memory and integrate it with Hadoop. Most frequently data have been available in primary memory & hence process it much more quickly. This paper describes the system architecture that aims to provide a cache system to HDFS, we can avoid unnecessary trips HDD to fetch data and thus avoid delay.


Big Data, Hadoop, Hadoop Distributed File System (HDFS), Cache system, Primary memory

  1. [Shafer et al] J. Shafer and S Rixner (2010), "The Hadoop distributed file system: balancing portability and performance”, In 2010 IEEE International Symposium on Performance Analysis of System and Software (ISPASS2010), White Plains, NY, March 2010. Pp.122-133.
  2. [Zhang et al] S. Zhang, J. Han, Z. Liu, K. Wang (2009), “Accelerating MapReduce with Distributed Memory Cache”, In 15th International Conference on Parallel and Distributed Systems (ICPADS09), Shenzhen, 2009, pp.472-478.
  3. [Apache.Hadoop] Apache Hadoop. Available at http://hadoop.apache.org.
  4. [Apache.HDFS] Apache Hadoop Distributed File System. Available at http://hadoop.apache.org/hdfs.
  5. [Apache.HDFS] Scalability of Hadoop Distributed File System.
  6. [Borthakur et al] D. Borthakur (2011) et al. "Apache Hadoop goes real-time at Facebook", In Proceedings of the 2011 International Conference on Management of Data (SIGMOD11), New York, 2011.
  7. [Dean et al] J. Dean and S. Ghemawat (2004), “Mapreduce: Simplified Data Processing on Large Clusters”. In Proceeding of the 6th Conference on Symposium on operating Systems Design and Implementation (OSDI04), Berkeley, CA, USA, 2004, pp.137-150.
  8. [Ghemawat et al] S. Ghemawat, H. Gobioff and S. T. Leung (2003), “Google File System”, In Proc. of the Nineteenth ACM Symposium on Operating Systems Principles (SOSP03), Lake George New York, 2003, pp.29-43.<
  9. [John K. Ousterhout et al] John K. Ousterhout et al “The Case for RAMClouds: Scalable High- Performance Storage Entirely in DRAM” Department of Computer Science Stanford University
  10. [Zhang et al] Jing Zhang, Gongqing Wu, Xuegang Hu, Xindong Wu (2012), “A Distributed Cache for Hadoop Distributed File System in Real-time Cloud Services”. In 2012 ACM/IEEE 13th International Conference on Grid Computing.
How to Cite this Paper? [APA Style]
Archana S. Kakade , Suhas Rautnline purchase, (2014), HADOOP DISTRIBUTED FILE SYSTEM WITH CACHE TECHNOLOGY, Industrial Science Journal, http://industrialscience.org/Article.aspx?aid=45&vid=6, (August, 2014)
Full Text in PDF

Full Article in PDF Format

Comment on this article...!!!

Previous Comments...
No previous comments.


Alert Me...!!!

When new article publish, article link will mail to your mail...

Enter Your Name :
Enter Your Email ID :

For Authors

For Readers