An Algorithm to Enhance Cache Efficiency in Multi-core Processors

  • Ali Ghaffari Department of computer engineering, Tabriz branch, Islamic azad university, Tabriz, Iran
  • Majid Babaei Department of computer engineering, Tabriz branch, Islamic azad university, Tabriz, Iran
Keywords: Asymmetric multicore-processors, Processor performance, Cache performance, Shared and dedicated structure, Partitioning of the last level of cache

Abstract

The Cache efficiency is considered to be one of the major challenges in multi-core processors. Hence, using cache space in such processors should be meticulously managed by each of the cores. This paper addresses the issue of cache re-access and proposes an algorithm which divides the last level of cache into local and global share for the cores. The rationale behind the proposed algorithm is to activate or deactivate the ways of cache for any intended core. Consequently, the collision between cores is reduced and each of the cores can use the cache space dynamically. To simulate the proposed algorithm, the researchers used three groups of applications and the obtained results were examined and evaluated in two stages. The first phase is involved with the number of active ways in cache for each core. It should be highlighted that the proposed algorithm, in the merged state, was able to enhance the active ways up to 19%. In the second phase, cache miss rate was taken into consideration and it was observed that about 7% improvement was achieved in this stage.

References

[1] F. J. Pollack, “New micro architecture challenges in the coming generations of CMOS process technologies”, in Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture, 1999, p.2.
[2] G. E. Moore, “Cramming more components onto integrated circuits”, Electronics, Vol. 86, No. 1, 1998, pp. 82-85.
[3] C. McNairy and R. Bhatia, “Montecito: a dual-core, dual-thread titanium processor”, IEEE Micro, Vol. 25, No. 2, 2005, pp. 10–20.
[4] S. Naffziger, B. Stackhouse, T. Grutkowski, D. Josephson, J. Desai, E. Alon and M. Horowitz, “The implementation of a 2-core, multi-threaded titanium family processor”, IEEE Journal of Solid-state circuits, Vol. 41, No. 1,2005, pp. 197–209.
[5] A. Carbine and D. Feltham, “Pentium pro processor design for test and debug”, IEEE Design &Test of Computer, Vol. 15, No. 3, 1998, pp. 77–82.
[6] J. W. Langston and X.He, ”Multi-core Processors and caching: A Survey” , http://blogs.cae.tntech.edu/jwlangston21/files/2008/08/multi-core-processors-and-caching-a-survey-ieee-format.pdf
[7] V. Romanchenko, “Evaluation of the multi-core processor architecture Intel core: Conroe, Kentsfield...“, in Digital-Daily.com, 2006.
[8] V. P. Heuring and H. F. Jordan, Computer Systems Design and Architecture, Prentice Hall, 2004.
[9] J. L. Hennessy, D. A. Patterson, Computer architecture: a quantitative approach, Morgan Kaufmann Publishers, 2007.
[10] D. Tam, R. Azimi, L. Soares and M. Stumm,” Managing shared L2 caches on multi-core systems in software”, in Workshop on the Interaction between Operating Systems and Computer Architecture, 2007, pp. 26-33.
[11] F. Guo and Y. Solihin, “An analytical model for cache replacement policy performance”, ACM SIGMETRICS Performance Evaluation Review, Vol. 34, No. 1, 2006, pp. 228-229.
[12] H. Kannan, F. Guo, L. Zhao, R. Illikkal, R. Iyer, D. Newell, Y. Solihin and C.Kozyrakis, “From chaos to QoS: case studies in CMP resource management”, in ACM SIGARCH computer Architecture News, Vol. 35, No. 1, 2007, pp. 21-30.
[13] M. Qureshi and Y. Patt, “Utility-based cache partitioning: a low overhead, high-performance, runtime mechanism to partition shared caches”, in Micro 39, 2006, pp. 422-432.
[14] S. Cho and L. Jin, “Managing distributed, shared L2 caches through OS-level page allocation,” in Micro 39, 2006, pp. 455-468.
[15] L. Jin and S. Cho, “Better than the two: exceeding private and shared caches via two-dimensional page coloring”, in Workshop on Chip Multiprocessor Memory Systems and Interconnects, 2007.
[16] A. Asaduzzaman, F. N. Sibai, and M. Rani, “Impact of level-2 cache sharing on the performance and power requirements of homogeneous multi-core embedded systems“, Microprocessors and Microsystems, Embedded Hardware Design, Vol. 33, No. 5,2009, pp. 388-397.
[17] R. Manikantan, K. R. Govindarajan,” Nucache: an efficient multi-core cache organization based on next-use distance”, in the proc of the 17th International Computer Architecture, 2011, pp. 243-253.
[18] M. D. Hill and A. J. Smith, “Evaluating associativity in cpu caches”, IEEE Transactions on Computer’s, Vol. 38, No. 12, 1989, pp. 1612–1630.
[19] D. Chandra, F. Guo, S. Kim and Y. Solihin, “Predicting inter-thread cache contention on a chip multi-processor architecture”, In HPCA, 2005, pp. 340–351.
[20] R. Iyer, “CQOS: A framework for enabling QoS in shared caches of CMP platforms”, in Proc. Annual International Conference on Supercomputing, 2004, pp. 257–266.
[21] C.Xu, X. Cheny, R. P. Dicky and Z. M. Mao,” Cache contention and application performance prediction for multi-core systems”, in Performance Analysis of Systems & Software (ISPASS), 2010, pp.76-86.
[22] D. K. Tam, R. Azimi, L. B. Soares, and M. Stumm,“ Rapid MRC: approximating L2 miss rate curves on commodity systems for online optimizations”, ACM SIGARCH Computer Architecture News, Vol. 37, No. 1. ACM, 2009, pp. 121–132.
[23] D. Kaseridis, M. F. Iqbal and L. K. John, “Cache friendliness-aware management of shared last-level cachesfor high performance multi-core systems”, IEEE transactions on computers, Vol. 63, 2014, pp. 874-887.
[24] N.L. Binkert, R.G. Dreslinski, L.R. Hsu, K.T. Lim, A.G. Saidi, and S. K. Reinhardt, “The m5 simulator: Modeling networked systems”, IEEE Micro, Vol. 26, No. 4,2006, pp. 52–60.
[25] T. Austin, E. Larson and D. Ernst, “Simple scalar: an infrastructure for computer system modeling”, IEEE Computer, Vol. 35, No. 2, 2002, pp. 59–67.
[26] Compaq. Alpha 21264 Microprocessor Hardware Reference Manual. Technical report, Compaq Computer Corporation, 1999.
[27] The Standard Performance Evaluation Corporation. http://www.spec.org/.
[28] C. Lee, M. Potkonjakand W. H. M. Smith, “Media bench: a tool for evaluating and synthesizing multimedia and communications systems”, In MICRO 30: Proceedings of the 30th annual ACM/IEEE international symposium on Micro architecture, 1997, pp. 330–335.
[29] S. M. Khan, A. R. Alameldeen, C. Wilkerson, J. Kulkarni and D. A. Jiménez, “Improving multi-core performance using mixed-cell cache architecture,” IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), 2013, pp. 119-130.
[30] W. Zang and A. G. Ross, “A single-pass cache simulation methodology for two-level unified caches”, IEEE International Symposium on Performance Analysis of Systems & Software, Vol. 0, 2012, pp. 168-177.
[31] A. Asaduzzaman, V. R. Suryanarayana, F. N. Sibai, “on level-1 cache locking for high performance low-power real-time multi-core systems”, computers and electrical engineering, Vol. 39, 2013, pp. 1333-1345.
[32] I. Kotra, “Performance and power aware cache memory architectures”, Ph.D. thesis, Department of Computer and Mathematical Sciences, TOHOKU University, Sendai, Japan, 2009.
Published
2017-06-01
How to Cite
Ghaffari, A., & Babaei, M. (2017). An Algorithm to Enhance Cache Efficiency in Multi-core Processors. Majlesi Journal of Electrical Engineering, 11(2). Retrieved from http://mjee.iaumajlesi.ac.ir/index/index.php/ee/article/view/2089
Section
Articles