Variable Latency Speculative Multiply-Accumulator Architectures
AbstractIn this paper, variable latency speculative Multiply-Accumulator (MAC) architectures are introduced. The proposed architectures use the idea of integrating the results vectors of multiplier in parallel with the accumulator to create asynchronous data paths design. The proposed variable latency speculative MACs consist of two short and long data paths and a circuit is used to select a suitable path with minimum overhead. In order to investigate variable latency speculative MACs performances, proposed architectures have been synthesized using the Faraday’s 90 nm technology library, for operand lengths 8, 16 and 32 bits. Obtained results show that the proposed MAC architectures provide a variety of trade-offs in the power-delay-area space that outperform the existing designs that use only the integration technique.
 V. Gierenz, C. Panis, J. Nurmi, “Parameterized MAC unit generation for a scalable embedded DSP core,” Microprocessors and Microsystems, 34 (5), (2010), pp. 138–150.
 K. Benkrid, S. Belkacemi, “Design and implementation of a 2D convolution core for video applications on FPGAs,” Digital and Computational Video, DCV 2002. Proceedings. Third International Workshop on, (2002), pp. 85-92.
 M. Verhelst and B. Moons, “Embedded Deep Neural Network Processing: Algorithmic and Processor Techniques Bring Deep Learning to IoT and Edge Devices,” IEEE Solid-State Circuits Magazine, 9(4), (2017), pp.55-65.
 J. Chang, H. Lee, and C. Choi, “A power-aware variable-precision multiply-accumulate unit,” in International Symposium on Communications and Information Technology, (2009), pp. 1336–1339.
 H. Lee, “Power-Aware Scalable Booth Multiplier,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Vol. E88-A, No. 11, (2005), pp.3230-3234.
 H. Jiang, F. J. H. Santiago, H. Mo, L. Liu, and J. Han, “Approximate arithmetic circuits: A survey, characterization and recent applications,” Proceedings of the IEEE, vol. 108, no. 12, pp. 2108-2135, Dec. 2020.
 L.Sousa, “Nonconventional Computer Arithmetic Circuits, Systems and Applications,” IEEE Circuits and Systems Magazine, vol. 21, no 1, pp. 6-40, March 2021.
 J. .Hu, Z. Li, M.Yang, Z.Huang, and W. Qian, “A high-accuracy approximate adder with correct sign calculation,” Integration, the VLSI Journal, vol. 65, pp. 370-388, March 2019.
 K. Verma et al., “Variable latency speculative addition: a new paradigm for arithmetic circuit design,” in Proc. Design, Automation and Test in Europe, pp. 1250—1255, 2008.
 K. Du, P. Varman, and K. Mohanram, “High performance reliable variable latency carry select addition,” Proc. Design, Autom. Test Eur., Mar. 2012, pp. 1257–1262.
 A. Cilardo, “A new speculative addition architecture suitable for two’s comple- ment operations,” in Proc. Design, Automation and Test in Europe, pp. 664—669, 2009.
 D. Kelly and J. Phillips, “Arithmetic data value speculation,” Adv. Comput. Syst. Architecture, Lecture Notes Comput. Sci., 2005, pp. 353–366.
 S. M. Nowick, K. Y. Yun, P. A. Beerel, and A. E. Dooply, “Speculative completion for the design of high-performance asynchronous dynamic adders,” in Proc. International Symposium on Advanced Research in Asynchronous Circuits and Systems, Apr. 1997, pp. 210–223.
 D. Esposito, D. De Caro, A.G.M. Strollo, “Variable Latency Speculative Parallel Prefix Adders for Unsigned and Signed Operands, ” IEEE Transactions on Circuits and Systems I: Regular Papers, Vol. 63, n. 8, pp. 1200-1209, Aug. 2016.
 I.-C. Lin, Y.-M. Yang, and C.-C. Lin, “High-performance low-power carry speculative addition with variable latency,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 23, no. 9, pp. 1591–1603, Sep. 2015.
 Y. Choi and E. E. Swartzlander, “Speculative Carry Generation with Prefix Adder,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 16, no. 3, pp. 321-326, March 2008.
 A. Cilardo, D. De Caro, N. Petra, F. Caserta, N. Mazzocca, E. Napoli, and A. G. M. Strollo, “High speed speculative multipliers based on speculative carry-save tree,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 61, no. 12, (2014), pp. 3426–3435.
 D. Esposito, D. De Caro, E. Napoli, N. Petra and A. G. M. Strollo, “On the use of approximate adders in carry-save multiplieraccumulators,” IEEE International Symposium on Circuits and Systems (ISCAS), Baltimore, MD, (2017), pp. 1-4.
 D. Esposito, A. G. M. Strollo, and M. Alioto, “Low-power approximate MAC unit,” inProc. IEEE PRIME, Giardini Naxos, Italy, Jun. (2017), pp. 81–84.
 G. A. Gillani, M. A. Hanif, M. Krone, S. H. Gerez, M. Shafique, andA. B. J. Kokkeler, “Designing approximate MAC accelerators with internal-self-healing,”IEEE Access, vol. 7, (2019), pp. 142–77.
 M. Masadeh, O. Hasan, and S. Tahar, “Input-Conscious Approximate Multiply-Accumulate (MAC) Unit for Energy-Efficiency,” IEEE Access, vol. 7, (2019), pp. 129–147.
 B. Parhami “Computer arithmetic, algorithms and hardware designs. " New York: Oxford Press; (2000).
 H. Parandeh-Afshar, S.M. Fakhraie, and O.Fatemi, “Parallel Merged Multiplier-Accumulator Coprocessor Optimized for Digital Filters”, Elsevier Journal of Computers and Electrical Engineering, no.36, (2008), pp.864-873.
 AA. Fayed, MA. Bayoumi “A merged multiplier–accumulator for high speed signal processing applications,” IEEE Trans VLSI, 3(2), (2002).
 J. Wang, L. Xu, H. Wang and C. Choy, “A high-speed pipeline architecture of squarer-accumulator (SQAC),” IEEE Region 10 Conference (TENCON), Singapore, (2016), pp. 3429-3432.
 L. S. Wallace, “A suggestion for fast multipliers, " IEEE Trans.Comput., vol. EC-13, (1964), pp. 14–17.
 R. S. Waters and E. E. Swartzlander, “A reduced complexity Wallace
multiplier reduction,” IEEE Transactions on Computers, vol. 59, no. 8,
pp. 1134–1137, August 2010.