Abstract: Graphics Processing Units (GPUs) have emerged as the predominant hardware platforms for massively parallel computing. However, their inherent von-Neumann architecture still suffers ...