Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, volume 383, issue 2288

AxLaM: energy-efficient accelerator design for language models for edge computing

Tom Glint 1
Bhumika Mittal 2
Santripta Sharma 2
Abdul Qadir Ronak 3
Abhinav Goud 3
Neerja Kasture 3
Zaqi Momin 3
Aravind Krishna 3
Joycee Mekie 3
Show full list: 9 authors
Publication typeJournal Article
Publication date2025-01-16
scimago Q1
wos Q1
SJR0.870
CiteScore9.3
Impact factor4.3
ISSN1364503X, 14712962
Abstract

Modern language models such as bidirectional encoder representations from transformers have revolutionized natural language processing (NLP) tasks but are computationally intensive, limiting their deployment on edge devices. This paper presents an energy-efficient accelerator design tailored for encoder-based language models, enabling their integration into mobile and edge computing environments. A data-flow-aware hardware accelerator design for language models inspired by Simba, makes use of approximate fixed-point POSIT-based multipliers and uses high bandwidth memory (HBM) in achieving significant improvements in computational efficiency, power consumption, area and latency compared to the hardware-realized scalable accelerator Simba. Compared to Simba, AxLaM achieves a ninefold energy reduction, 58% area reduction and 1.2 times improved latency, making it suitable for deployment in edge devices. The energy efficiency of AxLaN is 1.8 TOPS/W, 65% higher than FACT, which requires pre-processing of the language model before implementing it on the hardware.

This article is part of the theme issue ‘Emerging technologies for future secure computing platforms’.

Found 

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Share
Cite this
GOST | RIS | BibTex
Found error?