Abstract
Artificial Neural Networks (ANN) are habitually trained via the back-propagation (BP) algorithm. This approach has been extremely successful: Current models like GPT-3 have O(1011) parameters, are trained on O(1011) words and produce awe-inspiring results. However, there are good reasons to look for alternative training methods: With current algorithms and hardware constraints sometimes only half the available computing power is actually used. This is due to a complicated interplay between the size of the ANN, the available memory, throughput limitations of interconnects, the architecture of the network of computers, and the training algorithm. Training a model like the aforementioned GPT-3 takes months and costs millions. A different training paradigm, which could make clever use of specialized hardware, may train large ANNs more efficiently.
© 2023 IEEE
PDF ArticleMore Like This
Tao Fang, Jingwei Li, Biao Zhang, Tongyu Wu, and Xiaowen Dong
JW7A.27 Frontiers in Optics (FiO) 2021
Jasvith Raj Basani, Stefan Krastanov, Mikkel Heuck, and Dirk R. Englund
SF4F.5 CLEO: Science and Innovations (CLEO:S&I) 2022
Saumil Bandyopadhyay, Alexander Sludds, Stefan Krastanov, Ryan Hamerly, Nicholas Harris, Darius Bunandar, Matthew Streshinsky, Michael Hochberg, and Dirk Englund
SM2P.2 CLEO: Science and Innovations (CLEO:S&I) 2023