Abstract
Modern advancements in the field of Natural Language Processing, driven primarily by Large Language Models, open the door not only to a plethora of AI-driven applications, but to the ability to produce such applications at or near the industry standard without the need for industrial-grade computational resources. In particular, requirements regarding GPU memory have sunk drastically as a result of recent advancements. However, it is not always clear what implications these improvements have for smaller organizations planning to work with Large Language Models. We review the extent to which consumer-grade hardware is capable of training language models of varying parameter count and analyze the performance of resulting models. Furthermore, we share our experience regarding training methodology and evaluative measures through an intuitive practical use case.
Talk in the series “Train Your Engineering Network”.