Category: GPT language model

The construction of a large language model (LLM) depends on many things: banks of GPUs, vast reams of training data, massive amounts of power, and matrix manipulation libraries like Numpy. For models with lower requirements though, it’s possible to do away with all of that, including the software dependencies. As someone who’d already built a […]
Last Thursday two lowly masters grad students, Aaron Gokaslan and Vanya Cohen managed to replicate the secretive OpenAI model and cheekily named their version OpenGPT-2. The code can be downloaded from this Google Colab page and apparently no prior experience in language modeling is required to use it. More useful might be the skills required […]