This model was created as a test of the Jean Zay infrastructure and an exploration of the potential difficulties and instabilities that could arise from scaling up a model.
Progress of this project was recorded here
13B English decoder only model
The final checkpoints can be found here: [to be added]
The code is here: [to be added]