We evaluate davinci-003 across a range of classification, summarization, and generation tasks. Using Scale Spellbook, the platform for large language model apps, we show where davinci-003 significantly outperforms the prior version and where it still has room to improve.
Comments are closed.