Meta LLaMA, 4 foundation models ranging from 7B to 65B parameters. LLaMA-13B outperforms OPT and GPT-3 175B on most benchmarks. LLaMA-65B is competitive with Chinchilla 70B and PaLM 540B. The weights for all models are open and available.

Unlike Chinchilla, PaLM, or GPT-3, we only use datasets publicly available, making our work compatible with open-sourcing and reproducible, while most existing models rely on data that is either not publicly available or undocumented.

All our models were trained on at least 1T token, much more than what is typically used at this scale. Interestingly, even after 1T tokens, the 7B model still improved.

On Common Sense Reasoning, Closed-book Question Answering, and Reading Comprehension, LLaMA-65B outperforms Chinchilla 70B and PaLM 540B on almost all benchmarks.

LLaMA-65B outperforms Minerva-62B on GSM8k, even though it has not been fine-tuned on any mathematical dataset. On the MATH benchmark, it outperforms PaLM-62B (but is quite below Minerva-62B).

On code generation benchmarks, LLaMA-62B outperforms cont-PaLM (62B) and PaLM-540B.

We also briefly tried instruction finetuning using the approach of Chung et al. (2022). The resulting model, LLaMA-I, outperforms Flan-PaLM-cont (62B) on MMLU and showcases some interesting instruct capabilities.

Leave a Reply

Help us find great AI content


Never miss a thing! Sign up for our AI Hackr newsletter to stay updated.


AI curated tools and resources. Find the best AI tools, reports, research entries, writing assistants, chrome extensions and GPT tools.