TRL 0.2.0 – A library to train language models with reinforcement learning
trl you can train transformer language models with Proximal Policy Optimization (PPO). The library is built on top of the
transformers library by 🤗 Hugging Face. Therefore, pre-trained language models can be directly loaded via
transformers. At this point most of decoder architectures and encoder-decoder architectures are supported.
PPOTrainer: A PPO trainer for language models that just needs (query, response, reward) triplets to optimise the language model.
AutoModelForSeq2SeqLMWithValueHead: A transformer model with an additional scalar output for each token which can be used as a value function in reinforcement learning.
Help us find great AI content
Never miss a thing! Sign up for our AI Hackr newsletter to stay updated.
AI curated tools and resources. Find the best AI tools, reports, research entries, writing assistants, chrome extensions and GPT tools.