Improving few-shot generalisation by exploring and exploiting auxiliary data
Few-shot learning involves learning an effective model from only a few labeled datapoints. The use of a small training set makes it difficult to avoid overfitting but also makes few-shot learning applicable to many important real-world settings. In this work, we focus on Few-shot Learning with Auxiliary Data (FLAD), a training paradigm that assumes access to auxiliary data during few-shot learning in hopes of improving generalization. Introducing auxiliary data during few-shot learning leads to essential design choices where hand-designed heuristics can lead to sub-optimal performance. In this work, we focus on automated sampling strategies for FLAD and relate them to the explore-exploit dilemma that is central in multi-armed bandit settings. Based on this connection we propose two algorithms — EXP3-FLAD and UCB1-FLAD — and compare them with methods that either explore or exploit, finding that the combination of exploration and exploitation is crucial. Using our proposed algorithms to train T5 yields a 9% absolute improvement over the explicitly multi-task pre-trained T0 model across 11 datasets.
Large language models can perform various reasoning tasks by using chain-of-thought prompting, which guides them to find answers through step-by-step demonstrations. However, the quality of the prompts depends on the demonstrations given to the models, and creating many of them by hand is costly. We introduce Synthetic prompting, a method that leverages a few handcrafted […]
Recent works successfully leveraged Large Language Models’ (LLM) abilities to capture abstract knowledge about world’s physics to solve decision-making problems. Yet, the alignment between LLMs’ knowledge and the environment can be wrong and limit functional competence due to lack of grounding. In this paper, we study an approach to achieve this alignment through functional grounding: […]
It is well established that increasing scale in deep transformer networks leads to improved quality and performance. This increase in scale often comes with an increase in compute cost and inference latency. Consequently, research into methods which help realize the benefits of increased scale without leading to an increase in the compute cost becomes important. […]
Help us find great AI content
Never miss a thing! Sign up for our AI Hackr newsletter to stay updated.
AI curated tools and resources. Find the best AI tools, reports, research entries, writing assistants, chrome extensions and GPT tools.