06-06, 13:30–15:00 (Europe/London), Hardwick Hub
Large Language Models like GPT4 are now a key part of the technology landsacpe, but how do they really work? And can you code them up at home? In this tutorial we'll create a simple GPT and train it on a simplified dataset of children's jokes. We'll work against a new set of transformer encoder flow diagrams that intuitively match the code, and look at visualisations of GPT's internal representations in order to better understand transformers inside out!
In this tutorial we’ll work step by step through creating a simple GPT model in PyTorch. We'll use simplified kids jokes to train it and see how it’s internal representations evolve as it tries to tell (hopefully) funnier and funnier jokes. Intermediate Python programming skills are assumed for this tutorial, as well as a basic understanding of matrix algebra. No familiarity with PyTorch, GPT or LLMs is assumed.
Please clone https://github.com/karpathy/nanoGPT onto your laptop and follow the README.md instructions to get the dependencies installed pip install torch numpy transformers datasets tiktoken wandb tqdm
before coming to the session,
Previous knowledge expected
I am currently the lead AI developer at Qualis Flow, a company that is using the latest AI tech to help decarbonise the construction industry. Previously I was the CTO of NeuroGrid Ltd., a software consultancy firm providing data science and software engineering services. Before that I was a CoFounder of AgileVentures, where as the CTO we supported multiple open source international charity projects. Further back I was Head of Education and Engineering at the Makers Academy bootcamp, following many years as Associate Professor in Computer Science at Hawaii Pacific University, where I taught courses on AI, mobile, games and software engineering. It all started with a Ph.D. in Machine Learning from the University of Edinburgh.