-
RLHF from Shakespeare
I tried to finetune LLM with RLHF to generate positive tone message from Shakespeare Corpus. Here is what I learnt.
-
ARENA learning experience
I summarized my learning experience about ARENA.
-
How to get gold medal in Kaggle competition, from a Competition Master perspective.
I summarized 7 key points about how to get a Kaggle competition gold medal.
-
Implementing PPO from scratch
I tried to implementing PPO from scratch and apply it to Procgen environment. Here is what I learnt.
-
Replicating Scaling Laws by using MNIST data
I tried to replicating scaling laws result by using MNIST data. Here is what I learnt.