Phuc Nguyen

Phuc Nguyen

High school dropout. Research Engineer at Hugging Face.

DMs open. The best way to reach me is twitter, discord: neuralink, or email: phucnh791 [at] gmail [dot] com. If you're ever in Paris, feel free to drop me a message at any of these contacts!

I'm currently a research engineer at Hugging Face in 🥐 Paris, where I'm part of the distributed training team nanotron and work on various research reproduction effortsThese Twitter threads document my research experiments - scroll through earlier and later tweets to see the full journey and discoveries on the Hugging Face science team (FP8 research, Infini-Attention, MoE's Expert Parallelism, DoMiNo, DoReMi).

I maintain a life-long learning progress threadScroll down the Twitter thread to see the learning journey where I've shared my study notes over the past two years (3D parallelism).

Before Hugging Face, I designed a study plan that spanned across many subjects and then consistently studied from 3:30 AM to 3:30 PM, then went to sleep from 5:20 PM to 3:00 AM and repeated for 2 years.

Selected Work

A failed experiment: Infini-Attention, and why we should keep trying?

Research reproduction of the Infini-Attention paper. TLDR: Infini-Attention's performance gets worse as we increase the number of times we compress the memory. To the best of our knowledge, ring attention, YaRN, and rope scaling are still the best ways for extending a pretrained model to a longer context length.

Selected Study Notes

Talks

Get in Touch

TwitterGitHubDiscordTwitchEmailLinkedInCV