Dong Won Lee

Hi, My name is Dong Won. My advisors and friends call me “Dong” or “Don”. Whatever is convenient for you!

I’m a 5th-year Master's student (4 (B.S.) + 1 (M.S)) in the Machine Learning Department at Carnegie Mellon University. (Expected Grad. May 2022). I recently completed my B.S. in Statistics and Machine Learning at CMU, as well.

I am fascinated about designing ML models to understand the relationship between what we see (vision) and what we hear (language) and implement these findings to interesting applications in education, human-robot-interaction, and robotics control.

Email  /  CV  /  Google Scholar  /  Linkedin  /  Github  /  Medium

profile photo

07/2021: Our paper on Crossmodal clustered contrastive learning: Grounding of spoken language to gestures is accepted to GENEA Workshop @ ICMI 2021.
06/2021: Graduated from my undergraduate degree at CMU!
05/2021: We are organizing the First Workshop on Crossmodal Social Animation at ICCV 2021.
09/2020: Our paper on No Gestures Left Behind: Learning Relationships between Spoken Language and Freeform Gestures is accepted at Findings at EMNLP 2020.
07/2020: Our paper on Style Transfer for Co-Speech Gesture Animation is accepted at ECCV 2020

During my Bachelor’s and Master’s, I’ve primarily worked in the MultiComp Lab, advised by Professor Louis-Philippe Morency. I'm currently dedicating my efforts to understand the fine-grained visual grounding between visual elements and language. Prior, I focused on developing models in generating nonverbal behavioral cues (gestures) conditioned on language to improve naturalness in human-robot interaction. I work with Dr. Hae Won Park at the MIT Media Lab to predict people’s engagement with an agent using graphical models. I am also collaborating with Professor Ruslan Salakhutinov to teach robots to learn optimal policies just with language.

clean-usnob Role-Aware Graph-Based Next Speaker Prediction in Multi-party Human-Robot Interaction
Dong Won Lee, Pedro Colon-Hernandez, Ishaan Grover, Cynthia Breazeal, Hae Won Park
CSCW, 2022, In Submission
code / demo video
clean-usnob Low-resource Adaptation for Personalized Co-Speech Gesture Generation
Chaitanya Ahuja, Dong Won Lee, Louis-Philippe Morency
CVPR, 2022, In Submission
clean-usnob Crossmodal clustered contrastive learning: Grounding of spoken language to gestures
Dong Won Lee, Chaitanya Ahuja, Louis-Philippe Morency
ICMI GENEA Workshop, 2021
paper / presentation video / code

We propose a new crossmodal contrastive learning loss to encourage a stronger grounding between gestures and spoken language.

clean-usnob No Gestures Left Behind: Learning Relationships between Spoken Language and Freeform Gestures
Chaitanya Ahuja, Dong Won Lee, Ryo Ishii, Louis-Philippe Morency
Findings at EMNLP, 2020
paper / code

We study relationships between spoken language and co-speech gestures to account for the long tail of text-gesture distribution.

clean-usnob Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional Mixture Approach
Chaitanya Ahuja, Dong Won Lee, Yukiko I. Nakano, Louis-Philippe Morency
ECCV, 2020
project page / paper / code

We propose a new style transfer model to learn individual styles of speaker's gestures.

clean-usnob PATS Dataset: Pose, Audio, Transcripts and Style
Chaitanya Ahuja, Dong Won Lee, Yukiko I. Nakano, Louis-Philippe Morency
dataset page / download Link / code

PATS was collected to study correlation of co-speech gestures with audio and text signals. The dataset consists of a diverse and large amount of aligned pose, audio and transcripts.

CMU LTI 11-777: Multimodal Machine Learning
Graduate TA, Spring 2022
CMU MLD 10-725: Convex Optimization
Graduate TA, Spring 2021
CMU Stat & DS 36-202: Statistics & Data Science Methods
Undergraduate TA, Fall 2019, Spring 2020, Fall 2020 (3 Semesters)
CMU Stat & DS 36-200: Reasoning with Data
Undergraduate TA, Fall 2020, Spring 2021 (2 Semesters)
First Workshop on Crossmodal Social Animation @ ICCV 2021
Publication Chair
International Conference on Multimodal Interaction (ICMI 2021)
Empirical Methods in Natural Language Processing (EMNLP 2021)

I spend some of my time writing about interesting things I come across while I study ML :)

Key Intuition Behind Positional Encodings
Explaning the intuition behind the formulation of Transformer’s positional embeddings.
C-Learning: No reward function needed for Goal-Conditioned RL
A more-detailed summary of C-Learning: Learning to Achieve Goals via Recursive Classification, which studies the problem of predicting and controlling future states distribution of an autonomous agent.
ROK Special Operations Unit Deployed in UAE
Previously, I had the incredible opportunity to be a member of a South Korean special operations unit deployed to Abu Dhabi (AKH14).
Please excuse the hideous sunglasses in this photo.
Planned Submissions

1/15/2022: CSCW 2022
1/28/2022: ICML 2022
3/07/2022: ECCV 2022


I have been blessed to meet amazing mentors who have guided me to become a better researcher (and more importantly, a good person). I believe that I can only repay what they've done for me by assisting others in their journey in any way I can. Please don't hesitate to reach out!

Mentors and Advisors: (in Alphabetical Order)
  • Ben Eysenbach - CMU MLD
  • Chaitanya Ahuja - CMU LTI
  • David Kosbie - CMU CSD
  • Louis-Phillipe Morency - CMU LTI
  • Mark Stehlik - CMU CSD

Source: source code