Dong Won Lee

Hi, My name is Dong Won. My advisors and friends call me “Dong” or “Don”. Whatever is convenient for you!

I’m a 5th-year Master's student (4 (B.S.) + 1 (M.S)) in the Machine Learning Department at Carnegie Mellon University. (Expected Grad. May 2022). I recently completed my B.S. in Statistics and Machine Learning at CMU, as well.

I am fascinated about designing ML models to understand the relationship between what we see (vision) and what we hear (language) and implement these findings to interesting applications in education, human-robot-interaction, and robotics control.

Email  /  Resume  /  Google Scholar  /  Linkedin  /  Github  /  Medium

profile photo
News

07/2021: Our paper on Crossmodal clustered contrastive learning: Grounding of spoken language to gestures is accepted to GENEA Workshop @ ICMI 2021.
06/2021: Graduated from my undergraduate degree at CMU!
05/2021: We are organizing the First Workshop on Crossmodal Social Animation at ICCV 2021.
09/2020: Our paper on No Gestures Left Behind: Learning Relationships between Spoken Language and Freeform Gestures is accepted at Findings at EMNLP 2020.
07/2020: Our paper on Style Transfer for Co-Speech Gesture Animation is accepted at ECCV 2020
Research

During my Bachelor’s and Master’s, I’ve primarily worked in the MultiComp Lab, advised by Professor Louis-Philippe Morency. I'm currently dedicating my efforts to understand the fine-grained visual grounding between visual elements and language. Prior, I focused on developing models in generating nonverbal behavioral cues (gestures) conditioned on language to improve naturalness in human-robot interaction. I work with Dr. Hae Won Park at the MIT Media Lab to predict people’s engagement with an agent using graphical models. I am also collaborating with Professor Ruslan Salakhutinov to teach robots to learn optimal policies just with language.

clean-usnob Role-Aware Graph-Based Next Speaker Prediction in Multi-party Human-Robot Interaction
Dong Won Lee, Ishaan Grover, Pedro Colon-Hernandez, Cynthia Breazeal, Hae Won Park
HRI, 2022, In Submission
clean-usnob Low-resource Adaptation for Personalized Co-Speech Gesture Generation:
Chaitanya Ahuja, Dong Won Lee, Louis-Philippe Morency
CVPR, 2022, In Submission
clean-usnob Crossmodal clustered contrastive learning: Grounding of spoken language to gestures
Dong Won Lee, Chaitanya Ahuja, Louis-Philippe Morency
ICMI GENEA Workshop, 2021
paper / video / code

We propose a new crossmodal contrastive learning loss to encourage a stronger grounding between gestures and spoken language.

clean-usnob No Gestures Left Behind: Learning Relationships between Spoken Language and Freeform Gestures
Chaitanya Ahuja, Dong Won Lee, Ryo Ishii, Louis-Philippe Morency
Findings at EMNLP, 2020
paper / code

We study relationships between spoken language and co-speech gestures to account for the long tail of text-gesture distribution.

clean-usnob Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional Mixture Approach
Chaitanya Ahuja, Dong Won Lee, Yukiko I. Nakano, Louis-Philippe Morency
ECCV, 2020
project page / paper / code

We propose a new style transfer model to learn individual styles of speaker's gestures.

Resources
clean-usnob PATS Dataset: Pose, Audio, Transcripts and Style
Chaitanya Ahuja, Dong Won Lee, Yukiko I. Nakano, Louis-Philippe Morency
dataset page / download Link / code

PATS was collected to study correlation of co-speech gestures with audio and text signals. The dataset consists of a diverse and large amount of aligned pose, audio and transcripts.

Teaching
CMU MLD 10-725: Convex Optimization (PhD)
Graduate TA, Spring 2021
CMU Stat & DS 36-202: Statistics & Data Science Methods (Undergrad)
Undergraduate TA, Fall 2019, Spring 2020, Fall 2020 (3 Semesters)
CMU Stat & DS 36-200: Reasoning with Data (Undergrad)
Undergraduate TA, Fall 2020, Spring 2021 (2 Semesters)
Services
First Workshop on Crossmodal Social Animation @ ICCV 2021
Publication Chair
International Conference on Multimodal Interaction (ICMI 2021)
Reviewer
Empirical Methods in Natural Language Processing (EMNLP 2021)
Reviewer
Blog

I spend some of my time writing about interesting things I come across while I study ML :)

Key Intuition Behind Positional Encodings
Explaning the intuition behind the formulation of Transformer’s positional embeddings.
C-Learning: No reward function needed for Goal-Conditioned RL
A more-detailed summary of C-Learning: Learning to Achieve Goals via Recursive Classification, which studies the problem of predicting and controlling future states distribution of an autonomous agent.
Misc.
ROK Special Operations Unit Deployed in UAE
Previously, I had the incredible opportunity to be a member of a South Korean special operations unit deployed to Abu Dhabi (AKH14).
Please excuse the hideous sunglasses in this photo.
Planned Submissions

11/16/2021 or 11/17/2021: ACL 2022 or CVPR 2022
Mentors

I have been blessed to meet amazing mentors (listed below in alphabetical order) who have guided me to become a better researcher (and more importantly, a good person). I believe that I can only repay what they've done for me by assisting others in their journey in any way I can. Please don't hesitate to reach out!



Mentors and Advisors: (in Alphabetical Order)
  • Ben Eysenbach - CMU MLD
  • Chaitanya Ahuja - CMU LTI
  • David Kosbie - CMU CSD
  • Louis-Phillipe Morency - CMU LTI
  • Mark Stehlik - CMU CSD


Source: source code