Dong Won Lee

Hi, My name is Dong Won. My advisors and friends usually call me “Don" :)

I’m a PhD Student at MIT working on the foundations of multimodal social agents, where I focus on developing machine learning methods and evaluation for social intelligence. I am extremely grateful to be co-advised by Professor Cynthia Breazeal in the Personal Robots Group at the Media Lab and Professor Louis-Philippe Morency at the Language Technologies Institute at CMU. Prior to MIT, I graduated with a Master's in Machine Learning and Bachelor's in Statistics and Machine Learning from Carnegie Mellon University.

Email  /  CV  /  Google Scholar  /  Linkedin  /  Twitter  /  Github  /  Medium

profile photo
News
09/2024: Our paper on Global Reward to Local Rewards: Multimodal-Guided Decomposition for Improving Dialogue Agents is accepted at EMNLP 2024. This is a new work that extends alignment techniques such as RLHF for a multimodal signals using a single final score instead of intermediate annotations.
06/2024: I've started an research internship at Microsoft Research in NYC working on Human-Oriented AI!
08/2023: I'll be TA'ing and mentoring students for MIT MAS.630 Affective Computing and Ethics taught by Rosalind Picard at the MIT Media Lab, where I'll be giving lectures on advancements in Machine Learning for Affective Computing and Social Intelligence in LLMs. We encourage interested MIT/Harvard students to sign up for our class to learn about the future of Socio-Emotional AI!
07/2023: Our paper on Lecture Presentations Multimodal Dataset: Towards Understanding Multimodality in Educational Videos is accepted at ICCV 2023!
07/2023: Our paper on HIINT: Historical, Intra-and Inter-personal Dynamics Modeling with Cross-person Memory Transformer is accepted at ICMI 2023!
04/2023: Our paper on Multipar-T: Multiparty-Transformer for Capturing Contingent Behaviors in Group Conversations is accepted at IJCAI 2023!
03/2023: Our proposal for the 1st Workshop on Social and Affective Intelligence (SAI) has been accepted at ACII 2023! Please consider submitting your work!
03/2022: Our paper on Low-resource Adaptation for Personalized Co-Speech Gesture Generation is accepted at CVPR 2022.
07/2021: Our paper on Crossmodal clustered contrastive learning: Grounding of spoken language to gestures is accepted to GENEA Workshop @ ICMI 2021.
05/2021: We are organizing the First Workshop on Crossmodal Social Animation at ICCV 2021.
09/2020: Our paper on No Gestures Left Behind: Learning Relationships between Spoken Language and Freeform Gestures is accepted at Findings at EMNLP 2020.
07/2020: Our paper on Style Transfer for Co-Speech Gesture Animation is accepted at ECCV 2020
Research
clean-usnob Global Reward to Local Rewards: Multimodal-Guided Decomposition for Improving Dialogue Agents
Dong Won Lee, Hae Won Park, Yoon Kim, Cynthia Breazeal, Louis-Philippe Morency
To appear at EMNLP (Oral) , 2024
paper / code / huggingface

We introduce an approach named GELI, which automatically decomposes a single Global Explicit post-interaction score while incorporating Local Implicit feedback from multimodal signals to adapt a language model to become more conversational.

clean-usnob Lecture Presentations Multimodal Dataset: Towards Understanding Multimodality in Educational Videos
Dong Won Lee, Chaitanya Ahuja, Paul Pu Liang, Sanika Natu, Louis-Philippe Morency
ICCV, 2023
paper / code

We introduce the Multimodal Lecture Presentations dataset and PolyViLT a multimodal transformer trained with a multi-instance learning loss. We propose a large-scale benchmark testing the capabilities of machine learning models in multimodal understanding of educational content.

clean-usnob HIINT: Historical, Intra-and Inter-personal Dynamics Modeling with Cross-person Memory Transformer
Yubin Kim, Dong Won Lee, Paul Pu Liang, Sharifa Algohwinem, Cynthia Breazeal, Hae Won Park
ICMI, 2023
paper

We model the Historical, Intra-and Inter-personal (HIINT) Dynamics in conversation by incorporating memory modules in the Cross-person Memory Transformer to address temporal coherence and better represent the context of conversational behaviors.

clean-usnob Multipar-T: Multiparty-Transformer for Capturing Contingent Behaviors in Group Conversations
Dong Won Lee, Yubin Kim, Rosalind Picard, Cynthia Breazeal, Hae Won Park
IJCAI, 2023 (Oral)
paper

We introduce a new transformer architecture to model contingent behaviors in multiparty group conversations.

clean-usnob Low-resource Adaptation for Personalized Co-Speech Gesture Generation
Chaitanya Ahuja, Dong Won Lee, Louis-Philippe Morency
CVPR, 2022
paper /

We propose a new approach in crossmodal generative modeling in low-resource settings in the hopes to to create a personalized gesture generation model (e.g. as part of a personalized avatar) with limited data from a new speaker.

clean-usnob Crossmodal clustered contrastive learning: Grounding of spoken language to gestures
Dong Won Lee, Chaitanya Ahuja, Louis-Philippe Morency
ICMI, GENEA Workshop, 2021
paper / presentation video / code

We propose a new crossmodal contrastive learning loss to encourage a stronger grounding between gestures and spoken language.

clean-usnob No Gestures Left Behind: Learning Relationships between Spoken Language and Freeform Gestures
Chaitanya Ahuja, Dong Won Lee, Ryo Ishii, Louis-Philippe Morency
EMNLP, Findings, 2020
paper / code

We study relationships between spoken language and co-speech gestures to account for the long tail of text-gesture distribution.

clean-usnob Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional Mixture Approach
Chaitanya Ahuja, Dong Won Lee, Yukiko I. Nakano, Louis-Philippe Morency
ECCV, 2020
project page / paper / code

We propose a new style transfer model to learn individual styles of speaker's gestures.

Resources
clean-usnob PATS Dataset: Pose, Audio, Transcripts and Style
Chaitanya Ahuja, Dong Won Lee, Yukiko I. Nakano, Louis-Philippe Morency
dataset page / download Link / code

PATS was collected to study correlation of co-speech gestures with audio and text signals. The dataset consists of a diverse and large amount of aligned pose, audio and transcripts.

Teaching
MIT MAS.630: Advanced Seminar: Affective Computing and Ethics
Graduate TA, Fall 2023
CMU LTI 11-777: Multimodal Machine Learning
Graduate TA, Spring 2022
CMU MLD 10-725: Convex Optimization
Graduate TA, Spring 2021
CMU Stat & DS 36-202: Statistics & Data Science Methods
Undergraduate TA, Fall 2019, Spring 2020, Fall 2020 (3 Semesters)
CMU Stat & DS 36-200: Reasoning with Data
Undergraduate TA, Fall 2020, Spring 2021 (2 Semesters)
Services
Empirical Methods in Natural Language Processing (EMNLP 2021, 2024)
Reviewer
Social and Affective Intelligence (SAI) @ ACII 2023
Co-Organizing Chair
workshop page
First Workshop on Crossmodal Social Animation @ ICCV 2021
Publication Chair
workshop page / video
International Conference on Multimodal Interaction (ICMI 2021)
Reviewer
Misc.
ROK Special Operations Unit Deployed in UAE
Previously, I had the incredible opportunity to be a member of a South Korean special operations unit deployed to Abu Dhabi (AKH14).
Find me here: photo
Planned Submissions

06/2024: EMNLP 2024

Mentors

I have been blessed to meet amazing mentors who have guided me to become a better researcher (and more importantly, a good person). I believe that I can only repay what they've done for me by assisting others in their journey in any way I can. Please don't hesitate to reach out!



Mentors and Advisors: (in Alphabetical Order)
  • Ben Eysenbach - CMU
  • Chaitanya Ahuja - CMU
  • Cynthia Breazeal - MIT
  • David Kosbie - CMU
  • Hae Won Park - MIT
  • Louis-Phillipe Morency - CMU
  • Mark Stehlik - CMU
  • Paul Pu Liang - CMU
  • Roz Picard - MIT
  • Yoon Kim - MIT
  • Sid Sen - Microsoft Research


Website Credits Here: source code