In-person

Research in Motion - Bowen Baker, OpenAI

Bowen Baker Headshot

This event has passed.

Kline Tower, 13th Floor, Rm. 1327
219 Prospect Street New Haven, CT 06511

 

Webcast Option:  https://yale.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=a64e8618-52ef-4715-b7ee-b40100e20a15

Chain-of-Thought Monitorability: A fragile opportunity for AI Safety

Speaker Bio

Bowen Baker is a research scientist at OpenAI and leads its Chain-of-Thought Interpretability Team. Bowen has always been interested in self-improving systems, and his first work in machine learning was in deep learning architecture search during his masters at MIT. He then joined OpenAI in 2017 where he has worked on sim-to-real transfer and dexterous manipulation of humanoid robotic hands, multi-agent autocurricula and cooperation, constructing behavioral priors from unsupervised video, LLM reasoning, and most recently AI safety and Alignment.

Abstract

Observability into the decision making of modern AI systems may be required to safely deploy increasingly capable agents. Monitoring the chain-of-thought (CoT) of today’s reasoning models has proven effective for detecting misbehavior. However, this “monitorability” may be fragile under different training procedures, data sources, or even continued system scaling. This talk will cover OpenAI’s recent work on chain-of-thought monitoring, discuss the importance and fragility of chain-of-thought monitorability, and introduce three evaluation archetypes that we are using to measure and hopefully maintain monitorability.