I am a second-year PhD student in the Computer Science Department at Stanford University advised by Marco Pavone and Clark Barrett. I received my Bachelor of Science (2020-2023) and Master of Science (2023-2024) in Computer Science at UC San Diego, where I was advised by Sicun Gao and Sylvia Herbert. My interests are in multimodal reasoning and out-of-distribution (OOD) generalization for robotics.
Embodied Chain-of-Thought (CoT) reasoning has enhanced Vision-Language-Action (VLA) models, but rigid templates over reasoning primitives (objects, plans, affordances) force policies to process irrelevant information that distracts from action-prediction signals. We introduce R&B-EnCoRe, which treats reasoning as a latent variable within importance-weighted variational inference, enabling models to bootstrap embodiment-specific reasoning from internet-scale knowledge through self-supervised refinement — without external rewards, verifiers, or human annotation. Validated across manipulation, legged navigation, and autonomous driving with various VLA architectures with 1B, 4B, 7B, and 30B parameters.
As autonomous systems expand their deployment regions into unstructured, open-world environments, they face potential hazardous Out-of-Distribution (OOD) failure scenarios that differ from their training data. Current methods rely on handcrafted intervention policies, limiting their ability to plan generalizable, safe motions. FORTRESS introduces a novel framework that generates and reasons about semantically safe fallback strategies in real time to prevent OOD failures by bridging open-world, multi-modal reasoning with dynamics-aware planning.
A journal publication surveying the recent literature on scalable Hamilton-Jacobi reachability estimation in reinforcement learning to provide a foundational basis for research into reliability in high-dimensional systems. We review how this technique has been employed to solve challenging tasks like those with dynamic obstacles and lidar-based or RGB image-based observations.
Hamilton-Jacobi reachability estimation for model-free safe RL in deterministic and stochastic environments with safety guarantees and convergence analysis. Tasks include lidar-based observations, dynamic obstacles, and multiple hard and soft constraints.
Learning Lyapunov-like models offline from observation-only, expert data to solve stabilization control tasks online. Deployed in hardware for robustness testing.