I am a dedicated Ph.D. computer vision researcher with a strong passion for large-scale vision systems and generative models. My background spans 2D/3D perception and diffusion-based generative modeling. I enjoy working on problems at the intersection of research and engineering: designing new methods, carefully implementing them, and stress-testing them on real data.
Full-Time
Computer Vision Engineer at Path Robotics, Nov. 2024 - present.
Designing and productionizing 2D and 3D foundation models for critical vision tasks—semantic segmentation, point-cloud completion, and point-cloud registration—powering reliable perception for autonomous robotic welding.
Internships
Machine Learning Intern at ASML, Jun. 2021 - Aug. 2021.
Responsible for creating and assessing a new deep learning model for an image-to-image translation problem. This involves mapping circuit layout patterns to unconventional configurations that significantly improve manufacturability.
Research Work
Computer Vision PhD Research Assistant, CoVIS Lab, May 2019 - 2024.
Computer Vision PhD Research Assistant, BioMed-AI Lab, May 2020 - Aug. 2020
Direct Preference Optimization for Text-to-Video/Text-to-Image generation.
SPADe: Spatial Plaid Attention Decoder for Semantic Segmentation
Created an efficient and powerful decoder for semantic segmentation called “SPADe”, compatible with CNNs and Transformers as the backbone.
Achieved state‑of‑the‑art performance on Cityscapes and ADE20K with a significantly reduced computational cost of 34.95% and 32.85% lower than the second‑best decoder, for each dataset, respectively.
Minimized decoder size by 89.74% relative to UPerNet.
Irrigation Practice Mapping
Developed advanced AI models to classify irrigation practices in U.S. lands accurately.
Managed data acquisition, annotation, and preprocessing, as well as designing, training, and deploying the models across a large image dataset.
The proposed model could segment irrigation practices with a mIoU of over 93.0%.
Provided a nationwide irrigation map to USGS and NASA, providing insight into water use across the U.S.
A Task-Aware Network for Multi-task Learning
Designed a task‑aware multi‑task network for landmark localization, pose estimation, and landmark visibility estimation.
Reduced error by 25% for face pose estimation, 15% for landmark localization, and 10% for landmark visibility estimation with a novel multi‑task network.
Secured the best cumulative performance with a 5.7% error reduction, despite landmark visibility being the most challenging task.
Ensured superior consistency with smaller standard deviations in error rates across tasks.