I'm a Research Associate at Harvard Medical School's Department of Biomedical Informatics and a Research Scholar at the Harvard–MIT Center for Regulatory Science, with appointments at VA Boston Healthcare System and Brigham and Women's Hospital. I also serve as Deputy Director of the HMS/CELEHS Data Science in Action Summer Program.
My current efforts center on clinical NLP and medical language models deployed on HPC clusters across the Veterans Affairs and Mass General Brigham healthcare systems for phenotyping research, multi-institute EHR harmonization, medical code crosswalks, clinical representation learning and biomedical knowledge graphs.
My work sits at the boundary of applied ML, clinical informatics, and software engineering — turning unstructured EHR data into research-ready datasets and building the pipelines that produce them. I'm interested in how representation learning, language models, and harmonization infrastructure can scale across institutions while handling the messiness of real-world clinical data.
I develop Clinical NLP systems for EHR codified and narrative data, from rule-based pipelines to finetuning language models for entity extraction, phenotyping, and medical code retrieval.
Maintain multi-institutional pipelines that harmonize EHR data into analysis ready datasets, building medical code crosswalks, cross institutional knowledge graphs, and HPC scale NLP deployments.
Co-led computer lab session on working with EHR data for master's and PhD students at HMS.
Led the computer lab workshop on working with EHR data for AI in Medicine Phd students at HMS.
I lead curriculum design and operations for Harvard's data science summer program for high school students, covering machine learning, statistics, and Python.
Python/Jupyter tutorial hosted by the NIH AIM-AHEAD consortium as training material for AI/ML health-equity research.
Computational workflows for processing EHR data — extraction, preprocessing, and feature engineering for clinical NLP modeling.