Vidul Ayakulangara Panickan

I'm a Research Associate at Harvard Medical School's Department of Biomedical Informatics and a Research Scholar at the Harvard–MIT Center for Regulatory Science, with appointments at VA Boston Healthcare System and Brigham and Women's Hospital. I also serve as Deputy Director of the HMS/CELEHS Data Science in Action Summer Program.

My current efforts center on clinical NLP and medical language models deployed on HPC clusters across the Veterans Affairs and Mass General Brigham healthcare systems for phenotyping research, multi-institute EHR harmonization, medical code crosswalks, clinical representation learning and biomedical knowledge graphs.

Email / Google Scholar / PubMed / ORCID / GitHub / LinkedIn / HMS

Harvard Medical School, USA

vidul [at] hms [dot] harvard [dot] edu

News

Apr 2026

Serving as Judge for the Health Systems Innovation Lab Hackathon 2026, Boston Hub, Harvard School of Public Health.

Sep 2025

Guest instructor for EHR computer lab, BMIF 204, Harvard Medical School.

Sep 2025

PEHRT preprint released — a pipeline for harmonizing EHR data for translational research.

Nov 2024

Guest instructor for EHR computer lab, BMIF 300qc, Harvard Medical School.

Aug 2024

CIPHER phenomics platform featured by VA Boston Healthcare - Research Platform Optimizes Use of Electronic Health Data

Jul 2024

VA Million Veteran Program genetic architecture study published in Science.

Jul 2024

CELEHS/HMS Data Science Summer Program wrap-up covered in Harvard T.H. Chan School of Public Health.

Oct 2023

EVAR postmarket surveillance study published in JAMA Internal Medicine.

Dec 2022

Secure Science with CITADEL — scaled NLP and concurrence computation on the Summit supercomputer to analyze Veterans Health Records. Coverage in HPCwire.

Research

My work sits at the boundary of applied ML, clinical informatics, and software engineering — turning unstructured EHR data into research-ready datasets and building the pipelines that produce them. I'm interested in how representation learning, language models, and harmonization infrastructure can scale across institutions while handling the messiness of real-world clinical data.

Natural Language Processing

I develop Clinical NLP systems for EHR codified and narrative data, from rule-based pipelines to finetuning language models for entity extraction, phenotyping, and medical code retrieval.

EHR Infrastructure & Knowledge Graphs

Maintain multi-institutional pipelines that harmonize EHR data into analysis ready datasets, building medical code crosswalks, cross institutional knowledge graphs, and HPC scale NLP deployments.

Selected Publications

arXiv Sep 2025

PEHRT: A Common Pipeline for Harmonizing EHR Data for Translational Research

Gronsbell J, Panickan VA, Zhou D, Lin C, Charlon T, Hong C, Xiong X, Wang L, Gao J, Zhou S, Tian Y, Shi Y, Gan Z, Cai T.

JAMA Internal Med Oct 2023

Endovascular Aneurysm Repair Devices as a Use Case for Postmarketing Surveillance of Medical Devices

Wang X*, Panickan VA*, Cai T*, Xiong X, Cho K, Cai T, Bourgeois FT.

Science Jul 2024

Diversity and Scale: Genetic Architecture of 2,068 Traits in the VA Million Veteran Program

Verma A, Huffman JE, Rodriguez A, Conery M, Liu M, Ho Y-L, Kim Y, Heise DA, Guare L, Panickan VA, et al.

Nature Medicine 2023

Potential Pitfalls in the Use of Real-World Data for Studying Long COVID

Zhang HG, Honerlaw JP, Maripuri M, Samayamuthu MJ, Beaulieu-Jones BR, Baig HS, L'Yi S, Ho Y-L, Morris M, Panickan VA, Wang X, Weber GM, Liao KP, et al. (4CE Consortium).

J Biomed Inform 2025

DOME: Directional Medical Embedding Vectors from Electronic Health Records

Wen J, Xue H, Rush E, Panickan VA, Cai T, Zhou D, Ho Y-L, Costa L, Begoli E, Hong C, Gaziano JM, Cho K, Liao KP, Lu J, Cai T.

J Biomed Inform 2025

ARCH: Large-Scale Knowledge Graph via Aggregated Narrative Codified Health Records Analysis

Gan Z, Zhou D, Rush E, Panickan VA, Ho Y-L, Ostrouchov G, Xu Z, Shen S, Xiong X, et al.

NPJ Digital Med 2025

Label-Efficient Phenotyping for Long COVID Using Electronic Health Records

Hong C, Wen J, Zhang HG, Panickan VA, Yang DY, Chen AW, Xiong X, Wang X, Morris M, et al.

JAMIA 2024

CIPHER: Centralized Interactive Phenomics Resource — An Integrated Online Phenomics Knowledgebase for Health Data Users

Honerlaw J, Ho Y-L, Fontin F, Murray M, Galloway A, Heise D, Connatser K, Davies L, Gosian J, Maripuri M, Russo J, Sangar R, Tanukonda V, Zielinski E, Dubreuil M, Zimolzak AJ, Panickan VA, et al.

Patterns 2024

LATTE: Label-Efficient Incident Phenotyping from Longitudinal Electronic Health Records

Wen J, Hou J, Bonzel CL, Zhao Y, Castro VM, Gainer VS, Weisenfeld D, Cai T, Ho Y-L, Panickan VA, Costa L, Hong C, Gaziano JM, Liao KP, Lu J, Cho K, Cai T.

NPJ Digital Med 2024

Multisource Representation Learning for Pediatric Knowledge Extraction from Electronic Health Records

Li M, Li X, Pan K, Geva A, Yang D, Sweet SM, Bonzel CL, Panickan VA, Xiong X, Mandl K, Cai T.

NPJ Digital Med 2021

KESER: Clinical Knowledge Extraction via Sparse Embedding Regression with Multi-Center Large-Scale EHR Data

Hong C, Rush E, Liu M, Zhou D, Sun J, Sonabend A, Castro VM, Schubert P, Panickan VA, et al.

→ Full publication list on PubMed

Teaching

BMIF 204 · 2025

Foundations of Clinical Data - Computer Lab

Co-led computer lab session on working with EHR data for master's and PhD students at HMS.

Lab Instructor · Harvard Medical School

BMIF 300qc · 2024

Working with MIMIC-IV data - Computer Lab

Led the computer lab workshop on working with EHR data for AI in Medicine Phd students at HMS.

Lab Instructor · Harvard Medical School

2023–present

Data Science in Action Summer Program

I lead curriculum design and operations for Harvard's data science summer program for high school students, covering machine learning, statistics, and Python.

Deputy Director · CELEHS / HMS

NIH AIM-AHEAD

MIMIC-IV Data Preparation

Python/Jupyter tutorial hosted by the NIH AIM-AHEAD consortium as training material for AI/ML health-equity research.

Tutorial Developer

Open Methods

EHR Processing Tutorial

Computational workflows for processing EHR data — extraction, preprocessing, and feature engineering for clinical NLP modeling.

Tutorial Developer

Service

2026

Judge — HSIL Hackathon 2026, Boston Hub Health Systems Innovation Lab, Harvard School of Public Health

2026

Reviewer — AMIA 2026 Amplify Informatics Conference American Medical Informatics Association

2026

Reviewer — IEEE ICHI 2026 IEEE International Conference on Healthcare Informatics

Appointments

2019 – present

Research Associate, Biomedical Informatics

Harvard Medical School, Department of Biomedical Informatics

2022 – present

Research Scholar

Harvard–MIT Center for Regulatory Science

2020 – present

Data Scientist (Contractor)

VA Boston Healthcare System — Million Veteran Program

2020 – present

Education

2016 – 2019

M.S. in Computer Science

University of Massachusetts Amherst

2011 – 2015

B.Tech. in Computer Science and Engineering

Amrita University, Amritapuri, Kerala, India

Affiliations

Harvard Medical School, DBMI

Harvard–MIT Center for Regulatory Science

VA Boston Healthcare System

Brigham and Women's Hospital

NIH AIM-AHEAD Consortium

4CE Consortium

Contact

vidul@hms.harvard.edu

Department of Biomedical Informatics
Harvard Medical School
10 Shattuck St, Suite 514
Boston, MA 02115