$ cat address-parser.json
title: Enterprise Address NER Pipeline
category: ml
stack:PythonPyTorchIndicBERTv2CRFFastAPICloud RunONNX

Project Overview
During my ML internship at BSES Delhi, I built a Named Entity Recognition system for parsing unstructured Indian addresses into 15 structured fields — house number, floor, block, gali, colony, area, sector, pincode, city, and more. The model uses IndicBERTv2 (pretrained on 20B+ tokens of Indian language text) with a CRF layer for sequence labeling, trained on hand-annotated Delhi addresses in both Hindi and English. It hits 94.3% F1 on a balanced test set with sub-25ms inference per address. I added a rule-based post-processor with a Delhi locality gazetteer for corrections, optimized with ONNX runtime, and deployed via FastAPI on Google Cloud Run with auto-deploy via Cloud Build.