Back to projects
project_details.sh

$ cat address-parser.json

title: Enterprise Address NER Pipeline

category: ml

stack:PythonPyTorchIndicBERTv2CRFFastAPICloud RunONNX

Enterprise Address NER text processing system dashboard showing raw unstructured text parsed into structured geographic entities

Project Overview

During my ML internship at BSES Delhi, I built a Named Entity Recognition system for parsing unstructured Indian addresses into 15 structured fields — house number, floor, block, gali, colony, area, sector, pincode, city, and more. The model uses IndicBERTv2 (pretrained on 20B+ tokens of Indian language text) with a CRF layer for sequence labeling, trained on hand-annotated Delhi addresses in both Hindi and English. It hits 94.3% F1 on a balanced test set with sub-25ms inference per address. I added a rule-based post-processor with a Delhi locality gazetteer for corrections, optimized with ONNX runtime, and deployed via FastAPI on Google Cloud Run with auto-deploy via Cloud Build.