Residual blocks
Skip connections prevent gradient vanishing and regularize โ critical for sketch data where everyone draws differently.
DW-Sep convolutions
Blocks 3 & 4 use depthwise separable convolutions โ ~9ร fewer params than standard Conv2d with similar accuracy.
Global avg pool
GAP gives translation invariance โ a cat drawn in any corner of the canvas produces the same feature vector.
Mixup training
Training blends pairs of drawings (ฮฑ=0.4). Forces smooth boundaries between confusable pairs like clock/sun and cat/dog.
SGD + OneCycleLR
SGD with Nesterov momentum + OneCycleLR warmup/cosine decay. Converges in ~25 epochs from random init.
Client inference
Model exported to ONNX and runs entirely in-browser via ONNX Runtime Web (WebGPU / WASM). No server needed.