Not All Split Points Are Created Equal

by Martinraj Nadar | Friday, Mar 27, 2026

Abstract: Split inference enables deep neural networks to be partitioned between edge devices and servers. However, most prior work examines only a narrow set of split points and overlooks the trade-off between computational offload and feature compressibility. This paper introduces a framework for systematically analyzing split points across an entire backbone, evaluating both compute distribution and compression efficiency. Results show that later layers offer significantly higher compressibility, achieving BD-Rate reductions of up to 57% compared to remote inference, while earlier and mid-level layers provide greater compute offload for the edge device. A saturation effect is observed in very deep layers, where further gains diminish. These findings demonstrate that optimal split points depend on jointly balancing compute and compression, and the proposed framework provides a foundation for dynamic split inference strategies in practical edge–server deployments. Authors: By Juan Merlos, Ashan Perera, Velibor Adzic & Hari Kalva Conference / Journal 2026 IEEE International Conference on Consumer Electronics (ICCE)