The current implementation is obviously broken and I think it should be fixed using (optimized) compiler-agnostic asm code. This should fix the backwards-compatibility problem. In a long term view it ...