Accessing NPU with TensorFlow Lite on Cortex-A: Understanding Cortex-M33 Role and NPU Configuration on I.MX 93 (digi connectcore 93)

I am running a TensorFlow Lite model using external delegates to access the NPU on a Cortex-A. I have converted my int8.tflite model to a Vela-compiled model (int8_vela.tflite). According to the documentation, when using TensorFlow Lite on Cortex-A, the NPU is accessed directly without waking up the Cortex-M33. Is this correct?

If so, can I disable the Cortex-M33 when running the model on the NPU? If not, how much load does the Cortex-M33 bear while the model is running on the NPU?

Additionally, in the Etherous software architecture, the NPU is typically accessed via the Cortex-M33. I am unclear on how to properly configure NPU access in this scenario. Could you clarify how this works?