That is actually a rather tricky question since it is very individual for each setup. The basic idea is to move calculations in software to let specialized HW do that to get a lower CPU usage that can be used for other things.
So far we have always ended up with Software Internal to be the most efficient one, lowest cpu load and fastest coding to mp4. Software VLC only use VLC and is most for testing purposes. The two hardware versions depends on your hardware and if the graphics and processor support these two. I see from the other post that you run ESXi like I do so none of the HW accelerations will work. The experiences so far is hat the DXVA2 have no or extremely little effect. A user tested to run QuickSync on his i5 quad core processor and the Turbo really kicked in and NCS "got faster" at the same time as the CPU load really increased so he went back to Software Internal instead.
Then of cause it will also depends on the cameras mjpeg or rtsp and resolution and ..... So it depends on the setup. In principal both quicksync and dxva2 is great, but in practical the good old software version is very good. I would also say it is Intel processors.