NCS Server fails to use Nvidia GPU


#1

I have been unable after trying various settings strategies to get your SW to use Nvidia CUDA encoding and decoding. I have gone as far as replacing the FFMPEG with my own complied version which run from the command line in your program directory uses the GPU. I never see your SW spawn the FFMPEG process so you must natively us the Moonware.FFMPEG.dll to do encoding and decoding. Is that correct? As the “Hardware_DXVA2” setting does not envoke the GPU. I also have forced set your executable using the NVDIA control panel to its hardware. Appearantly the DLL has not been compiled with implement the “-hwaccel cuvid” option. You feedback would be appreciated.


#2

Hi Frank,
Thanks for this very interesting investigation. This is out of my knowledge, but very much of interest to find out if something is wrong or not. I will forward this to our developer for further investigation.
Thanks,
Henrik
@Steve


#3

I would be more than happy to compile the FFMPEG.dll if you can tell me the Moonware.FFMPEG.dll is the unmodified .dll.


#4

I don’t know, but please test.
-Henrik

I made some test and the setting Hardware_DXVA2 use my Nvidia graphics card. As mentioned by many it do not have any major effect on the overall CPU-load for some reason.


#6

Henrik,
Actually the machine I have has its HDMI monitors connected via Thunderbolt 3. Windows uses one GPU for its primary displays and manages traffic to the displays. But if you have a image/video display application you can assign the GPU handling the rendering if that rendering is compute intensive. VLC is a prime example of that where the application (this is “experimental” and the executable has to be renamed “vlc_test.exe” to have it rendered using a GPU). It effect the computing GPU then does a “COPY” to the primary GPU for display. I actually have three GPUs in the machine and can without hicup have (have only tried 4) multiple 4k videos being played simultaneously by using the non-primary GPUs doing the compute intensive decoding.
Now I don’t know the SW architecture of NCS and in the case of display decoding it may not make a big difference with low-resolution cameras, BUT you in effect are taking multiple H.264 streams (in the case of RTSP) and displaying them. And where customers are deploying 1080p cameras there will be an offloading of the display function from the CPU.
When we WERE RUNNING NCS (because of the pagefile issue) I was unable to even get it to use the primary GPU for display decoding. Nevertheless, I will try it again BUT I did try to get you client app (the blue one) to display utilizing ANY GPU.
The bigger question is encoding. What encoding is NCS doing in its video processing typically in custoners who are deploying a large number of cameras pumping out H.264 streams? And more importantly, can those encoding functions utilize GPU hardware. My first question as an outsider to your code was: “was Moonware.FFMPEG.dll compiled with the GPU flags?” (i.e.: --enable-cuda --enable-cuvid --enable-nvenc). I understand that this functionality was added in late 2016. So if your .dll source tree is older it will be unable. These are not questions that I see on your forum being addressed but will eventually improve your product as customers deploy newer high resolution video cameras. Many customer have very capable GPUs in their systems that sit idle because software does not support them. GPUs are no longer a “GAMER” luxury but can be used in a variety of applications and yours is a prime example. I appreciate your attention to this not only to benefit our own inhouse app, but for our customers and others also.


#7

Hi,
Thanks. Of cause I agree with you and that is how NCS should work. SInce it works with my Nvidia card I am not sure what might be the problem so I forward this for further discussion.
@Steve
-Henrik


#8

We currently don’t have cuda support, we rely exclusively on ffmpeg and as you have noticed the version we use is from before this was available at the ffmpeg side.

We have made some experiments with the other hw accelerations that were available such as DXVA2 and unfortunately the results weren’t really impressive.

The main problem / limitation is that we work with RGB32 pictures for the overlay, for the motion detection, for generating the live jpegs etc… so somehow what we gain by use hw accel is being lost by having to 1) transfer images from the GPU to the CPU 2) converting the images back and forth to RGB32.

This whole hw acceleration is generally tricky and quite complex to put in place, we already struggled a lot to get the existing modes to work (despite they don’t provide much improvements against pure CPU mode).

I will check with my ffmpeg expert which helped me to put in place if there’s any interest into looking again at CUDA now that it’s in ffmpeg but I doubt it’s something that will be available on our end at short or mid term.


#9

Sorry I have not gotten back, just busy elsewhere. I naively thought you could also get a performance boost if you are doing encoding but it appears (at least from RTSP cameras) you leave the video stream unmolested (except the audio) so no video encoding is being performed for saved captures. Your use of RGB32 data internally would certainly lend itself to direct display not requiring intervention by the GPU. Thanks for the insight. Frank